Hacker News new | ask | show | jobs
by matchu 3723 days ago
I'm not clear on why we need to upload the full codebase.

The Privacy Policy indicates that this is because their index of public code is too large to copy to the client machine. So why can't my machine just send queries as needed, or, even better, download the subset of the index relevant to my project?

Maybe it was easier to ship the "local code" search features by building on top of the existing public code indexing, which is server-side. But, despite the extra engineering work, client-side indexing for local code would definitely make a lot more sense for most customers.

1 comments

We really did think a lot about this. It turns out that there is no clean separation between "index code" and "query for results". E.g. In Python if you see x.foo() you may need to know a _lot_ about the universe of Python libraries to do the type inference required to figure out what foo is in that particular expression.
I'm not sure that addresses the question; likely I'm missing something. Why does it need to send the user's entire codebase over? If it's a question of needing to analyze a given source file to infer types so query results are relevant, then why not do that inference work client side and ship back just the type information to the server where the query is run?

Even if that is a step too far, you could strip out string literals (e.g. API keys), and obfuscate variable, method, type, and file names while retaining a map on the client side so results line up.

That seems possible with a strongly-typed language, but for something like Python, they'd basically have to recompile the code on every keystroke, right?
Wouldn't they basically need to do this (compile the code) even if they have all of it?