Hacker News new | ask | show | jobs
by dcreager 1536 days ago
Note that this article describes our implementation of “search-based” or “ctags-like” Code Navigation, which definitely has the imprecision that you describe. We've also been working over the previous ~year on a framework called Stack Graphs [1,2,3], which lets us tackle “precise” Code Navigation while still having the zero-config and incremental aspects that are described in the paper.

The build-based approach that you describe is also used by the Language Server Protocol (LSP) ecosystem. You've summarized the tradeoffs quite well! I've described a bit more about why we decided against a build-based/LSP approach here [4]. One of the biggest deciding factors is that at our scale, incremental processing is an absolute necessity, not a nice-to-have.

[1] https://github.blog/2021-12-09-introducing-stack-graphs/

[2] https://dcreager.net/talks/2021-strange-loop/

[3] https://news.ycombinator.com/item?id=29500602

[4] https://news.ycombinator.com/item?id=29501824

2 comments

I read about stack graphs before, it sounds interesting!

I think they help, but ultimately I expect you need a compiler solve the absolute madness of the totality of C++. For example I think getting argument-dependent lookup right in the presence of 'auto' requires type information? And there are other categories of things (like header search paths) where I think you are forced to involve the build system too.

Yup, it is probably fair to say that C++ accounts for like 50% of the complexity of Kythe at Google. Or certainly it feels like it.

And it is also worth noting that Kythe goes a bit deeper than what LSP can accomplish. In particular Kythe is built around a sort of two-layer graph, where it separates the physical code/line representation from a more abstract semantic graph. This allows us to accomplish some things that are very difficult to do in LSP.

Finally, Kythe at least internally has a big reliance on a unified build system (Blaze, or Bazel). It becomes rapidly more difficult to do when you have to hook in N different build systems up front, which is why search-based references are so appealing. Build integration is hard.

Has Tree Sitter been useful to projects like this? Does it have promise to be useful in the future? It seems to be gaining a lot of adoption among Neovim users and plugin developers, but not really anywhere else. I'm curious if that's because of lack of familiarity, or because it's technically deficient somehow.
In short, yes, very much so! Tree-sitter is what we're using under the covers to parse all of the languages that we support. The ctags-like symbol extraction described in the paper comes straight from tree-sitter, too. [1]

[1] https://tree-sitter.github.io/tree-sitter/code-navigation-sy...