| I built myself an AST based solution for that during the last 6 months roughly. I always wondered whether grep and agent-based discovery will be the end of it and thought it just has to be better with a more deterministic approach. In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it. I used a different approach than tree-sitter because I thought I found a nice way to get around having to write language-specific code. I basically use VSCode as a language backend and wrote some logic to basically rebuild the AST tree from VSCode's symbol data and other API. That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP. In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later. Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow. The approach is similar to yours, I have an overview tool which runs different centrality ranking metrics over the whole codebase to get the most important symbols and presents that as an architectural overview to the LLM. Then I have a "get-symbol-context" tool which allows the AI to get all the information that the AST holds about a single symbol, including a parameter to include source code which completely replaces grepping and file reading for me. The tool also specifies which other symbols call the one in question and which others it calls, respectively. But yeah, sorry for this being already a quite long comment, if you want to give it a try, I published it on the VSCode marketplace a couple of days ago, and it's basically free right now, although I have to admit that I still want to try to earn a little bit of money with it at some point. Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody. Would love to hear what you think :) <https://marketplace.visualstudio.com/items?itemName=LuGoSoft...> |
The fact that you've been using it for six months and that it performs well says a lot. At the end of the day, that's what counts.
I like your idea of piggybacking on top of the LSP services, and I can imagine that this was quite a bit of work. Doing it as an MCP server makes it usable across different tools.
I also really like the name "Context Master."
In my case, it's much more niche since it's for the tool I built. Though it's open source, the key difference is that the "indexing" is only agentic at this point.
I can see value in mixing the two. LSP integration scares me because of the amount of work involved, and tree-sitter seems like a good path.
In that case, in the code map, for each item, there could be both the LLM response info and some deterministic info, for example, from tree-sitter.
That being said, the current approach works so well that I think I am going to keep using and fine-tuning it for a while, and bring in deterministic context only when or if I need it.
Anyway, what you built looks great. If it works, that's great.