Hacker News new | ask | show | jobs
by freakynit 31 days ago
What I have personally observed with such tools is that they make the AI's dumb, similar to how it makes coders dumb when relying more on AI tools.

These agentic AI's are already smart enough to figure out a highly optimized path to code exploration or search. But, with these tools, they just go very aggressive, partly because the search results from these tools almost in 100% of the cases do not furnish full details, but, just the pointers.

To confirm this behaviour, I did a small test run. This is in no way conclusive, but, the results do align with what I been observing:

---

Task: trace full ingestion and search paths in some okayish complex project. Harness is Pi.

1. With "codebase-memory-mcp": 85k/4.4k (input/output tokens).

2. With my own regular setup: 67k/3.2k.

3. Without any of these: 80k/3.2k.

As we see, such a tool made it worse (not by much, but, still). The outputs were same in quality and informational content.

---

Now, what my "regular setup" mentioned above is?:

Just one line in AGENTS.md and CLAUDE.md: "Start by reading PROJECT.md" .

And PROJECT.md contains just following: 2-3 line description of the project, all relevant files and their one-line description, any nuiances, and finally, ends with this line:

    ## To LLM
    Update this file if the changes you have done are worth updating here. The intent of this file is to give you a rough idea of the project, from where you can explore further, if needed.
5 comments

> These agentic AI's are already smart enough to figure out a highly optimized path to code exploration or search.

Hasn't been my experience. We used to use Augment Code at work which has a thing called Context Engine - basically an MCP that can answer natural language queries about pre-indexed code. Then we switched to Claude Code, which for some reason prefers to use sed to read from files using line ranges from its own memory (this despite having a range-capable read tool). I don't know, does that really mean that sed is the highly optimized path?

Lol... I noticed it does weird stuff sometimes. I'll see it generate a python script inline on the CLI to edit files. Like... Yo what the fuck? It literally used the edit tool until 5 turn ago.

Also, it'll run a formatter, read, edit to undo auto formatting and then continue on its merry way. What is the point of that??? Lol

I’ve seen Claude write a one line Perl script to pull items from a json file.
Hey, codebase-memory-mcp and semble are not exactly the same, but it's an interesting comparison, I'll put it on the todolist to check that out and add it to our benchmarks if feasible. If you ever get a chance to do this same comparison with semble it would be super useful feedback since these "real" scenarios are hard to benchmark/replicate.
So, I just tested with semble. Your MCP integration did not work, and kept throwing error (Failed to connect to "semble": MCP error -32000: Connection closed) though I installed using documented manner (tried both: pip and ux methods).

Anyways, I made it work by making it generate relevant doc (using semble init), and then copying this into AGENTS.md, and then prompting it with this line:

""" Start by reading AGENTS.md in current folder. Now, the task::: `Explore the ingestion and search paths. Do not read README.md at all`. Prefer to use `semble` search for code search. Do not do new installation. semble is already available at `/Users/nitinbansal/.local/bin/semble` . """

The results are much better. Even better than my own setup, but, vary a lot. I did 4 runs:

95k/2.9k

25k/2.7k

71k/2.9k

37k/4.0k

Sorry to hear about the MCP integration, that's definitely something we'll look into. If you have any info about your system or how to reproduce it please let me know. Very nice to hear about the results, thanks for checking this! The variance is interesting to see, that's probably non-determinism in the LLM rather than semble since semble is deterministic. But I'm guessing we can make that better with the prompt, I'll look into this.
uv failed for me once, but then worked the second time. I think it has to do with uv just taking a while to install the first time. Maybe if you pre-run the installation it'll work better.
I'm seeing over and over again people claiming absurd optimizations for coding agents:

> Our tool uses 99x fewer tokens and delivers 88x better results.

Okay, great, but...

1) It's VERY difficult to quantify something is better.

2) They almost never post how they measured how much better it is and what the margin of error might be.

3) I assume they are incompetent and don't even try the tool.

Like you pointed out, the odds these things make agents worse is FAR higher than they make them better.

Not saying it's impossible, but if it was possible on the scales they are claiming, it probably would already be done, or put into the next release of the agents...

Hey, this skepticism is fair and we share it, which is why we don't claim end-to-end agent improvements since we haven't measured those (yet). The benchmark we published measures retrieval quality and token count during search, not overall agent performance. We are working on agent-level evals, but those are unfortunately much harder to get right. However, we do believe that Semble makes agents better based on our own experience of using it for the past months while in development (or at the very least, cheaper).
> We are working on agent-level evals, but those are unfortunately much harder to get right.

It's unfortunately a nearly impossible task, as the models change regularly (without letting you know), so you have a moving (invisible) target that's 1) hard to test exhaustively, and 2) very expensive to test with any low margin of error.

This is why no one does it and just makes broad sweeping unverified claims instead.

If you figure out how to do it... You should probably just get a job at Anthropic or OpenAI and make $2M+ per year...

I found this prompt works well to nudge it to use a better grep as the start, then just keep using grep (Cursors instant grep in my case):

``` - For planning, prefer using morph-mcp `codebase_search` - subagent that takes in a search string and tries to find relevant context. Best practice is to use it at the beginning of codebase explorations to fast track finding relevant files/lines. Do not use it to pin point keywords, but use it for broader semantic queries. "Find the XYZ flow", "How does XYZ work", "Where is XYZ handled?", "Where is <error message> coming from?" ```

(see also https://news.ycombinator.com/item?id=48205911; having higher quality results at the beginning of a thread seem to improve the output vs. having faster search later on).

> Just one line in AGENTS.md and CLAUDE.md: "Start by reading PROJECT.md" .

> And PROJECT.md contains...

…Why not just use that PROJECT.md as the AGENTS/CLAUDE.md?

Doing that would mean copying and keeping in-sync two files: AGENTS.md and CLAUDE.md since I use both, claude, and others intercheangeably in one project.

Also, I dont want to keep my project's details in those files, but, keep it separate.

With current setup/way, a single line in both satisifies all constraints and requirements.

I use AGENTS.md as the "global" agent file, then use CLAUDE.md as a light wrapper with Claude-specific instructions which ends in an instruction to read AGENTS.md .
That works better. I just personally avoid touching global configs, hence, the project specific copy-pasting of two files.