| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by softwaredoug 2 days ago

In my research grep is fine if you don’t care about tokens and you have less than 100k files. The direct corpus interaction paper [1] shows a breakdown past this level. In my personal experience you get a bit better relevance than a BM25 search engine with grep plus an agent. But it requires you to eat tokens.

If you think grep is great, it’s because you’ve been social engineered to organize your content to be findable. We document why something is useful to an agent. We put it in a logical place.

Just organizing content is at least half of building search, agentic or not. It’s one reason Google is successful, we’re all trying to make our content findable by the search engine. It’s not all technology :)

1- https://arxiv.org/abs/2605.05242

3 comments

cpburns2009 2 days ago

> If you think grep is great, it’s because you’ve been social engineered to organize your content to be findable. ...

This is such a strange train of thought. How do did you get there?

link

softwaredoug 2 days ago

I'm not literally saying you were social engineered. I'm saying all the incentives are there for you to organize your content.

Incentives to make things findable is more important to search than any technology.

link

nh23423fefe 2 days ago

I read that as, "you've learned to insert weird entropy meta-breadcrumbs just for finding"

so if i just index and search then i can stop writing like that?

link

allan_s 2 days ago

Long before AI I remember asking people in code review to add comments specifically to make the code grep-able. Same for for privileging key value mapping to dynamic string concatenation.

link

piekvorst 2 days ago

The social engineering thing runs deep. For example, if you grep for “Key” method, chances are the type/class name would stand on the same line. This is the case in Go and, I think, in many other programming languages (ironically, not C).

Lines are a fundamental building block of text and it’s not unreasonable to optimize them.

)

link

giancarlostoro 1 day ago

You can minimize the token waste using rtk as a proxy, and Claude will happily use rtk.

link