| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by asteroidz 1178 days ago
	The problem with your suggested approach is the resulting lack of holistic context. The problem of OP's approach (direct parsing) is cost and context-window-limits. There has to be a better way.

1 comments

jasonjmcghee 1178 days ago

It's searching by semantic meaning, so it should be able to find all relevant pieces. Using overlap during chunking should help too.

Using the "give it everything" method will cause it to forget most of what you're feeding it if you have a large repo anyway, right?

link

deathmonger5000 1178 days ago

I don't think it will forget anything as long as everything fits in the context window, but I could totally be wrong. That's the big problem with the "give it everything" approach: if your codebase doesn't fit then it's game over. I've had success limiting what I give it to the relevant files.

link

jasonjmcghee 1178 days ago

Right- "forgetting" assuming a rolling context window maxed at the models max token count. "If it doesn't fit then it's game over" - assuming 8k tokens with each token being ~4 characters, that's a pretty small repo. And that's a motivator behind the similarity search approach.

link