| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mplanchard 445 days ago
	What’s 300k tokens in terms of lines of code? Most codebases I’ve worked on professionally have easily eclipsed 100k lines, not including comments and whitespace. But really that’s the vision of actual utility that I imagined when this stuff first started coming out and that I’d still love to see: something that integrates with your editor, trains on your giant legacy codebase, and can actually be useful answering questions about it and maybe suggesting code. Seems like we might get there eventually, but I haven’t seen that we’re there yet.

2 comments

simonw 445 days ago

We hit "can actually be useful answering questions about it" within the last ~6 months with the introduction of "reasoning" models with 100,000+ token contest limits (and the aforementioned Gemini 1 million/2 million models).

The "reasoning" thing is important because it gives models the ability to follow execution flow and answer complex questions that down many different files and classes. I'm finding it incredible for debugging, eg: https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8...

I built a files-to-prompt tool to help dump entire codebases into the larger models and I use it to answer complex questions about code (including other people's projects written in languages I don't know) several times a week. There's a bunch of examples of that here: https://simonwillison.net/search/?q=Files-to-prompt&sort=dat...

link

jjani 445 days ago

How much lines of context and understanding can we, human developers, keep in our heads, taken into account and refer to when implementing something?

Whatever the amount may be, it definitely fits into 300k tokens.

link

mplanchard 444 days ago

After more than a few years working on a codebase? Quite a lot. I know which interfaces I need and from where, what the general areas of the codebase are, and how they fit together, even if I don’t remember every detail of every file.

link