| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by elcritch 432 days ago
	Yet despite this all the LLMS I've tried struggle to scale beyond much more than a single module. They're vast improvements on that test perhaps, but in real life they still struggle to be coherent over larger projects and scales.

2 comments

bckr 432 days ago

> struggle to scale beyond much more than a single module

Yes. You must guide coding agents at the level of modules and above. In fact, you have to know good coding patterns and make these patterns explicit.

Claude 4 won’t use uv, pytest, pydantic, mypy, classes, small methods, and small files unless you tell it to.

Once you tell it to, it will do a fantastic job generating well-structured, type-checked Python.

link

viraptor 432 days ago

Those are different kind of issues. Improving the quality of actions is what we're seeing here. Then for the larger projects/contexts the leaders will have to battle it out between the improved agents, or actually moving to something like RWKV and processing the whole project in one go.

link

morsecodist 432 days ago

They may be different kinds of issues but they are the issues that actually matter.

link