| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by johnsmith1840 397 days ago

I seem to be alone in this but the only methods truly good at coding are slow heavy test time compute models.

o1-pro and o1-preview are the only models I've ever used that can reliably update and work with 1000 LOC without error.

I don't let o3 write any code unless it's very small. Any "cheap" model will hallucinate or fail massively when pushed.

One good tip I've done lately. Remove all comments in your code before passing or using LLMs, don't let LLM generated comments persist under any circumstance.

2 comments

_bin_ 397 days ago

Interesting. I've never tested o1-pro because it's insanely expensive but preview seemed to do okay.

I wouldn't be shocked if huge, expensive-to-run models performed better and if all the "optimized" versions were actually labs trying to ram cheaper bullshit down everyone's throat. Basically chinesium for LLMs; you can afford them but it's not worth it. I remember someone saying o1 was, what, 200B dense? I might be misremembering.

link

johnsmith1840 397 days ago

I'm positive they are pushing users to cheaper models due to cost. o1-pro is now in a sub menu for pro users and labled legacy. The big inference methods must be stupidly expensive.

o1-preview was and possibly still is the most powerful model they ever released. I only switched to pro for coding after months of them improving it and my api bill getting a bit crazy (like 0.50$ per question).

I don't think paramater count matters anymore. I think the only thing that matters is how much compute a vendor will give you per question.

link

doug_durham 396 days ago

I never have LLMs work on 1000 LOC. I don't think that's the value-add. Instead I use it a the function and class level to accelerate my work. The thought of having any agent human or computer run amok in my code makes me uncomfortable. At the end of the day I'm still accountable for the work, and I have to read and comprehend everything. If do it piecewise I it makes tracking the work easier.

link

johnsmith1840 395 days ago

Big test time compute LLMs can easily handle 1k depending on logic density and prompt densitity.

Never an agent, every independent step an LLM takes is dangerous. My method is much more about taking the largest and safest single step at a time possible. If it can't do it in one step I narrow down until it can.

link