Hacker News new | ask | show | jobs
by Jweb_Guru 35 days ago
This jives with what I've experienced in the brief time I had access to 5.5 Pro. It's the very first LLM that I feel like I can wrangle into solving tedious, but straightforward, problems correctly. It still makes a ton of mistakes and needs to be very rigidly guided, but it does a pretty good job of tracing its own reasoning and correcting itself in a way that the other models do not.

The downside (not noted in the article, but noted by others here) is cost. It uses tokens at an insane rate, the tokens cost a lot, and using it with subagent flows that you can use to have it tackle large problems with high accuracy costs even more. It is also much "slower" for large scale problems because of context limitations -- it has to constantly rediscover context for each part of the problem, and in order to make it accurate you need to wipe its context before progressing to the next small part, or launch even more agents. For mathematical proofs like these, where the required context to understand the problem and proof besides stuff that's already available in its training set is small and the problems are considered "important" enough, this might not be a problem, but for many of the tasks I would like to use it for (ensuring correctness of code that affects large codebases, or validating subtle assumptions) it definitely is one.

So I think it will be a while before the impressive capabilities of these models really percolate into our lives as programmers, unless you're one of the lucky ones given unlimited access to 5.5 Pro.

3 comments

> It's the very first LLM that I feel like I can wrangle into solving tedious, but straightforward, problems correctly. It still makes a ton of mistakes and needs to be very rigidly guided, but it does a pretty good job of tracing its own reasoning and correcting itself in a way that the other models do not.

I swear that people have said the same thing with effectively every new model that came out in the last six months.

I think it's because people walk every model up to its limits and become very aware of a task they can't make work. They do a lot of work simplifying and understanding limitations at that boundary. Then an improved model comes out and they immediately toe that barrier and make swift progress. They will also notice that the new model is natively doing tricks they had done manually.

The reality is likely that everyone is hitting similar barriers and the solutions are somewhat generalizable and get added to training new models.

Eventually people will reach the new limits and the cycle repeats.

> I swear that people have said the same thing with effectively every new model

That is definitely true, and at the same time, we can measure progress by who is making that claim. When Timothy Gowers, a Fields Medalist, says that models are now capable of "producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me," we can be pretty confident that we are getting into seriously interesting territory.

Many people may have, but I certainly haven't.
They did. The scam continues.
The conspiracy theory mindset, is there anything it can't explain away?
> This jives with what I've experienced

Just as an fyi, the word you are looking for is jibes. Jive is something else entirely.

I'm with you!
Oh look it's jibe's account!
Excuse me stewardess, I speak jive.
I'm going to start using malapropisms so people know I didn't use an llm to write things
Cut me some slack, Jack.
"Jive" is anacronistic black American slang for bullshit.
Interesting I did not know that I would have used jives :) thanks
That ship sailed looooong ago.
> looooong

Just as an fyi, the words you are looking for are ages/eons/an eternity.

If we are having this meta-discussion, you can usually guess a person's age by which letter they are elongating. Millenial generation uses the vowel (as above) but gen alpha elongates the syllable - "longggg". Doesn't add anything to the convo just an interesting tidbit.
What has HN become…?
Same as it ever was
...Talking Heads
hooo nooo
The only thing worse than complaining about this is being the guy complaining about the guy complaining about this. So congratulations on being second most annoying.
oh the irony
> can wrangle into solving tedious, but straightforward, problems correctly. It still makes a ton of mistakes and needs to be very rigidly guided,

I don’t know about the rest of y’all but I find “rigidly guiding” LLM’s incredibly tedious and frustrating in the same way seeing an error code throw for the 40th time while troubleshooting something on my computer for two hours is frustrating. It also feels somewhat like micromanaging a direct report. I don’t find that process fun or enjoyable in the slightest and it teaches me little in the process. It’s just trading styles of work, and I guess the response to that is “some people prefer that of work.” I just don’t like being told by the world we all have to work that way now I guess.

I agree. I find it endlessly frustrating and kind of hate what programming has become. But at least for me it meets the minimum bar of "it works if you push things" now. For past models, under no circumstances could I get them to semi reliably solve these kinds of problems correctly without giving them so many "hints" that they weren't actually saving me time. The kind of reasoning I'm talking about is stuff like "can you actually construct a trace from program start for this condition that looks locally reachable?" Past model simply cannot reliably answer such questions as soon as the control flow involves enough hops or requires tracing through enough function calls.