| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Jweb_Guru 35 days ago

This jives with what I've experienced in the brief time I had access to 5.5 Pro. It's the very first LLM that I feel like I can wrangle into solving tedious, but straightforward, problems correctly. It still makes a ton of mistakes and needs to be very rigidly guided, but it does a pretty good job of tracing its own reasoning and correcting itself in a way that the other models do not.

The downside (not noted in the article, but noted by others here) is cost. It uses tokens at an insane rate, the tokens cost a lot, and using it with subagent flows that you can use to have it tackle large problems with high accuracy costs even more. It is also much "slower" for large scale problems because of context limitations -- it has to constantly rediscover context for each part of the problem, and in order to make it accurate you need to wipe its context before progressing to the next small part, or launch even more agents. For mathematical proofs like these, where the required context to understand the problem and proof besides stuff that's already available in its training set is small and the problems are considered "important" enough, this might not be a problem, but for many of the tasks I would like to use it for (ensuring correctness of code that affects large codebases, or validating subtle assumptions) it definitely is one.

So I think it will be a while before the impressive capabilities of these models really percolate into our lives as programmers, unless you're one of the lucky ones given unlimited access to 5.5 Pro.

3 comments

elAhmo 34 days ago

> It's the very first LLM that I feel like I can wrangle into solving tedious, but straightforward, problems correctly. It still makes a ton of mistakes and needs to be very rigidly guided, but it does a pretty good job of tracing its own reasoning and correcting itself in a way that the other models do not.

I swear that people have said the same thing with effectively every new model that came out in the last six months.

fluidcruft 34 days ago

I think it's because people walk every model up to its limits and become very aware of a task they can't make work. They do a lot of work simplifying and understanding limitations at that boundary. Then an improved model comes out and they immediately toe that barrier and make swift progress. They will also notice that the new model is natively doing tricks they had done manually.

The reality is likely that everyone is hitting similar barriers and the solutions are somewhat generalizable and get added to training new models.

Eventually people will reach the new limits and the cycle repeats.

fasterik 34 days ago

> I swear that people have said the same thing with effectively every new model

That is definitely true, and at the same time, we can measure progress by who is making that claim. When Timothy Gowers, a Fields Medalist, says that models are now capable of "producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me," we can be pretty confident that we are getting into seriously interesting territory.

Jweb_Guru 34 days ago

Many people may have, but I certainly haven't.

hackable_sand 34 days ago

They did. The scam continues.

pfdietz 30 days ago

The conspiracy theory mindset, is there anything it can't explain away?

y1n0 35 days ago

> This jives with what I've experienced

Just as an fyi, the word you are looking for is jibes. Jive is something else entirely.

jibe 35 days ago

I'm with you!

shnock 35 days ago

Oh look it's jibe's account!

pfdietz 34 days ago

Excuse me stewardess, I speak jive.

jfaat 34 days ago

I'm going to start using malapropisms so people know I didn't use an llm to write things

boring-human 35 days ago

Cut me some slack, Jack.

billfor 34 days ago

Blame The Bee Gees: https://en.wikipedia.org/wiki/Jive_Talkin'

pessimizer 34 days ago

"Jive" is anacronistic black American slang for bullshit.

bicepjai 35 days ago

Interesting I did not know that I would have used jives :) thanks

refulgentis 35 days ago

That ship sailed looooong ago.

ignoramous 35 days ago

> looooong

Just as an fyi, the words you are looking for are ages/eons/an eternity.

ricardobayes 34 days ago

If we are having this meta-discussion, you can usually guess a person's age by which letter they are elongating. Millenial generation uses the vowel (as above) but gen alpha elongates the syllable - "longggg". Doesn't add anything to the convo just an interesting tidbit.

hooo 35 days ago

What has HN become…?

sdwr 34 days ago

Same as it ever was

mimentum 33 days ago

...Talking Heads

Culonavirus 34 days ago

hooo nooo

idiotsecant 35 days ago

The only thing worse than complaining about this is being the guy complaining about the guy complaining about this. So congratulations on being second most annoying.

xdavidliu 34 days ago

oh the irony

Forgeties79 34 days ago

> can wrangle into solving tedious, but straightforward, problems correctly. It still makes a ton of mistakes and needs to be very rigidly guided,

I don’t know about the rest of y’all but I find “rigidly guiding” LLM’s incredibly tedious and frustrating in the same way seeing an error code throw for the 40th time while troubleshooting something on my computer for two hours is frustrating. It also feels somewhat like micromanaging a direct report. I don’t find that process fun or enjoyable in the slightest and it teaches me little in the process. It’s just trading styles of work, and I guess the response to that is “some people prefer that of work.” I just don’t like being told by the world we all have to work that way now I guess.

Jweb_Guru 34 days ago

I agree. I find it endlessly frustrating and kind of hate what programming has become. But at least for me it meets the minimum bar of "it works if you push things" now. For past models, under no circumstances could I get them to semi reliably solve these kinds of problems correctly without giving them so many "hints" that they weren't actually saving me time. The kind of reasoning I'm talking about is stuff like "can you actually construct a trace from program start for this condition that looks locally reachable?" Past model simply cannot reliably answer such questions as soon as the control flow involves enough hops or requires tracing through enough function calls.