| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by esperent 44 days ago

It's well priced but does that have much relevance for "state of the art coding models", specifically?

I wouldn't use Gemini 3 Flash or GPT 5.4 mini for anything except the most trivial work, although both are useful for basic exploratory work.

So I'm using a heavy model for the bulk of the work and the cost of that so far outweighs the light model that the light model cost is effectively irrelevant.

1 comments

julianlam 44 days ago

It's so interesting to see the wild pendulum swings of LLM sentiment here.

If one likes a model then it's capable of one-shotting entire apps.

Otherwise it's "only suitable for the most trivial tasks".

Never in between.

link

esperent 44 days ago

You're confusing "different people with different opinions" with "wild pendulum swings".

Personally my opinion in this regard is highly consistent over time.

link

jiggawatts 44 days ago

I have no trivial tasks.

Just last week, I was trying to map the weird and wonderful column names emitted by a NetScaler’s detailed REST log into OpenTelemetry semantics.

The NetScaler is basically abandoned by its dying vendor. Hence, its new features like sending logs directly to Splunk compatible receivers are basically undocumented. I’m sure there’s like three of us masochists out there stumbling our way through the brambles.

The Open Telemetry end is a mess of “deprecated” and “beta”, copying the shifting sands of other cloud native projects like Kubernetes.

Even with carefully curated Markdown documentation references and sample logs, every modern “frontier” AI makes basic mistakes and hallucinates like crazy no matter how I stuff their context.

This isn’t an Erdős problem! This is just getting logs from point A to point B.

link

vonunov 43 days ago

(I would drop this somewhere more relevant but apparently replying to a thing from 17 days ago would be necroing here + PMs are hard or something)

>[1] With what prompt!? I like the terse output! Do share...

Not sure if it's complete or completely right, but the matter came up in a recent session, and when asked what gave, Jim 'n' I came up with some supposedly relevant factors, at least one of which is news to me (supposedly, steering LLMs using negative instructions isn't counterproductive anymore (not that I'd been resisting the temptation anyway)):

https://gemini.google.com/share/7af54a6861d7#:~:text=What%20...

With the caveat (or bonus) that it can go (?:too)? far when told to "be blunt" and not to "pull punches":

https://i.vgy.me/WHRZD7.png (from 2024-09) (in this case, in user-config persistent instructions in Kagi's multi-LLM thing)

link

2ndorderthought 44 days ago

It's so true. I bet 80% of questions normal people even ask chatgpt/copilot could be answered with an 8b model trained on recent data.

I don't think people realize how small the gap between free to cheap models have from frontier models. It's going to be commoditized a lot faster than marketing will catch up. Once cash gets tight or prices rise it's more or less done for.

Especially considering some of the small free/cheap models can one shot code now.

link