Hacker News new | ask | show | jobs
by mistrial9 746 days ago
the models that you have tried .. are garbage. hmmm Maybe you are not among the many, many, many inside professionals and unofrmed services that have different access than you? money talks?
1 comments

It is remarkable that folks who tried a garbage LLM like copilot, 3.5, Gemini, or made meta LLMs say naughty words, seem to think these are still SOA. Sometimes I stumble on them and I am shocked at the degradation in quality then realize my settings are wrong. People are vastly underestimating the rate of change here.
People have tried gpt-4, it does the same kind of errors as gpt-3, it just has a bigger set of known things where it does ok so it is immensely more useful.

It is like a calculator that only worked in one digit, and now it works on 2, the improvement is immense but its still nowhere close to replacing mathematicians since it isn't even working on the same kind of problems.

Edit: In several years we might have a perfect calculator that is better than any human at such tasks, but it still doesn't beat humans at stuff unrelated to calculations. Or in the case of LLMs pattern matching texts, humans don't pattern match texts to plan or mentally simulate scenarios etc, that part isn't covered by LLMs. Human level planning with todays LLM level pattern matching on text would be really useful, we see a lot of humans work that way by using the LLM as a pattern matcher, but there is no progress on automating human level planning so far, LLMs aren't it.

> People are vastly underestimating the rate of change here

GPT-3.5 was released in March 2022. We are now in June 2024. Over 2 years later.

And on average GPT-4 is about 40% more accurate.

For me, LLMs are very much like self-driving cars. On the journey towards perfect accuracy it gets progressively harder to make advancements.

And for it to replace the status quo it really does need to be perfect. And there is no evidence or research that this is possible.

Its enough to decrease the amount of ppl you need in IT by a factor of 20-30%.

Ppl dont want to hear that, but you see less and less offers and not only for junior positions.

Hard truth is that like with any tool/automation - the higher performance improves, the less ppl are needed for this kind of work.

Just look at how some parts of manual labor were made redundant.

Why ppl think it wont be the same with mental work is beyond me.

Not yet, because the reliability isn't there. You still need to validate everything it does.

E.g. I had it autocompleting a set of 20 variable#s today Something like output.blah=tostring(input[blah]). The kind of work you give to a regex.

In the middle of the list, it decides to go output.blah=some long weitd piece of code, completely unexpected and syntactically invalid.

I am still in my AI evaluation phase, and sometimes I am impressed with what it does. But just as possible is an unexpected total failure. As long is it does that, I can't trust it.