| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jumploops 620 days ago

> There is just no way you can build reliable agents on this foundation, where changing a word or two in irrelevant ways or adding a few bits of irrelevant info can give you a different answer.

LLMs are not magic bullets for every problem, but that doesn't preclude them from being used to build reliable systems or "agents."

It's clear that we don't yet have the all-encompassing AGI architecture, especially with the transformer model alone, but adding steps beyond the transformer leads to interesting results, as we've seen with current coding tools and the new o1-series models by OpenAI.

For example, the featured article calls out `o1-mini` as failing a kiwi-counting test prompt, however the `o1-preview` model gets the right answer[0].

I also built a simple test using gpt-4o, that prompts it to solve the problem in parts, and it reliably returns the correct answer using only gpt-4o and code generated by gpt-4o[1].

Furthermore, there's still a ton of research being done on models that are specific to formal theorem proving that show promise[2] (even if `o1-preview` already beats them for e.g. IMO problems[3]).

I'm of the opinion that we still have a ways to go until AGI, but that doesn't mean LLMs can't be used in reliable ways.

[0]https://chatgpt.com/share/e/67098356-ce88-8001-a2e1-9857064a...

[1]https://magicloops.dev/loop/30fb3c1a-8e40-47ae-8611-91554faf...

[2]https://arxiv.org/pdf/2408.08152

[3]https://openai.com/index/introducing-openai-o1-preview/