|
|
|
|
|
by kadushka
408 days ago
|
|
Running a vending machine using an LLM is absurd not because we can't run LLMs fast or cheap enough - it's because LLMs are not reliable, and we don't know yet how to make them more reliable. Our best LLM - o3 - doubled the previous model (o1) hallucination rate. OpenAI says it hallucinated a wrong answer 33% of the time in benchmarks. Do you want a vending machine that screws up 33% of the time? Today, the accuracy of LLMs is by far a bigger concern (and a harder problem to solve) than its speed. If someone releases a model which is 10x slower than o3, but is 20% better in terms of accuracy, reliability, or some other metric of its output quality, I'd switch to it in a heartbeat (and I'd be ready to pay more for it). I can't wait until o3-pro is released. |
|
I don't know what benchmark you're looking at but I'm sure the questions in it were more complicated than the logic inside a vending machine.
Why don't you just try it out? It's easy to simulate, just tell the bot about the task and explain to it what actions to perform in different situations, then provide some user input and see if it works or not.