|
|
|
|
|
by kovek
231 days ago
|
|
Well, if the model can reliably keep in context CPU cache plus CPU registers plus CPU instructions and is able to do operations based on those, then we pretty much solved computation using LLMs, right? It could use RAG to operate on RAM and SSD. Here we can see the amount of data a high end traditional non-SOC CPU holds: > For a recent high-end non-SoC desktop CPU:
> Cache: ~40-100 MB total (L1 + L2 + shared L3)
> Register files: tens to few hundreds of KB total across cores (e.g., ~200-300 KB or so)
> Combined: So you're looking at ~40-100 MB + ~0.2 MB → roughly ~40-100 MB of total on-chip caches + registers. I'm sure we can reduce these caches to fit in the context windows of today's LLMs (~500,000 tokens). Then, with temperature 0 we get more "discrete" operations. Now, we still have the rare problem of hallucinations, but it should be small with temperature 0. |
|
And temperature 0 makes outputs deterministic, not magically correct.