|
|
|
|
|
by stingraycharles
28 days ago
|
|
> The difference between datacenter hardware and cheap personal hardware is not in what can be run and what cannot be run. You do realize that a model like Opus is (estimated to be) around 5T parameters, and uses around 5TB of GPU memory? These kind of things are just impossible to run locally. |
|
Like I have said, the problem is not that they cannot be run, but that they may run more slowly than it is acceptable for a given application. Depending on the model, the speeds reported for inference with weights stored on SSDs vary from one token every few seconds to at most a few tokens per second.
Computers could solve relatively huge problems even in the early days of vacuum tube computers, when the main memories were measured in kilobytes, because at that time it was not expected that the data needed for problem solving must fit inside the main memory or even in the next tier of memory, with magnetic drums or magnetic disks, but the really big problems were solved by a great number of passes over data stored on magnetic tapes.
An LLM whose inference could not be run on a small mini-PC would have to be one hundred times bigger than the biggest existing SOTA LLMs.
Any LLM that exists today can be run on almost any PC, just extremely slowly in comparison with datacenter hardware.