|
|
|
|
|
by adwn
365 days ago
|
|
> EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second 50 token/s is completely useless if the tokens themselves are useless. Just look at the "story" generated by the model presented in your link: Each individual sentence is somewhat grammatically correct, but they have next to nothing to do with each other, they make absolutely no sense. Take this, for example: "I lost my broken broke in my cold rock. It is okay, you can't." Good luck tuning this for turn-based conversations, let alone for solving any practical task. This model is so restricted that you couldn't even benchmark its performance, because it wouldn't be able to follow the simplest of instructions. |
|
Even at that small scale, you can already do useful things like basic code or text autocompletion, and with a few million parameters on a machine like a Cray Y-MP, you could reasonably attempt tasks like summarizing structured or technical documentation. It's constrained in scope, granted, but it's a solid proof of concept.
The fact that a functioning language model runs at all on a Pentium II, with resources not far off from a 1982 Cray X-MP, is the whole point: we weren’t held back by hardware, we were held back by ideas.