Hacker News new | ask | show | jobs
by nl 222 days ago
They mean the ability to run a large model entirely on the GPU without paging it out of a separate memory system.
1 comments

They're basically describing the Jetson and Tegra lineup, then. Those were featured in several high-end consumer devices, like smart-cars and the Nintendo Switch.
Sure but neither had enough memory to be useful for large LLMs.

And neither were really consumer offerings.