Hacker News new | ask | show | jobs
by pianopatrick 40 days ago
Currently I'm testing something like this just to see what happens. I have an old laptop with 4GB of RAM. I attached a USB drive with Gemma 4 31B model (which is 32.6 GB). Currently the laptop is running llama.cpp and trying to respond to a prompt by streaming the model from disk.

The USB drive light is flickering, showing something is happening. It's been about 8 hours since I entered the prompt and I've gotten about 10 tokens back so far. I'm going to leave it running overnight and see what happens.

2 comments

Wow, that's a true worst case scenario especially if the USB is just plain old USB 2.0 (max 480 Mbps) and/or if the drive is a spinning disk. How's the CPU doing, though? Is there any headroom given the USB bottleneck?
running top shows the process llama-cli taking 29% of CPU and 88% of memory, while process usb-storage is taking 9% of cpu and 0% of memory
Nice.

What did you use to do this, something standard like llamacpp or something else like vllm or your own contraption ?

llama.cpp

It's now spit out about 40 tokens after maybe 18 hours and has not finished the "thinking" stage of responding to the prompt. I'll let it keep running to see what happens