| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pianopatrick 40 days ago
	Currently I'm testing something like this just to see what happens. I have an old laptop with 4GB of RAM. I attached a USB drive with Gemma 4 31B model (which is 32.6 GB). Currently the laptop is running llama.cpp and trying to respond to a prompt by streaming the model from disk. The USB drive light is flickering, showing something is happening. It's been about 8 hours since I entered the prompt and I've gotten about 10 tokens back so far. I'm going to leave it running overnight and see what happens.

2 comments

zozbot234 39 days ago

Wow, that's a true worst case scenario especially if the USB is just plain old USB 2.0 (max 480 Mbps) and/or if the drive is a spinning disk. How's the CPU doing, though? Is there any headroom given the USB bottleneck?

link

pianopatrick 39 days ago

running top shows the process llama-cli taking 29% of CPU and 88% of memory, while process usb-storage is taking 9% of cpu and 0% of memory

link

stuaxo 39 days ago

Nice.

What did you use to do this, something standard like llamacpp or something else like vllm or your own contraption ?

link

pianopatrick 39 days ago

llama.cpp

It's now spit out about 40 tokens after maybe 18 hours and has not finished the "thinking" stage of responding to the prompt. I'll let it keep running to see what happens

link