Hacker News new | ask | show | jobs
by johnklos 642 days ago
Running Linux on a 4004 is possible, as we've seen, but running llama is just way too far? Interesting take.
1 comments

Llama takes a lot more MIPS and a lot more RAM than linux. Linux is more complicated, but computers were running linux 30 years ago. In this case, quantity has a quality all of its own.
It takes 14,493,515,821 cycles to boot Alpine Linux in an qemu.

    perf stat -Bddd qemu-system-x86_64   -m 2048   -cdrom alpine.iso   -boot d   -enable-kvm   -cpu host   -smp 2   -net nic -net user,hostfwd=tcp::2222-:22   -nographic   -serial mon:stdio   -monitor telnet:127.0.0.1:1234,server,nowait   -d in_asm,cpu   -D qemu.log
It takes 1,927,757,029,221 cycles to summarize a 1625 token Dijkstra essay with LLaMA 8B.

    perf stat -Bddd llamafile -m Meta-Llama-3.1-8B-Instruct.BF16.gguf -f ~/prompt1625.txt -c 4096 -n 40
Ignoring things like AVX512 you're looking at about 100x more compute to do something serious with LLaMA.

However! If you just want to demo it working, then you could generate 4 tokens using TinyLLaMA 1.1B which takes 25,164,386,466 cycles. That's about the same cost as booting Linux. So you could do TinyLLaMA if you can do Linux.

That's closer than I thought, to be honest.

Note also that the 4004 lacks a floating-point unit of any kind - not just a vector unit. I think people make 8-bit integer quantizations of LLMs, though, which would be the fastest versions to run on a 4004.

A lot of quants just upcast to floats. Some of them work on integer multiplication using pmaddubsw. But oof, it looks like the i4004 doesn't even have that.