| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by johnklos 642 days ago
	Running Linux on a 4004 is possible, as we've seen, but running llama is just way too far? Interesting take.

1 comments

pclmulqdq 642 days ago

Llama takes a lot more MIPS and a lot more RAM than linux. Linux is more complicated, but computers were running linux 30 years ago. In this case, quantity has a quality all of its own.

link

jart 641 days ago

It takes 14,493,515,821 cycles to boot Alpine Linux in an qemu.

    perf stat -Bddd qemu-system-x86_64   -m 2048   -cdrom alpine.iso   -boot d   -enable-kvm   -cpu host   -smp 2   -net nic -net user,hostfwd=tcp::2222-:22   -nographic   -serial mon:stdio   -monitor telnet:127.0.0.1:1234,server,nowait   -d in_asm,cpu   -D qemu.log

It takes 1,927,757,029,221 cycles to summarize a 1625 token Dijkstra essay with LLaMA 8B.

    perf stat -Bddd llamafile -m Meta-Llama-3.1-8B-Instruct.BF16.gguf -f ~/prompt1625.txt -c 4096 -n 40

Ignoring things like AVX512 you're looking at about 100x more compute to do something serious with LLaMA.

However! If you just want to demo it working, then you could generate 4 tokens using TinyLLaMA 1.1B which takes 25,164,386,466 cycles. That's about the same cost as booting Linux. So you could do TinyLLaMA if you can do Linux.

link

pclmulqdq 640 days ago

That's closer than I thought, to be honest.

Note also that the 4004 lacks a floating-point unit of any kind - not just a vector unit. I think people make 8-bit integer quantizations of LLMs, though, which would be the fastest versions to run on a 4004.

link

jart 640 days ago

A lot of quants just upcast to floats. Some of them work on integer multiplication using pmaddubsw. But oof, it looks like the i4004 doesn't even have that.

link