| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jart 641 days ago

It takes 14,493,515,821 cycles to boot Alpine Linux in an qemu.

    perf stat -Bddd qemu-system-x86_64   -m 2048   -cdrom alpine.iso   -boot d   -enable-kvm   -cpu host   -smp 2   -net nic -net user,hostfwd=tcp::2222-:22   -nographic   -serial mon:stdio   -monitor telnet:127.0.0.1:1234,server,nowait   -d in_asm,cpu   -D qemu.log

It takes 1,927,757,029,221 cycles to summarize a 1625 token Dijkstra essay with LLaMA 8B.

    perf stat -Bddd llamafile -m Meta-Llama-3.1-8B-Instruct.BF16.gguf -f ~/prompt1625.txt -c 4096 -n 40

Ignoring things like AVX512 you're looking at about 100x more compute to do something serious with LLaMA.

However! If you just want to demo it working, then you could generate 4 tokens using TinyLLaMA 1.1B which takes 25,164,386,466 cycles. That's about the same cost as booting Linux. So you could do TinyLLaMA if you can do Linux.

1 comments

pclmulqdq 640 days ago

That's closer than I thought, to be honest.

Note also that the 4004 lacks a floating-point unit of any kind - not just a vector unit. I think people make 8-bit integer quantizations of LLMs, though, which would be the fastest versions to run on a 4004.

link

jart 640 days ago

A lot of quants just upcast to floats. Some of them work on integer multiplication using pmaddubsw. But oof, it looks like the i4004 doesn't even have that.

link