|
|
|
|
|
by jart
641 days ago
|
|
It takes 14,493,515,821 cycles to boot Alpine Linux in an qemu. perf stat -Bddd qemu-system-x86_64 -m 2048 -cdrom alpine.iso -boot d -enable-kvm -cpu host -smp 2 -net nic -net user,hostfwd=tcp::2222-:22 -nographic -serial mon:stdio -monitor telnet:127.0.0.1:1234,server,nowait -d in_asm,cpu -D qemu.log
It takes 1,927,757,029,221 cycles to summarize a 1625 token Dijkstra essay with LLaMA 8B. perf stat -Bddd llamafile -m Meta-Llama-3.1-8B-Instruct.BF16.gguf -f ~/prompt1625.txt -c 4096 -n 40
Ignoring things like AVX512 you're looking at about 100x more compute to do something serious with LLaMA.However! If you just want to demo it working, then you could generate 4 tokens using TinyLLaMA 1.1B which takes 25,164,386,466 cycles. That's about the same cost as booting Linux. So you could do TinyLLaMA if you can do Linux. |
|
Note also that the 4004 lacks a floating-point unit of any kind - not just a vector unit. I think people make 8-bit integer quantizations of LLMs, though, which would be the fastest versions to run on a 4004.