Hacker News new | ask | show | jobs
by pulse7 589 days ago
Options:

A) 128GB RAM with the fastest Intel/AMD CPU, no GPU: you can run big/good models, but very slow (about 0.5 to 3 tokens/second)

B) Fastest Mac with 128GB/192GB: you can run big/good models with moderate speed (like 5-10 tokens/second)

C) 16/32GB RAM + RTX 4090 with 24GB VRAM: you can run smaller (but still good) models very fast - completely in VRAM (20-30 tokens/second)