Hacker News new | ask | show | jobs
by avion23 1061 days ago
> and $10,000+ of compute hardware per inference session.

That is not true. A common macbook with lots of RAM (>32GB) is enough. Or any x86 computer with lots of RAM. llama.cpp is CPU only and quite fast