Hacker News new | ask | show | jobs
by int_19h 410 days ago
LLMs are a bad deal when you look at how much power you need to run that inference. A device that could barely run one instance of QwQ-32B at glacial speeds will be able to serve multiple concurrent users of Kiwix.
2 comments

Quick question: which car companies are working on self driving cars? All of them, and two other companies ( Apple and Google ).

Which militaries are working on battle field AI. All of them.

Could a 64Gb dual xeon run say 50 to 100 users of kiwix?

To serve multiple users, probably not.

But--if you don't think of asking Hacker News every single thing you need to know beforehand, I think you still want the LLM to answer questions and help you bootstrap it.