Hacker News new | ask | show | jobs
by butz 925 days ago
I assume that training LLMs locally require high-end hardware. Even running a model requires a decent CPU or, even better, a high end GPU, but it is not so expensive as training a model. And usually you have to use hardware that is available on the cloud, so not much of privacy here.
2 comments

You don't need to train the model on your data: you can use retrieval augmented generation to add the relevant documents to your prompt at query time.
This works if the document plus prompt fit in the context window. I suspect the most popular task for this workflow is summary which presumably means large documents. That's when you begin scaling out to a vector store and implementing those more advanced workflows. It does work even by sending a large document on certain local models, but even with the highest tier MacBook Pro a large document can quickly choke up any LLM and bring inference speed to a crawl. Meaning, a powerful client is still required no matter what. Even if you generate embeddings in "real-time" and dump to a vector store that process would be slow in most consumers hardware.

If you're passing in smaller documents then it works pretty good for real-time feedback.

Thank you for explanation. I see there is still a lot I have to learn about LLMs.
As someone else said you don't need to train any models, also - small LLMs (7b~) can run really well even on a base M1 Macbook air from 3 years ago.