| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by butz 925 days ago
	I assume that training LLMs locally require high-end hardware. Even running a model requires a decent CPU or, even better, a high end GPU, but it is not so expensive as training a model. And usually you have to use hardware that is available on the cloud, so not much of privacy here.

2 comments

cjbprime 925 days ago

You don't need to train the model on your data: you can use retrieval augmented generation to add the relevant documents to your prompt at query time.

link

Art9681 925 days ago

This works if the document plus prompt fit in the context window. I suspect the most popular task for this workflow is summary which presumably means large documents. That's when you begin scaling out to a vector store and implementing those more advanced workflows. It does work even by sending a large document on certain local models, but even with the highest tier MacBook Pro a large document can quickly choke up any LLM and bring inference speed to a crawl. Meaning, a powerful client is still required no matter what. Even if you generate embeddings in "real-time" and dump to a vector store that process would be slow in most consumers hardware.

If you're passing in smaller documents then it works pretty good for real-time feedback.

link

butz 925 days ago

Thank you for explanation. I see there is still a lot I have to learn about LLMs.

link

smcleod 925 days ago

As someone else said you don't need to train any models, also - small LLMs (7b~) can run really well even on a base M1 Macbook air from 3 years ago.

link