| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alekseiprokopev 940 days ago
	One of the tasks that can be accomplished by running LLMs on a CPU is to execute long background tasks that do not require real-time response. llama.cpp seems like a suitable platform for this. It would be interesting to explore how to leverage the various acceleration techniques available on AWS.