Hacker News new | ask | show | jobs
by alekseiprokopev 940 days ago
One of the tasks that can be accomplished by running LLMs on a CPU is to execute long background tasks that do not require real-time response. llama.cpp seems like a suitable platform for this. It would be interesting to explore how to leverage the various acceleration techniques available on AWS.