|
|
|
|
|
by alekseiprokopev
940 days ago
|
|
One of the tasks that can be accomplished by running LLMs on a CPU is to execute long background tasks that do not require real-time response. llama.cpp seems like a suitable platform for this. It would be interesting to explore how to leverage the various acceleration techniques available on AWS. |
|