I spent my easter weekend stuck in the house with COVID and I decided to play with llama.cpp [1] and fauxpilot [2] to see if I could get LLM code assist working on pure CPU.
As a proof of concept I'd say I've proven that it's possible. However there's still a lot to do. The auto complete is quite slow at the moment. PRs welcome.
Tabby infers on the gpu and is slow, I can only imagine how slow truepilot is on the cpu.
If people want auto complete, it needs to be super fast. For slow inference, a better application would be a chatbot that reads your code and answers questions, like cody from sourcegraph.