Hacker News new | ask | show | jobs
by Eisenstein 641 days ago
Download koboldcpp and llama3.1 gguf weights, use it with the llama3 completions adapter.

Edit the 'background.js' file in the extension and replace the openAI endpoint with

'http://your.local.ip.addr:5001/v1/chat/completions'

Set anything you want as an API key. Now you have a truly local version.

* https://github.com/LostRuins/koboldcpp/releases

* https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-...

* https://github.com/LostRuins/koboldcpp/blob/concedo/kcpp_ada...