Hacker News new | ask | show | jobs
by brucethemoose2 1064 days ago
Kobold.cpp is your best bet.

You can leverage those big CPUs while still loading both GPUs with a 65B model.

... If you are feeling extra nice, you should set that up as an AI horde worker whenever you run koboldcpp to play with models. It will run API requests for others in the background whenever its not crunching your own requests, in return allowing you priority access to models other hosts are running: https://aihorde.net/

1 comments

oooh, this is a great idea
Also, I would suggest this model as one to play with:

https://huggingface.co/ycros/airoboros-65b-gpt4-1.4.1-PI-819...

Check the prompting syntax here, it has a huge effect on the output:

https://huggingface.co/jondurbin/airoboros-65b-gpt4-1.4