| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brucethemoose2 1112 days ago

Kobold.cpp is your best bet.

You can leverage those big CPUs while still loading both GPUs with a 65B model.

... If you are feeling extra nice, you should set that up as an AI horde worker whenever you run koboldcpp to play with models. It will run API requests for others in the background whenever its not crunching your own requests, in return allowing you priority access to models other hosts are running: https://aihorde.net/

1 comments

pmarreck 1111 days ago

oooh, this is a great idea

link

brucethemoose2 1111 days ago

Also, I would suggest this model as one to play with:

https://huggingface.co/ycros/airoboros-65b-gpt4-1.4.1-PI-819...

Check the prompting syntax here, it has a huge effect on the output:

https://huggingface.co/jondurbin/airoboros-65b-gpt4-1.4

link