|
|
|
|
|
by tyfon
1165 days ago
|
|
So I didn't see this before today, I will respond anyway and the siblings can also see.
There is quite a big difference between the 65B model and especially the 13B and 7B models. But here is the bash script[0] I launch my "go to" AI, it's called Omnius :) As written in the previous comment, it is a modified version of the examples/chat-13b.sh that is included in the llama.cpp github. [0] https://pastebin.com/SeKE3Uac |
|
I have a 5950X as well, but sadly, token generation is a bit too slow for me now. (I've had turbo turned off for efficiency as well, but maybe I'll see if the extra cycles helps).
I'm giving 30B a try on my GPU now with https://github.com/oobabooga/text-generation-webui/wiki/LLaM... and if it's not good then will give layer offloading with 65B a try and see if I can get it running well.