Hacker News new | ask | show | jobs
by tyfon 1165 days ago
So I didn't see this before today, I will respond anyway and the siblings can also see. There is quite a big difference between the 65B model and especially the 13B and 7B models.

But here is the bash script[0] I launch my "go to" AI, it's called Omnius :)

As written in the previous comment, it is a modified version of the examples/chat-13b.sh that is included in the llama.cpp github.

[0] https://pastebin.com/SeKE3Uac

1 comments

Ah thanks a lot, I tried out the llama.cpp examples before the k-shot chat prompts, this is definitely much better!

I have a 5950X as well, but sadly, token generation is a bit too slow for me now. (I've had turbo turned off for efficiency as well, but maybe I'll see if the extra cycles helps).

I'm giving 30B a try on my GPU now with https://github.com/oobabooga/text-generation-webui/wiki/LLaM... and if it's not good then will give layer offloading with 65B a try and see if I can get it running well.