Hacker News new | ask | show | jobs
by lhl 1167 days ago
Sounds like you've had some more success w/ w/ raw LLaMA - would def be interested in how you're prompting it.

BTW, for those interested (looks like the markdown rendering is a bit messed up) but here are some notes I'm taking for some of the nuts and bolts for the local models I'm running: https://mostlyobvious.org/?link=%2FReference%2FSoftware%2FGe...

2 comments

So I didn't see this before today, I will respond anyway and the siblings can also see. There is quite a big difference between the 65B model and especially the 13B and 7B models.

But here is the bash script[0] I launch my "go to" AI, it's called Omnius :)

As written in the previous comment, it is a modified version of the examples/chat-13b.sh that is included in the llama.cpp github.

[0] https://pastebin.com/SeKE3Uac

Ah thanks a lot, I tried out the llama.cpp examples before the k-shot chat prompts, this is definitely much better!

I have a 5950X as well, but sadly, token generation is a bit too slow for me now. (I've had turbo turned off for efficiency as well, but maybe I'll see if the extra cycles helps).

I'm giving 30B a try on my GPU now with https://github.com/oobabooga/text-generation-webui/wiki/LLaM... and if it's not good then will give layer offloading with 65B a try and see if I can get it running well.

super helpful, thank you.