| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 1052 days ago
	That link above is the fork ^ It uses the ggml library, just like llama.cpp does, and is indeed a fork of llama.cpp's implementation of ggml.

1 comments

version_five 1052 days ago

Right, I'm being stupid, that's the fork I saw earlier today I didn't realize. Have you tried it? Iirc the documentation mentioned at 2-bit quantizatikn of the 40B model performing well. I've been using a 5-bit 7B llama2 which I'm generally happy with (because it can run in a pretty crappy machine) but interested to see the differences.

link

brucethemoose2 1052 days ago

I wouldn't go lower than Q3_K_S, as its basically the same filesize, and llama 33B has a big perplexity dropoff.

link