| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kapildev 1044 days ago
	MKML says that they reduced the size of Llama2-13B model from 26GB to 10.5GB. Similar offering from TheBloke (your first link) is a 10.7GB Q6_K model. Maybe, they are using GGML and llama.cpp and packaging it in an attractive way while making people believe it is some proprietary tech.

1 comments

polishgladiator 1044 days ago

Based on the integration examples, I don't think they are simply repackaging llama.cpp

Rather it looks like they are reimplementing their own quantization scheme, in such a way that it is a little easier to integrate for basic python users, at the cost of performance (compared to llama.cpp and others).

Given that the bar for integrating something with higher perf like llama.cpp isn't very high (and that's the way the world is heading -- ask any 15 year old interested in this stuff), I can't see anything of value here.

Lindon4290 1043 days ago

Looks their performance is better than llama.cpp - https://news.ycombinator.com/item?id=37018989 - and scales to batches of prompts.

polishgladiator 1043 days ago

Actually no -- that post shows they are not performing measurements and comparisons correctly.

These are not serious people.