|
|
|
|
|
by polishgladiator
1046 days ago
|
|
Based on the integration examples, I don't think they are simply repackaging llama.cpp Rather it looks like they are reimplementing their own quantization scheme, in such a way that it is a little easier to integrate for basic python users, at the cost of performance (compared to llama.cpp and others). Given that the bar for integrating something with higher perf like llama.cpp isn't very high (and that's the way the world is heading -- ask any 15 year old interested in this stuff), I can't see anything of value here. |
|