| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by version_five 1045 days ago
	Where does the performance difference come from? And in what kind of processor & gpu? I didn't even know llama.cpp had a 32 bit option. For now I'm pretty suspicious it's a fair comparison.

1 comments

tjake 1045 days ago

The default for `convert.py` is F32. This is just SIMD CPU comparison.

Jlama uses the vector api in java20 but also better thread scheduling with work stealing and zero allocation.

link

belfthrow 1044 days ago

Could you link to some of the examples in your repo where you enforce the zero allocation? I don't see much reuse of the buffers, eg float buffers and there is quite a lot of array based heap allocation. Just for my own interest. Many thanks. Cool to see the use of the new vector api also.

link

version_five 1045 days ago

Very interesting, I'll watch for the quantized version.

link