| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by maximilianburke 1086 days ago
	Is it possible we will we eventually see 1-bit weights in use?

2 comments

brucethemoose2 1086 days ago

There are already papers on it, and there is 2-bit quant in llama.cpp.

But it seems to be past the point of diminishing returns, where you mind as well use a model with fewer parameters... For now.

There was another scheme in a paper where the "sparse" majority of the model was highly quantized, while the "dense" part was left in FP16, with good results.

link

touisteur 1085 days ago

For some time I played with Brevitas and Xilinx's FINN, you could quantize like crazy. I haven't looked since transformers took over the AI world where they were.

link