Hacker News new | ask | show | jobs
by maximilianburke 1086 days ago
Is it possible we will we eventually see 1-bit weights in use?
2 comments

There are already papers on it, and there is 2-bit quant in llama.cpp.

But it seems to be past the point of diminishing returns, where you mind as well use a model with fewer parameters... For now.

There was another scheme in a paper where the "sparse" majority of the model was highly quantized, while the "dense" part was left in FP16, with good results.

For some time I played with Brevitas and Xilinx's FINN, you could quantize like crazy. I haven't looked since transformers took over the AI world where they were.