In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.
Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.
Hopefully Microsoft keeps pushing BitNet too, so only "1.58" bits are needed.
I think fractional representations are only relevant for training at this point, and bf16 is sufficient, no need for fp4 and such.