Hacker News new | ask | show | jobs
by adsharma 253 days ago
2 bits out of FP8 would be 25% 2 bits out of FP16 would be 12.5%

I've seen recent work that claimed 70% of the params are used for memorization.