Hacker News new | ask | show | jobs
by Tepix 12 days ago
No need to try really. 1100b weights with 256GB RAM that‘s less than 1.8 bits per weight if you want a little bit of context.

How is that supposed to give good results?