Hacker News new | ask | show | jobs
by visarga 1307 days ago
It's just 12B parameters, not 175B. You can probably use int16 or int8.
1 comments

That's afaik already the size when using half precision. And it's not like GPT3 doesn't get half queries wrong as-is in its optimal form, detracting from it will just make it unusable. Check out GPT-J if you want to see a functionally useless model that can be run locally.

It's not like this is something that can be surmounted without extra storage, neural nets are just a fancy system for compression and data access. You can either go for less data or more compression but either way your results won't ever be as good.