Hacker News new | ask | show | jobs
by moffkalast 1307 days ago
Sure, as long as you've got 350 GB of VRAM to load it.
1 comments

It's just 12B parameters, not 175B. You can probably use int16 or int8.
That's afaik already the size when using half precision. And it's not like GPT3 doesn't get half queries wrong as-is in its optimal form, detracting from it will just make it unusable. Check out GPT-J if you want to see a functionally useless model that can be run locally.

It's not like this is something that can be surmounted without extra storage, neural nets are just a fancy system for compression and data access. You can either go for less data or more compression but either way your results won't ever be as good.