Y
Hacker News
new
|
ask
|
show
|
jobs
by
CuriouslyC
1217 days ago
7 billion can run on 16+ gb GPUs as fp16, 14 billion can be run on 16+ gb if quantized to int8. 14G @ fp16 and 30G at int8 will require one of the 48 gb cards (less, but hardware mostly goes 24 -> 48).
2 comments
brucethemoose2
1217 days ago
Requirements could be reduced with something like DeepSpeed or ColossalAI (or even just simple hacks to move bits to RAM more aggressively)
link
blablablub
1217 days ago
thanks
link