Hacker News new | ask | show | jobs
by mlstudies 934 days ago
wouldn't that mean trying to fit it on one machine?
1 comments

Indeed :P

Honestly I'm not sure how context "sharding" works on multiple GPUs atm. Decent, really long context OSS models like Yi 200K and YARN finetunes are very new.