| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 982 days ago
	Indeed :P Honestly I'm not sure how context "sharding" works on multiple GPUs atm. Decent, really long context OSS models like Yi 200K and YARN finetunes are very new.