| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shubb 930 days ago
	>> You won't be able to run this on your home GPU. Would this allow you to run each expert on a cheap commodity GPU card so that instead of using expensive 200GB cards we can use a computer with 8 cheap gaming cards in it?

2 comments

dragonwriter 930 days ago

> Would this allow you to run each expert on a cheap commodity GPU card so that instead of using expensive 200GB cards we can use a computer with 8 cheap gaming cards in it?

I would think no differently than you can run a large regular model on a multiGPU setup (which people do!). Its still all one network even if not all of it is activated for each token, and since its much smaller than a 56B model, it seems like there are significant components of the network that are shared.

link

terafo 930 days ago

Attention is shared. It's ~30% of params here. So ~2B params are shared between experts and ~5B params are unique to each expert.

link

terafo 930 days ago

Yes, but you wouldn't want to do that. You will be able to run that on a single 24gb GPU by the end of this weekend.

link

brucethemoose2 929 days ago

Maybe two weekends.

link