| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bick_nyers 632 days ago
	You could always split one of the experts up across multiple GPUs. I tend to agree with your sentiment, I think researchers in this space tend to not optimize that well for inference deployment scenarios. To be fair, there is a lot of different ways to deploy something, and a lot of quantization techniques and parameters.