| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Jrxing 244 days ago
	Sharing the big GPU cluster with non-latency critical load is one solution we also explored. For this work, we are targeting more on the problem of smaller models running SOTA GPUs. Distilled/fine-tuned small models have shown comparable performance in vertial tasks.