| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ilaksh 481 days ago
	There is a limit due to the need to keep model responses nearly instant and the trade off that smaller models that are generally capable of that have. Unless you have unique hardware Only Cerebras can run medium to large models at truly near instant speed.