| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eigenvalue 506 days ago
	Fair enough, but that still uses a lot more memory during training than what DeepSeek is doing.