| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ianand 453 days ago
	The model architecture is the same during RL but the training algorithm is substantially different.