| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by snek_case 95 days ago
	Probably constrained by training resources. It's much easier to experiment with a smaller architecture. You may need many training runs to figure out hyperparameters for example. If each run needs multiple GPUs for a week the cost adds up quickly. I think it makes a lot of sense to start small.