Hacker News new | ask | show | jobs
by beernet 1385 days ago
It depends on the "DL model", which is a highly vague term. Both a model with 10K parameters and a model with 10T parameters fit this description equally Well.