|
|
|
|
|
by bobbylarrybobby
1203 days ago
|
|
> Do people just throw random layers and activation functions together until something works? In a lot of cases, yes. You can start with a reasonable baseline architectural guess, like “convolution should be good for vision” or “attention should be good for language”, but after that it's a lot of guess and check. |
|
I consider it a tragedy that we throw huge hogs of models into datacenters and let them churn without giving much thought to improving performance. The climate impacts relative to the real benefits are measurable and depressing for a field so focused on innovation. Usually the 100x model built by a team of "data scientists" over a month is no better than a 1x model built by a couple SMEs over a couple weeks.