|
|
|
|
|
by galangalalgol
1199 days ago
|
|
I can't imagine the architects of the model didn't design the layers and kernels without some idea in mind of what sorts of operations they would be wanting to simulate right? Do people just throw random layers and activation functions together until something works? I'm only on the periphery of dl but those I know that work in it either have a general idea of what equuations they would want to use to do it without dl, or have a biological inspiration in mind. This doesn't seem like a biologically inspired sort of domain. |
|
In a lot of cases, yes. You can start with a reasonable baseline architectural guess, like “convolution should be good for vision” or “attention should be good for language”, but after that it's a lot of guess and check.