|
|
|
|
|
by blackbear_
1292 days ago
|
|
Sorry but this is just wrong, using only fully connected layers would result in pretty bad performance on images, text, audio, etc., or at the very least require much more data to perform well. At least use the right type of architecture for each data modality, then I agree that the basic version won't perform much worse than sota in the real world. |
|
There are many rules of thumb that took the last 5+ years to discover but are now quite standard. You are nit picking on fully connected, but if we add dropout, weight initialization, and adaptive learning rate to what they said, then we are fairly close to being able at least get a deep architecture to overfit a toy dataset and be off to the races for then applying it to a larger dataset.