Hacker News new | ask | show | jobs
by visarga 2790 days ago
Depends. Many standard layers express a form of prior knowledge. A CNN layer embeds the assumption of spatial translation invariance, an RNN does the same for temporal translation. Graph Neural Nets have permutation invariance. Assumptions can also be expressed as regularisation terms added to the loss function. One common practice is to initialise a net with the weights of another net trained on a related task - usually CNNs trained on ImageNet, and word embeddings for NLP (though lately it is possible to use deep neural nets such as BERT, ELMo, ULMFiT and OpenAI transformer pre-trained on large text corpora).