Hacker News new | ask | show | jobs
by carpdiem 670 days ago
https://www.mnist.org

I wanted to actually build first-hand intuition on all of the choices around hyperparameter choices, activation functions, network architectures, etc. So I've been rigorously exploring them by training and testing models off of the mnist dataset.

Coming up soon: vision transformers, depth-of-architecture on CNNs, batch size investigations, and more.

Let me know if any of you have any suggestions of things to investigate next!