Hacker News new | ask | show | jobs
by leminimal 1095 days ago
Maybe this is more of a general ML question but I faced it when transformers became popular. Do you know of a project-based tutorial that talks more about neural net architecture, hyperparameters selection and debugging? Something that walks through getting poor results and make explicit the reasoning for tweaking?

When I try to use transformers or any AI thing on a toy problem I come up with, it never works. And there's this blackbox of training that's hard to debug into. Yes, for the available resources, if you pick the exact problem, the exact NN architecture and exact hyperparameters, it all works out. But surely they didn't get that on the first try. So what's the tweaking process?

1 comments

There is A. Karpathy's recipe for training NNs but it is not a walkthrough with an example:

https://karpathy.github.io/2019/04/25/recipe/

but the general idea of "get something that can overfit first" is probably pretty good.

In my experience getting the data right is probably the most underappreciated thing. Karpathy has data as step one, but in my experience, also data representation and sampling strategy does quite the miracle.

In Part II of our book we do an end-to-end project including e.g. a moment where nothing works until we crop around "regions of interest" to balance the per-pixel classes in the training data for the UNet. This has been something I have pasted into the PyTorch forums every now and then, too.

Thanks for linking me to that post! Its much better at expressing what I'm trying to say. I'll have a careful read of it now.

I think I'm still at a step before the overfit. It doesn't converge to a solution on its training data (fit or overfit). And all my data is artificially generated so no cleaning is needed (though choosing a representation still matters). I don't know if that's what you mean by getting the data right or something else. Example problems that "don't work": fizzbuzz, reverse all characters in a sentence.