Hacker News new | ask | show | jobs
by leminimal 1083 days ago
Are there project-based tutorial that talks more about neural net architecture, hyperparameters selection and debugging? Something that walks through getting poor results and make explicit the reasoning for tweaking?

When I try to use transformers or any AI thing on a toy problem I come up with, it never works. Even Fizz-Buzz which I thought was easy doesn't work (because division or modulo is apparently hard to represent for NNs). And there's this blackbox of training that's hard to debug into. Yes, for the available resources, if you pick the exact same problem, the exact same NN architecture and exact same hyperparameters, it all works out. But surely they didn't get that on the first try. So what's the tweaking process?

Somehow this point isn't often talked about in courses and consequently the ones who've passed this hurdle don't get their experience transferred. I'd follow an entire course on this if it were available. An HN commenter linked me to this

https://karpathy.github.io/2019/04/25/recipe/

which is exactly on point. But it'd be great if it were one or more tutorials with a specific example, wrapped in code and peppered with many failures.

3 comments

There’s an interactive neural network you can train here, which can give some intuition on wider vs larger networks:

https://mlu-explain.github.io/neural-networks/

See also here:

http://playground.tensorflow.org/

There's no great answer to this question. It is a bunch of tricks. Fundamentally:

If you're saying FizzBuzz doesn't work, presumably you mean that encoding the n directly doesn't work. Neither does encoding n from 0 to 1 or between -1 and 1 (and don't forget: obviously don't use relu with -1 to 1). It doesn't.

Neural networks can do a LOT of things, but they cannot deal with numbers. And they certainly cannot deal with natural or real numbers. BUT they can deal with certain encodings.

Instead of using the number directly, give one input to the neural network per bit of the number. That will work. Just pass in the last 10 bits of the number.

Or cheat and use transformers. Pass in the last 5 generations and have it construct the next FizzBuzz line. That will work. Because it's possible.

To make the number-based neural network for FizzBuzz "perfect" think about it. The neural network needs to be able to divide by 3 and 5. They can't. You can't fix that. You must make it possible for the neural network to learn the algorithm for dividing by 3 and 5 ... 2, 3 and 5 are relative primes (and actual primes). So "cheat" and pass in numbers in base 15 (by one-hot encoding the number mod 15 for example).

PM me if you'd like to debug whatever network you have together over zoom or Google meets or whatever.

https://en.wikipedia.org/wiki/One-hot

This may be catastrophically wrong. I only have a master's in machine learning (a European master's degree, meaning I've written several theses on it (didn't pass first time, had to work full time to be able to study), and I was writing captcha crackers using ConvNets in 2002. But I've never been able to convince anyone to hire me to do anything machine learning related.

Thanks for answering, what you wrote here is exactly the sort of thing I'm talking about. Something implicit that's known but not obvious if you look at the first few lectures of the first few courses (or blogs or announcements, etc).

You mention bag of tricks and that's indeed one issue but its worse than that because it includes knowing what "silent problems" needs a trick applied to it in the first place!

Indeed, despite using vectors everywhere, NN are bad with numerical input encoded as themselves! Its almost like the only kind of variables you can have are fixed size enums. That you then encode into vectors that are as far apart as possible, and unit vectors ("one hot vectors") do this. But that's not quite it and sometimes you can still some meaningful metric on the input that's preserved in the encoding (example: word embeddings). And so its again unclear what you can give it and what you can't.

In this toy example, I have an idea of what the shape of the solution is. But generally I do not and would not know to use a base 15 encoding or to send it the last 5 (or 15) outputs as inputs. I know you already sort of addressed this point in your last few paragraphs.

I'm still trying out toy problems at the time so it might be a "waste" of your time to troubleshoot these but I'm happy to take you up on the offer. HN doesn't have PMs though.

Do you remember when you first learned about the things you are using in your reply here? Was it in a course or just asking someone else who worked on NN for longer? I learned through by googling and finding comment threads like these! But they are not easy to collect or find together.

(I've added an email to my profile. I hope you can see it. Feel free to flick me an email or google chat me)
> This may be catastrophically wrong. I only have a master's in machine learning (a European master's degree, meaning I've written several theses on it (didn't pass first time, had to work full time to be able to study), and I was writing captcha crackers using ConvNets in 2002. But I've never been able to convince anyone to hire me to do anything machine learning related.

Oh wow, those are great credentials. I'm surprised that you haven't run across a position yet. Maybe it is a matter of your location? It seems like a lot of these jobs want onsite workers, which can be a real problem.

TBH, I get the feeling that a lot of us without such credentials are in a similar position right now. Slowly trying to work our way towards what seems to be a big new green field, but having a really unclear path to getting there...

Yes. I created a course which uses implementing Stable Diffusion from scratch as the project, and goes through lots of architecture choices, hyperparam selection, and debugging. (But note that this isn't something that's fast or easy to learn - it'll take around a month full-time intensive study.) https://course.fast.ai/Lessons/part2.html
Thanks for making that course. It was on my list of courses to look at since GPT-4 recommended it (with all the caveat that entails :) ). Thanks for also making notebooks available alongside the videos.

However, can you point me to the lectures where training happen (and architecture choices, hyperparam selection, and debugging happens.). I'm less familiar with SD but at a quick glance it seems like we're using a pretrained model and implementing bits that will eventually be useful for training but not training a new model, at least in the beginning of the deep dive notebook and first few lessons (starting at part 2, lesson 9).