Hacker News new | ask | show | jobs
by tyrael71 3613 days ago
'It's also worth checking out existing neural net code-bases to see what tricks they have. The fine details usually aren't in papers, and they're not all in the text-books either.'

Given that you are a person who is highly-qualified to answer, I am genuinely curious why do you think that is? Reimplementing algorithms from scratch is an efficient way to learn, understand the underlying concepts and attempt improvements in a research context.

2 comments

A lot of machine-learning papers are eight pages. Speech conference papers (heavy users of neural nets) are often only four. Some details aren't part of the main message, so don't make it in. Often code is available, and initialization and other tweaks can be found in there (even if you aren't going to use their code).

That said, there are also whole papers, even collected volumes, on initialization and other practical details.

Textbooks aren't always up-to-date with the latest practical knowledge, as deep-learning practice is moving quickly. Or they simply don't want to clutter their high-level maths descriptions with code-level implementation details. Teaching stuff is all about tradeoffs. I'm sure several books do mention the scale of weights for simple feed-forward weights though, as it's not an implementation-level detail, and it's probably been well known since the 1980s.

I'll weigh in; papers aren't necessarily worded to convey new information in an ideal manner (especially to newbies). They are worded so that expert researchers are able to reproduce them, especially the parts that constitute whatever their contribution is to the field.

As for textbooks, I imagine that the field is moving too fast; half the stuff I use has only existed for the past year or two.