| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jimfleming 2450 days ago

More anecdata: we consistently outperform lightgbm, xgboost, random forests, linear models, etc. using neural networks even on smaller datasets. This applies whether we implemented the other algorithms ourselves or simply compared to someone else’s results with them. In my experience it really comes down to how many “tricks” you know for each algorithm and how well can you apply and combine these “tricks”. The difference is that neural networks have many more of these tricks and a broader coverage of research detailing the interactions between them.

I call them “tricks” but really they’re just design decisions based on what current research indicates about certain problems. This is largely where the “art” part of neural networks comes from that many people refer. The search space is simply too big to try everything and hope for the best. Therefore, how a problem is approached and how solutions are narrowed and applied really matter. Even simple things like which optimizer you use, how you leverage learning rate schedules, how the loss function is formulated, how weights are updated, feature engineering (often neglected in neural networks), and architectural priors make a big difference on both sample efficiency and overall performance. Most people, if they’re not just fine-tuning an existing model, simply load up a neural network framework, stack some layers together and throw data at it expecting better results than other approaches. But there’s a huge spectrum from that naive approach to architecting a custom model.

This is why neural networks are so powerful and why we tend to favor it (though not for every problem). It’s much easier to design a model from the ground up with neural networks than it is for e.g. xgboost because not only are the components more easily composable thanks to the available frameworks but there’s a ton more research on the specific interactions between those components.

That doesn’t mean than every problem is appropriate for neural networks. I completely agree with you that no matter what the problem is you should never jump to an approach just because its popular. Neural networks are a tool and for many problems you need to be comfortable with every one of those decision points to get the best results and even if you’re comfortable it can take time and that isn’t always appropriate for every problem. My other point is that I wouldn’t draw too many conclusions about a particular algorithm being better or worse than another. I’m not saying that was the intention with your comment but I know many people in the ML industry tend to take a similar position. It really depends on current experience with the applied algorithms, not just experience with ML in general.

3 comments

suresk 2450 days ago

This was a really interesting and insightful comment, thanks for sharing. I think the conclusion I shared in my sibling comment was probably a little too broad.

I particularly like this:

> In my experience it really comes down to how many “tricks” you know for each algorithm and how well can you apply and combine these “tricks”. The difference is that neural networks have many more of these tricks and a broader coverage of research detailing the interactions between them.

This is pretty true - the lack of knobs to turn on something like XGBoost or LightGBM both make it pretty easy to get good results and harder to fine tune results for your specific problem. Maybe this isn't the most correct way to look at it, but I've always sort of pictured it as curve where you are plotting effort vs results, and the one for LightGBM/XGBoost starts out higher but is more flat, and the NN one is steeper.

I guess reading your post makes me wonder where the two curves cross? Do you have good intuition for that, or do you feel so comfortable with neural networks that they are sort of your default? I peeked at the company you have listed in your bio, and it looks like you have pretty deep experience with neural networks and work with other people who have been in research roles in that area too, and I wonder how that changes your curve compared to the average ML practitioner? Certainly figuring out how to pick the best layer combinations, optimizer, loss functions, etc benefits hugely from intuition gained over years of experience.

link

jimfleming 2449 days ago

I think your conclusions are accurate. For many problems LightGBM or xgboost can often yield decent results in short amounts of time and for many problems that’s sufficient. A lot of the work we do is about pushing the results as far as we can take them and the business case justifies the extra time it can take to get there. For those types of problems, today, we would probably choose a neural network because then we have a lot more knobs as you mentioned.

Just like the rest of ML, whether neural networks are the right choice still depends on the problem at hand and the team implementing the solution. It definitely impacts where the performance / time curves intersect. If we just need something decent fast, or we’re working with another team that doesn’t have the same background, we tend to focus on approaches with fewer moving pieces. If we need the best possible performance, have a qualified team to get there, and have the time to iterate on development then the curves would favor neural networks.

link

oli5679 2449 days ago

Do you have any advice on how to increase performance of NN? Id be interested to see some examples of NN doing better than lgbm benchmark on medium size tabular data. What black magic is needed to achieve this? Would be super valuable to my job:)

This tuning approach gets good results for Lightgbm. I'd recommend using TimeSeriesSplit.

https://www.kaggle.com/nanomathias/bayesian-optimization-of-...

I've seen colleagues do something like this, or random search over NN architecture (NUM layers, nodes per layer, learning rate, dropout rate), always falling short of results this archives, despite far longer time to code up an tune model.

link

jimfleming 2449 days ago

It’s really very problem dependent. I allude to a few low-hanging things in my post above: e.g. feature engineering. Just because neural networks have an easier time learning non-linear feature transformations doesn’t mean its good to ignore feature engineering entirely.

Possibly more important is to focus on the process for how you derive and apply model changes. You get some model performance and then what? Rather than throwing something else at the model in a “guess-and-check” fashion, be methodical about what you try next. Have analysis and a hypothesis going in to each change that you make and why it’s worth spending time on and why it will help. Back that hypothesis by research, when possible, to save yourself some time verifying something that someone else has already done the legwork on. Then verify the hypothesis with empirical results and further analysis to understand the impact of the change. This sounds obvious (it’s just the scientific method) but in my experience ML practitioners and data scientists tend to forget or were never taught the “science” part. (I’m not accusing you of this; it just tends to be my experience.)

Random search, AutoML, hyperparameter searches, etc. are incredibly inefficient at model development so they’ll rarely land you in a better place unless a lot of upfront work has been put in. For us, they’re useful for two things: analysis and finalization. For analysis the search should be heavily constrained since you’re trying to understand something specific. For finalization of a model before going into production, a search on only the most sensitive parameters identified during development usually yields additional gains.

link

Stasis5001 2449 days ago

Any good references on the art part, or is it just an intuition you develop over time? In my experience, all the ML education will teach you a ton of theory and basics but none of the practical details you're referring to.

link

jimfleming 2449 days ago

It takes time and a lot of hands-on experience. Many ML teams tend to work on one or just a few tightly coupled project for years. By contrast, we’ve worked on a lot of unique projects with real-world constraints so it gives us a different perspective. An important part has been developing a rigorous process; sort of a framework for applying the “art”. As you mention, this often isn’t covered in ML or data science education, which tends to focus on (important) fundamentals.

link