Hacker News new | ask | show | jobs
by 995533 2727 days ago
Nobody cares how long it takes to train a model. What matters is prediction speeds, which are comparable (and NLP less likely to require high frequency, where a few more milliseconds matters).

Besides that, the accuracy gains are not marginal anymore (BoW can't compete like it used to, especially with pre-trained models).

6 comments

> Nobody cares how long it takes to train a model.

This isn't true. It depends on your priorities and goals. Machine learning that spends most of its time unable to learn is not real AI. Some of us are interested in sample and energy efficient learning capable of on-line incremental updates immune to catastrophic forgetting. Not just because this is truer to actual learning but because it moves away from being dependent on a handful of companies to do the actual training.

Anticipating some replies: no, transfer learning or meta-learning methods don't really avoid this. In the case of transfer learning, you still have that high coupling between a handful of sources. The down-sides of this is its own discussion. In addition, there are times where the ability to extract local relations can be dulled by the dominant wikipedia and common-crawl representations. Meta-learning gets you fast updates but you still cannot stray too far away from the domains that were met at training time.

> What matters is prediction speeds

I'm not a fan of bag of words models either but a simple dot product is always going to be faster than many matrix multiplies and or convolutions. The implementor should always try these as a base-line and decide if the performance accuracy trade-off is worth it for them.

Nobody in business cares if you are doing proper AI or dumb curve fitting. What matters is the complexity (engineering debt) and performance (accuracy, robustness).

Online learning, sample - and energy efficiency are unrelated to training times. Like said: nobody cares if you ran Vowpal Wabbit for 1 hour or 100 hours, as long as you are not constantly babysitting it and calling that paid work (or have the unusual requirement of daily retraining while using an online model).

> simple dot product is always going to be faster than many matrix multiplies

If you care about this (because it is profitable), you rewrite in lower-level language or predict with cloud GPU (which will be at least comparable to simple dot product, while adding performance)

You've clarified your stance from nobody to nobody in business. That's good, although, I think that is opinion based on your experiences. I suspect that business will care if researchers can make it easy to learn on premise on their small datasets while maintaining high accuracy. The ability to easily update and adapt under non-stationarity without having to retrain from scratch benefits all. The same is true of models that maintain uncertainty or that can explain decision outputs. Tracking uncertainty, robustness to changes, on-line updatability and explainability are all related in that they are examples of things that become easier under causal modeling.

A parallel discussion we are having is whether the gain in accuracy is always worth the gain in complexity and loss in speed. It's something to decide on a case by case basis. It's basic hygiene to reach for the simplest model first.

> Nobody in business cares if you are doing proper AI or dumb curve fitting.

What is proper AI? It's all dumb curve fitting right now.

> Nobody cares how long it takes to train a model.

LOTS of people care how long it takes to train a model. A few minutes, vs. a day, vs. a week, vs. a month? Yea, that matters.

Think about how long it takes to try out different hyperparameters or make other adjustments while conducting research...

If you're Google maybe you don't care as much because you can fire off a hundred different jobs at once, but if you're a resource-limited mere mortal, yea, that wait time adds up.

Yes I agree. most people who come to us at alpes AI do care about training time. how fast they can do experiments

Another important aspect is training and incremental training on edge device.

At the time when privacy is becoming very important and you cannot export data from mobile devices etc. Training time on mobile is an important factor

If you are building large-scale systems that take weeks or months to train, you are at a point where you shouldn't care about this. Throw more compute at the problem, it will pay for itself.

If we are talking days or hours: start parameter search on Friday and return best parameters on Monday.

Do research and iteration on heavily subsampled datasets.

If you are building models for yourself, or for Kaggle, you may care in as much as your laptop gets uncomfortably hot.

Time to train a model matters for applications where you want to have end users training models on their own computers without spending so much CPU/GPU time that they have to plan their day around it.

Consider for instance an RSS reader that classifies articles to determine whether or not to interrupt the user with a notification. This should be fast to train and update the model on the fly every time the user enters a correction (e.g. 'this article actually isn't interesting', or 'interrupt me with articles like this in the future'.)

I would not retrain such a model on all data, just do online updates. Also I still think for that use case training times and latency are negligible (nobody cares or nobody notices any difference between training a BoW and bi-LSTM.)

If you are deploying on resource-constrainted devices (IE: low-end PC's without GPU), it is not unusual to take a lot of time training a model on a very powerful computer (which nobody cares about), then distilling or transfering the result for test time.

I recently had a very real world project be forced to abandon some promising methods because they were taking too long to train.
It was not possible to increase speed by getting more powerful compute?
No. Resources are not infinite, and we were already on the edge of what the resources at most sites where training would be done could be expected to have.
Thanks. I think you are a correct exception to what I said. I should have known that using words like "nobody" would not go over well on HN (but tedious to type "a very large percentage"), despite that statement being a verbatim quote from one of the world's leading ML engineers and, to me, not controversial.

I do consider the cloud both widely available and near infinite in resource adding capability.

If it is really not economically feasible to add resources, then the performance gains were not as promising as thought (whether cloud or on-site).

> Thanks. I think you are a correct exception to what I said. I should have known that using words like "nobody" would not go over well on HN (but tedious to type "a very large percentage")

In the future, you could use “most”.

So the problem in my circumstance is two-fold:

1) The ML experts in the field have all, pretty much, settled on the need for a uniform method to train models, but for each model needing to be trained on-site.

2) While the cloud might be near infinite in terms of adding capacity, "Hey guys, lets stage up some health-data compatible AWS instances to do something that was a side project we're not even sure will work" in what is always a cash-starved part of healthcare is...well...a pretty big ask.

> Nobody cares how long it takes to train a model.

That's a reckless generalization. I care.

My thesis would take forever if I didn't do any optimization. Also my data is 20 rows with ~6000 predictors.

There are models out there that can take months! I worked on one that took months. We had to tweak it and optimize it to see if we can get it to acceptable training time.

> "Nobody cares how long it takes to train a model."

In kaggle some competitions it takes over 7 hours to train a model, and I can generally think of 10 things a day to try. prediction only takes about a minute.

> "especially with pre-trained models" if the corpus are different, pre-trained models do not help much, if not hurt.