Hacker News new | ask | show | jobs
by Cybiote 2727 days ago
> Nobody cares how long it takes to train a model.

This isn't true. It depends on your priorities and goals. Machine learning that spends most of its time unable to learn is not real AI. Some of us are interested in sample and energy efficient learning capable of on-line incremental updates immune to catastrophic forgetting. Not just because this is truer to actual learning but because it moves away from being dependent on a handful of companies to do the actual training.

Anticipating some replies: no, transfer learning or meta-learning methods don't really avoid this. In the case of transfer learning, you still have that high coupling between a handful of sources. The down-sides of this is its own discussion. In addition, there are times where the ability to extract local relations can be dulled by the dominant wikipedia and common-crawl representations. Meta-learning gets you fast updates but you still cannot stray too far away from the domains that were met at training time.

> What matters is prediction speeds

I'm not a fan of bag of words models either but a simple dot product is always going to be faster than many matrix multiplies and or convolutions. The implementor should always try these as a base-line and decide if the performance accuracy trade-off is worth it for them.

1 comments

Nobody in business cares if you are doing proper AI or dumb curve fitting. What matters is the complexity (engineering debt) and performance (accuracy, robustness).

Online learning, sample - and energy efficiency are unrelated to training times. Like said: nobody cares if you ran Vowpal Wabbit for 1 hour or 100 hours, as long as you are not constantly babysitting it and calling that paid work (or have the unusual requirement of daily retraining while using an online model).

> simple dot product is always going to be faster than many matrix multiplies

If you care about this (because it is profitable), you rewrite in lower-level language or predict with cloud GPU (which will be at least comparable to simple dot product, while adding performance)

You've clarified your stance from nobody to nobody in business. That's good, although, I think that is opinion based on your experiences. I suspect that business will care if researchers can make it easy to learn on premise on their small datasets while maintaining high accuracy. The ability to easily update and adapt under non-stationarity without having to retrain from scratch benefits all. The same is true of models that maintain uncertainty or that can explain decision outputs. Tracking uncertainty, robustness to changes, on-line updatability and explainability are all related in that they are examples of things that become easier under causal modeling.

A parallel discussion we are having is whether the gain in accuracy is always worth the gain in complexity and loss in speed. It's something to decide on a case by case basis. It's basic hygiene to reach for the simplest model first.

> Nobody in business cares if you are doing proper AI or dumb curve fitting.

What is proper AI? It's all dumb curve fitting right now.