Hacker News new | ask | show | jobs
by mattkrause 1624 days ago
I'm sure this happens, but do you think the problem is actually one of mathematical savvy?

My guess would be that more machine learning projects go off the rails for want of understanding the data or the {business, research} problem.

2 comments

My experience is bulk of the problem is insufficient monitoring. ML systems need heavy monitoring and should be sending lots of metrics to stuff like prometheus/grafana. There should also be validation/consistency checks for all data pipeline/feature transformations. And you should strongly avoid duplicating logic for stuff like feature preprocessing. I've seen people implement "same" feature preprocessing pipeline twice (one python, one java) and it is so common to find edge case bugs for a long time especially when these bugs only slightly impact model behavior.

Another issue is proliferation of data pipelines. The more distinct pipelines you have, the more painful they become to monitor. It is much better to minimize pipelines and do views on a small number. I think proliferations of models is a similar issue. It is often easier to build 4 models instead of 1 multi-task model, but monitoring/operational tasks grow more and more painful as you manage more models.

Not necessarily mathematical savvy though a lot of deeper understanding can follow from a strong grasp on the fundamentals. I think it has more to do with the alignment between intuitions and outcomes, and this is not taught well in most academic programs as far as I can tell.