Hacker News new | ask | show | jobs
Four lessons from a year building tools for machine learning (humanloop.com)
76 points by peadarohaodha 1805 days ago
4 comments

I find #1 "Subject matter experts have as much impact as data scientists" surprising only in that it was considered surprising.
I think this is one of those points that is obvious in retrospect but almost universally under appreciated.

Almost all data science workflows treat the annotators or subject matter experts as secondary. The tooling isn't set up to put them at the centre of the process and make it easy for them to collaborate with the more technical folks.

Perhaps it should be obvious but its definitely over looked in much of academic ML and in MLops.

> its definitely over looked in much of academic ML

right - but it seems like a, if not the, first lesson you learn when you leave the classroom for the "real" world.

It's not surprising to me that any people think this way, but it seems to be a characteristic of inexperience (or narrow experience).

Maybe so but most data science workflows still don't acknowledge this "obvious" truth.
Compare this with the famous quote:

> Every time I fire a linguist, the performance of our speech recognition system goes up. - Fred Jelinek

> Every time I fire a linguist, the performance of our speech recognition system goes up. - Fred Jelinek

This one is easy to misapply. If you are applying your domain experts to the model, you might have a bad time. If you are applying them to the data, most likely not. And data is usually more important than the model.

> And data is usually more important than the model.

idk. we went from conv nets to transformers just to have the quality of our predictions go up as well as reducing the amount of data prep time by a factor of 20.

no change in data, just a better model.

in my field, improvements are nearly always made in the model. never in the data or data prep. (crowd countinf, people tracking, etc)

I very nearly said this myself!

I think the mistake of this quote is in the application of the expertise. The bitter lesson is that data + compute can outperform inductive biases but that doesn't mean you don't need domain expertise to get the right data.

The 80s called and and want their subject matter training back
they have a kick-ass ML team including David Barber[1] but could use a good web designer it seems.

I also wish it was 'one lesson from four years of building tools for ML'.

On a serious note, there is a book on Human-In-The-Loop ML by Robert Monarch, published just a few weeks ago [2], where concepts like "active learning" are elucidated. Also, Andrew Ng recently started 'Data-Centric AI' competition, focusing on improving the data but keeping the model fixed[3].

There seems to be a growing emphasis on data quality while models become commoditized and outsourced to 'ML as a service' (MLAAS) platforms. If I understood correctly humanloop project aspires to be 'all-in-one' MLAAS serving both the models/predictions but also taking care of data annotations, targeting the market currently served by e.g. Scale.AI and Salesforce Einstein.

[1] Bayesian Reasoning and Machine Learning http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...

[2] Human-in-the-Loop Machine Learning https://www.manning.com/books/human-in-the-loop-machine-lear...

[3] https://https-deeplearning-ai.github.io/data-centric-comp/

Hi Andy, thanks for the feedback on the site! We're actually redesigning at the moment so it should hopefully be fresher soon :P. Also great pointer to Rob Munroe's book. He actually used to be CTO at figure 8 before they were acquired.

You seem to be pretty clued up on the area, what do you see as the pros and cons of an end-to-end approach?

I'm actually using Scale.AI and few other annotation products, if you can provide a clear example how your product stands out/compares to existing annotations services that would be great. Specifically focusing on quality of annotations.

Normally we do this kind of benchmark internally by sending the same dataset to each service and running some stats on the results, but if a vendor comes in with a ready to use comparison report that would be easier sale.

As for end-to-end you would be competing with large internal ML teams and revenue bringing internal ML engines, i'm probably not the right audience for that type of product. Salesforce seems to be doing alright on that front, but from my discussions with them there is a lot of hand-holding and customizations for each client use case, it's a high-touch thing.

We see ourselves as quite different to Scale really as we don't provide annotation services, mainly the software.

One of the main differences is that we've pretty exclusively focussed on language rather than vision which has quite a different tech stack.

We also view human-in-the-loop not just as a way to get better data but actually as a better deployment paradigm.

P.s You're right that David is awesome btw!

>In just a few hours the lawyers had trained a model that provided the outcome of all 80,000 judgements without needing the input of a data scientist at all.

If this is meant to imply it predicted them all correctly, that rings alarm bells to me. 100% accuracy is much more likely to mean something is wrong (label leakage?) than it is to mean you have an amazing model.

No, not 100% accuracy. I've left out details for the sake of brevity but with a precision and recall high enough for the team to be able to answer the questions they cared about.
Not how i intended to kick off the discussion but is anyone else seeing really messed up formatting? Like this https://ibb.co/5LF2fY0 (bit of mare today getting ghost on a subdirectory...)