| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by llm_trw 477 days ago
	Data is king. Even when a new better model comes along a high quality dataset is still just as valuable. Paying top performers above market rates to do nothing but data labelling is a moat that just keeps getting deeper.

1 comments

xnx 477 days ago

Good data and good evals are two legs of the 3-legged stool that a lot of AI teams are missing.

link

antognini 477 days ago

It also can't really be overstated how helpful it is as an ML engineer to simply spend the time going through thousands of examples yourself. If you abstract yourself away from the data and just "make metric go up" you'll be missing out on valuable insights about how and why your model might be failing.

link

almostgotcaught 475 days ago

It's almost as if (bear with me ...) these "artificial intelligences" actually need "human intelligences" to guide them. Maybe we can think up a "system" where "experts" can codify rules for the "artificial intelligence" to follow.

Ok the sarcasm got too thick but my point is if the engineer has to spend the time to comb thousands of examples then you don't have AI you have a man in a box pretending to be a machine that plays chess.

link

llm_trw 474 days ago

We have human teachers for much the same reasons.

Are humans just other humans hiding in boxes pretending to play chess?

link

jsemrau 477 days ago

What would a product look like in this space?

link

llm_trw 477 days ago

It's not a product. It's business core competency in the ml space.

link

ellisv 476 days ago

There are several data labeling products on the market such as Label Studio.

I’ve resorted to building my own annotation apps.

link

sgarland 476 days ago

For my one foray into ML, in 2020, I also built my own labeling system. It was stupidly simple; IIRC, it was a Jupyter Notebook that presented you with text to label, and you’d do so by hitting 1-5, which were mapped to sentiments / emotions. If you got bored, or just wanted to see how it performed with X% training, you could save progress and quit. It worked well enough, and I think I labeled a couple of thousand entries using it.

link

collingreen 475 days ago

I ALSO have resorted to building my own labeling even though there are great generic labeling tools out there. I think this is a missing piece of the landscape but I don't know enough about the space yet to say what the solution should be.

link