Hacker News new | ask | show | jobs
by jonnycat 803 days ago
Can we start a list of technological magic that is actually "1000 people in India watching and labeling videos" (or functional equivalent)?
8 comments

Not India, but my favorite example was the Kiwi food delivery robot fleet in Berkeley, CA. They were controlled manually from Colombia, and from the looks of it, seems like one person was trying to drive 20 robots at once.
A lot of these robot delivery things need to go out with a human handler 100 yards back lest people try and kick them open it seems. Whats the point at that point? Just have the person walk it over.
These ones didn't have a human handler. I think a lot of them just got broken into. I've also seen one get run over by a truck before because it wandered into the road.

Also, even if nothing went wrong, they were so slow that I can't imagine people actually used them. Wonder if they were just pretending to deliver stuff.

Expensify was a pretty well known case of this several years ago — their marketing was all about their advanced scanning technology, and it turned out they were using Mechanical Turk in many cases with little concern for PII (or corporate security) concerns.

(I have no idea if this is still the case, for the record.)

That would explain why their receipt scanning is so damn slow even for easily scannable PDF receipts.
In robotics this was called a "wizard of oz" approach. Where when you pull back the curtain it's much less impressive than it seems on the surface.
while I agree with the sentiment, as an Indian, I hope this doesn't happen in India. countries which typically do this mechanical turk-like work typically don't raise themselves out of poverty (esp. Philippines, Indonesia, etc.). If anyone wants a specific example, I lead an aspect of web crawling for a FAANG and then other public companies. Over the last 10 years we heavily used those offshore teams, aforementioned, to do sanity checks/labeling, etc. Now, we have initiatives with GPT APIs which perform just as well for pennies on the dollar we spent offshore - and the offshore team that's been loyal for years? They're getting cut.
That's just exploitative business.

I know companies that operate in that space and they pay incredibly well, between $20 to $50/hour.

> GPT APIs which perform just as well

That's because they were also trained by exploiting third world groups, paying about $2/hour.

The problem here isn't offering work to developing countries, the problem here is major corporations squeezing them for every cent and not allowing it to be used as a means of getting out of poverty. And that's also why the workers end up performing half-assed work by using automated classifiers and faking their credentials. It's not hard to see where this goes for both.

if you don't think FAANGs (and most companies) participate in "exploitative business" you should find out how your iphone was made (hint: lots of exploited workers).
Never said it wasn't. Amazon's antics especially are well known. The point here is that data labelling itself isn't fundamentally exploitative, even when leveraging developing countries.
> the offshore team that's been loyal for years

Why, though? I'd say outsourcing that way is a clear indicator that loyalty is not part of the picture.

I wonder if GPT-4's performance has degraded in recent months because there are less human data contractors on standby to answer questions GPT flags as low-confidence. GPT might be "refusing to answer questions" because it's not able to escalate tough queries to a human.
Not plausible, even when it's on slow mode it's too fast to be contacted out to humans
To be clear: ChatGPT-4 is in general both far too fast and far too stupid for humans to be answering any more than a tiny fraction (<< 1%) of the queries.

But last year I repeatedly saw ChatGPT-4 respond token-by-token much more slowly than a human would! E.g. several seconds between words. It was clearly not a human responding: at least a few times I was testing on preschool counting questions and GPT-4 was not able to answer them. I interpreted the slowness as GPT's poor quantitative reasoning. But what you're saying is simply not true, sometimes ChatGPT-4 is (or was) extremely slow.

Regardless, if OpenAI was running this con it probably wouldn't have been real-time humans writing. First of all it might be enough to have a human in the "mixture of experts" who decides the best of multiple responses when GPT-4 is unable to come to an automated conclusion. But humans could be writing ChatGPT responses due to a quirk in their UX:

- ChatGPT errors out on a certain question and asks you to try again later, as it does (or used to do) frequently

- the response is prepared by the human contractor while the user waits patiently for ChatGPT to resolve its technical difficulty

- when the user asks again ChatGPT can largely read off the answer, using its (genuine) language-processing abilities to handle variations in phrasing/etc

I suspect some/many of the online exam proctoring services are that way. Even some that market themselves as fully AI driven might be really AI = Actually Indians
sort of surprised it wasn't just using their Mechanical Turk site
Mechanical Turk suffers from coordinated fraud by people who want to be paid for doing a task without actually doing it [1]. The company I work for had to spend more engineering effort on building an internal reviewing-the-reviewers system to make it useful than we spent on the original Mechanical Turk integration. I'm not surprised that Amazon would avoid Mechanical Turk for higher consequence applications.

[1] e.g. https://timryan.web.unc.edu/2020/12/22/fraudulent-responses-...

It will be fun to watch as AI tools flood mainstream.

We already had a lawyer having case thrown out because he didn’t do the job properly and got hallucinations from LLM.

Mostly because people don’t want to work.

Quite a few of the current LLM chatbots from big players are at least partially trained in this way.