|
|
|
|
|
by bee_rider
812 days ago
|
|
> In 2016, Microsoft released their chatbot named Tay on Twitter to learn from human interactions by posting comments. But after the release, it started to act crazy. > It started using vulgar language and making hateful comments. This was one of the first incidents of data poisoning. Is this true? I remember when this happened but I thought the story was that 4chan basically found an “echo” type debug command or something like that. The ML mode wasn’t being trained to say bad things, it was just being sent some kind of repeat-after-me command and then the things it was told to repeat were bad. It seems odd that somebody would write a whole blog post without bothering to check that, though, so maybe I’m mis-remembering? |
|
That is certainly what Microsoft wanted people to think[0]:
> a coordinated attack by a subset of people exploited a vulnerability in Tay.
Realistically, though, Tay’s website was open about using tweets directed at it as part of its training set[1]:
> Data and conversations you provide to Tay are anonymized and may be retained for up to one year to help improve the service.
So all that this group did was tweet racist things at it, and it ended up in its training set. Microsoft hints at it in the earlier blog post:
> AI systems feed off of both positive and negative interactions with people. In that sense, the challenges are just as much social as they are technical.
There are technical solutions for this issue however; for instance, when creating ChatGPT, the OpenAI team designed ChatML[2] to distinguish assistant messages from user messages, so that it would send messages in the style of the assistant only, not in the style of the user. Along with RLHF, it allowed OpenAI to use ChatGPT messages as part of their training set.
[0]: https://blogs.microsoft.com/blog/2016/03/25/learning-tays-in...
[1]: https://web.archive.org/web/20160323194709/https://tay.ai/
[2]: https://github.com/MicrosoftDocs/azure-docs/blob/main/articl...