| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by espadrine 859 days ago

> 4chan basically found an “echo” type debug command or something like that

That is certainly what Microsoft wanted people to think[0]:

> a coordinated attack by a subset of people exploited a vulnerability in Tay.

Realistically, though, Tay’s website was open about using tweets directed at it as part of its training set[1]:

> Data and conversations you provide to Tay are anonymized and may be retained for up to one year to help improve the service.

So all that this group did was tweet racist things at it, and it ended up in its training set. Microsoft hints at it in the earlier blog post:

> AI systems feed off of both positive and negative interactions with people. In that sense, the challenges are just as much social as they are technical.

There are technical solutions for this issue however; for instance, when creating ChatGPT, the OpenAI team designed ChatML[2] to distinguish assistant messages from user messages, so that it would send messages in the style of the assistant only, not in the style of the user. Along with RLHF, it allowed OpenAI to use ChatGPT messages as part of their training set.

[0]: https://blogs.microsoft.com/blog/2016/03/25/learning-tays-in...

[1]: https://web.archive.org/web/20160323194709/https://tay.ai/

[2]: https://github.com/MicrosoftDocs/azure-docs/blob/main/articl...

1 comments

bee_rider 859 days ago

> That is certainly what Microsoft wanted people to think[0]:

Maybe I’m reading between the lines in your post too hard, but are you saying they wanted people to think this because it is somehow less embarrassing or makes them look better? Including this “repeat after me” functionality seems like an extremely stupid move, like I must assume they found the 3 programmers who’ve never encountered the internet or something.

In 2016, I can see thinking they got the filtering right and that users wouldn’t be able to re-train the bot as a sort of reasonable mistake to make, on the other hand. It doesn’t look so bad, haha.

link

espadrine 859 days ago

Yes, they employed security terminology for something that was instead data pipeline contamination. As the saying goes, garbage in: garbage out. I don't mean to be harsh on them though: experimentation is useful, and it became a great lesson on red teaming models.

link