| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eric4smith 1590 days ago

Impressive BUT.

Who is defining toxic speech? Where is that data being taken from?

This is the definition of using AI to set what the edges of “speech” should be based on potentially flawed data.

This is a clown world.

2 comments

r3trohack3r 1590 days ago

> In this example, we’re using the Copilot extension for Visual Studio Code, and a free toxicity dataset that we built;

(Emphasis mine)

Following that link:

> Surge AI is a data labeling platform and workforce. Our labeling team pored over tens of thousands of social media comments to build this toxicity dataset. Each comment was then evaluated by multiple members of our team to determine its severity level.

link

bobsmooth 1590 days ago

I feel so sorry for the labeling team. Hope they were paid well.

link

Xorlev 1590 days ago

I think you missed the forest for the trees. It isn't the model that matters, it's that copilot is building the classifier from intent (comments). It wouldn't matter if it was classifying flowers instead.

link

eric4smith 1590 days ago

No. I did not miss it. The work is pretty good.

My problem is with the dataset and datasets like this overall that sets the tone through AI of what is acceptable and what is not.

link