Hacker News new | ask | show | jobs
by andy99 818 days ago
This is only a made-up issue for a few that are looking for something to criticize. Almost nobody cares, in the sense that appears to be meant here about "ownership" of the training data. Any this unfortunately hampers research and understanding of models because companies are reluctant to talk about training lest the trolls start jumping on. We're all worse off because of this.
2 comments

Is it really so hard to imagine that someone might not like the idea of their work being used to train a machine to imitate their work with no compensation to them? And that machine is then instead used to benefit the shareholders of large corporations?

The position that this is a made-up issue when there are multiple large pending lawsuits about exactly this thing is pretty bizarre.

What a short-sighted view.

Having open access to the training data is how you prevent poisoning/biasing of the dataset. People complaining about bad data in the dataset improve the quality of the dataset. That's in addition to the benefit of creators being labeled in the dataset.

Hiding the data from public view seems to only helps nefarious actors.

Pretty sure we're saying the same thing
> Any this unfortunately hampers research and understanding of models because companies are reluctant to talk about training lest the trolls start jumping on

Respecting artists and being open about training data should go hand in hand. That companies feel the need to hide the training data from public scrutiny should immediately be suspect.

It seems like you are saying no one cares about copyright, I inform you that is not the case. I disagree with (most current forms of) copyright, but I do respect artists and their need to feed themselves. Proper attribution, and labeling and scrutiny of the dataset is imperative.

>This is only a made-up issue for a few that are looking for something to criticize. Almost nobody cares, in the sense that appears to be meant here about "ownership" of the training data

So it's not just 'trolls' that want the data to be open and labeled, is my point. If the companies are hurting artists ('s economic output), that should be examined and fixed (stopped and reattuned attention of said companies).

'trolling (bothering)' a company to be 'good (not against human interests)' isn't a bad thing.