Hacker News new | ask | show | jobs
by disntthinkthis 1151 days ago
Except that Stack Overflow’s CEO, in this very article, says that it’s a violation of the Creative Commons license to train an LLM on their answers. So what he’s actually proposing is very unclear.

> When AI companies sell their models to customers, they “are unable to attribute each and every one of the community members whose questions and answers were used to train the model, thereby breaching the Creative Commons license,” Chandrasekar says.

1 comments

> Except that Stack Overflow’s CEO, in this very article, says that it’s a violation of the Creative Commons license to train an LLM on their answers.

Yes, because it's a license violation — "If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original". That includes derived data products, like AI models, built using the content.

That seems rather debatable. While I don't think the overall use case favors fair use, due to the commercial nature of most of the end products, the fact that such use is clearly transformative is definitely a positive factor on the side of LLM creators:

> A key consideration in later fair use cases is the extent to which the use is transformative. In the 1994 decision Campbell v. Acuff-Rose Music Inc,[13] the U.S. Supreme Court held that when the purpose of the use is transformative, this makes the first factor more likely to favor fair use.[14] Before the Campbell decision, federal Judge Pierre Leval argued that transformativeness is central to the fair use analysis in his 1990 article, Toward a Fair Use Standard.[11] Blanch v. Koons is another example of a fair use case that focused on transformativeness. In 2006, Jeff Koons used a photograph taken by commercial photographer Andrea Blanch in a collage painting.[15] Koons appropriated a central portion of an advertisement she had been commissioned to shoot for a magazine. Koons prevailed in part because his use was found transformative under the first fair use factor.

https://en.wikipedia.org/wiki/Fair_use

…yet in the same article he’s talking about selling the data to LLM developers.

It’s hard to make sense of.

SO content is dual licensed to SO, giving them the right to "commercially exploit" it. That means they can relicense under terms that also allow commercial exploitation.

https://stackoverflow.com/legal/terms-of-service