Hacker News new | ask | show | jobs
by CharlesW 1151 days ago
> Except that Stack Overflow’s CEO, in this very article, says that it’s a violation of the Creative Commons license to train an LLM on their answers.

Yes, because it's a license violation — "If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original". That includes derived data products, like AI models, built using the content.

2 comments

That seems rather debatable. While I don't think the overall use case favors fair use, due to the commercial nature of most of the end products, the fact that such use is clearly transformative is definitely a positive factor on the side of LLM creators:

> A key consideration in later fair use cases is the extent to which the use is transformative. In the 1994 decision Campbell v. Acuff-Rose Music Inc,[13] the U.S. Supreme Court held that when the purpose of the use is transformative, this makes the first factor more likely to favor fair use.[14] Before the Campbell decision, federal Judge Pierre Leval argued that transformativeness is central to the fair use analysis in his 1990 article, Toward a Fair Use Standard.[11] Blanch v. Koons is another example of a fair use case that focused on transformativeness. In 2006, Jeff Koons used a photograph taken by commercial photographer Andrea Blanch in a collage painting.[15] Koons appropriated a central portion of an advertisement she had been commissioned to shoot for a magazine. Koons prevailed in part because his use was found transformative under the first fair use factor.

https://en.wikipedia.org/wiki/Fair_use

…yet in the same article he’s talking about selling the data to LLM developers.

It’s hard to make sense of.

SO content is dual licensed to SO, giving them the right to "commercially exploit" it. That means they can relicense under terms that also allow commercial exploitation.

https://stackoverflow.com/legal/terms-of-service