Hacker News new | ask | show | jobs
by JoeAltmaier 1049 days ago
That's silly. Humans review books all the time, using very similar words. Where's the outrage over that?

This is manufactured, stretched, overhyped objections. I believe it's all as the OP suggests, because the word AI is in there. Not because anything illegal or immoral is going on. In fact it's a terribly useful tool, and once the mob cools off it'll likely return.

3 comments

You are exactly modeling the chauvinistic Silicon Valley attitude that is causing the outrage in the general population to begin with.

“Our algorithms are pretty much the same as human art criticism, so put down the pitchforks you unenlightened scum” is up there with telling them to eat (a Stable Diffusion generated picture of) cake.

> You are exactly modeling the chauvinistic Silicon Valley attitude that is causing the outrage in the general population to begin with.

Just like the writers he talked to and got positive feedback? Everybody not agreeing with you represents "chauvinistic SV attitude"?

(Edited)

No, he didn’t say anything about them. People side against their interests all the time, finding a few writers that like this is trivial. Are those people the majority opinion on this or are we just trying to prove how wonderful this technology is?

I'm assuming you read the article.

Let's recap:

> I launched the prosecraft website in the summer of 2017, and I started showing it off to authors at writers conferences. The response was universally positive, and I incorporated the prosecraft analytic tools into the Shaxpir desktop application [...]

And he goes on mentioning that some authors even reached out to him to get their books added.

Unless you are accusing him of lying or unreasonably overstating the response he got ("universally positive"), for which I really don't see any indication, then a statement like "finding a few writers that like this is trivial" is not a good faith engagement with this topic/conversation.

There’s no way to qualify the sample size of writers based on his claims so within the bubble of his experience I’m sure it’s correct but not useful to base an argument on that writers at large are onboard with this and as for good faith engagement your response to parent…

“Everybody not agreeing with you represents "chauvinistic SV attitude"?”

…wasn’t very good faith either as it’s unclear whether the writers share the same belief as some tech people that AI and humans doing stuff are the same and use that idea to further a pro AI agenda as opposed to them just finding a useful tool to incorporate into their workflow regardless of the underlying technology or politics. Your response assumed the former and paints parent poster as wrong based on your assumption. Some writers liking the tool, just like some artists liking stable diffusion, doesn’t invalidate the original criticism or imply their ideology.

Indeed my experience jives with what he said. Many AI people I’ve seen comment are very much “adapt or die” when it comes to AI technology, suggesting that writers/artists must (even if begrudgingly) use these tools to stay competitive and see many datasets as fair game even when their authors are against its inclusion in said datasets, such as the author of this article.

There's no outrage in the general population. Just of a minority that is just as small as silicon valley.
Counting the ratio of nouns to verbs in a novel is an algorithm and I think it's like one of the most basic examples of what the thing in the article does, if I understood it correctly.

But I guess there would also be people up in arms about this.

Do you even know the meaning of chauvinism? Because this is literally the opposite of chauvinism. They aren't stating their view is superior, they just want it to exist.
People starved while it was suggested they eat cake. Not sure how that relates - are the rights around art crit not the same as AI crit?
Of course they never were suggested to eat cake in reality. 1) The actual French quote was to brioche, a type of bread and mistranslated as cake because brioche wasn't common in English-speaking countries 2) Was never an actual suggestion -- the French philosopher Rousseau was making a sarcastic remark suggesting that if the people didn't have bread, they should eat brioche (a fancier kind of bread) instead. But for some reason in pop culture this was falsely transformed into an actual suggestion by Marie Antoinette
Glad to see someone else mention the falsehood of the original quote.

It's actually really fitting to see that (mis-)quote used in the context of this outrage since from reading through the original vitriolic Twitter thread it's clear that many of the most outraged are incorrect about what the product does.

IT doesn't really matter to the idiom - what is understood by the reference is what it means.
or talking about human "just another matrix", so how dare them don't want to offer their artwork for new models.

this even didn't contain how developers decided to let people lose job. people is angry because they worried about losing job.

Or AI-generated Soylent.
> That's silly. Humans review books all the time, using very similar words. Where's the outrage over that?

Easy: humans are not machines. "X does it all the time, so I should be able to do it" is never a valid conclusion. It depends on the situation.

> In fact it's a terribly useful tool, and once the mob cools off it'll likely return.

Maybe this tool in particular does not "abuse" the books. Maybe this tool in particular is terribly useful. But you can't blame authors and artists for taking a stance against those new algorithms that provably have the potential to automatically "steal" from their work. You can believe that asking ChatGPT to "write a novel in the style of X" is not abusing the copyright, that's fine. And the authors can answer that they fear it has the potential to break their source of revenue to a point where they won't want to publish anything anymore. And they are entitled to it. And maybe someday we come up with licenses that prevent the use as training data (how in the world could one conclude today that "it is most definitely fair use", given that this is a very new way of using IP material?).

That was the accusation, and it was misplaced here. So we agree, this is a smear campaign in this case, not a sensible reaction to a reasonable application of machine algorithms.

The idea that counting adverbs is steal their work to the point they won't want to publish anymore is clearly FUD. As my remark made clear.

> The idea that counting adverbs is steal their work to the point they won't want to publish anymore is clearly FUD.

I did not mean that, I am genuinely not sure if you rephrased my point to make it sound wrong or if you missed it.

My point was that, IMO, it does not matter to the other whether counting adverbs is stealing their work or not. Probably if you counted them manually they would be fine (and most likely they were fine before generative AI).

What matters to them is that generative AI is trained from their copyrighted material, and they fear it (I would, too).

The day people stop reading my blog because they can just ask ChatGPT and will get something generated (partly) from my material without any kind of attribution, I can promise you I will stop my blog.

This project was not generative AI. Comments are saying this project, which is not at all similar to generative ai, seemed to be okay. But you keep replying to say essentially “but if it was generative ai then authors have a legitimate reason to be angry”.

There is no need to shoehorn that debate into this particular situation, and I see no merit in defending authors that had a knee jerk reaction to this project on the grounds that they have reasonable fears about other types of projects.

I think it is not completely off topic. Here is how I see it:

Engineers tend to globally think that LLMs are not really a problem for copyright holders. At least those who develop LLMs pretty clearly don't give a damn. And on top of that, it is in their interest to not be constrained by copyrights.

If this is my feeling (that engineers globally don't care about copyright holders), then it seems reasonable to me that non-engineers could feel the same. That sounds fair, doesn't it?

So those people start speaking up when they see a situation where they feel like "it is happening". And because they don't really know the technology, it is hard for them to know if this particular case is a problem or not. And they can't really trust engineers to tell them, because engineers built LLMs in the first place, and really it does not seem like they care about copyright holders.

Finally, engineers see this reaction from authors, and instead of trying to understand where they come from, they dismiss their opinion. Which probably will reinforce the feeling that engineers don't remotely understand the concerns of those people, and keep building their AI-powered laundering machines. Again, engineers working on those technologies in big companies have absolutely no interest in even considering that it is a problem. Because they get a big salary to help their big company get more profitable, even if it kills many jobs and is a net loss for society (because they benefit from that).

To rephrase in my own understanding of what you wrote:

1) Some engineers (or more broadly, software developers) do not respect copyright

2) Therefore you reasonably are skeptical of projects related to material under copyright.

3) It is not always obvious if a project is respectful of copyright.

Now, applying these #1,#2,#3 you believe they justify the outrage for this particular project.

I disagree, because outrage combined with a lack of understanding (#3) is pretty much my definition of a knee-jerk reaction and vastly counterproductive to the interests of copyright holders because it will make the dismissiveness you predict a self-fulfilling prophecy.

If you want to be a pitchfork mob against generative AI at least understand whether AI is generative or not? Seems like a reasonably low bar. This was non-generative AI, it didn't produce content it output metrics and labelled some existing content.
What makes you think that I don't understand whether AI is generative or not? What I said was that for artists who are complaining about their copyright being abused, it does not matter. 10 years ago they were not complaining, because AIs looking like ChatGPT (to users who see it as a black box) did not exist (or were not remotely as powerful).

And I understand that. It is not their job to learn how the black box works. What they see is that "machine learning models" (which they probably call "AI" now), which are complete black boxes to them (and that's justified: engineers who train them also don't know exactly what they do, but rather test their model on some dataset and judge it from there). And those black boxes are being trained from their copyrighted work and have the potential to generate a ton of money which they will never see.

You can go and say "you guys should learn how the technology works instead of complaining", but let's be honest: probably you are not an expert in AI yourself, and anyway why would the artists have to care? It is a totally legit question that they have: "Why can engineers take my copyrighted work, run it through an algorithm that does stuff no algorithm has done in history at a scale never seen before, make money out of it, and not even consider that maybe they are abusing my IP?".

Before dismissing the artists, you should try to understand their point of view.

I would disagree. Just because you don't quite understand something, doesn't mean your concerns are not worth consideration - consider the recent zoom TOS issue. I doubt that many of us have a deep understanding of how that data's being used, or the internal guidelines that zoom follows for its data use, and most people aren't lawyers specializing in IP law to know exactly how the law would treat zoom if they were to accidentally (or "accidentally") leak IP. We just see that they are putting in a clause in their TOS to allow themselves to do so, remember our own heuristics of how LLM have behaved in the past, and understandably start raising questions. For all we know, zoom's AI might be something constrained to a framework which doesn't allow for such data leaks to occur, or it's generative capabilities might be constrained in some other way. They're just demanding legal permission to do so, but that still rubs a lot of us the wrong way. Our concerns are still justified, even if Zoom never actually touches AI. Artists lack as concrete heuristics as the technical crowd. But they still have concerns that need addressing, and those concerns about the effects of AI still should be considered and respected. If the details of the situation don't match their concerns, care should be taken to explain how they don't match to the people in question, in a way that isn't looking down on them (admittedly, trying to be the calm voice is often a waste of time on the internet) That said, if you were to make an informational video which succintly summarizes the technical details that are relevant to artists, it might become sufficiently popular to influence debate.
> It is not their job to learn how the black box works

If you have not learned the basics of how something works, you have no right for your opinion on it to be considered valid. Period.

Invalid opinions do harm to democracy and endanger our way of life.

Because fair use allows transformation and the output of their algorithm looks nothing like the input of the copyrighted work? For generative models its more complicated because generative models can actually reproduce large sections of a copyrighted work so transformation is a bit less clear.
They… quantify the number of adverbs and voices? I'm sorry, but have you ever read either a book or a review of one?
Yes, and they have 1000 times the spoilers and quotations and judgemental attitude of ... a summary of adverbs and voices.

So yes, I understand what a review is, thanks for the put-down, that certainly added something to the conversation.

I think we are in agreement - doing statistical analysis on written works is entirely a lesser thing than simple review, and is harmless.