Hacker News new | ask | show | jobs
by LikelyABurner 1049 days ago
I’m with the artists on this one. Our obsession with converting everything into input for an algorithm that spits out an ill-defined number (what the hell is “vividness”?) needs to stop.

We already tried this with human communication and gave birth to the dystopian nightmare that is social media, why keep repeating our mistakes?

17 comments

I kind of don't understand the issue - IANAL, so I'm not going to delve on the legality of things, but I think making automated book recommendations better is absolutely fits the bill for 'transformative use' - as in book recommendations are in no way a substitute product for books themselves.

And personally, I think book recommendations are an absolutely underserved market, if I liked a book, having the ability to find more like it would be an absolute godsend for connecting authors with people who would be interested in their works, resulting in much more potential sales for them.

I can't count how many times have I discovered an absolutely great book on Amazon with like 50 reviews accidentally, as well as other, objectively less recommendable books that have nevertheless made an impression on me.

Discovering these books is sort of a hobby of mine, and is the exact kind of activity an LLM would be a great help with.

Going further, if there was an LLM that could be asked for book recommendations for your particular tastes, it could also identify markets for books not yet written, and would give a hint to authors on what sort of books to write to find an audience.

> And personally, I think book recommendations are an absolutely underserved market

I haven't read about the industry in years but isn't it the case that the job of "book recommendations" is essentially the publishers job? They unironically try to sell you more than a book. An algorithm would threaten their worth.

(There are, of course, other useful functions like publishing and the irreplaceable editors, but neither require the capital strength of marketing.)

Stop with the "this is good, I want more" Skinner-box model of happiness. Try some serendipity instead of being led by a generic algorithm.

I discovered absolutely great books by moving slowly along the shelves of a library or a bookshop.

And you need to read bad books to understand the great ones.

The whole point of this site is for people to express their opinions, and torginus took the time to write a thoughtful comment anout LLMs and how they might help both authors and readers.

As for your point about serendipity, torginus never said that he didn't wander book stores and libraries looking for books he wouldn't have been previously exposed to.

Based on the post, I'm sure they understood the basics of reading a variety of books, both good and bad -- there is no need to get judgemental.

Are you seriously telling someone else how they should enjoy something?
That's silly. Humans review books all the time, using very similar words. Where's the outrage over that?

This is manufactured, stretched, overhyped objections. I believe it's all as the OP suggests, because the word AI is in there. Not because anything illegal or immoral is going on. In fact it's a terribly useful tool, and once the mob cools off it'll likely return.

You are exactly modeling the chauvinistic Silicon Valley attitude that is causing the outrage in the general population to begin with.

“Our algorithms are pretty much the same as human art criticism, so put down the pitchforks you unenlightened scum” is up there with telling them to eat (a Stable Diffusion generated picture of) cake.

> You are exactly modeling the chauvinistic Silicon Valley attitude that is causing the outrage in the general population to begin with.

Just like the writers he talked to and got positive feedback? Everybody not agreeing with you represents "chauvinistic SV attitude"?

(Edited)

No, he didn’t say anything about them. People side against their interests all the time, finding a few writers that like this is trivial. Are those people the majority opinion on this or are we just trying to prove how wonderful this technology is?

I'm assuming you read the article.

Let's recap:

> I launched the prosecraft website in the summer of 2017, and I started showing it off to authors at writers conferences. The response was universally positive, and I incorporated the prosecraft analytic tools into the Shaxpir desktop application [...]

And he goes on mentioning that some authors even reached out to him to get their books added.

Unless you are accusing him of lying or unreasonably overstating the response he got ("universally positive"), for which I really don't see any indication, then a statement like "finding a few writers that like this is trivial" is not a good faith engagement with this topic/conversation.

There’s no way to qualify the sample size of writers based on his claims so within the bubble of his experience I’m sure it’s correct but not useful to base an argument on that writers at large are onboard with this and as for good faith engagement your response to parent…

“Everybody not agreeing with you represents "chauvinistic SV attitude"?”

…wasn’t very good faith either as it’s unclear whether the writers share the same belief as some tech people that AI and humans doing stuff are the same and use that idea to further a pro AI agenda as opposed to them just finding a useful tool to incorporate into their workflow regardless of the underlying technology or politics. Your response assumed the former and paints parent poster as wrong based on your assumption. Some writers liking the tool, just like some artists liking stable diffusion, doesn’t invalidate the original criticism or imply their ideology.

Indeed my experience jives with what he said. Many AI people I’ve seen comment are very much “adapt or die” when it comes to AI technology, suggesting that writers/artists must (even if begrudgingly) use these tools to stay competitive and see many datasets as fair game even when their authors are against its inclusion in said datasets, such as the author of this article.

There's no outrage in the general population. Just of a minority that is just as small as silicon valley.
Counting the ratio of nouns to verbs in a novel is an algorithm and I think it's like one of the most basic examples of what the thing in the article does, if I understood it correctly.

But I guess there would also be people up in arms about this.

Do you even know the meaning of chauvinism? Because this is literally the opposite of chauvinism. They aren't stating their view is superior, they just want it to exist.
People starved while it was suggested they eat cake. Not sure how that relates - are the rights around art crit not the same as AI crit?
Of course they never were suggested to eat cake in reality. 1) The actual French quote was to brioche, a type of bread and mistranslated as cake because brioche wasn't common in English-speaking countries 2) Was never an actual suggestion -- the French philosopher Rousseau was making a sarcastic remark suggesting that if the people didn't have bread, they should eat brioche (a fancier kind of bread) instead. But for some reason in pop culture this was falsely transformed into an actual suggestion by Marie Antoinette
Glad to see someone else mention the falsehood of the original quote.

It's actually really fitting to see that (mis-)quote used in the context of this outrage since from reading through the original vitriolic Twitter thread it's clear that many of the most outraged are incorrect about what the product does.

IT doesn't really matter to the idiom - what is understood by the reference is what it means.
or talking about human "just another matrix", so how dare them don't want to offer their artwork for new models.

this even didn't contain how developers decided to let people lose job. people is angry because they worried about losing job.

Or AI-generated Soylent.
> That's silly. Humans review books all the time, using very similar words. Where's the outrage over that?

Easy: humans are not machines. "X does it all the time, so I should be able to do it" is never a valid conclusion. It depends on the situation.

> In fact it's a terribly useful tool, and once the mob cools off it'll likely return.

Maybe this tool in particular does not "abuse" the books. Maybe this tool in particular is terribly useful. But you can't blame authors and artists for taking a stance against those new algorithms that provably have the potential to automatically "steal" from their work. You can believe that asking ChatGPT to "write a novel in the style of X" is not abusing the copyright, that's fine. And the authors can answer that they fear it has the potential to break their source of revenue to a point where they won't want to publish anything anymore. And they are entitled to it. And maybe someday we come up with licenses that prevent the use as training data (how in the world could one conclude today that "it is most definitely fair use", given that this is a very new way of using IP material?).

That was the accusation, and it was misplaced here. So we agree, this is a smear campaign in this case, not a sensible reaction to a reasonable application of machine algorithms.

The idea that counting adverbs is steal their work to the point they won't want to publish anymore is clearly FUD. As my remark made clear.

> The idea that counting adverbs is steal their work to the point they won't want to publish anymore is clearly FUD.

I did not mean that, I am genuinely not sure if you rephrased my point to make it sound wrong or if you missed it.

My point was that, IMO, it does not matter to the other whether counting adverbs is stealing their work or not. Probably if you counted them manually they would be fine (and most likely they were fine before generative AI).

What matters to them is that generative AI is trained from their copyrighted material, and they fear it (I would, too).

The day people stop reading my blog because they can just ask ChatGPT and will get something generated (partly) from my material without any kind of attribution, I can promise you I will stop my blog.

This project was not generative AI. Comments are saying this project, which is not at all similar to generative ai, seemed to be okay. But you keep replying to say essentially “but if it was generative ai then authors have a legitimate reason to be angry”.

There is no need to shoehorn that debate into this particular situation, and I see no merit in defending authors that had a knee jerk reaction to this project on the grounds that they have reasonable fears about other types of projects.

I think it is not completely off topic. Here is how I see it:

Engineers tend to globally think that LLMs are not really a problem for copyright holders. At least those who develop LLMs pretty clearly don't give a damn. And on top of that, it is in their interest to not be constrained by copyrights.

If this is my feeling (that engineers globally don't care about copyright holders), then it seems reasonable to me that non-engineers could feel the same. That sounds fair, doesn't it?

So those people start speaking up when they see a situation where they feel like "it is happening". And because they don't really know the technology, it is hard for them to know if this particular case is a problem or not. And they can't really trust engineers to tell them, because engineers built LLMs in the first place, and really it does not seem like they care about copyright holders.

Finally, engineers see this reaction from authors, and instead of trying to understand where they come from, they dismiss their opinion. Which probably will reinforce the feeling that engineers don't remotely understand the concerns of those people, and keep building their AI-powered laundering machines. Again, engineers working on those technologies in big companies have absolutely no interest in even considering that it is a problem. Because they get a big salary to help their big company get more profitable, even if it kills many jobs and is a net loss for society (because they benefit from that).

If you want to be a pitchfork mob against generative AI at least understand whether AI is generative or not? Seems like a reasonably low bar. This was non-generative AI, it didn't produce content it output metrics and labelled some existing content.
What makes you think that I don't understand whether AI is generative or not? What I said was that for artists who are complaining about their copyright being abused, it does not matter. 10 years ago they were not complaining, because AIs looking like ChatGPT (to users who see it as a black box) did not exist (or were not remotely as powerful).

And I understand that. It is not their job to learn how the black box works. What they see is that "machine learning models" (which they probably call "AI" now), which are complete black boxes to them (and that's justified: engineers who train them also don't know exactly what they do, but rather test their model on some dataset and judge it from there). And those black boxes are being trained from their copyrighted work and have the potential to generate a ton of money which they will never see.

You can go and say "you guys should learn how the technology works instead of complaining", but let's be honest: probably you are not an expert in AI yourself, and anyway why would the artists have to care? It is a totally legit question that they have: "Why can engineers take my copyrighted work, run it through an algorithm that does stuff no algorithm has done in history at a scale never seen before, make money out of it, and not even consider that maybe they are abusing my IP?".

Before dismissing the artists, you should try to understand their point of view.

They… quantify the number of adverbs and voices? I'm sorry, but have you ever read either a book or a review of one?
Yes, and they have 1000 times the spoilers and quotations and judgemental attitude of ... a summary of adverbs and voices.

So yes, I understand what a review is, thanks for the put-down, that certainly added something to the conversation.

I think we are in agreement - doing statistical analysis on written works is entirely a lesser thing than simple review, and is harmless.

"It's a mistake." OK, you could be right.

"Needs to stop." OK, you could be right on that one too. I don't think you are, but that's not the point.

Neither of those adds up to "it's currently illegal". (Whether it's actually illegal probably depends on the details of how he did what he did.)

Further, neither of those things adds up to "the howling mob should attack him until he stops". (Even if the "attacks" are purely online.) I am against "attack him with outrage dialed all the way up to 11 without actually understanding what his tool is and does". I am also against giving in to the outrage - it just shows the mob that baseless outrage attacks work.

You think it needs to stop? Fine. Persuade him that it needs to stop, and therefore that he should stop. Convince him - not with a mob screaming in outrage, but with reason.

As someone who has published two novels: The outrage over this site was stupid, ignorant and a demonstration of a witch-hunt that will help nobody.
If you’re an author of books intended for children, your texts are likely already being quantified to produce a reading level difficulty score:

https://metametricsinc.com/parents-and-students/lexile-for-p...

Honestly, this is the really offensive part of the article. Who cares about whether or not it's legal, the idea that it's, in any way, shape, or form, useful is bafflingly laughable.

Not everything can be meaningfully quantified. Not everything needs to be.

Certainly something interesting is bound to come out of quantifying things? "Hm, this three act structure thing seems to work, I wonder why." "Children doesn't seem to understand texts which include these words, I wonder why."

Patterns rarely show themselves before we investigate.

> Certainly something interesting is bound to come out of quantifying things?

In science they call this trap P-hacking. Even data "scientists" know to be wary of overfitting. We're really good at finding patterns, but few of them actually mean anything.

>> Certainly something interesting is bound to come out of quantifying things?

> In science they call this trap P-hacking. Even data "scientists" know to be wary of overfitting. We're really good at finding patterns, but few of them actually mean anything.

Quantifying things is not always p-hacking. When people do experiments on novel materials or structures they quantify the data, make readings and record them, and then look for patterns. For example measuring the electronic properties of a new novel nano structure or molecule.

When I think of p-hacking[1] I think of using the same static data and doing various data analysis over and over again until something potentially interesting is found and ignoring the risks of false positives as you do so.

[1] https://en.wikipedia.org/wiki/Data_dredging

> Not everything can be meaningfully quantified. Not everything needs to be.

Ok, so who decides what's OK to analyze or not? Is there some obvious moral line I fail to see, that everyone would immediately agree on?

It seems the project was about analyzing books, not about producing new books. How is that hurting the authors?

What will hurt artists is, when in 10 years, all publishers are demanding that the vividness score (TM) be at least a 95% “because that’s what drives sales”.

Which is what will happen if the authors don’t proactively stop it from happening. Look at how the music industry has evolved over time.

How his this different from all the vampire novels that hit the shelf after the success of Twilight? Publishers alway preferred the money makers, just the measure changed.

Nowadays writers can at least publish their books without the need of publishers and I think some like the help of the bad Silicon valley stuff that made writing, publishing and interacting with the readers easier.

I'm on your site if it's about automatic content creation and style copying but text analysis is not the real danger. Especially when the usefulness of such statistics isn't even given.

> publish their books without the need of publishers

Except those are very likely to be metoo vampire novels. And lately LLM generated.

I'd move that on the contrary, the role of the publisher as a curator will only become more important in the future.

But publishers will have to deal with a lot more content thanks to LLMs.
Or it could help me find terser books I like, people will still have preferences and if the author tries to pander to only the largest market segment I'd argue that's on them.
I think it’s much more likely you would get the book equivalent of crap SEO sites spammed out to satisfy numerical measures of quality.
How is this different to the current process, other than feedback is slower (if forthcoming at all) and less specific?
> How is this different to the current process, other than feedback is slower (if forthcoming at all) and less specific?

Let me rephrase your question: "how is it different to the current process, other than <the fact that it is different>?" :-). I would say that the answer lies in the question.

Sounds as though your view of the AI is purely positive, in that case. That's fair enough. The answer for other people may well not lie in the question (e.g. for all the people who don't like this development), but it did for you!
the difference is that an machine analysis is necessarily limited and can't account for all the factors that make a text interesting. so it is possible that this analysis rejects texts that would not be rejected by a human.

it is objective but potentially biased. and it could even be discriminating if the input for this tool isn't diverse enough. but these are the issues that can go wrong with any use of technology, and we have seen many examples of that happening. however i don't think that is problematic if writers use it to analyse their own texts in comparison. it is however a serious issue if publishers use it to decide what to accept

Again, I don't particularly care about whether this is allowed to exist, I'm just here to laugh at the mindset that lead to it being created. But sure, I can see this being used in harmful ways.

> It seems the project was about analyzing books, not about producing new books. How is that hurting the authors?

"Vivid books are really in this year, we're gonna have to ask that you aim for a Vividness(tm) of 85 or above."

"US books have 15% more adjectives, clearly this is proof of our superior detail-oriented work ethic!"

"What does the rise in Emotion(tm) have to say about the decline of society?"

So if I understand you correctly, you're saying that we should not create "metrics" for anything because said metrics could be misused by clueless people?
The analysis is cool. The problematic thing is what would have happened next, if this tool turned out to be any good.

Publishers rejecting manuscripts because "this years trend shows customers are looking for vividness in the 70+ percentile, your book is only at 55". Everything becoming the same style. If you thought Hemingway, Joyce or Nabokov had it bad with rejections, there'd be zero chance for actual innovative writing to break through the walls of The Algorithm.

Joyce should have had more rejections, but that’s just my personal opinion
> Not everything can be meaningfully quantified.

Sure, but written words _can_ be meaningfully quantified. We have been doing that for thousands of years. Starting with numerology and other mystical/religious beliefs, poem metrics, stylometry, crypto analysis, stroke counting, to name a few.

> Not everything needs to be.

Why not?

> Honestly, this is the really offensive part of the article.

I would argue that "Offensive" is either hyperbolic or you've used the wrong word.

> the idea that it's, in any way, shape, or form, useful is bafflingly laughable.

I don't know if it's useful because I never tried it. I might harbour my doubts but I'd like to find out. This is how I approach new things.

If you don't find it useful, don't use it. But why get outraged about something that others find useful? It's clearly a tool that other writers were positive and excited about. Why not let them have it? If you don't find those quantifications meaningful, so be it. You don't need to use it. Why force your opinion on others?
Simple. Just allow an opt-out for Authors or Publishers. Then only interested parties will comprise of and make use of the service, like you want.
As the article stated, there is nothing either legally or morally wrong with what the site did, and many authors found it useful. Let us know when you come up with an actual counterargument based on reason instead of an appeal to emotion fallacy.
I probably agree, but how does this have any relevance to copyright? If the tool is bad but otherwise legal then it should just fail on its lack of merit.
There is a difference between a statistical analysis of a text to categorize by certain words or word groups and training an AI model to generate texts on the data used for training.

The later creates massive competition to human writers, the former is just an information for potential readers.

Both the former and latter are information for writers. Neither create massive competition for writers (not that there is any law against creating competition), just FUD and better tools for writers.
Things like GPT already create competition for authors even using their names.

https://news.ycombinator.com/item?id=37042561

Pure text statistics won't do the same.

Wrt. your link, the same thing could have happen to this author if these spam books contained complete gibberish (and someone listed it on Amazon/goodreads using the authors name). This isn't legitimate competition (i.e. books written by LLMs that rival the quality and style of the actual author). This is a failure of the selling platform to QC the books they are selling.
LLMs make the scams better just like they will make spam better.

If it's gibberish you know you got scammed, LLM texts look convincing so you don't know for sure.

I agree LLMs can make better spam. But good spam isn't real competition. It's not like anyone is debating whether they should buy the latest book from their favorite author or the latest book from their favorite author's clone LLM (which is known to have written some solid books). Again, this is an issue that needs to be solved by vendors (it seems like all they need is a system where authors get a copyright to their name, and can curate the list of titles published under their name).
I agree that it's probably not that useful, but to actually take offense? The outrage seems to misunderstand the law and the technology. If you think the numbers offer no meaning, then just ignore them. People produce bad tools every day and the world still turns.
I'm disappointed he went for "vividness" and not novelty. Judging text based on how uncommon the n-gram is/how much it differs from an LLM could be interesting for sure.

The better an LLM can complete your joke the worse it is, for instance. Important to have a good Letterman-MacDonald quotient.

> I’m with the artists on this one. Our obsession with converting everything into input for an algorithm that spits out an ill-defined number (what the hell is “vividness”?) needs to stop.

Usefulness is immaterial here.

Is he allowed to do this? Yes.

What's wrong with presenting a page count and word count, for example?

> I’m with the artists on this one. Our obsession with converting everything into input for an algorithm that spits out an ill-defined number (what the hell is “vividness”?) needs to stop.

Anyone who is with the artists should pass a law. Moral outrage is not law.

And promptly smack face first into the First Amendment. There is a reason they are going with moral outrage. Because they know they don't have the right.
The guy who wrote code is also an artist, and he is allowed to publish his book reports.
Well... We still can not agree if...

Technology has to be protected from dumb people, or is it worth protecting dumb people from technology....

If anything, it’s the smart people that need to be protected from technology, because Silicon Valley is obsessed with pulling them down into a Harrison Bergeron nightmare where they’re absorbed into the same modeled probability distribution as the rest of the population to better sell them ads (outliers are bad for profits.)
What do you even mean by this comment? Have you considered the possibility that people are smart in ways that you are not considering, rather than just labeling it “dumb”?
Have you considered ... in ways that you are not considering...?

I am pretty confident they haven't. Sounds like you've set yourself up for a reverse "true scotsman" here ;)

Nice catch, thanks for pointing it out.
For this very particular project I agree the reaction seems exaggerated, even though it does walk the thin line of copyright infringement. But as it happens, it rides the wave of all other AI project which started small then headed we know where. Because once your book is in the database of company X you can bet safe money they will take it and continue their "analysis" as much further as they like because hey you did not complain - and I believe you must defend yourself in order to keep the copyright.
The response to this undermines my ability to take the “backlash” against AI as anything other than innumerate, mob idiocy. It’s hard to prevent myself from being negatively polarized against the backlash when people in the backlash defend outrage against obviously innocuous things like this.
We repeat the mistakes because in the short term, someone finds it profitable, hence a prisoner's dilemma type situation.

If an AI tool was killed, I consider it a victory. That's because even if there are some small useful applications of AI, AI on the whole will certainly put most creatives out of business.

Instead, I propose the following: anyone who is interested in preventing AI from taking over their craft should join me in a coalition of ban AI from their own business. By placing a notice that your work is "100% AI FREE", you are doing something akin to the fair-trade/sustainably sourced sticker on chocolate or other food products: you are letting consumers know that your work was made by a human, so that they can support you.

If enough people get in on this, and pledge to support only those creators who don't use AI, then we can make AI an unprofitable venture and hopefully kill it forever!

I already put a 100% AI FREE badge on my YouTube channel, which means that I will never use AI for writing scripts, editing videos, producing images, etc. Moreover, I also pledge to support other creators who pledge never to use AI, by buying their products over others!

Without trying to sound flippant - what do you define AI as? Things like autofocus in your video cameras or automatic gain control or noise cancellation in your audio pipeline could also be considered AI. Do you remove those too? What about the AI recommendation algorithm built into YouTube - how do you reconcile being AI free while still using that platform?
Yes, you are right, and I advocate the following: a detailed look at each of these technologies.

However, for practical purposes, a direct definition that encompasses every situation is not necessary, but can evolve. For now, I think we do not need a precise definition and we can start with the following: AI such as ChatGPT, LLMs, image generation tools like DALL-E and ohters, should be restricted.

As for YouTube's algorithm, I agree it is also dangerous. For now, I have restricted the use of direct content generation algorithms, in other words, all content can reasonably said to be human generated in terms of writing, composition, etc.

In other words: AI that makes any creative decision in making content should be banned. Other algorithms should be carefully debated.

Banning automation technology because it could put workers out of business.. isn't that the textbook definition of a luddite? Also, are you saying no creative people are using these technologies? It's not all "enter 1 prompt, get image, call it a day", they are tools that can be and often are part of complex chain. Creatives that don't want to use these tools are probably going to be superseded by creatives who do.

What's your take on generative fill in Photoshop?

I am a luddite. What's wrong with that? I don't believe that all technology is bad, but that AI has reached a stage sufficient so that the order of magnitude of the changes it can affect are too damaging for humanity. I do believe that AI has become advanced enough to pose such a risk to us.

Some creative people are using these technologies, and while it is quite human guided NOW, at some point, the guidance that humans put into it will lessen. That's not to say that AI will ever produce a work like Dostoevsky --- maybe it won't, but it WILL be enough to eliminate most creative jobs, and reduce them to being at most being supervised by people who don't have much of a passion for creative works. And that's a shame, because it will remove the passion of creativity from society.

Generative fill: I don't use it, and that's part of my personal ban. It goes too far. I only use traditionl editing techniques in my photography that works with basically what is there.

Yes, you can say that photography has always been about manipulation, but basically, I have a personal line that I believe I can define sufficiently well, that is far behind the line of AI.

What about the youtube speech to text AI that creates automatic CC transcripts for the hearing impaired? What about the AI that translates transcripts and comments into other languages? (translation certainly makes use of creativity since not every word maps 1:1)
One can always say that AI has some positive uses like CC transcripts. And of course, I can't prevent the platform from making that. I only mean to say that personally, I will not use it in the basic process of video creation. If YouTube ever forces any sort of editing on my videos through AI, I will quit the platform.

But returning to the topic: even though AI has some benefits, I believe that AI in the long run will have negatives that FAR outweigh the positives, so I believe it still should be restricted.

As for translation, well, the AI transcription/translation sucks. I do attempt to put manual captions in my videos as much as I can though.

So far we have identified like 10 ways you and your audience are currently benefitting from AI, but you haven't mentioned any concrete way AI is harming you.

Also, what do you mean by "forces any sort of editing on my videos through AI". Do you mean like, changing the actual content of your videos?