Hacker News new | ask | show | jobs
by manuelmoreale 695 days ago
> So the snapshot of the web as it was in 2023 will be the last time we had original content, as soon we will have stop producing new content and just recycling existing content.

I’ve seen this take before and I genuinely don’t understand it. Plenty of people create content online for the simple reason they enjoy doing it.

They don’t do it for the traffic. They don’t do it for the money. Why should they stop now? Is not like AI is taking away anything from them.

3 comments

The question is how do you seperate that fresh signal from the noise going forward, at scale, when LLM output is designed to look like signal?
You start from the people you know are not pushing out LLM generated nonsense and you go from there.

It’s gonna be a mess I can tell you already but it’s not going to be impossible.

There’s plenty of people who love writing and won’t stop.

You ask an LLM to do it. Not sarcasm, they’re quite good at ranking the quality of content already and you could certainly fine tune one to be very good at it. You also don’t need to filter out all of the machine written content, only the low quality and redundant samples. You have to do this anyways with human generated writing.
I just tried asking ChatGPT to rate various BBC and NYT articles out of 10, and it consistently gave all of them a 7 or 8. Then I tried today's featured Wikipedia article, which got a 7, which it revised to an 8 after regenerating the respose. Then I tried the same but with BuzzFeeds hilariously shallow AI-generated travel articles[1] and it also gave those 7 or 8 every time. Then I asked ChatGPT to write a review of the iPhone 20, fed it back, and it gave itself a 7.5 out of 10.

I personally give this experiment a 7, maybe 8 out of 10.

[1] https://www.buzzfeed.com/astoldtobuzzy

ChatGPT has a giant system prompt that you have no control over. Try using Llama and create a system prompt with clear instructions and examples. If you were going to use a model in a production system you would also want to either fine tune it or train a BERT-like model as a classifier that just outputs a score. Maybe even more than one for ranking along different dimensions.
Yes, do not rely on it for assessments. It generates ratings of 7 or 8 because those ratings are statistically common in its training data.
Except AI in search is taking away significant traffic from everywhere, and it hits small blogs as well as nonprofits like encyclopaedias the hardest, while misrepresenting and “remixing” the actual content.

I’ve given up on the internet as a place to share my passions and hobbies for the most part, and while LLM’s weren’t the only reason, this current trend is a significant factor. I focus most of my attention on talking directly with people. And yes that does mean the information I share is guaranteed to be lost to time, but I’d rather it be shared in a meaningful manner in the moment than live on in an interpreted zombie form in perpetuity.

I have a blog. Been writing on that for 7 years. Should I care if AI in search is taking away traffic? If yes, why? I’m not writing for traffic. I write because I enjoy doing it. People find my way mostly thanks to other people linking to my site. And a solid % of traffic comes from RSS anyway.

I think giving up on the web because of AI is the wrong move. You should still create and focus more on connecting with others directly, when online. Get in touch, write emails, sign guestbooks.

I’m personally having great exchanges daily with people from all over via email and that won’t stop because of stupid ChatGPT or whatever.

And don’t get me wrong, it’s awesome to spend more time offline so if you want to do down that path it’s great.

I just don’t think it’s the only solution.

The only reason to put things you write online is to make it available to others. If writing simply for my own enjoyment or reference I write in my notebooks, as I do all the time. I never stopped doing that.
No one cares about your content being merged into the LLM slop. No one will notice whether your content is in or out.

So why harm your audience and your own baseline preferences just to spite a system that will never notice the attack?

A lot of people who create content don't want their content to feed AI. They love what they do and they don't want their work to support a system whose purpose is to debase and commoditize that work. The only way to avoid that is to never publish to the web, everything published to the web feeds AI. That is the web's purpose now.

Also there are plenty of people who create content because they love it, and also need to be able to make a living at it, because doing so at the level of quality they want is time consuming and expensive.

But mostly because even people who produce content because they love it want to share that content with the world and that will be nigh impossible when the only content anyone sees, and that any platform or algorithm surfaces, is AI generated. Why put in the effort and heart and work to create something only for an AI to immediately clone it for ad revenue? Why even bother?

> The only way to avoid that is to never publish to the web, everything published to the web feeds AI. That is the web's purpose now.

And in doing that you also prevent real humans from accessing that same content. Look, I have no simpathy for AI companies. I wrote about it before on my site, will probably write again. The current situation sucks. But giving up is not the right answer imo.

> Also there are plenty of people who create content because they love it, and also need to be able to make a living at it, because doing so at the level of quality they want is time consuming and expensive.

Fair but those are the minority. I'd argue the vast majority of people create content because they enjoy the process and earn a living in other ways. I run a newsletter where I interview people with blogs and so far, after a year running it, not a single person has told me they blog for a living. Every single one is doing it for passion. And I suspect that's true for the vast majority of people out there. The bulk of internet content (when it comes to creative content that is) is created by people who do it as a hobby.

> But mostly because even people who produce content because they love it want to share that content with the world and that will be nigh impossible when the only content anyone sees, and that any platform or algorithm surfaces, is AI generated. Why put in the effort and heart and work to create something only for an AI to immediately clone it for ad revenue? Why even bother?

Why even bother? Because there are people out there who care. And the assumption that "the only content anyone sees, and that any platform or algorithm surfaces, is AI generated" is a wrong one imo. I can assure you that there are PLENTY of people out there who still value original content, still value connecting with real human beings doing things because they love the craft. Assuming everything is doomed is not helpful.

Is it going to be harder? Yes. Are there solution? Yes.