Hacker News new | ask | show | jobs
by shortformblog 654 days ago
As a publisher who publishes a full-text RSS feed at a time when not a lot of publishers do, I must say: The publisher should have a say in this.

This is not to say that this is a good idea or a bad one, but I think you will, long-term, have better luck if people don’t feel their content is being siphoned.

A great case-in-point is what my friends at 404 Media did: https://www.404media.co/why-404-media-needs-your-email-addre...

They saw that a lot of their content was just getting scraped by random AI sites, so they put up a regwall to try to limit that as much as possible. But readers wanted access to full-text RSS feeds, so they went out of their way to create a full-text RSS offering for subscribers with a degree of security so it couldn’t be siphoned.

I do not think this tool was created in bad faith, and I hope that my comment is not seen as being in bad faith, but: You will find better relationships with the writers you share if you ask rather than just take. They may have reasons for not having RSS feeds you may not be aware of. For example, I don’t want my content distributed in audio format, because I want to leave that option open for myself.

People should have a say in how their content is distributed. I worry what happens when you take those choices away from publishers.

3 comments

This.

I love these projects but often they can have a negative side-effects.

ive never implemented it but it should be possible to check if content still lives behind the url where it was originally found before serving any kind of archived copy.(preferably with contact info for the unwilling author) Using it for a search index should be fine ofc
I disagree. If you put your content out in the open for everyone to read, it is totally valid to scrape that content. Otherwise put it behind a paywall. If i can access it for free with a browser then you should be fine with me consuming your content with the tool of my choice. So i can search or use it however i see fit. Why not?

Getting consumed by ai scrapers will be inevitable in the long run i think.

Just because I make the information available in a convenient way doesn't mean I expect it to be harvested. That you make that leap is 100% troubling and makes me not want to have you as a reader, because you don't respect my work.

You are describing the “give an inch, take a mile” concept neatly.

I think your mindset will just lead to a lot of people who otherwise would not want to regwall their content to do so. And if I ever do so, I will include a link to your post so they know who to blame.

I feel like the two massive unspoken caveats are:

1. Downloading and polling that doesn't resemble a cyberattack.

2. Not reproducing their content in a way that could compete with theirs or tarnishes their identity... and there's a lot of open ongoing debate about how that principle relates to different ways of using LLMs.

So I can take all the words written here by you and use them to pretend to be you elsewhere online, right?
As a one-off thing you personally do, yeah that’s probably okay. Turning that into a product that you then offer to others is where the line is drawn, in my opinion.
I think this is a fair line. I don't want to mess with the tinkerers of the world, and to be clear I'm not even entirely opposed to this. I just think we do not put enough stock into discussing potentially damaging actions with creators.

Which is why so many writers and artists are upset at OpenAI and Anthropic right now.