Hacker News new | ask | show | jobs
by sccxy 538 days ago
Her rss feed is last 100 posts with full content.

So it means 30 months of blog posts content in single request.

Sending 0.5MB in single rss request is more crime than those 2 hits in 20 minutes.

6 comments

I generally agree here.

There are a lot of very valid use cases where defaulting to deny for an entire 24 hour cycle after a single request is incredible frustrating for your downstream users (shared IP at my university means I will never get a non-429 response... And God help me if I'm testing new RSS readers...)

It's her server, so do as you please, I guess. But it's a hilariously hostile response compared to just returning less data.

> But it's a hilariously hostile response compared to just returning less data.

So provide a poor service to everyone, because some people doesn't know how to behave. That sees like an even worse response.

Send only one year's recent posts and you've reduced bandwidth by 50%.
People don't want to have to customize refresh rates on a per-feed basis. Perhaps the RSS or Atom standards need to support importing the recommended refresh rate automatically.
They don't need to change the refresh rate, though. They need to make conditional requests with an etag or a last-modified date, so the server can respond with a 304 not modified if no changes have been made.

No standards need to be updated. The client software needs to be a better HTTP citizen.

What about people who reside in the same place who have multiple RSS aggregators that scrape the same RSS? Her analysis will not handle that I think. At some point she is going to have to talk to the engineers that made it if she wants something done. Or she could take it upon herself to fix the software (at least the ones that are open-source). If she's just sharing the investigation then it's fine. But if the goal is to get the problems fixed, whining to us is probably the least efficient way to do it. She is knowledgeable enough to fix probably half of the RSS readers that she is complaining about and definitely knowledgeable enough to engage with all of them about fixing their code.
If there were a widely supported standard for pagination in RSS, then it would make sense to limit the number of posts. As there isn't, sending 500kB seems eminently reasonable, and RSS readers that send conditional requests are fine.
"Pagination in feeds like ATOM and RSS?" - https://stackoverflow.com/questions/1301392/pagination-in-fe...

Sounds like something that could be scored in the rss reader tests.

Did you actually write 500KB as 0.5MB to make it sound BIGGER?

Clever.

Yes that's right. Most blogs that are popular enough to have this problem send you the last 10 post titles and links or something. THAT is why people refresh every hour, so they don't miss out.
I hate RSS feeds that don't include full content.
If only there were some kind of HTTP headers that could help them stop doing a GET every hour!

Gosh darn, if only I could say "Hey, please only send me the data if it's been modified since I last requested it an hour ago" somehow.

Sure, but whining to the broader public about it before talking to the engineers who made the offending software seems like a bad idea. The public doesn't really care and will keep using their preferred readers. In all my years on the Internet (a lot) I have never seen anyone complain about the volume of RSS traffic they got. If Rachel enjoys sharing her experience of investigating this issue, that's fine. But if she is sharing it with the expectation that her readers will randomly go fix other people's software, that's becoming unreasonable.
Complains about traffic, sends 0.5mb of everything.

That’s my kind of humor.

sigh feed readers set the If-Modified-Since header so that the feed is only resent when there are new items.