Hacker News new | ask | show | jobs
by l1k 32 days ago
Fun fact (or not so fun if you're a subscriber):

Somebody is spamming kernel mailing lists under the name Marian Corcodel with a 26 MByte message multiple times per day containing a collection of nonsensical patches. Looks AI-generated, perhaps with the intention to poison LLMs. This has been going on for a few days now.

https://lore.kernel.org/all/CAGg4U=GNtCObd_Nbm_1Rr5FEvPb69Yz...

3 comments

I'd warn HN users not to click on that link simply because it will load a 26Mb message that will likely cause quite a strain on kernel.org's servers if everyone here does it.
I was curious how much of an impact HN could have. Napkin math:

HN gets 24M views a day. Assume those views are evenly distributed across the front page (they aren’t), and that’s about 1M views for each front page post, assuming each user clicks on one post.

By the rule of 10s (also not exact), there are 10x less views on comment threads. So assume around 100k views on a comment thread as a theoretical average.

If everyone in this thread clicked on the link, that would be 2.6 TB of transfer across the day. But by the rule of 10’s we have to assume 10x fewer people will interact (upvote, click, anything) than view. So we’re down to 260GB transfer over the course of a day.

I wonder how close that is. It seems plausible that a link in the top comment of a thread could garner 10,000 clicks.

That’s still about one click every 8 seconds, which at 10Mbit/s would indeed overwhelm the server by a factor of about 2.5x. But I clicked through and it loaded in just a few seconds, so presumably the pipe is faster than 10Mbit/s.

Another caveat is that many websites are already several megabytes, so it seems strange that 26Mb would be the breaking point for a reasonable web host.

Don't forget scrapers. Scrapers can be biased towards top posts and comments.
Arn't AI agents worse than scrapers now since they're basically a DDoS that runs over and over where scrapers will actually cache data.
> HN gets 24M views a day

This is available info?

https://news.ycombinator.com/item?id=33450094

2022 from dang:

> There's no stats page but last I checked it was around 5M monthly unique users (depending on how you count them), perhaps 10M page views a day (including a guess at API traffic), and something like 1300 submissions (stories) and 13k comments a day.

> The most interesting number is the 1300 submissions because that hasn't grown since 2011 - it just fluctuates. Everything else has been growing more or less linearly for a long time, which is how we like it.

Plenty of people deliberately posting to HN have their servers overwhelmed.
It's mirrored by Akamai, which is designed to repeatedly serve the same thing over and over. It won't really hurt anyone.
Does a 26MB message actually cause noticeable strain on the server much beyond loading the page? I would think serving a contiguous 26MB chunk would be relatively similar to say 20 normal sized messages.
Way off. I went to an arbitrary message on lore.kernel.org. Firefox's network inspector says 7.37kB was transferred, including stylesheets. 26MB is roughly 3500x 7.37kB.
Data transferred is not what generates load. sendfile() is about the lowest-overhead thing a web server does.
I don't think needlessly straining the Internet Archive's servers is any better.
IA's infra is slightly better for big loads though, they tend to just have higher latency rather than aborted/timed out requests, for better or worse. It can be bit slow, but as long as you're ready to wait, you'll eventually get the response. Usually hosts just cut you off with a hardcoded timeout instead, which for people on high latency/low bandwidth connections can be super fun.
IA's resources are very limited as is. There is so many people (emulation/roms) YouTubers linking to Archive.org downloads for full ROM Sets.

It's a big problem. Donate to Archive.org if you can!

Will clicking on this link download a 26MB message putting extra load on archive.org's servers?
Thank you for the warning. I rarely click on links these days though; only exception I make for HN links for main articles.
How do you navigate the web, everything is CTRL+L then manually type the address, or you have some fancier solution?
the web is useless outside of hn
90% of it yeah, but the 10% is still worth it, like HN.
The page is gzipped in transit - only 5 MB of traffic are generated.
Why can't they block the people doing this? Bring on the ban hammer.
> perhaps with the intention to poison LLMs

How does that work?

This is just nonsensical changes and slurs, but particularly degenerate input data can cause big issues in training:

https://x.com/gabriberton/status/2051873677998956851