Hacker News new | ask | show | jobs
by geofft 1312 days ago
Honestly, I think if the GDPR had been around before HTTP, we would have seen HTTP as the unreasonable part in this system.

You don't have to make a direct TCP/IP connection for two people to communicate. We had systems like Usenet and UUCP that replicated data through a series of servers. Even today, when you use email, you talk to your email provider who talks to the recipient's email provider, and they have no need to share your personal IP addresses in the process. Some providers used to include this in Received: headers, but many today do not, rightly seeing it as a privacy concern. And even on HTTP we had (and still have, in some cases) mirrors, where legally-unrelated entities host copies of each others' data. Someone in the EU can visit http://ftp.icm.edu.pl/pub/linux/Documentation/ and never have their connection known to the US-juridiction host of TLDP.

It is both socially sensible for these providers to consent to sharing their own infrastructure IP addresses with other providers (but not share their customers' IP addresses) and legally practical for them to make that consent under the GDPR.

Why should it be the case that when you visit my personal website, which I happen to self-host, I have access to your IP address? I don't want that information. I don't even get that information when using higher-level services like Hacker News or Twitter or GitHub, even though those services operate over HTTP. It's weird that I get it, honestly.

I understand there's a huge planetary investment in HTTP, and so the collision of abstractly-reasonable privacy rights with that reality is an extremely hard engineering and policy problem. But that doesn't make the privacy rights unreasonable.

2 comments

> Why should it be the case that when you visit my personal website, which I happen to self-host, I have access to your IP address?

So when you misbehave, I have the means to block you in particular.

My personal website is a publicly-accessible static site. Blocking people from it is not meaningful.

It might be meaningful under the model of direct HTTP, where you could be DoSing me or trying to exploit my web server. But if you don't contact me over HTTP, then that problem doesn't arise. There's no meaningful concept of blocking people from a Usenet post I write. Even for indirect HTTP, I don't need to block people from my GitHub Pages or from my HN comments. They're public.

If I add dynamic feature like a comment system or discussion forum to my website, then it becomes meaningful, but also at that point I can implement a way for you to consent to sharing your IP address with me as part of signing up.

I get your point, but you said “happen to self host”. You’re (I think inadvertently) conflating content distribution and infrastructure. If you are self hosting a website then there aren’t other people distributing your content: you are. If you are just publishing content to be hosted on some static site platform like gh-pages, then yeah, blocking bad infrastructure players is their problem.

What’s what with the internet is that it allows both types of models, and both are widespread and actively used today. It wouldn’t be hard for e.g. GH to run EU servers and manage mirroring all content and static sites so that traffic is roughly region local. I wouldn't be surprised if they did this to some extent just for efficiency concerns irrespective of any legal ones.

You also seem to be conflating TCP and HTTP.

Yeah, but I think the term "self-host" is a little bit conditioned on the HTTP model itself. If we had some sort of Usenet-style distributed infrastructure, you could imagine two ways of publishing content, either running a static site generator locally and using that to push content to the world (and you would be directly negotiating with people about whether you're a spammer), or using some helper online tool a la WordPress to render the content and have them do the push (and they would take responsibility for making sure it gets pushed). In that world, where the end-to-end model of our world's HTTP is uncommon, I think people would still call the first way "self-hosting."

In fact I think we do use the term "self-host" in exactly that way when talking about "self-hosted newsletters." I can (and do!) run a newsletter where I generate the HTML and the MIME document locally, find an SMTP provider of my choice, and instruct it to directly mail recipients. I maintain the mailing list (in a text file in a Git repo) and pass it to my SMTP provider every time I do a mailing, and people contact me directly to sign up. I could also use Substack/Tinyletter/Buttondown/etc., which would have various advantages and disadvantages; the hosting provider would handle most of this for me, including maintaining the list of subscribers. You can also talk about "self-hosted Mailman," etc. In these cases, the self-hoster sees the email addresses of subscribers but not (necessarily) their IP addresses.

I don't think I'm conflating TCP and HTTP. NNTP, UUCP, and SMTP all use TCP, but they're designed in a way that doesn't have this property. In fact it's not even HTTP per se that's a problem. It's mostly about what I called "direct HTTP" - though you posted your comment to me over HTTP, there's no HTTP (nor TCP) connection to me.

(Also other comments claim that the CLOUD Act means that if GH the US entity runs EU servers, that doesn't actually solve the problem - it'd have to be a non-US entity not subject to US jurisdiction. That's why I think the old-school-web model of mirrors is a better example; they're generally run by universities or other entities with no legal relation to the site they're mirroring.)

Now you have mentioned mail providers.

It is illegal to have source ip address in EU based smtp relay?