Hacker News new | ask | show | jobs
by zdimension 587 days ago
Very nice. Modern GPUs really are fast as drawing points.

It's pretty similar to a project I've been working on for the past year, scraping Facebook instead of BlueSky (which is a bit harder since FB doesn't expose an API for that). I currently have about 140 million nodes on my scraped graph and a GUI with pathfinding and stuff like that.

It's a shame though because as nice as the thing is, I'm not sure I can publish it online, given it contains names of people. I don't think the GDPR would be very happy.

Which is why I'm a bit surprised you published this, aren't you afraid of people, uh, disliking the fact that they're present in your dataset?

1 comments

AT proto is an open network. Everything you do is public by default. e.g. anyone else can just drink from the firehose.
Yeah, but that doesn't solve the data privacy problem. Not that I care, I'd love to be able to do all sorts of stuff with scraped datasets.
One would hope the people on bluesky understand that they're posting publically to a centralized database. What data privacy problem are you concerned with?
As I understand it, the moment you're processing someone's personally identifiable information, you're in the red zone, GDPR-wise. The users consented to publish their info on BlueSky, but not on OP's website.

I get the idea behind the GDPR and it's nice to protect consumers but I'm scared for hobby projects like this.

I think GDPR itself is a bit unclear here. Google Search still operates in Europe as far as I know even though it scrapes and indexes people's websites without explicit consent, and I suspect GDPR doesn't intend to make it illegal to do this. Could be wrong...

IANAL but at least in the U.S. I'm pretty sure publicly-available data is generally excluded from whatever protections do exist on PII. I'm not sure what, if anything, has been said about this in the EU.