Hacker News new | ask | show | jobs
by half-kh-hacker 593 days ago
What else would "self-hosting all of Bluesky" mean other than a copy of the entire site? If you just want to participate in the network host a PDS, which only stores your own posts.
1 comments

Surely there's some middle ground between only hosting your own data and being reliant on another site to keep track of your following / followers and hosting a duplicate copy of the entire network?
For sure. If you just want to host your own data, you can do that. A PDS for you and maybe some friends is very small and cheap to host.
My understanding though is that having a PDS on its own is useless without an AppView to collect the data from the relay? Or am I misunderstanding the architecture here? https://docs.bsky.app/docs/advanced-guides/federation-archit...
I'm talking about the case where you wanted to run your own PDS and use all of the other infrastructure being run by Bluesky.

If you fully want your own copy of everything, then you'd want to run a copy of everything. But you don't have to. It really depends on what your goals are. That's why the post is about the maximal scenario. "Just your own PDS" is the minimalist scenario. But I think it's the one that makes sense for 95% of users who want to self-host.

Right, and I'm saying "surely there must be a middle ground between "using all of Bluesky's infrastructure" and "having a 4.5tb copy of every post ever made on the network""
What exactly would that be?

I feel like the middle ground your talking about could be just a feed?

A feed is: a server that consumes the firehose and decided on whether to store posts, when loaded in the app it returns some post to create a feed

So essentially you only store references to part of the network rather than storing the whole thing

consider the nostr protocol
Your following list is stored in your own repo, so it lives on your PDS. You can theoretically have partial replicas of the network but nobody has bothered yet; if you want to make software like that, a good start would be subscribing to the firehose and filtering down to DIDs you care about / supplying the watched DIDs parameter to a Jetstream instance
The middle ground you're looking for is impossible in the AT protocol, it is however what the Nostr protocol is aiming towards.