What else would "self-hosting all of Bluesky" mean other than a copy of the entire site? If you just want to participate in the network host a PDS, which only stores your own posts.
Surely there's some middle ground between only hosting your own data and being reliant on another site to keep track of your following / followers and hosting a duplicate copy of the entire network?
I'm talking about the case where you wanted to run your own PDS and use all of the other infrastructure being run by Bluesky.
If you fully want your own copy of everything, then you'd want to run a copy of everything. But you don't have to. It really depends on what your goals are. That's why the post is about the maximal scenario. "Just your own PDS" is the minimalist scenario. But I think it's the one that makes sense for 95% of users who want to self-host.
Right, and I'm saying "surely there must be a middle ground between "using all of Bluesky's infrastructure" and "having a 4.5tb copy of every post ever made on the network""
Your following list is stored in your own repo, so it lives on your PDS. You can theoretically have partial replicas of the network but nobody has bothered yet; if you want to make software like that, a good start would be subscribing to the firehose and filtering down to DIDs you care about / supplying the watched DIDs parameter to a Jetstream instance