Hacker News new | ask | show | jobs
by kyle-rb 285 days ago
Running a PDS server for yourself and a few friends is not very expensive afaik, but there's also not much benefit to doing so, because the point of the PDS is to have a clean separation between your own data and the rest of the network.

The expensive things in ATProto are the Relay (crawls/listens to PDSs to produce the firehose) and the AppView (keeps a DB of all posts/likes/etc to serve users' requests). Expensive at scale anyway; if you want your own small network for hosting non-Bluesky posts (like WhiteWind's longer character limit), the event volume will be manageable.

For a lot of stuff though ATProto is built in a way that you shouldn't have to host your own; you can implement your own algorithmic feed that reads from the Bluesky Relay's firehose, or your own frontend that still gets data from the Bluesky AppView.

2 comments

> For a lot of stuff though ATProto is built in a way that you shouldn't have to host your own; you can implement your own algorithmic feed that reads from the Bluesky Relay's firehose, or your own frontend that still gets data from the Bluesky AppView.

ATProto isn't "built this way".

Twitter was also built in a way where you could implement your own stuff - and then Twitter took that away.

With Mastodon, there is one large instance (controlled by the non-profit Mastodon gGmbH). If they tried closing themselves off, their users would be losing access to the majority of people in the network. Plus, while non-profits aren't perfect, they don't have VC investors to answer to.

Bluesky could decide to stop publishing the firehose or restrict its APIs - just as Twitter did. Given that they control 99.55% of the network, they can close it off without worrying about their users losing access to anything. And Bluesky is a for-profit company that has raised around $30M in VC.

What you talk about isn't a feature of ATProto. It's a feature of being centralized and having a company willing to let you use their servers for free (at least for now). This was the case with Twitter for a long time. You could read the Twitter firehose and build your own apps and frontends getting data from the Twitter APIs - just as you can do from Bluesky today.

But unless there's a reason why shutting off the firehose/APIs would be bad for Bluesky, they can do that at anytime. It might anger some users (as Reddit and Twitter both did), but they control the network and network effects are powerful. For most Bluesky users, they'd continue using it because they aren't there for some open protocol. They're on Bluesky because Twitter became a nazi bar. Until we see real decentralization with ATProto, it's just a centralized network like Twitter or Reddit which hasn't shut off its firehose and public API yet. Hopefully that won't happen, but it certainly could.

> ATProto isn't "built this way".

ATProto is built this way. "Closing off bluesky" would be an extraordinarily non-trivial process and would break basically everything. This is in large part why private data isn't a thing yet. The architecture is diametrically opposed to it.

> Bluesky could decide to stop publishing the firehose or restrict its APIs - just as Twitter did.

They could "technically" completely rearchitect their application frontend and backend to do this but any effort to do so would be visible from miles away given the architecture and warning claxons would start ringing immediately.

> And Bluesky is a for-profit company that has raised around $30M in VC.

Bluesky PBLLC is a for-profit "public benefit corporation" for what it's worth and the way they are structured would open them up to legal consequences if they were to move diametrically opposed to the mission they were founded on. Not just because they are a PBLLC but because their initial investment funding was drawn up under an explicit contract that views moves against "the development of decentralised social media" as a violation of the terms of that contract.

No I think ATProto is "built this way". The firehose is just the output of the Relay. If Bluesky wanted to shut off the firehose, they would have to make changes to ATProto or stop conforming to ATProto.

I understand that Bluesky's conformance to ATProto is just a promise, but it's a better promise than you get from most websites. Also in the meantime, if you migrate to a self-hosted PDS, you can ensure that even if Bluesky restricts access to their Relay's firehose, 3rd party Relay servers can still pick up your posts and publish their own unrestricted firehose.

What do you think would happen if the Bluesky company suddenly blocked everyone but https://bsky.app/ servers from using their relays?

And what if, before they did that, they updated the PDS code so it blocked all relays except for their one?

I'm not asking what you would do. I'm asking what would happen because of what everyone does. I think the name "Bluesky" would refer to the fully centralized bsky.app, and 99.9% of users would never notice a difference. Users who had other PDSes would either quit (nobody noticing their departure) or sign up to bsky.app like everyone else. The events of Twitter show it's probably the latter - people bent over backwards to comply with Musk to keep their accounts.

> What do you think would happen if the Bluesky company suddenly blocked everyone but https://bsky.app/ servers from using their relays?

That's not how it works. Appviews pick the relay they use, not the client/user. The relay is used for gossip into the appview (and other things).

More importantly, appviews never see the client/user directly. Appviews only talk to the PDS. Really most things other than the client ever only talk to the PDS or listen to the relay. The only thing that ever directly talks to the client is the PDS.

The way atproto services generally work is the client configures a series of XRPC requests with HTTP headers to determine what appview, labelers, etc to use and it issues that request to the PDS. The PDS then proxies that request to the appview or wherever and they respond back to the PDS which routes the response back to you.

So in a real sense your PDS is not just a data host, but also operates akin to an IRC bouncer.

-----

> And what if, before they did that, they updated the PDS code so it blocked all relays except for their one?

PDS relay routing, etc is mostly all handled manually via config files,etc so this isn't really a concern. And PDS code is probably the "easiest" part of the ecosystem to hack on which is why there are like 6 different implementations with the majority (like 4) that maintain near feature parity with the "bluesky PDS" software.

And importantly, the bluesky PDS is literally a sqlite DB, an OAUTH implementation, some go IPLD data structure manipulation code, and a go XRPC router. It's fairly trivial to hack on as needed.

------

> I'm not asking what you would do. I'm asking what would happen because of what everyone does. I think the name "Bluesky" would refer to the fully centralized bsky.app [...]

Migration currently isn't perfect but within ~6 months it should be ironed out by the community at which point migrating off a PDS to another is just a matter of:

1. click button on new PDS to transfer/"create new account".

2. set your new email, password, and list your old/current handle.

3. get auth code via email (one from the new PDS and one from your DID provider)

4. input codes into migrator interface (for whichever migrator you are using)

5. log into your apps again.

There are multiple large PDS operators working really really hard to spin up operations (proper backups, failover, HA, etc) so they can run reliably and avoid the "my mastodon instance imploded guess everything is gone" issue. Open federation is only about ~ a year old (plus change) so the community is only just now really reaching the "mature third parties" stage.

So, you're making stuff up that obviously has no basis in reality here.

Migration would be impossible because the bsky.app PDS wouldn't allow anyone to access the data except for the bsky.app relay.

other appviews wouldn't display bsky.app data because both the PDS and relay would block them.

First of all, Bluesky the company would have to implement auth on the communication between relay and AppView, cut off everyone from using their relay, and specifically stop ingesting content from non-Bluesky PDSs. There would be time when we realize that's happening to get another AppView up and running. There are arguably enough resources in either the ATproto dev community or other funding sources that another AppView could pop up within a week or two, and be maintained by the community for months without issue. And Bluesky wouldn't likely stop existing altogether, so people would log into Bluesky, see the news (or hear it elsewhere), and see people talking about how to get access to your account.

Secondly, while you're right that most people don't have copies of their repos, some copies of the entire network exist in the community, as well as copies of the PLC Directory. Within a couple weeks we would have community run versions of the AppView, PDSs, Relays, and PLC, and some seriously pissed off community members who would now want to do everything they can to take up the mantle temporarily. Soon Bluesky the app, as well as Instagram and Twitter, would be flooded with tutorials of how to recover your accounts and migrate to another PDS.

If your point is that we'd lose at least half of the users of Bluesky, then yea probably. But ATproto would be just fine, and if people want to get their data back they likely can, as long as they can stomach a little work. And that process is getting easier all the time.

No. That's just not remotely close to true or feasible.

> So, you're making stuff up that obviously has no basis in reality here.

I cannot understand why you are claiming this. I'm basing off the actual architecture and the way the parts interact. The design is just not feasible for locking down. Doing so completely breaks the model and it still leaks like a sieve if you try to.

----------

> Migration would be impossible because the bsky.app PDS wouldn't allow anyone to access the data except for the bsky.app relay.

Nope. Migration is still fully possible. Migration doesn't happen via the relay or any PDS->PDS mechanism. Migration is done via the client. The client/user runs operations on the source PDS, the destination PDS, and the DID registry. All the data is transported between by the client.

Specifically the way it works is you export/backup your information from your current PDS (in the form of a CAR file + blobs). Technically this step is optional. Even if the PDS goes offline or becomes hostile you can actually largely reconstruct this data from the network. Then you "create a new account" on the new PDS and upload your data that you backup up/recovered onto the new PDS. Then you update your DID to point to the new PDS. And finally you deactivate the account on the original PDS (basically saying I no longer store stuff here anymore).

This is part of the reason why migration tooling is a bit bumpy. Your JS script or app has to do the entire process by itself rather than letting the backends handle it. However it does make them extraordinarily resistant to data loss and/or takeover.

----------

> other appviews wouldn't display bsky.app data because both the PDS and relay would block them.

Relays work via gossip. If you can see the relay at any point, you can gossip 100% of their contents to another relay.

In the event bluesky PBLLC locked down their appview and PDS, they'd still have to make the relay open or everything breaks. Feed providers need access to the firehose. Labelers/Moderation Services need access to the firehose. And so on.

Everything is built with an assumption of a public firehose and if you lock down the firehose, all you need is one person to listen to the locked down firehose to 100% replicate it and gossip onto any other relay.

there are already independent relays running. a full relay (which is open source software) costs around 30 USD per month to operate
Running a relay is not expensive anymore (it used to be), with recent changes it's about $30/mo. Running an AppView that ingests all ongoing Bluesky traffic (and puts it into database) is more expensive ($300/mo currently) but if you were happy with a partial view of the network, you could get it down by a lot.
> Running a relay is not expensive anymore [...]

$30/mo is $360/yr, which for most people is a prohibitively large sum of money. That would make Bluesky access more expensive than even the most expensive Netflix subscription; closer to the cost of a cellular plan.

For comparison: for my Mastodon account I pay $5/mo or $60/yr to a dedicated hosting provider. This puts it in the same ballpark as paying for a private email host or a VPN subscription.

This makes sense, but a Relay isn't something you'd expect a normal user to run.

It doesn't meaningfully make you "more independent" because all Relays are trivial (they're just dumb re-broadcasters of a stream) and it makes sense to use one run by somebody else — a company or a community that's pooling resources.

Aren't the entities who manage relays the ones who broker access to the network? E.g. if you subscribe to a relay, you must also subscribe to their moderation decisions.

Say if Bluesky (the company) bans someone, that person could still have the keys to their data, but their feeds will no longer be "re-broadcast" by that company's servers - right?

Relays aren't really user-facing, so most relay operators would probably only censor a user on the relay if there is some legal or network reason to do so, say content illegal in their jurisdiction or excessive spam.

If an app is concerned that a relay is censoring some user that they care about, the easiest solution is just to host their own relay. It's probably cheaper to operate than their app is. But if they really wanted to, they could listen to multiple relays to "cover the gap" or just manually listen to the event stream from specific users' PDSs directly whenever they notice censorship (effectively operating a partial relay in addition to listening to a full but censored one). But, again, in reality they'd just host their own relay and not bother complicating things.

The hardest problem of relays censoring content is to notice it happening, but once you notice you can easily verify it and switch to a different relay.

The PDS is closer to the Mastodon account and will run you the same amount of money. The relay or appview is what takes the load when one of your posts goes viral, whereas in mastadon your $5 VPs has to handle that spike in traffic. Been several a story about how AP has DDoS'd a small server because there is no equivalent to the relay
There's no real reason to set up a relay for just one person, though.

It's less like a cellular plan and more like building your own private cell tower just because you can.

I thought only the way to avoiding the full firehose was connecting to Bluesky's centralized Jetstream instances, and that if anyone else wanted to host Jetstream without depending on Bluesky infra other than the PDS, they would still need to pay the full price for the firehose bandwidth and storage.

I'd be happy to be wrong here though.

Yes and no.

If you want to avoid the entire bandwidth of the firehose, you need something like jetstream (at least until something like sharded relays come around).

However the relay gossip protocol is not as taxing as it used to be. Relay Sync 1.1 massively decreased overhead and it allows relays to run "thin", i.e. running with only a certain backlog of history and not carrying the full history of the network. So you can make a relay that only keeps 24 hours of history and it'll perpetually stay under like 100gb of storage (I don't remember the exact storage amount but storage size is pretty linear with backlog history).

I'm admittedly not versed in the AT protocol, but $30/month is high for me. Something like $10/month is quite fair, and i would expect should be more than enough for a VPS to host my and 2 other members of my family for any social network service...This $10/month is overkill for other servers on the ActivityPub via say gotosocial, pleroma, etc. Not that ActivityPub is perfect or anything like that, i just mean that $30/month is not yet what i would call a sweet spot for self-hosting something...but of course, that is absolutely leagues better than any cost that the bluesky team or other bigger players would naturally have to pay for ongoing infra., etc.

Then again, i will not deny that there's also the possibility that i am simply cheap! :-)

Right, but a Relay is literally a network-wide optimization. That's why in my other comment I'm mentioning that it's hard to do apples-to-apples comparison. With Mastodon, you only have one possible role: "hosting an instance" (which is like hosting a mini-Twitter with almost no data). ATProto aims higher in terms of UX (shared world by default) so there are different distributable pieces with different incentives to run. Anyone can self-host a PDS for super cheap (which will store only your data). But running a Relay is useful as an optimization for whoever's running the actual backends. So if you're a company building on ATProto, maybe you run your own Relay just in case (or maybe you don't and reuse an existing one). Or if you're a collective of people, maybe you pool resources together to run your own independent Relay. It's not something you'd just run for yourself unless you're a huge enthusiast.
Gotcha, thanks for that context!
An important note here is the $30/mo is for the relay which is _not_ something the average user will ever need to run

Instead, you're looking for hosting a PDS which you absolutely can do for $10/mo (or less)

I run a PDS on a OVH Cloud VPS for $5/mo for myself, some alts, and some bots

Aaah, ok, thanks for the clarification!