Hacker News new | ask | show | jobs
by epicsponge 1140 days ago
Hi, original author here. Some comments:

> We sign the records so that authenticity can be determined without polling the home server, and we use a repository structure rather than signing individual records so that we can establish whether a record has been deleted (signature revocation).

Why do you need an entirely separate protocol to do this? Email had this exact same problem, yet was able to build protocols on top of it in order to fix the authenticity problem. This is the issue: instead of using ActivityPub, which is simpler to implement, more generic, and significantly easier for developers to understand, you invented an overly-complex alternative that doesn't work with the rest of the federated Internet.

> The schema is a well-defined machine language which translates to static types and runtime validation through code generation. It helps us maintain correctness when coordinating across multiple servers that span orgs, and any protocol that doesn't have one is informally speccing its logic across multiple codebases and non-machine-readable specs.

OpenAPI specs already exist and do the same job. They support much more tooling and are much easier for developers to understand. There is objectively no reason why you could not have used them, you are literally just making GET and POST requests with XRPC. If you really wanted to you could've used GraphQL.

There are plenty of protocols which do not include machine-readable specs (including TCP, IP, and HTTP) that are incredibly reliable and work just fine. If you make the protocol simple to understand and easy to implement, you really don't need this (watch Simple Made Easy by Rich Hickey).

> The DID system uses the recovery key to move from one server to another without coordinating with the server (ie because it suddenly disappeared).

Why is this necessary? The likelihood of a server just randomly disappearing is incredibly low. There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue. You're storing all of a user's post history on their own device in the case of an immediate outage. That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider. That is an extremely high cost (I have 55k tweets, that would be a nightmare to host locally) for an outcome that is very unlikely.

> It supports key rotations and it enables very low friction moves between servers without any loss of past activity or data.

This forces community servers to store even more data, data that may not even be relevant or useful. Folks might have gigabytes of attachments and hundreds of thousands of tweets. That is not a fast or easy thing to import if you're hosting a community server. This stacks the decks against community servers.

Most people want some of their content archived, not all, and there is no reason why archival can be separate from where content is posted. Those can be two separate problems.

> That design is why we felt comfortable just defaulting to our hosting service; because we made it easy to switch off after the fact if/when you learn there's a better option. Given that the number one gripe about activitypub's onboarding is server selection, I think we made the right call.

Mastodon is able to do this on top of ActivityPub. Pleroma works with it. Akkoma works with it. There's already a standard for this. Why are you inventing an unnecessary one?

Mastodon also changed their app to use Mastodon.Social as the default server, so this is a non-issue.

18 comments

I think it’s important to say this: I think asking questions is great, and I’m glad that we’re not just taking statements at face value because making social suck less is a worthy goal.

However, you are coming across as highly adversarial here. Mostly because you immediately follow your questions with assertions, indicating that your questions may be rhetorical rather than genuine.

I’m not accusing you of anything per say but I very much want a dialog to happen in this space and I think your framing is damaging the chances of that happening.

Whether on Twitter or Mastodon, people deep into that type of social network love TO SHOUT LIKE THIS to get likes or boosts.

It is why passersby like me can't get into either Twitter or Mastodon when it is a culture of getting outraged and shouting at each other, to collect a choir of people nodding and agreeing in the replies: "well done for saying it like it is."

These people forgot how humans talk and have arguments outside of their Internet echo chambers.

Anger, insults and hate sells more (creates more engagement) than reasoned arguments. No one would have posted this on HN if is was otherwise. So don't worry, they are not hurting their chances, the next topic that can be summarised with an angry title like "xxx is the most obtuse crock of shit" will get great traction on HN.

They're explicitly not debating in good faith:

"Also I don't care if I'm spreading FUD or if I'm wrong on some of this stuff. I spent an insane amount of time reading the docs and looking at implementation code, moreso than most other people. If I'm getting anything wrong, it's the fault of the Bluesky authors for not having an understandable protocol and for not bothering to document it correctly."

(https://urbanists.social/@sam/110340956133434975)

Yeah the way he conflates "crypto" to refer both cryptography and cryptocurrency, and the rhetoric itself is quite odd: https://urbanists.social/@sam/110340265606422596

It's unfortunate because there are some valid points in his criticism.

"crypto" was used to mean cryptography long before it was used to mean currency, and in some circles still primarily means cryptography.
> in some circles still primarily means cryptography.

It still does in my circle. The overloading of "crypto", though, has become such a source of confusion and misunderstanding that I have stopped using it and just use the full word, be it cryptography or cryptocurrency, instead.

I don't think it's "not in good faith" to say "I made a real substantial effort to understand this, and am trying to describe it accurately; if at this point my descriptions don't match the reality, it's not my fault but that of the people who made it impossible to understand".

(Of course it's perfectly possible, for all I know, that SW is not debating in good faith. But what you quote doesn't look to me like an admission of bad faith.)

I don't see what charitable take could possibly be made wrt "I don't care if I'm spreading FUD" even with all these caveats.
Well, I thought I already described what seemed to me to be a charitable and reasonable take on it.

"I put as much effort in as can reasonably be expected; I tried to evaluate it fairly; but the documentation and supporting code is so bad that I may have made mistakes. If so, blame them for making it impossible to evaluate fairly, not me for falling over their tripwires."

If something is badly documented and badly implemented, then I think it's OK to say "I think this is badly designed" even if you found it incomprehensible enough that you aren't completely certain that some of what looks like bad design is actually bad explanation.

If some of the faults you think you see are in fact "only" bad documentation, then in some sense you're "spreading FUD". But after putting in a certain amount of effort, I think it's reasonable to say: I've tried to understand it, I've done my best, and they've made that unreasonably difficult; any mistakes in my account of what they did are their fault, not mine.

(I should reiterate that I haven't myself looked at the AT protocol or Bluesky's code or anything, and I don't know how much effort SW actually put in or how skilled SW actually is. It is consistent with what I know for SW to be just maliciously or incompetently spreading FUD, and I am not saying that that would be OK. Only that what SW is admitting to -- making a reasonable best effort, and possibly getting things wrong because the protocol is badly documented -- is not a bad thing even when described with the words "I don't care if I'm spreading FUD".)

I agree, thank you for stating that in a respectful way.

The linked article/toot and certain replies seriously makes me want to just shutdown my personal mastodon server and move on from the technology altogether.

> The likelihood of a server just randomly disappearing is incredibly low. There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue.

This has actually happened. It's a real problem. For example, "Mastodon instance mstdn.plus with over 4K users suddenly broke" https://lapcatsoftware.com/articles/mastodon.html

As far as I'm concerned, the Mastodon Server Covenant is a joke.

I came here to day this.

Another example: Mastodon.lol, which had 12,000 users literally shutdown a few hours ago. They did manage to give notice but the point remains that people had to move instances, cannot take their posts with them, and it’s a giant PITA, server covenant or not.

To call this stuff a “non-issue” seems incredibly obtuse, especially when the data portability piece is clearly an after thought by the Mastodon devs, and something that ActivityPub would need some major changes to get accomplished. Changes that the project leads have been fairly against implementing.

there's also a trend of most servers not even being compliant with the 'covenant'
Y’all should see the dead letters in the publish queues from dead indie servers of which thousands have gone offline but whose addresses will get looked up forevermore
> Email had this exact same problem, yet was able to build protocols on top of it in order to fix the authenticity problem.

On the contrary, email has no solution to the authenticity problem that’s being talked about. Even what there is is a right mess and not even slightly how you would choose to build such a thing deliberately.

If you want to verify authenticity via SPF/DKIM/DMARC, you have to query DNS on the sender’s domain name. This works to verify at the time you receive the email, but doesn’t work persistently: in the future those records may have changed (and regular DKIM key rotation is even strongly encouraged and widely practised).

What you are replying to says that AT wants to be able to determine authenticity without polling the home server, and establish whether a record has been deleted. Email has nothing like either of those features.

I think they're talking about GPG, not SPF/DKIM/DMARC.

Which is a risky thing to do, because most people don't associate GPG with positive feelings about well designed solutions, but they're right in that it works well, solves the problem and is built squarely on top of email.

The reason that it's not generally well received is that there's no good social network for distributing the keys, and no popular clients integrate it transparently.

In this case GPG, DKIM and even S/MIME are on equal standing. Validity can be checked only on reception because there's no validity stapling mechanisms.
I’m curious about this. So email that I’ve sent, let’s say from a gmail account to an iCloud account, isn’t guaranteed to be verifiable years later because of dkim key rotation?

That’s not great. I wonder if the receiver could append a signed message upon receipt with something like “the sender’s identity was valid upon receipt”.

The receiver absolutely does that with the Authentication-Results header, but can you trust its integrity in your mailbox, your email provider and all your email clients (to not modify it)? It's indeed not great for non-repudiation.
> I wonder if the receiver could append a signed message upon receipt with something like “the sender’s identity was valid upon receipt”.

That's exactly what does happen, if you view the raw message in GMail/iCloud, you should see DMARC pass/fail header added by the receiving server (iCloud in your example).

(Well not exactly, it's not signed, but I'm not sure that's necessary? Headers are applied in order, like a wrapper on all the content underneath/already present, so you know in this case it was added by iCloud not GMail, because it's coming after (above) 'message received at x from y' etc.)

Thanks for the response. Do you know if this extra “dkim sig was verified header” is part of a protocol or is it just something that is done bc otherwise bad stuff happens?

I’m also curious how this plays into the original comment about dkim/spf/dmarc not being sufficient due to key rotation still factors into the conversation after having discussed this?

I'm not sure off the top of my head, I'd guess it's a MAY or SHOULD. Verifying DKIM/SPF/DMARC is optional anyway, if you want to just read everything without caring you can; you've received the message by that point, I can't see what bad stuff would happen if it wasn't added.

Key rotation would have the same effect as 'DNS rotation' (if you stopped leasing the domain, or changed records) - you might get a different result if you attempted to re-verify later.

I just don't really see it as a problem, you check when you receive the message; why would you check again later? (And generally you 'can't', not as a layman user of GMail or whatever - it's not checked in the client, but the actual receiving server. Once it's received, it delivers the message, doesn't even have it to recheck any more. Perhaps a clearer example: if you use AWS SES to receive, ultimately to an S3 bucket or whatever for your client or application, SES does this check, and then you just have an eml file in S3, there's no 'hey SES take this message back and run your DKIM & virus scan on it again'.)

It's just for humans, it's not usually used for anything else. For machines we have ARC (Authenticated Received Chain) which basically contains almost the same info but signed across the entire chain.
The notion that server disappearance is a non-issue is quite misleading. Servers go offline for various reasons, such as technical difficulties, financial constraints, or legal issues. Recovering and transferring data without relying on the original server is essential for users to maintain control over their data and identities. DIDs and recovery keys provide a valuable solution to this problem, ensuring user autonomy.

Your reply fails to address that push-based systems are prone to overwhelming home servers due to burst loads when content becomes viral. By implementing pull-based federation, the AT Protocol allows for a more balanced and efficient distribution of resources, making self-hosting more affordable and sustainable in the long run.

> The likelihood of a server just randomly disappearing is incredibly low.

Everything else aside, this is completely untrue.

I self-hosted my first fediverse account on Mastodon and got fed up with the complexity of it for a single person instance and shut it off one day (2018 or so?).

On another account at some point 50% of my followed people vanished because 2 servers where everyone in that bubble were on just went offline. Took a while to recreate the list manually.

This may be anecdotal but I've seen it happen very often. Also people on small instances blocking mastodon.social for its mod policies comes close to this experience.

Alternatively: the likelihood of any one server going away tomorrow is small, but the likelihood of something in your social graph going away tomorrow is high.
> I have 55k tweets, that would be a nightmare to host locally)

theyre tweets, how much could they cost? @ 280 bytes each, that's like 15MB. double it for cryptographic signatures and reply-to metadata. is that really too much to ask for the capacity to transfer to another host at anytime?

(also, leaving aside the fact that 55k tweets puts you in the 0.1% of most prodigious users)

I have every post made on BlueSky up to a certain point last weekend and it's only 3 GB.

I have every email I've ever received or sent (and not deleted) and it's only 4GB.

Should something require I download all that every time I login? No. But having a local copy is amazing, and a truly federated system should have and even be able to depend on those.

The Mastodon Server Covenant is a joke; the only enforcement is to remove the server from the list of signup servers; which if it just fell over dead because the admin died/doesn't care/got arrested/got a job will not matter.

How have you pared down your email to just 4GB?
Not sure, I guess I don't send or receive many attachments and delete marketing/spam.

My work email is 10gb.

As of my last Twitter export, I had 54425 tweets and the tweet data comes to 110M. But there's also 2G of media files that goes with it.
So… not only fits on ever popular cloud storage providers free tier, but also your phone.

Seriously. This is setting off all sorts of red flags. I’m old enough not to trust non w3c standards.

How did we get to 55k tweets being a nightmare for any social media platform?

A quick search got me to twitter stats from 2013 when people were posting 200 billion tweets per year. Thats 5-6 orders of magnitude more. You don't get a 10000x improvement just by federating and hosting multiple nodes.

The discussion here was about archiving each user's tweets on their own client device - this is where the 55k was brought up as a problem. I still think it's a low number, even if it includes plenty of images.
With a decent amount of images and videos this can easily be 100+GB. Even if it's a fraction of that, not something I want to sync down to my device.
> double it for cryptographic signatures and reply-to metadata

Ah, email, where a message of 114 characters with no formatting ends up over 9KB due to authentication and signatures, spam analysis stuff, delivery path information and other metadata. Sigh. Although I doubt this will end up as large as email, the lesson is that metadata can end up surprisingly large.

In this instance, I think 1–2KB is probably more realistic than the half kilobyte of “double it”.

There’s also all the media to go along with them.
Pretty sure a nontrivial percentage of peoples smartphones have that many thumbnails for the photo gallery app alone.

Anybody who has had a smartphone for a decade likely has at minimum 10k photos in their cloud locker with local thumbnail.

Ok, so add some more megabytes to that. Most people don't have that much microblogging data.
I actually think photos could potentially add up to quite a lot!
Sure, they could. Most people don't post tons of hi res photos. But I'm sure there are ways you could optimize to not have all the content on local device, if it's such a big deal. But this is a really strange point to me to be hung up on.
I think you’d be surprised at the number of photos posted, but also many people post tons of gifs (especially reaction gifs) which are fairly large.
"The likelihood of a server just randomly disappearing is incredibly low."

No. Just no.

If (IF!) some distributed social network breaks through and hundreds of millions or billions of people are participating, they are going to do things that The Powers That Be don't like. For better or worse, when that happens they will target servers, and servers WILL just disappear. Domains will disappear. Hosting providers will disappear. You can take that straight to the bank and cash it.

Uncoordinated moves are table stakes for a real distributed social network at scale. The fact AT Protocol provides this affordance on day one is a great credit.

> That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider. That is an extremely high cost (I have 55k tweets, that would be a nightmare to host locally) for an outcome that is very unlikely.

If your identity is separate from your Gmail account (as it can be with a custom domain, for email and for bluesky), this seems like a very plausible and desirable thing to be able to do. Just recently there was an article about how Gmail is increasing the number of ads in the inbox; for some people that might change the equation of whether Gmail's UX is better than it is bad. If packing up and leaving is low-friction enough, people might do it (and that would also put downward pressure on the provider to not make the experience suck over time)

And that's not even getting into things like censorship, getting auto-banned because you tripped some alarm, hosts deciding they no longer want to host (which has happened to some Mastodon instances), etc.

> The likelihood of a server just randomly disappearing is incredibly low.

It happens all the time. mastodon.social, the oldest and biggest Mastodon instance, has filled up with cached ghost profiles of users on dead instances. Last I checked, I could still find my old server in there, which hasn't existed for several years.

Email has only solved the "authenticity problem" by centralizing to a tiny number of megaproviders with privileged trusted relationships. Forestalling that sort of "solution" seems to me one of the Blueksy team's design goals.

Servers go down or get flaky all the time for various reasons. Easy relocation (with no loss of content & relationships) and signed content (that remains readable/verifiable even through server bounciness) soften the frustrations.

55k tweets is little challenge to replicate, just like 50k signatures is little challenge to verify, here in the 2020s.

If Mastodon does everything better with a head start, it should have no problem continuing to serve its users, and new ones.

Alas, even just the Mastodon et al community emphasis on extreme limits on visibility & distribution – by personal preferences, by idiosyncratic server-to-server discourse standards, by sysop grudges, whatever – suppress a lot of the 'sizzle' that initially brought people to Twitter.

Bluesky having an even slightly greater tilt towards wider distribution, easier search, and relationships that can outlive server drama may attract some users who'd never be satisfied by Mastodon's twisty little warrens & handcrafted patterns-of-trust.

There's room for multiple approaches, different strokes for different folks.

> Why do you need an entirely separate protocol to do this? Email had this exact same problem, yet was able to build protocols on top of it in order to fix the authenticity problem.

But if we started today, we wouldn't build email that way. There are so many baked-in well-intended fuckups in email that reflect a simpler time where the first spam message was met with "wtf is this, go away!" I remember pranking a teacher with a "From: president@whitehouse.gov" spoofed header in the 90s.

Email is the way it is because it can't be changed, not because it shouldn't be.

I'm sorry but this is ridiculous. Just because a protocol exists doesn't mean that if someone doesn't build on top of it, you can describe it as a crock of shit.
> > The DID system uses the recovery key to move from one server to another without coordinating with the server (ie because it suddenly disappeared).

> Why is this necessary? The likelihood of a server just randomly disappearing is incredibly low.

The likelihood of a server just randomly disappearing at any point in time is low. The likelihood of said server disappearing altogether, based on the 20+ years of the internet, can & will approach 100% as the decades go on. Most of the websites I know in the early 2000s are defunct now. Heck, I have a few webcomic sites from the 2010s in my bookmarks that are nxdomain'd.

Also, as noted by lapcat, these sudden server disappearances will happen. Marking this problem as a non-issue is not, in any realm of possibility, a good UX decision.

https://news.ycombinator.com/item?id=35883409

This is coupled with the fact that Mastodon (& ActivityPub in general) don't have to do anything when it comes to user migration: The current system in place on Mastodon is completely optional, wherein servers can simply choose to not allow users to migrate.

https://news.ycombinator.com/item?id=35883570

https://news.ycombinator.com/item?id=35884682

> There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue.

*The Covenant is not enforced in code by Mastodon's system, nor by AcitivtyPub's protocol.* It's heavily reliant on good faith & manual human review, with no system-inherent capabilities to check if the server actually allows user data to be exported.

> You're storing all of a user's post history on their own device in the case of an immediate outage. That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider. That is an extremely high cost (I have 55k tweets, that would be a nightmare to host locally) for an outcome that is very unlikely.

An outcome *that can still happen*. As noted by the incidents linked above, they're happening within the Mastodon platform itself, with many users from those incidents being unable to fully recover their own user data. Assuming that this isn't needed at all is the equivalent of playing with lightning.

The recovery key bit is the one part I actually like.

But improving on the ActivityPub user migration store is also a minor/trivial change away from doing much better than today: you just need to change ActivityPub Ids to either fully a contentadressable hash or referencing a base that is under user control, plus a revocation key style mechanism for letting the user sign claims about their identity in order to allow unilateral moves.

you say "objectively" a lot when most of what you write seems to be just overly emotional flashy wringing
> Mastodon also changed their app to use Mastodon.Social as the default server, so this is a non-issue.

They have instead created another issue.

Before, it was a usability issue which normal users looking for an alternative social network got confused on the sign up process and gave up.

If that wasn’t an issue why did the Mastodon devs decide to select a default server in the app after seeing this?

Now, they have traded that off and created a centralization issue going against the point of encouraging federation.

This only shows that centralization wins in the end.

> You're storing all of a user's post history on their own device in the case of an immediate outage. That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider.

In the world I currently live in, all my emails are stored locally on my devices. Also, text files take up little to no storage, so why does it matter?

> Why is this necessary? The likelihood of a server just randomly disappearing is incredibly low. There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue.

I literally read about a case over a month ago where some obscure Mastodon-server admin blocked someone's account on their server so it was impossible to move to another instance. The motivation was "I don't want capitalist here, can change my mind for money" (slightly paraphrasing). Basically, it's stupid to use any Mastodon instance other than the few largest one or your own.

That's why BlueSky's approach makes sense.

>with the rest of the federated Internet. You're saying like it's a thing that won and not a niche project for <10M users globally.