Hacker News new | ask | show | jobs
by benjaminjackman 3126 days ago
Only have time to skim it, I didn't see anyplace, so might be a good time to suggest multihash: https://multiformats.io/multihash/

Having git to use that could be a great opportunity to standardize on a de facto hash function encoding standard.

What would be the best way to suggest that (if it hasn't been already, though I am guessing it likely has).

4 comments

But does not solve the problem. Multihashes are not unique identifiers of a message, which is what git mostly uses hashes for. Now, instead of a single unique identifier, you have N possible ones, where N is the number of hash implementations your multihash library has. And it is not possible to convert between two hash types without having the original message.
Wasn't there an issue with JWT that was summarized as this:

"This is a good idea, but it doesn't solve the underlying problem: attackers control the choice of algorithm" ?

Here's another quote from the Wireguard paper[1]:

"Finally, WireGuard is cryptographically opinionated. It intentionally lacks cipher and protocol agility. If holes are found in the underlying primitives, all endpoints will be required to update"

[1]: https://www.wireguard.com/papers/wireguard.pdf

Sorry I wasn’t suggesting allowing any algorithm to be used just whichever one was chosen next it be encoded in a way that if it needs to be replaced again it could, and also if possible that numeric id for that algorithm be standardized beyond just git.

https://github.com/multiformats/multihash/blob/master/README...

That’s only true of JWT if you allow your server to accept all algorithms.

You don’t actually have to.

Correct, your token authority should specify which algorithms are valid, and your clients should self configure via a secure back channel to only accept the algorithms your token authority issues.
Exactly! JWT is a much misunderstood system it seems. Though it doesn’t exactly help itself by being quite complex
Well-designed protocols generally include algorithm identifiers. It doesn't mean that upgrade will always be easy though.

I really don't like given this a new name ("multihash"). We have a name already: algorithm agility. We should use that name.

I also don't like this idea of having a standard for algorithm agility for hash functions (and another for encryption algorithms, and...).

It's also not obvious that making every hash/MAC/public key payload carry an algorithm ID is the right design for every protocol (it's not), though for git it is.

Yeah this came out of the IPFS camp, might be sensible though to use the same numeric id numbers for the hashing algorithm ids though all other things being equal.

Generally, and this is just my gut feeling, I think that for any hash code written to disk or stored in some way having an identifier for the hashing algorithm used is such a common bite you in the ass later thing that it makes sense to always just do it from day one. To that end it’s easier to do day one if everyone agrees to a standard set of numeric codes.

Multihash is the standard set of numeric codes for different algorithms I am aware of.

Unifying here might allow git objects to be served natively over IPFS.

Just a quick note, while we still would really love to have git use multihash. You can already serve git objects natively over ipfs via: https://github.com/magik6k/git-remote-ipld

Which uses our new plugin system: https://github.com/ipfs/go-ipfs/blob/master/docs/plugins.md

> Generally, and this is just my gut feeling, I think that for any hash code written to disk or stored in some way having an identifier for the hashing algorithm used is such a common bite you in the ass later thing that it makes sense to always just do it from day one. To that end it’s easier to do day one if everyone agrees to a standard set of numeric codes.

Yes, that's the basic idea of all multiformats: "it's never gonna change" is considered harmful.

> Unifying here might allow git objects to be served natively over IPFS.

IPFS can already do that thanks to the CID format: https://github.com/ipld/cid

There's no good examples for Git specifically yet, but there's a good bunch of working code for transporting e.g. Ethereum and Zcash transaction blobs over IPFS. For Git it's in principle the same: import the raw object into IPFS, and starts addressing it with /ipfs/<git-cid><original-git-hash>

Something about multihash makes me worry it's a security risk. Like I worry that it encourages this mistake:

1. Define a new protocol with multihash somewhere in it.

2. Import a super convenient multihash library.

3. Verify all hashes with a simple library function.

That sounds super natural and convenient to me, but if it means that you support MD4 by default, then you've introduced a downgrade attack into your protocol.

You can lock it down to specific hash functions no problem.
If I’ve learned anything from being in this field it’s that:

  1) many if not most implementations will support lots of algorithms by default, and
  2) as a result, approximately zero users will lock it down
3) the users who do lock it down will be harangued about not being compatible with less secure versions barring a major incident
Yeah that's exactly what I'm worried about. The nature of the beast makes it tricky to define a safe default.