| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _prometheus 3395 days ago

Hey everyone. The SHAttered break made us release a new website for Multiformats: http://multiformats.io which includes a page explaining Multihash -- http://multiformats.io/multihash --

The page goes through a bunch of examples of how a multihash would work in practice, why it is useful to do this, and goes through some FAQs.

I'm writing a post about all this that I'll push out soon, but wanted to drop the site here now.

Please, as you make new systems or start to upgrade your systems out of SHA1 (or MD5!!!), please use Multihash so that your upgrades will be much, much easier in the future. This makes rolling up a matter of recompiling some tools, and by far not facing the horrible breakages that occur when tools and scripts assume certain things (like the hash digest of git will always be 160bits... now git faces breaking a lot of things and may have to stick with 160bits of a truncated SHA3 or BLAKE2...)

Oh also-- definitely check out BLAKE2, it's a fantastic hash function from Zooko, JP Aumasson, Samuel Neves, and Christian Winnerlein -- https://blake2.net -- use it for all your hashing needs! So much faster than SHA2, and has the same "unbrokenness" that SHA3 enjoys. (And, I believe deep cryptanalysis has gone into BLAKE, Keccak is not particularly safer)

The Multiformats site also has a Multiaddr description, but that's far from complete and doesn't have the examples. The other multiformats arent well explained on the website yet. Sorry, coming soon :) we wanted to get this out ASAP -- PR's accepted at https://github.com/multiformats/website (oh and yes, TLS certs coming too)

3 comments

JoshTriplett 3395 days ago

How would multihash handle parameterized algorithms like siphash, which take multiple arbitrary parameters? Adding every combination of "number of rounds" and "number of finalization rounds" as a separate table entry seems problematic.

hobofan 3395 days ago

As long as the algorithm is sufficiently described, the parameters can be part of the body. So instead of `<hash-type><hash-result>` it would be `<hash-type><hash-rounds><hash-result>`.

_prometheus 3395 days ago

@hobofan Yeah, exactly. We've explored this by making the value of such hashes (when generated and checked with multihash libs) literally be the `<hash-rounds><hash-result>` pair. This avoids complicating the table or multihash model/implementations. We can just redefine the hash function verify to use the first bits as the round, etc. We haven't done this yet, but we may.

_prometheus 3395 days ago

See https://news.ycombinator.com/item?id=13740369 -- does that make sense? the idea is to treat those parameters as "part of the hash digest value", and write wrappers for siphash and functions like it, s.t.:

  mh_siphash_digest(input, length, rounds) {
    digest := siphash_digest(input, length, rounds)
    mhdigest := concat(rounds, digest)
    return mhdigest
  }

  mh_siphash_verify(input, mhdigest) {
    mh := multihash.parse(mhdigest)
    rounds := mh.digest[0]    # the rounds
    expected := mh.digest[1:] # rest
    actual := siphash_digest(input, mh.length, rounds)
    return expected == actual
  }

JoshTriplett 3395 days ago

Makes perfect sense, thanks.

mtdewcmu 3395 days ago

It seems like it might be better to put the metadata at the end. That would make it easier to truncate to a certain amount of entropy. Also, it would make it possible to trim off the metadata without having to understand the format to figure out the variable length.

panic 3395 days ago

What are you going to do with the hash after you trim off the metadata if you don't understand the format?

mtdewcmu 3395 days ago

I've dealt with variable-length headers on (binary) things before, and it creates annoyances. It screws up the memory alignment, for instance.

IanCal 3395 days ago

I have two minor, but useful, uses:

One is that the prefix is probably unique for small sets. So it's both easy to read and parse and then use just a prefix for doing something like stopping a docker container.

Second is that a short prefix on a larger set of data is probably not unique but at a rate I can predict. I can take a random, and repeatable, sample of data by selecting everything with the same prefix. This gives me a fast way of taking a random sample from a database (or file or anything else).

mtdewcmu 3395 days ago

Good question. There are instances where it allows you to write cleaner (and faster) code if you can identify the parts without necessarily having to parse the metadata.

mtdewcmu 3395 days ago

You could probably omit the length, because the length is probably already known implicitly. E.g. git knows the length of its hashes without having to read the hash. If the length < the full hash, it can be assumed to be truncated.

cmrx64 3395 days ago

Some hashing algorithms have a configurable output length. You need to encode the length, or have it as part of the type, and it's more uniform to just have it separate from the type.

mtdewcmu 3395 days ago

I guess it depends on how precious space is. A naked hash is very information-dense. In certain applications, inserting a prefix of several bytes to each hash makes a difference. OTOH, if the hashes end up being inserted into a table in MySQL, then space is probably not that precious.

_prometheus 3395 days ago

Attackers can exploit that by cutting off a stream. it's best to be explicit.

mtdewcmu 3395 days ago

Hashes don't currently know their own lengths, so I don't see why they'd need to.

matt_kantor 3393 days ago

Putting this kind of metadata inline with your hashes is the entire point of multihash:

> Multihash is a protocol for differentiating outputs from various well-established hash functions, addressing size + encoding considerations. It is useful to write applications that future-proof their use of hashes, and allow multiple hash functions to coexist.[0]

[0]: http://multiformats.io/multihash/

Kubuxu 3395 days ago

In case of multihash length field allows for two things, 1. truncation; 2. you don't have to know the hashing function to transparently pass it through buffers and/or compare.

ctz 3395 days ago

Danger: truncation shouldn't be done by just chopping off the end of a hash; they should be different hash functions with entirely different images.

Compare, for instance, multihash's treatment of SHA512 truncated to 256-bits, and the standardised hash function SHA512/256. The paper which introduced SHA512/256 even gives a generic way to safely truncate SHA512: https://eprint.iacr.org/2010/548.pdf section 5.

Multihash's design begs implementations to validate these things by allowing the sender to arbitrarily truncate them. That's bad.

_prometheus 3395 days ago

Yeah this is a good point, which we discussed somewhere (not finding the issue atm). The resolution was to treat those as different hash function codes, because the normal usage of truncating hashes by chopping off bits is extremely common. We had direct use cases from past experience trying to help old systems that did things like take a sha2-512, trunc to 256bits and use instead of sha2-256 (for the speedup on some archs). So we saw the need for _literally_ a different size of the exact same function.

When the functions encourage it, we support the addition of the specific different constructions (if named): https://github.com/multiformats/multihash/blob/master/hashta... -- we did a silly thing with Blake2 where we imported all the valid numbers. (this is suboptimal in table space, but super explicit).

Are there other functions you think we should add for the sha2-256/512 set?

> Multihash's design begs implementations to validate these things by allowing the sender to arbitrarily truncate them

If the sender is manipulating the hash you get (i.e. changing the length prefix counts), you're already in huge trouble. They could change the code and the value too. The threat model here is that the hash you have cannot be altered by the attacker. If the attacker manages to truncate a stream to get you to think it's a shorter hash, the attack fails as you have the length to tell you what you should be expecting. (again, the crux here is that if the attacker can change the length bits they can probably also change the function and own you anyway)

Also-- as noted in https://github.com/multiformats/multihash/issues/70 -- we should make implementations allow clients to lock the hash function and length combinations they want to use, so that attackers cannot manipulate those parameters.

mtdewcmu 3395 days ago

You could be a little bit opinionated and limit the choice of formats to a small set of ones that are a good idea. That would encourage good design and it would allow you to shrink the metadata. Perhaps one byte would suffice. It's not necessary to support every algorithm/length under the sun.

mtdewcmu 3395 days ago

I'm thinking that it might be useful to include a magic number, so that you can distinguish a Multihash from a plain hash.