| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by AlexSW 1665 days ago
	The output length/size of a hash function is fixed, whereas it takes an arbitrary-length/size input.

1 comments

didericis 1665 days ago

I know it's normally fixed, but I did a quick google and saw a stack overflow answer saying there are some algorithms that allow for variable length outputs: https://crypto.stackexchange.com/a/3564

That doesn’t necessarily mean you can figure out what length output for a given input is needed to make it one to one. Not sure you could avoid collisions even if the length of the output was infinite, but I’m assuming different inputs have to have outputs that diverge at some point.

link

SAI_Peregrinus 1665 days ago

The variable output length functions are called eXtensible Output Functions (XOFs). An XOF isn't a cryptographic hash function, though they can be very similar (and can have an identical internal function doing the work).

We use different words for Cryptographic Hashes, Password Hashes, XOFs, MACs, and (non-cryptographic) Hashes because they do different things and have different security properties. Misusing the terminology makes reasoning about what is meant difficult.

link

dchest 1665 days ago

They all have a fixed-length internal state and will have internal state collisions regardless of the output size. (Basically, they "consume" input into the state and then "expand" the resulting state into the output.)

Suggested reading: Sponge Functions - https://keccak.team/files/SpongeFunctions.pdf

"Informally speaking, a random oracle* maps a variable-length input message to an infinite output string. It is completely random, i.e., the produced bits are uniformly and independently distributed. The only constraint is that identical input messages produce identical outputs. A hash function produces only a fixed number of output bits, say, n bits. So, a hash function should behave as a random oracle whose output is truncated to n bits."

* https://en.wikipedia.org/wiki/Random_oracle

link

7steps2much 1665 days ago

You are correct, at least in theory.

Assume you have two inputs, A and B.

A may hash to: A38uT75kjGz B may hash to: A38uHso629t

So yes, if you were to cut these off after the A38u then you would no longer be able to say for sure if you hashed A or B to arrive at your hash.

Of course in practice this usually isn't a problem as long as you have "a long enough" output.

link

bitkrieg 1665 days ago

Your example made me wonder, is there a known instance from a common hash algorithm where the input results in exactly the same string representation of the output hash? Eg. "AE485D" hashes to "AE485D". Is this even mathematically possible?

link

morelisp 1664 days ago

The mathematical term for this is a "fixed point", where f(x) == x.

Assuming a perfectly random uniform distribution, the usual desirable property of a cryptographic hash - the probability of a hash function not having a fixed point (that is, hashing at least one x to itself) is (1-1/n)**n, where n is the number of possible outputs. As n approaches infinity - which it does pretty rapidly in this case, since we're talking about 2**32 to 2**512 in practice - this approaches 1/e, or about 37%.

So, not only is it possible, but most "good" hash functions (63% of them) will have them.

link

kevinventullo 1665 days ago

Java’s built-in hash function for integers is the identity function.

link

charcircuit 1665 days ago

The modulus function has this property.

link

charcircuit 1665 days ago

Those are XOFs (extendable output functions), not hash functions.

link