Hacker News new | ask | show | jobs
How SHA-256 Works Step-by-Step (blog.boot.dev)
197 points by kyleShropshire 1521 days ago
3 comments

Every time one of these is posted, I’m expecting the steps to explain why they are being done. Like what makes this combination of operations have the particular properties we need?
You're not wrong. You're 100% right. This is like posting a story about how a game engine works, and having it be just blocks of assembly language with no explanation other than "here's some code multiplying a matrix, and then...".
Geez, now I want to write an article about how a game engine works, explaining the rationale for every piece of it; I don't think I've ever seen somebody do that before. It's all kind of spread around in disparate bits and pieces, often quite hard to find (or even to know that you ought to be looking for it).

Problem is that such a work would probably have to be more of a textbook, in terms of length and impenetrability. And it's not immediately obvious who the audience would be, as the vast majority of game developers these days are using pre-existing engines, rather than building new ones. But maybe there's still enough conceptual underpinnings that are worth discussing which would still be relevant for Unity/Unreal/Godot folks.

I think there is a middle ground. You can give a bird's eye view of the way things are done and a hand-wavy intuition for why they are done that way. Analogies go a long way towards building the latter. Then for those who want to dive deep, you can leave pointers to articles that cover such things in detail.

It doesn't always have to be a choice between writing a book and nothing at all because the topic is too complex. I personally have benefited from bird's eye view blog posts and mini-articles more than I can remember. If you have the knowledge then go for it, someone out there will thank you.

I think with most posts and also textbooks the problem is scope. I think the Handmade Hero project shows quite well what an amount of work it is to explain it. And there is the difference between a generic game engine like Unity and a specific engine for a specific project. And I mainly mean platform targeting and the likes. But I would read a blog post series :)
I'm sure there's way too much variability to write a whole game engine and describe every detail as "this is the best way to do it because X" rather than "this is the way we did it because personal opinion/time/money/we had no clue how it works."

When it comes to the 3d renderer alone, there's at least enough common ground to explain the big ideas. Like "we multiply matrices because GPUs can do that very well in parallel and we describe our geometry as points because we can then describe the whole image using simple coordinate transformations which boil down to matrix multiplications."

Yeah, I agree that you'd never get anywhere with a "this is the best way to do it because X" approach; it's just too divisive and often there's no single "best" answer that can be equally applied to every project.

I feel like a comprehensive article would need to be more of a catalog of "here are the basics of the various major approaches and how they differ and why you might choose one over another", rather than (for example) advocating specifically for forward shading vs. deferred vs. forward+ vs. clustered as being the universally "best" option. (especially when that whole discussion has become kind of moot for the majority of modern games developers, who mostly just take their engine's default rendering pipeline and just use it)

I wrote something up in a previous post. I'm no crypto-expert. But I did study it a bit.

https://news.ycombinator.com/item?id=30248439

Obviously, any actual crypto-experts can feel to correct me if I got history and/or understanding incorrect here.

-----

Addenendum to my previous post.

1. Confusion -- Bytes should turn into other bytes in random-looking ways. For example, the byte 0x25 may turn into 0x88. Aka: S-boxes in 90s-era ciphers.

2. Diffusion -- Bit-changes should "spread" to as many bits as possible.

3. Invertible -- Invertible operations minimizes the loss of information. Encryption/Decryption must be invertible by definition, but even Hash-functions should be largely built out of invertible operations. Try to make confusion and/or diffusion steps invertible.

4. ADD / XOR / Shift / Rotate -- These operations are the more popular way to make Invertible Confusion/Diffusion functions today.

5. SBox + Galois Fields -- For AES (a 90s-era algorithm), SBox was the source of confusion, and Galois Field arithmetic was the source of Diffusion. I could explain why but that gets more complicated.

5. Testing -- Test your functions against linear cryptography (how is the input related to the output?) and differential cryptography (how is each input bit related to each output bit on a bit-by-bit basis?)

------

Obviously, hash functions (like SHA256) must be non-invertible by the end of it all. But you want to carefully think about where the source of non-invertibility comes from, and to minimize the loss of entropy/information at any particular step.

With these principles, its not very hard to make your own hash function. I'd suggest studying Bob Jenkin's "JOAAT" hash, just-one-at-a-time. Its a non-crypto hash, but it is probably one of the simplest hashes that uses the above principles: https://en.wikipedia.org/wiki/Jenkins_hash_function

> even Hash-functions should be largely built out of invertible operations

Why is that? I don’t understand why you should minimize the loss of entropy at any particular step. Is it to help resist collisions?

> I could explain why but that gets more complicated.

If you’d care to go into more detail, I’d love to hear it. It was my understanding that the AES S-box was the result of some nonlinear transformation of the input bytes.

> Why is that? I don’t understand why you should minimize the loss of entropy at any particular step. Is it to help resist collisions?

SHA 256 is built out of 64 rounds of an internal mixing function.

That is, SHA256 is basically a glorified mix(mix(mix(mix...(mix(internal_state))))))))))))))) function, with 64x mixes applied to the internal data per input.

Now consider how "mix" should be designed.

1. If "mix" is invertible, it means that every possible input "pigeonholes" to exactly one possible output. Mathematically, we'd call invertible functions "one-to-one" and "onto".

2. If "mix" is not invertible, then the above condition does not hold.

-------------

So lets look at the simplest non-invertible function for 3-bit numbers as an example. And see what happens when we use it as a "mixing" function.

    Input | Output
    --------------
    000   | 001
    001   | 010
    010   | 011
    011   | 100
    100   | 101
    101   | 110
    110   | 111 <----
    111   | 111 <----
Notice that 110 and 111 both "map" to 111. So what happens when we run mix(mix(mix(mix...(data)))) ??

Eventually, after about 8 rounds, all your data turns into "111" eventually. You're "losing" information. It means that everyone can predict that your hash probably will be 111 eventually.

Now lets consider the invertible function:

    Input | Output
    --------------
    000   | 001
    001   | 010
    010   | 011
    011   | 100
    100   | 101
    101   | 110
    110   | 111 
    111   | 000 <----
000 was "unchosen" in previous function, and now we have one-to-one and onto . invertibility. As such mix(mix(mix(mix(...(data)))) will 100% depend on the input data.

Of course, this mixing function is simply "(data+1) modulo 8", which is insufficient for mixing purposes. (The "confusion" is pretty bad and very predictable, but you can see how "add +1" is actually kind of good at diffusion when "all the carries" line up... such as 011 -> 100 is changing a lot of bits from one mix). But you can imagine that even in this simplest example, the invertible function leads to a less-predictable internal state than the non-invertible function.

---------

Functions should also be "nonlinear". The above x+1 mixing function is bad because its linear (5x mixing functions can be simplified into x+5). It turns out that combinations of XOR, bitshifts, and ADDs could be non-linear, but you need to run a bunch of tests over the inputs/outputs to be sure that your particular mixing function is sufficient.

What differentiates a non-crypto hash from a crypto hash. Is there a fundamental difference between the two which prevents one from being used for cryptographic purposes?
A non-crypto hash is run enough to confuse "simple" statistical tests.

A crypto-hash is run enough times to confuse "difficult" statistical tests from a dedicated adversary.

For example, the JOAAT hash is a simple add / shift / XOR hash run 1x round per byte. A singular round is all you need to confuse the "simple" statistical tests such as birthday attacks, dice rolls, etc. etc.

SHA-256 is also a simple add / shift / XOR hash, albeit using 256-bits at a time instead of 8-bits at a time. It is a 64-round hash function. That is: it is round(round(round(round(data)))) 64-times deep.

--------

If you wanted to turn JOAAT non-crypto hash into a crypto-hash, a good first step would be to run JOAAT(JOAAT(JOAAT...() maybe 64x or 128x over the data.

Obviously, you'd need to run cryptoanalysis over the end result, and also test for non-linearity and other such properties. But "large number of rounds" is probably the most obvious difference between non-crypto and crypto hashes.

---------

In the 90s, functions like AES were designed with "maximum confusion / diffusion per round". In just 10 rounds, the bits are mixed up enough that it defeats linear and differential cryptography.

I don't know which cipher "changed the meta", but maybe it was the XTEA cipher? Since XTEA was so simple, they knew it would need many-many rounds to defeat linear and differential crypto. So they just did 64-rounds and called it the end of it.

----------

So you can see, a "crypto" hash is effectively a regular old hash done an absurd number of times to defeat the most incredible statistical-tests known to man... done under the assumption that the "opponent" is performing the most difficult statistical tests.

Yes! We’d like our cryptographic hash functions to be collision resistant and preimage resistant.

That is, we’d like it to be hard to generate 2 different messages m1 and m2 where the hash of m1 is equal to the hash of m2, and we’d also like for it to be hard to compute any function of the message m(except the hash of m) if you’re given only the hash of m.

Non cryptographic hash functions don’t require these properties, and in fact some hashing algorithms used for data mining are designed to, for example, map near inputs to near outputs.

There's some good discussion in the comments of another "one of these" a short while ago: https://news.ycombinator.com/item?id=30244534
huh, I didn't post this here, but I'm the author and kinda fun to just see it show up. Thanks for the feedback, it's a good point. I'll be making those updates soon.
Question: why didn't you post it to hn?
Linear and differential cryptanalisis are the 2 classical way of breaking cryptographic hash functions, I think they are a cool way to learn about the importance of the current constructs being in use:

https://alldifferences.net/difference-between-linear-and-dif...

A simple way to look at them is this: if you change some specific bits in the input, maybe not all bits change by exacly 50% chance in the output, or they are not independent. For only a few rounds of S-Box-es you can probably find something like this by hand, for many rounds, SAT-solver or a special tool is needed.

Can you help me understand how one might use a SAT solver to find ways in which cryptographic primitives (or their components) deviate from ideal pseudorandomness?

I know what all of these things are, my intuition just isn't jumping to a way to formulate statistical correlations as a SAT problem.

You're better off reading the Wikipedia page which covers things like that (for instance, the Merkle–Damgård construction) in some detail and without all the oversimplifications and almost unavoidable wrongness of slight write-ups like this.
On the contrary, I just want the dumbed down version. I’m not ever going to implement it (nobody should RYO crypto unless you really know what you’re doing), nor am I going to have to rely on the knowledge in my daily life - I’m not a maths educator, and I’m certainly not a cryptographer.

It’s still good to understand the basics.

I learnt a lot about PKI by forcing myself to learn the absolute basics of modulo, but I’ll never implement it or do a full calculation.

This isn't really a 'dumbed down version' and the person I'm replying to specifically asked about the sort of things not covered in this version.
Exactly. Also SHA-256 is in the same family as MD-5. It would be nice to kind of go over how the family in general works, what weaknesses were discovered, and what changes were incorporated into SHA-256 that addresses these issues.
I’ve always been curious about whether it’s possible to have a collision when your input is 256 bits or less (in the case of sha256)… I even emailed Bruce Schneier. I got a polite response that he didn’t have time to look at it, which indicated to me he didn’t know “off the top of his head”, which I interpreted as “it’s not impossible”… but I still don’t know.
There are 2^257 - 1 inputs of 256 bits or less and only 2^256 possible hashes, so there must be a collision.
The input is always padded out to 256 bits, though, isn't it?
The input is padded to form a sequence of 512 bit chunks, but that doesn't change anything, padding is one-to-one so there's still the same number of inputs.
One way to think of hash functions is that the hash of any value is basically a random number. So we can consider what happens if the results were actually random, then what would be the probability of a collision?

Consider a function from [0, 2^256 - 1] to [0, 2^256 - 1]. That is it maps 256bit numbers to 256bit numbers. It could in theory be represented as an enormous table lookup.

Now how many such functions are there? Well there are 2^256 ways to map 0. And then 2^256 ways to map 1.. etc. We have 2^256 inputs we need to deal with each of which can give one of of 2^256 results. That turns into (2^256) ^ (2^256)

Now, how many collision free functions are there? In this case there are 2^256 ways to map 0, but then we have to pick a different number for 1, so there are only 2^256 - 1 possiblites. Then 2^256 - 2 etc... It becomes (2^256)! where the exclamation mark is factorial.

So the probability of no collisions is (2^256)! / [(2^256) ^ (2^256)]. It may not be obvious but that's a very small number. A little bit of intuition: let's say we fill out the values of our gigantic table by hand. Let's assume that by some miracle we filled out the first half without creating any collisions. Now that means we used up half the possible outputs. So every cell we fill out from now we have a 50/50 of creating a collision. And the further we get the more numbers are used up.

Now sha256 is not a randomly chosen function. But without any evidence either way, we might suspect it does have collisions.

To give some further context, I stumbled across this thought while reading how “community ids” are calculated. Community ids are commonly used to simplify joining/lookups for network security tools (suracata, zeek). They essentially concatenate the “quad tuple” (src ip/port, dest ip/port), and a “seed”, then run sha against it. I didn’t entirely understand the reason the authors chose sha (other than being security people who might have just reached for a crypto secure hash function). SHA is slow vs something like xxh, and given the number of sessions these things process, seemed like overkill. Further, it’s unclear to me what’s gained by using sha vs xxh or simply concatenating the bits. Then I started wondering about the downsides: whether it’s possible to have a false correlation because two sessions yielded the same sha digest.
What sort of collision? Or asking in a different way, what would you consider a collision?
2 messages producing the same digest. That’s actually a really interesting question.
I think he means a set-theoretical collision, not a feasible collision attack. I.e. "is SHA256 a bijection on 256-bit integers?"
AOL. It's like the articles that would explain how unix software by explaining in detail how to ./configured;make;make install and leave out everything specific to the software in question.

I can explain one small part, though. A lot of these things need some fixed constants that have to be not all-zero, not all-one, not easily predictable and not someone's backdoor. So people will say things like "we need five ten-digit numbers here, so we'll use the second to eleventh digits of the square roots of the first five primes".

That is, the use of the square root has nothing to do with being the square root of two, there's no deep math. The square root of two is just some number that you won't suspect of being a backdoor.

https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number

Came here to ask the same. I enjoyed the blog, yet as a non-cryptographer, I'd like to see the reasons behind the actions: e.g. Okay we right rotate that input, but why are we doing this? Or Why are we taking cube roots of 64 primes? etc.
> Why are we taking cube roots of 64 primes?

https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number

That was helpful. It'd be great if there are tips like these (even Wikipedia links like this is very helpful) are also present in the posts like the blog post.
The wikipedia article spells it out pretty clearly, and the pseudo-code is easily translated into any language, here it is in JavaScript if anyone is interested:

sha-256: https://github.com/jeffallen6767/sha-256-js/blob/master/src/...

Once you understand the concept that most of these algos are simply dividing the input into computer-friendly sized blocks and then stacking and manipulating these bits in 3d space like a rubics cube, then the whole thing becomes a bit easier to understand.

Here's a few more for comparison:

sha-1: https://github.com/jeffallen6767/sha-1-js/blob/master/src/sh...

md5: https://github.com/jeffallen6767/md5-js/blob/master/src/md5....

keccak: https://github.com/jeffallen6767/keccak-p-js/blob/master/src...

side note, I also ended-up implementing the keccak algo in c for open cl usage, because I wanted to see if I could use it in parallel on my graphics card from nodejs ( spoiler: it's indeed possible ):

https://github.com/jeffallen6767/chain/blob/master/src/minin...

Please forgive my terrible programming style, this was done years ago...

> Append a single 1:

Just begs the question, why?

AFAIK the 1 bit does nothing. Padding with 1+(number of 0s)+(length of message) is traditional though and used by MD4/MD5/SHA1/SHA2.

You can read in a few places that this is the padding proposed by Merkle, but I looked it up and in my reading of One Way Hash Functions and DES he does not use the 1

> As an aside, we note that x is padded with 0's until its size is an integral multiple of the size of the convenientlySizedChunks. Note that padding with 0s might introduce some ambiguity about the exact value of x that is being authenticated, as discussed by Juenernan[4]. The message "010110" padded to 8 bits would be "01011000" and it is now unclear how many trailing 0's were present in the original message. Several methods are available to resolve this ambiguity. We select a method that will also be useful in resolving a further problem: we append the length of x, in bits, at the end of x. To make this additional "length" field easy to find. we shall right justify it in the final block. If the length field won't fit, we add additional blocks to the end of x.

If your input ends with a 0 and you don't append the 1, you have no idea if that 0 is part of the padding or input
Yes you do, since you still have the length field.
Good point, I never thought about it that way. The reverse is also true: The length is probably unnecessary, since the "1" is there. I suppose the length makes it harder to produce collisions with inputs of different lengths, but in practice I don't think anyone actually tries to do that.

If I had to take a wild guess, I'd guess that some early design in the family used only the "1", and that the length was added later?

Padding without the length isn't suffix-free, ie. there's two different messages x and y with pad(x) a suffix of pad(y). You want that basically because if there's ever a collision part-way through the loop on two inputs, if there's a common suffix, the rest of the loop will be the same so there's no chance to "escape" the collision.

Being some kind of artifact seems plausible.

> Padding without the length isn't suffix-free

Can you give a more concrete example? Specifically with padding with 1 then all 0s without length. Appending the 1 then all 0s is supposed to prevent collisions.

The length suffix is used as an early attempt to avoid length extension attacks on MACs of the form H(secret|M). However, we've later seen that it's not sufficient as it's easy to determine length of the secret by trial and error. This eventually led to the creation of HMAC H(H(secret^opad)|H(secret^ipad|M)).

In theory, the length suffix is no longer needed (or the "1" suffix but we save more space by removing the length). Maybe a cryptographer with more history knowledge can explain this but personally I think it's now one of those "don't fix what's not broken" things. It doesn't hurt security and it's already been thoroughly analyzed (and hardware optimized) so we just leave it.