| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by korbin 724 days ago

SHA256 is designed as such that the maximum amount of data that can be contained within a single block is 440 bits (55 bytes.)

If you carefully organize the nonce at the end and use all 55 bytes, you can pre-hash the first ~20/64 rounds of state and the first several rounds of W generation and just base further iterations off of that static value (this is known as a "midstate optimization.")

> If you limit your variable portion to a base16 alphabet like A-P

The more nonce bits you decide to use, the less you can statically pre-hash.

In FPGA, I am using 64 deep, 8-bit-wide memories to do the alphabet expansion. I am guessing in CUDA you could something similar with `LOP3.LUT`.