Hacker News new | ask | show | jobs
by enedil 2435 days ago
There's another problem, if the author tells us that this is crc, while it's a MAC (not particularly strong tbh), it shows that we, as developers and users, cannot trust that cryptographic decisions of the author are of any consideration. For reference, crc is not a signature, its full name is "cyclic redundancy check", which is a really simple mathematical operation, that computes a specific remainder that comes from polynomial long division (of polynomials decided from the message). It is not any kind of cryptographic signature and given a string, it's crc and target crc, we can find another string that differs in one byte only, that has the target crc, all of this in time faster than computing whole original crc.
3 comments

Not one byte, but a number of bytes equal to the length of the CRC (typically 4).

But yes, people saying "CRC" when they really mean "checksum" or "signature" is a pet peeve of mine, and I treat it as a code smell. CRC has a precise defined technical meaning.

For the curious: CRC is linear with respect to XOR. This means that if you XOR two equal-length strings, the CRC will be the individual CRCs XORed together. It's also a bijection for messages of length equal to the polynomial (typically 32 bits): every input maps to a distinct output and vice-versa. Together, these mean it's trivially invertible for a message of length equal to the CRC: the CRC function can be represented as a square matrix over GF-2 (i.e., bits + XOR), which can be inverted with standard Gaussian elimination to produce the inverse function: generate a (short) string from any CRC.

More fun CRC tidbits: it's not a given that two arbitrary CRC functions, or a CRC and another linear(ish) checksum (e.g. ones' complement), are linearly independent. This can bite you when you try to use two different CRCs to get "more" error protection, or when you use "independent" CRCs to route work at two different points in a system. Again this is easy to verify using linear algebra (just check that the GF-2 matrix formed by the concatenation of your two functions is of full rank).

I've done it for people with any level of programming. People have already got used to that "CRC" is "checksum". But I will think about it.
Your are correct in the "trust but verify" methodology, but the developer is Russian so it could be a translation error
>we can find another string that differs in one byte only, that has the target crc

reading that has me curious; if one can do that for any arbitrary string, and then iterate that process, doesn't that stand to reason that with enough work, one could make any two given strings calculate out to the same crc? I guess maybe not because that one byte constraint isn't specified as far as where it occurs and whether it's an insertion, deletion, permutation etc, but the way I see it if you can do it with one string to another, you could likely keep chaining that indefinitely and get countless strings that come to the same crc.

sorry if this is old news or anything I was just struck by that thought while reading your comment

At Sun we had a server with a flaky NIC with very high error rates, but worse, occasionally would introduce two one-bit errors in packets such that all the CRCs involved were defeated. This actually caused some version control history corruption. I forget the details though. Maybe some other ex-Sun'er can chime in with the details.
The amount of bits you need to modify is not “one byte” but the same as the degree of used polynomial.

And producing two strings with same CRC is trivial to the extent that it is how you are originally supposed to use the algorithm. Notice that CRC-32 of every valid ISO9660 filesystem is 0xffffffff ;)

Yes, sorry for the lack of scrupulosity. Of course, it might happen it suffices to change one byte, especially that this byte can be at any position.
It can be done with any byte. In CRC, you encode your input to a polynomial somehow, take an agreed-upon polynomial (dependent on the type of CRC, not the input data) - call it P. Compute the remainder of the input when divided by P, encode it, and that's CRC. So to get any desired CRC, you just need to solve a linear equation (in quite a lot variables, but it doesn't really matter). All in all, it's really easy.