Hacker News new | ask | show | jobs
by rvanlaar 1485 days ago
It will catch random webmasters.

In short, Dutch hoster TransIP had to pause their scanning service. They got hashes that included default images from WordPress.

"Waar het misging, is volgens TransIP-woordvoerder Marco Edelman dat per ongeluk hashes van standaardafbeeldingen van WordPress-installaties en plug-ins zijn toegevoegd." [1] https://tweakers.net/nieuws/182766/transip-pauzeert-hashchec...

3 comments

With the high-likelihood of false positives, this is just another Child Benefits scandal waiting to happen if the system were to be completely unsupervised.
What's so worrying about this is that these stock images were added there in the first place. Clearly the process is completely broken, if there was any kind of human oversight this would have been caught right there.

Right now it was clear because it was an often-used package. What if it's something more niche next time?

It uses the MD5 of the image, no surprise there that there were collisions with ordinary files. I've run into MD5 hash collisions causing mayhem several times in my life (usually between unrelated email addresses, IIRC).
You found 2 actual emails with the same md5? That sounds very unlikely. MD5 is weak against attack, but you're not going to be hitting collisions with "normal" data and especially not short strings.
Just looked over my notes. It was not email addresses, but a collision of two separate UPC codes concatenated with a timestamp. We never figured out which UPC codes it was or the timestamp (this was picking/sorting software for a warehouse in 2012). I wasn't there when it happened, and only heard about it after the fact (I was on my honeymoon). It crashed the software pretty hard.
That would be interesting since that would be likely the shortest known MD5 collisions. The shortest ones known are 512 bits differing by 2 bits.

If I had to guess, you might have been hitting collisions in a truncated MD5 hash. It's not uncommon to do something like take only the first 64 bits of the hash. Doing this you are quite vulnerable to birthday problem issues.

It was in php using the built-in md5 function. I’m actually reaching out to a friend that still has the code to see what the exact string is that got hashed.

I have no idea if the database is still somewhere, if so, we can look at the git commit and figure out what the possible UPCs are and narrow it down to a couple of hours worth of time stamps. I have a feeling it would probably be more than 512 bits though. UPCs are can be longish, plus a time stamp with each digit being a byte…

I highly doubt the database exists in any meaningful way though, since the product was shut down years ago.