You found 2 actual emails with the same md5? That sounds very unlikely. MD5 is weak against attack, but you're not going to be hitting collisions with "normal" data and especially not short strings.
Just looked over my notes. It was not email addresses, but a collision of two separate UPC codes concatenated with a timestamp. We never figured out which UPC codes it was or the timestamp (this was picking/sorting software for a warehouse in 2012). I wasn't there when it happened, and only heard about it after the fact (I was on my honeymoon). It crashed the software pretty hard.
That would be interesting since that would be likely the shortest known MD5 collisions. The shortest ones known are 512 bits differing by 2 bits.
If I had to guess, you might have been hitting collisions in a truncated MD5 hash. It's not uncommon to do something like take only the first 64 bits of the hash. Doing this you are quite vulnerable to birthday problem issues.
It was in php using the built-in md5 function. I’m actually reaching out to a friend that still has the code to see what the exact string is that got hashed.
I have no idea if the database is still somewhere, if so, we can look at the git commit and figure out what the possible UPCs are and narrow it down to a couple of hours worth of time stamps. I have a feeling it would probably be more than 512 bits though. UPCs are can be longish, plus a time stamp with each digit being a byte…
I highly doubt the database exists in any meaningful way though, since the product was shut down years ago.