A proper cryptographic hash function is a one way function, if they exist.
But I'd frame your question the other way around then. You do not want to store the emails in a form that leaks any data. For that we need a compressing function. My (unwritten) assumption was that if the adversary compromise the system to get the data, they'll get any secrets too. This means that a HMAC is no better than a cryptographic hash function.
I know that is quite possible to create a system where this would be significantly harder than just a DB dump. But that is both significantly more difficult, and expensive. I'll admit that the formulation "provable the best we can do" should've had a big fat asterisk with the disclaimer about the threat model.
So, if an attacker have the data set and secrets, and wants to compute if a particular input is a member of this set. Can you do better than a cryptographic hash function?
But I'd frame your question the other way around then. You do not want to store the emails in a form that leaks any data. For that we need a compressing function. My (unwritten) assumption was that if the adversary compromise the system to get the data, they'll get any secrets too. This means that a HMAC is no better than a cryptographic hash function.
I know that is quite possible to create a system where this would be significantly harder than just a DB dump. But that is both significantly more difficult, and expensive. I'll admit that the formulation "provable the best we can do" should've had a big fat asterisk with the disclaimer about the threat model.
So, if an attacker have the data set and secrets, and wants to compute if a particular input is a member of this set. Can you do better than a cryptographic hash function?