Hacker News new | ask | show | jobs
by zamadatix 733 days ago
Loved the interactions and flow overall but I'm a bit lost on the zero knowledge proof example. I'm familiar with the concept but I don't follow how the example is one. E.g. "By repeating the process enough times, the probability that you never catch me becomes smaller than, say, getting struck by lightning" doesn't seem to show it's a proof? If I pick a hundred numbers it'll look like I just proved some black box function which happens to be Sin[n] + 0.999999999999 is always positive even though I'd be able to clearly show it negative with the knowledge of the function.

It feels like something that got detached from the things that make it work during simplification. Or it could be that I just have a misunderstanding/oversight in the zero knowledge proof :).

In an unrelated note: I colored the larger graph and it didn't even play along!

5 comments

Very glad you enjoyed it!

For the ZK example, the math behind it is this: if there are m bordering regions and I am lying, you have a 1/m chance of catching me each time. Thus after k repetitions the chance you haven't caught me is (1-1/m)^k \approx e^{-k/m} which is extremely small for k sufficiently larger than m.

Now, you may rightfully say: hey that's still not a "proof," you could still be lying! There are two responses to this:

1. The probability can be made incredibly small, like smaller than the the chance, say, your computer got hit by a gamma ray burst that would flip bits from 0 to 1 (I really have no idea if this actually happens but people have said it to me).

2. It turns out it is mathematically impossible to get the zero knowledge property if you want true proofs (i.e., no probability of being wrong). So, there's a trade off: if you want zero knowledge, you have to accept some (small) failure probability

P.S. Adding an easter egg for coloring the larger graph is on the todo list :)

Yeah, I got tripped up by that formulation as well and it's actually something that annoys me with a lot of algorithms that have some properties proven in a limit: It's "easy" (or at least possible) to mathematically prove that in the limit of some variable, the property will hold: If you repeat the challenge increasingly often, the probability of being lied to will get arbitrarily close to zero; for sufficiently large input sizes, some algorithm runs in linear time; with sufficiently large amounts of training data and iterations, some prediction error will become arbitrarily small, etc etc.

But none of that is telling you how much is "sufficient", or even which order of magnitude we're talking about. If the quantity has a real life cost, this would result in enormous practical differences.

(With the formula you have given for the ZK proof, we're at least one step further: You can start with the desired probability, e.g. the gamma ray burst und calculate the required minimum k from that - also, it's easy to see that the color problem lends itself well to such proofs because the probability of failure drops exponentially quickly with growing k, so the actual k you choose can be relatively small. But if all you have is a proof in the limit, that's not possible)

The problem with doing this on a computer is getting us to believe you didn't just make up the colors as we tell you to reveal them (after being “dishonest” before).
That's the idea at the end about presenting the "sticky notes" as products of primes. Assuming you can't factor the primes yourself, you can be given the whole grid of those products and then interactively ask for the factors or a pair of them. The requestor can't give an alternative factorization (ie. make up a color on the spot) since each number can only be factored one possible way and its easy to verify.
I really liked this part that shows all the numbers up front, and none of them change during the reveal step.

I think it would present better if introduced as "To show that there's no cheating going on behind the scenes, we will..."

Yeah I agree it could be presented clearer. Maybe make the analogy of multiplying the factors together and "covering with a post it" more explicit
You're right. That does cover it. I was playing with my kid and I didn't get it at first.

I might use smaller factors then.

Thanks for the explanation, it seems the definition was slightly different than I assumed it to be previously and that was my missing link to it all making sense. Thanks also for the demonstration to share this info!

Looking forward to the easter egg :)

OP: Although I'm not really in the target audience for this demo (I already knew all the punchlines), it does occur to me that it might be helpful to readers like the parent-commenter — and even perhaps thought-provoking to us know-it-alls — if you provided a mode at the end of the demo where the graph was in fact not three-colorable, and the computer actually would lie about its being three-colorable. So it would generate a "three-coloring" with a flaw somewhere, and display its representation as products of primes, and you'd get to choose two adjacent products and receive their factors... and so you could see for yourself exactly how long it took for you to luck into catching the computer in one of its lies.

And the demo could also tell you the expected number of iterations to catch the lie with (50/90/99%) probability. It'd be a pretty large number even for such a small graph, I'd bet.

(Of course the computer could also lie about the factorizations, since it's unlikely a human would bother to catch it in that kind of lie; but let's assume it doesn't ever do that.)

Readers might also be interested in the https://mathworld.wolfram.com/McGregorMap.html (reported, on 1 April 1975, to require five colors!)

For me it was less about the idea of how likely you'd quickly it converges to you almost certainly outing them vs misunderstanding the idea that a zero knowledge proof is about, more or less, the "limit" of the validation behavior to an arbitrary point choosable by the tester not necessarily an actual guarantee you can finitely reach the conclusion.

Prior to this I'd only seen "proof" in math where it has meant you can absolutely guarantee there to bo no counterexample not just that it seems impossibly unlikely there could be a counterexample. E.g. the Tarry-Escott problem where we have proof there is no sets exists with n=4 and m=5 even though we haven't ever found numerical values of sets matching that description or Merten's conjecture where the smallest counterexample is estimated to be so large (~10 billion digits) we've not even been able to find the first counterexample value despite knowing it exists due to a proof. On the other side of things we have things like the Goldbach conjecture or Riemann Hypothesis where we've poured our hearts, brains, and souls into trying to find a counterexample or proof and don't claim to have either yet.

Adjusting to that definition of "proof" for the context it all makes a lot more sense now.

I've got another problem about this zero knowledge proof. The digital version doesn't make a lot of sense to me. It depends on the fact we don't have a fast integer factorization algorithm. But integer factorization is not proven to be NP-complete, and 3-coloring is NP-complete.

So isn't it possible that there is a polynomial time algorithm for integer factorization, but no polynominal time algorithm for 3-coloring, and therefore the "zero knowledge proof" actually reveals the answer?

I think you're right, and integer-factorization is often used in these examples as a process that is hard to do but easy to verify. There are plenty of other processes that could be substituted in, e.g. reversing SHA256 hashes, that would likely be even less tractable to the target audience.

However, if P = NP, there is no process that works here - there's nothing that is hard to do but easy to demonstrate, and therefore no zero knowledge proofs exist.

Actually, that's not true either. It requires the definition that all polynomial-time algorithms run quickly and all superpolynomial ones run slowly. This is not an accurate definition for all practical problem sizes and this is where the analogies all break down. Polynomial vs nonpolynomial is more interesting to complexity theorists than "how many years would this actually take with a fast computer".

> However, if P = NP, there is no process that works here - there's nothing that is hard to do but easy to demonstrate, and therefore no zero knowledge proofs exist.

IIRC technically, there are zero-knowledge proofs for all statements in P: the proof is "prove it yourself", which the verifier can do because it's in P.

> E.g. "By repeating the process enough times, the probability that you never catch me becomes smaller than, say, getting struck by lightning" doesn't seem to show it's a proof?

That's fine though, because the point isn't really to publish math papers without disclosing proofs. For example, presenting a valid digital signature is sometimes colloquially called a proof that you had the private key, even though there is 1 in gazillion chance that you didn't. For such practical tasks, very high chance tends to be good enough.

I think the answer is that each time you reveal the colours, you observe that they are within the set of three colours illustrated at the beginning of the proof. Whichever you reveal, you never find a fourth colour.

This confused me at first.

For anyone confused by this response I had edited my comment after reading https://news.ycombinator.com/item?id=40740557 but before equivalence had hit reply and now their reply is left hanging. Sorry esquivalience! To summarize the linked answer on trusting the second dot isn't just randomly assigned: keep the context as physical post-its. Barring something like a matter bending psychic you'd be able to tell the dot under the second post-it was swapped as you made your pick.

That still leaves how to rely on chance of picks for a proof though.

It's the same thing as limits in spirit.

It's not that the chances of lying are small, it's that they can be made arbitrarily small.

Let's say my standards of "proof" are that there's only 0.1% chance that you're cheating. We play that game several times, and I'm satisfied.

Next comes someone else whose standard is 0.001% chance of cheating. They simply play the game a few more times, and they're satisfied too.

If they change their mind and decide that only 0.0000001% will make them happy, they simply tack on a few more rounds.

The key here is that the probability that you can cheat for arbitrarily long is exactly zero — for the same reason that Zeno's paradox is resolvable (and limit of 1, 1/2, 1/4, 1/8, 1/16, ... is exactly zero, and not just a very small number).

Great description in that "proof" in this context is more referring to the limiting behavior and being able to get to your desired level of arbitrary happiness than necessarily providing a traditional "proof" about it being a certainty within a finite amount of estimation. Thanks.
Thanks for the feedback, glad my comment was helpful!
Reminds me of different colored swans