Zero Knowledge Proofs
So one day at uni I was taught about a thing called a zero knowledge proof. We were told at the time - "You probably won't ever need this, but it is cool so you are going to learn it." or something along those lines. Ever since then I've been trying to find an excuse to use one, and I finally succeeded the other day at work.
Basically, I had two parts of a program, and they each wanted to make sure the other part was there, and not some evil imposter program. There are boring ways to do this (A sends B a random string, B signs the string and returns it then vice versa) but, because they are boring (and vulnerable to a man in the middle attack) I didn't use them.
Instead, I generated a large graph (as in dots joined by lines, not bar-graph) where I knew a way to start from dot, and by following the lines, I could get to every other dot exactly once and return to where I started (this is called a Hamiltonian cycle). I then put the description of this path(*) in one of my programs (which will be called the prover).
When the other program (provee) wanted to check that the prover was actually the prover, it would make sure the prover knew what the path was. But, the prover couldn't just say what the path was, because if the provee was actually some jerk program just pretending to be the provee, so then it later can pretend to be the prover then we would have given up our secret.
So, instead of directly telling what the path was, the prover woud send the provee a new graph, one which has the same structure as the original graph (so it still has a Hamiltonian cycle) but where all the labels are different, so it isn't obvious which dot corresponds to which dot in the original. The provee then asks the prover either - "how did you get from the original graph to this?" or "where is the Hamiltonian cycle in this new graph?". It is only allowed to ask one of these questions, and the prover will only answer one question. The prover can easily answer both questions, because it knows how it relabelled the graph itself, and if it just relabels the dots in the secret path, then it can answer the second question too. Then the prover sends a new graph, and they repeat the process as many times as they want (each time increases the certainty - at first it is 1/2, then 3/4, then 7/8 etc).
This relies on two things being hard:
- Finding such a cycle.
- Finding how you got from one graph to another (called finding an isomorphism).
If you can do either of them easily, then the security of this whole thing falls apart.
So the first thing is - why is it called a zero knowledge proof? This type of proof doesn't tell anyone except for the provee anything except for the fact that the prover knows the cycle. Anyone else who listens to them talk will not have a satisfactory proof (the prover and the provee might have conspired before, so the provee only asks questions the prover will know the answer to) and the provee can't find out what the cycle was, because they would have to find the relabelling in the graph (which is hard).
So next, how hard is it really? Obviously that depends on the size of the graph.
So to get some idea how hard this really is, I wrote a program which would try to find a hamiltonian path in a random graph. It is slightly smarter than just trying every possible path, but it is by no means the best (it is especially inefficient since it is written in ruby).
I was actually expecting my program to be much faster than it was. It is a randomised algorithm (it picks a random path until it reaches a node that it has reached before, then it reverses the direction of the nodes found in the loop it just found, and keeps trying. Every now and then it restarts the search.
So there is a chance that it will be lucky and find the solution straight off, or it could be very unlucky and never find the solution.
My very unscientific results:
A graph with 50 nodes and 50 edges takes 0.0 seconds (there aren't many choices)
A graph with 50 nodes and 60 edges takes 0.6 seconds
A graph with 50 nodes and 70 edges takes 3.7 seconds
A graph with 50 nodes and 80 edges takes 0.8 seconds (lots of different answers now)
A graph with 100 nodes and 120 edges takes 11.2 seconds.
A graph with 100 nodes and 150 edges takes more than a minute.
Now I'm going to bed. Don't rely on these results if you are designing a crypto system.
(*) Technically it is called a cycle (because it starts and finishes at the same spot). But path sounds more natural.

RSS