It is not that hard. You take multiple photos, add depth information and you have 3d versions of the pictures that you took. I know I over simplified it but in a nutshell this is it
Yes, in nutshell, this is it :-)
But knowing depth information is not enough - you must also very precisely know the position of the camera the individual pictures were taken from. Computing both depth information and position of the camera is in fact a very challenging task.