| Just to add some detail regarding the "blob optimization" phase. The algorithm that recovers the camera positions from the reference images also gives you a sparse cloud of points (it places the pixels from the image in 3D space). Use that as the center of the initial blobs, and give each blob an initial size. This is almost certainly not enough detail, but a start. Then you run the "training" for a while, optimizing the position and shape of the blobs. Then you try to optimize the number of blobs. The key aspect here is to determine where more detail is needed. In order to do so they exploit that they already have derivatives of several properties, including screen position of each blob. If the previous training pass tries to move a given blob a significant distance on the screen, then they take that as a signal that the backpropagation is struggling to cover an area. They then decide to split the blobs either by duplication or by splitting, depending on if the blob is large or not. If it's small they assume there's detail it can't fill in, and duplicate the blob and move the new blob slightly in the direction it wanted to move the source blob so they don't overlap exactly. If the blob is large they assume the detail is too fine and is overcovered by the blob, hence they split it up, calculating the properties of the new blobs so that they best cover the volume the source blob covered. This process of training followed by blob optimization is repeated until the error is low enough or doesn't change enough, suggesting it converged or a failure to converge respectively. |