Hacker News new | ask | show | jobs
by dartmoose 1964 days ago
My understanding of this paper (please comment/correct based on your understanding):

* Instead of using a single latent feature per each implicit surface, the authors propose using a "volume" of latent features per each surface. This allows for the NNs to better capture the geometric detail while remaining relatively shallow. The result is a more accurate and faster to compute neural SDF. Contrary to the claim of another comment, the neural SDF alone is not the interesting part of this paper--the prior works points to at least three other papers that have explored the idea of representing an SDF with a neural net: Park et al.'s DeepSDF https://arxiv.org/pdf/1901.05103.pdf, Mescheder et al's Occupancy Networks http://www.cvlibs.net/publications/Mescheder2019CVPR.pdf, and Chen et al's Learning Implicit Fields https://arxiv.org/pdf/1812.02822.pdf All very interesting papers.

* When I say a "volume" of latent features I specifically mean a voxel-grid where the corners of each voxel are latent features and any position X has a corresponding feature Z which is simply the trilinear interpolation of the features on the corners of the voxel. As the authors mention, they try to keep this sparse by leaving any voxel that does not contain the surface "empty".

The authors use an octree to create L different feature volumes. As L becomes larger, the resolution of the feature volume increases which means that more fine grained details can be encoded as features.

Finally, the authors describe a rendering procedure that makes use of their LOD model (still need to read this part more thoroughly).

Some additional thoughts:

Why are SDFs useful at all?

One comment suggests this is a form of "compression" but meshes have a far smaller memory footprint and are computationally less expensive to render. Ray tracing is extremely fast, largely due to the fact that as a primitive operation in graphics so much time and energy has been invested into understanding how to make it faster with various acceleration structures, like BVHs.

So are SDFs actually useful?

Yes. Triangle or polygon meshes are great when you have them, but are terribly challenging to work with for reconstruction tasks. For instance, you effectively have to pause occasionally during reconstruction to fix your mesh up so that it isn't complete garbage (triangles with small angles, self-intersections, extremely lopsided side lengths, etc). SDFs support arbitrary topology painlessly, which is why they show up so much in reconstruction/computer vision.

So why do we need neural nets to represent them?

I think the primary reason you use a neural network to represent a signed distance function is because it's a more efficient representation than storing the SDF in some sort of grid structure (maybe someone else has more thoughts on this?). As a side benefit, it can simplify any sort of differentiable rendering since the surface itself already is represented in a manner that is naturally differentiable via back-propagation.

1 comments

>Why are SDFs useful at all?

Meshes and bezier patches are boundary representations with no information of the volume they enclose. Imagine you'd like to cut an object out of smoke or clouds. SDFs enable you to cut out arbitrary volumes from any material. This is more realistic than skinning a mesh object with textures, especially with translucent objects.

> So why do we need neural nets to represent them?

You don't. I didn't notice render times, but "interactive frame rates" would be lacking.

Most SDF primitives and their compositions are not analytic (from use of abs, fract, etc). Differentiation by finite differences is most common. Few are using automatic differentiation. I don't quite follow how back prop would produce the surface gradient, but I doubt it would be faster than these methods.

> Meshes and bezier patches are boundary representations with no information of the volume they enclose

Pedantic on my part, but SDF is still just a surface/boundary representation at the end of the day. It may have additional benefits that you point out such as quickly computing if you're inside/outside or being able to more easily deform the shape (through any number of procedures including slicing), but you'd probably use a density grid instead of a SDF if you're dealing with anything volumetric, such as clouds or smoke.

> You don't. I didn't notice render times, but "interactive frame rates" would be lacking.

Yeah definitely, poor wording on my part.

> Most SDF primitives and their compositions are not analytic (from use of abs, fract, etc). Differentiation by finite differences is most common. Few are using automatic differentiation. I don't quite follow how back prop would produce the surface gradient, but I doubt it would be faster than these methods.

I don't think we're talking about the same thing, I'm not referring to the surface gradient when I say differentiable rendering.

Both differentiable rendering with an SDF (i.e. SDFDiff https://arxiv.org/pdf/1912.07109.pdf) and with a neural SDF(DIST http://b1ueber2y.me/projects/DIST-Renderer/) use automatic differentiation to compute the gradient of the rendering/image-generating process. Section 3.4 in the DIST paper discusses where back prop comes into all of this. Basically your surface is defined by the weights of some neural net (i.e. a neural SDF) and you need to know the gradient of your image with respect to those weights.

For a variety of reasons, these surface representations are easier to handle than triangle meshes which break certain desirable properties for differentiation and require extra care as a result (see edge sampling https://people.csail.mit.edu/tzumao/diffrt/ for an early example of the challenges of doing differentiable rendering with a triangle mesh).

>SDF is still just a surface/boundary representation

Meshes and Bezier patches are parametric representations (defined over R^2, evaluated with vec2, output vec3). Implicit functions are volumetric representations of surfaces (defined over R^3, evaluated with vec3, output float). This is a key difference between parametric modelers like AutoCad and implicit modelers like nTopology.

>density grid instead of a SDF if you're dealing with anything volumetric, such as clouds or smoke.

These functions of time change every frame. In the implicit paradigm, it's not necessary to evaluate the smoke ahead of time since sphere tracing can reduce to marching adaptively when we are within smoke. Image quality will not be bound to any resource resolution other than the screen (a low res smoke grid would appear blocky).

I agree with you that the current triangle-based RT implementations have faster alternatives that accomplish similar things on the surface.

Thanks for the links!

Thanks again for the insights here and in the other thread!

> In the implicit paradigm, it's not necessary to evaluate the smoke ahead of time since sphere tracing can reduce to marching adaptively when we are within smoke.

Makes sense. The SDF is separate from the density grid for the cloud/smoke. We need both, one to detect the boundary (SDF) and another to actually render the volume (density grid).

In offline rendering we usually just have some primitive (sphere, cube etc) that acts as the boundary but that obviously isn't as adaptive and doesn't let you easily reveal arbitrary slices of a volume.

>In offline rendering we usually just have some primitive (sphere, cube etc) that acts as the boundary

Right, using a SDF as a boundary condition is to discretely check it as an indicator function (if negative do x else do y). You can use basically any SDF to bound your procedural medium, including a procedural boundary (via domain distortion).

You can also apply a continuous blend instead of the discrete check (smooth step the level curve). This is preferred since discrete checks will often result in popping.

Render times are listed on table 3 of the paper- interactive, but perhaps not high-performance!