Hacker News new | ask | show | jobs
by desdenova 613 days ago
I think the closest we have right now is 3D gaussian splatting.

So far it's only been used to train a scene from photographs from multiple angles and rebuild it volumetrically by adjusting densities in a point-cloud.

But it might be possible to train a model on multiple different scenes, and perform diffusion on a random point cloud to generate new scenes.

Rendering a point cloud in real time is also very efficient, so it could be used to create insanely realistic game worlds instead of polygonal geometry.

It seems someone already thought of that: https://ar5iv.labs.arxiv.org/html/2311.11221

2 comments

Interesting, I guess that takes things even further and removes the need for hand-crafted 3D assets altogether, which is probably how things will end up going in gaming, long-term.

I was suggesting a more modest approach, I guess, one where the reverse-denoising process involves picking and placing existing 3D assets, e.g., those in GTA 5, so that the process is actually building a plausible map, using those 3D assets, but on the fly...

Turn your car right and a plausible street decorated with buildings, trees and people is dreamt up by the algorithm. All the lighting and physics would still be done in-engine, with stable diffusion acting as a dynamic map creator, with an inherent knowledge of how to decorate a street with a plausible mix of assets.

I suppose it could form the basis of a procedurally generated game world where, given the same random seed, it could generate whole cities or landscapes that would be the same on each player's machine. Just an idea...

The thing is that, there are generators that can do exactly this, no need to have an LLM as the middle man. Things like terrain generation, city generation, crowd control, character generation, can be done quite easily with far less compute and energy.
Someone has to write those by hand, and they don't generalize.

Diffusion based generators will do everything soon. And in every style imaginable.

We'll probably solve the energy issue in time.

Technically I guess one could do a stable diffusion-like model except on voxels, where instead of pixel intensity values it producing a scalar field which you could turn into geometry using marching cubes or something similar.

Not sure how efficient that would be though, and would only work for assets like teapots and whatnot, not whole game maps say.

That's a simplified version of what a point cloud stores, but only works with cubes then.

A point cloud is basically a 3D texture of colors and densities, so a raymarching algorithm can traverse it adding densities it collides with to find the final fragment color. That's how realistic fog and clouds are rendered in games nowadays, and it's very fast, except they use a noise function instead of a scene model.

> A point cloud is basically a 3D texture of colors and densities

That's not how I'm familiar with it. As I know it[1], a point cloud is literally that, a collection of individual points, that represents an object scene.

While what you describe is like the scalar field[2] I mentioned, each position in space has some value. You can render them directly like you say, I was thinking to extract geometry a level-set method could be interesting.

[1]: https://en.wikipedia.org/wiki/Point_cloud

[2]: https://en.wikipedia.org/wiki/Scalar_field