| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joewhatkins 1324 days ago

From my understanding the model in the linked video only stylizes existing meshes from text.

There’s plenty of papers that have tried text -> 3D model generation using photogrammetric-esque methods similar to what the parent comment suggested - the two minute papers video on one example is here.

https://youtu.be/L3G0dx1Q0R8

Outside of cherry-picked examples this style of model tends to suffer from what people are calling the “Janus Problem” - the easiest way for it to satisfy the loss is to simply make the object look like the input prompt from as many angles as possible. So if you enter “a rubber duck”, it tends to generate yellow blobs with multiple head-like appendages sprouting off from it.

Google’s paper that tried this approach using Imagen as the text->image generator had great results, but they might be cherry-picked. Someone replicated it with stable diffusion as the text->image backend - still major Janus problems.