Hacker News new | ask | show | jobs
by layer8 1365 days ago
From the abstract: “We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss.”

This seems like basically plugging a couple of techniques together that already existed, allowing to turn 2D text-to-image into 3D text-to-image.

3 comments

> This seems like basically plugging a couple of techniques together that already existed [...]

In his Lex Fridman interview, John Carmack makes similar assertions about this prospect for AGI: That it will likely be the clever combination of existing primitives (plus maybe a couple novel new ones) that make the first AGI feasible in just a couple thousand lines of code.

That was a great interview. I really liked his perspective on how close we are to having AGI. His point is that there's only a few more things we need to figure out and then it will basically happen.

I also liked the analogy he made with his earlier work on 2D and 3D graphics engines where taking a few short cuts basically got him on a path to success. For a while we had this "almost" 3D capability long before the hardware was ready to do 3D properly. It's the same with AGIs. A few short cuts will get us AI that is pretty decent and can do some impressive things already - as witnessed by the recent improvements in image generation. It's not a general AI but it has enough intelligence that it can still do photo realistic images that make sense to us. There's a lot of that happening right now and just scaling that up is going to be interesting by itself.

That's a great example that reminds me of another one: there was nothing new about Bitcoin conceptually, it was all concepts we already had just in a new combination. IRC, Hashing, Proof of Work, Distributed Consensus, Difficulty algorithms, you name it. Aside from Base58 there wasn't much original other than the combination of those elements.
Base58 really should have been base57.
Hello Stavros, I agree. When I look at the goals that base58 sought to achieve, (eliminating visually similar characters) I couldn't help but wonder why more characters were not eliminated. There is quite a bit of typeface androgyny when you consider case and face.
Yeah, I don't know why 1 was left in there, seems like a lost opportunity. Discarding l, I, 0, O, but then leaving 1? I wonder why.
I can only assume it was for a superstitious reason so that the original address prefixes could be a 1. This is the only sense I can make from it.
Billions of creatures with stronger neutral networks, more parameters, better input have lived on earth for millions of years, but only now something like humans showed up. I fully expect AI to do everything animals can do pretty soon, but since whatever it is that differentiates humans didn't happen for million of years, there's good chance AGI research will get stuck at a similar point.
Nature has the advantage of self organisation and (partially because of that) parallelism, that's proved hard to mimic in man made devices. But on the other hand, nature also has obstacles such as energy consumption, procreation & development, and survival, that AI doesn't have to worry about.

I think finding a niche for humans has proved difficult especially because of those reasons, and AI can take those hurdles much easier.

Change arrives gradually, and then suddenly.

It takes nature thousands of years to create a rock that looks like a face, just by using geology. A human can do that in a couple hours. And then this AI can generate 50 3d human faces per second (assuming enough CPU).

It could be that an AGI is around the corner, as they say. We might not be machines, but are way faster than nature at reaching places. We don't have the option of waiting for thousands of years.

> This seems like basically plugging a couple of techniques together that already existed

as with a majority of ML research

True (I made such a proposal myself a few hours ago, albeit in vaguer terms). The thing is deployment infrastructure is good enough now that we can just treat it as modular signal flows and experiment a lot without having to engineer a whole pile of custom infrastructure for each impulsive experiment.
Isn't that what the Singularity was described as a few decades ago? Progress so fast it's unpredictable even in the short term.
Same as it ever was, scientific revolutions arrive all at once, punctuating otherwise uneventful periods. As I understand, the present one is the product of the paper "Attention is all you need": https://arxiv.org/pdf/1706.03762.pdf.
... that one has 52K citations, and the 2D to 3D paper "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" with 1488 citations.

https://arxiv.org/abs/2003.08934

>as with a majority of ML research

Plus "we did the same thing, but with 10x the compute resources".

But yeah.

> This seems like basically plugging a couple of techniques together that already existed

Do this enough times and eventually the thing you have looks indistinguishable from something completely novel.

Time and time again these ML techniques are proving to be wildly modular and pluggable. Maybe sooner or later someone will build a framework for end to end text-to-effective-ML-architecture that will just plug different things together and optimize them.
I think this is what huggingface (github for machine learning) is trying with diffusers lib: https://huggingface.co/docs/diffusers/index

They have others as well.

Fascinating stuff! But who is working on the text-to-ML-architecture thing?
Cool stuff. But who is working on the text-to-ML-architecture thing?