Hacker News new | ask | show | jobs
by SilverBirch 1431 days ago
Can I ask, was there an underlying reason that people deciding to pursue this image generation task, or is this literally just the result of throwing lots of tasks at different types of AI until you finally find one it seems to do well?

I don't mean to denigrate this, the results are clearly interesting, but I just don't understand what problem this solves, it just seems to raise the noise floor on reality.

5 comments

There is a real use case for this type of technology.

I'm the founder of https://ayvri.com, and we have a 3D virtual world where outdoor athletes watch their activities, and the activities of others.

As the resolution (and speed) of our 3D world improved, people got more interested and engaged with it.

I believe this is the future of video. Not volumetrically created through 20+ cameras, but with a single camera capturing the scene, and AI filling in the blanks based on what it knows.

Right now, there is a whole bunch of architectures that are being discovered as being good for certain tasks.

At some point, there is going to be some sort of higher level research for ML in terms of generating an architecture for a particular task. And all this research is going to be used for this.

When you look at GAN, I could see the data from this endeavor being used to improve it's output. I get it doesn't seem to have a direct application, but I would suspect that it actually is quite valuable to the media and entertainment space in the long run.
Probably an AI that could help us re watch our dreams?
tl;dr - It’s a GAN, they have some interesting limitations but can output 1024px images in real time on a consumer gpu.

The training labels may have been “segmentation maps”. These are regions of an image with a known scene description such as “cloud”, “trees”, “sky”. I’m not certain what model they use, but I bet it is a Stylegan2/3 modified to generate an image from a given set of segmentation masks.

Indeed, without the research context, it’s a little strange “why” you would want a product like this. Nvidia has done a lot of research to get GAN to run very fast on their RTX cards due to being mostly convolutional, operating directly in pixel (or wavelet) space rather than an embedding space. On my RTX 2070, I can run Stylegan2 at 1024px at a somewhat reasonable 10 FPS.