Hacker News new | ask | show | jobs
by ShamelessC 1432 days ago
tl;dr - It’s a GAN, they have some interesting limitations but can output 1024px images in real time on a consumer gpu.

The training labels may have been “segmentation maps”. These are regions of an image with a known scene description such as “cloud”, “trees”, “sky”. I’m not certain what model they use, but I bet it is a Stylegan2/3 modified to generate an image from a given set of segmentation masks.

Indeed, without the research context, it’s a little strange “why” you would want a product like this. Nvidia has done a lot of research to get GAN to run very fast on their RTX cards due to being mostly convolutional, operating directly in pixel (or wavelet) space rather than an embedding space. On my RTX 2070, I can run Stylegan2 at 1024px at a somewhat reasonable 10 FPS.