Hacker News new | ask | show | jobs
by bambax 1381 days ago
Good article, although "unbundling" is an odd choice of word. Before AI, the execution step was necessary, but not necessarily attached to the "coming up with an idea" step. Conceptual artists have for years (centuries?) produced ideas that were executed by others.

AI makes the execution step near-instantaneous and almost free. It kind of erases it. It's more disintermediation than unbundling. Execution artists are being removed as gatekeepers.

A question remains that, if an AI generator produces deterministic, consistent output given a set of (prompt, seed, model), does that mean that all possible images are somehow contained in the model?

If there was a Borgesian book containing, say, all possible 512x512 images, one on each page, then surely two people having the same copy of that book wouldn't need to exchange images, they could simply exchange page numbers, and see exactly what the other one is referring to.

If we are now able to exchange prompts and seeds and get a predictable, consistent result out of SD, isn't that what we're doing?

3 comments

> If there was a Borgesian book containing, say, all possible 512x512 images, one on each page, then surely two people having the same copy of that book wouldn't need to exchange images, they could simply exchange page numbers, and see exactly what the other one is referring to.

If that were the case, the page number would be as long as the image, and exchanging one or the other is the same. Heck with proper order arrangement, the page number IS the image, in a known format.

In other words, the information content of such book is exactly 0.

What could contain information is a book of all 512x512 images that a human being would perceive as being "an image". Ie, the vast majority of possible 512x512 images look like random noise to humans. Excluding those massively shrinks the size of the book.

So that does mean image model AI like dall-e/SD are effectively compression over this space of images that they can generate (which is at least attempting to emulate the space of 'meaningful images' to a human), since given a seed, they'll deterministically produce the same image, and that seed is much smaller than the information needed to describe every pixel in the image.

Ah yes, conceptually, SD is a filter that removes noise (~ things that look like noise to us) from the space of all possible pixel combinations. It's interesting that it's also how it works in practice.

  does that mean that all possible images are somehow contained in the model?
That would depend largely on what kinds of non-linearities the model is using, particularly towards the later steps. It's entirely possible for there to exist spaces that a given trained model cannot output. It's unlikely those spaces are particularly interesting: perhaps a pixel-by-pixel checkerboard of pure black and white, for example.

A perfectly linear model with linearly-independent matrix columns could generate any possible value, but would be exactly equivalent to a single vector-matrix multiplication, unable to do any interesting multi-step reasoning.

At least as it works now, on a given system, you can generate an identical image given the exact same prompt and the same random seed. That is assuming you are using the exact same software version and model version and the exact same graphics card.

Plus, I can't imagine it ever becoming faster to generate a 512x512 image using AI than transmitting the full image over a network.