Hacker News new | ask | show | jobs
by IdiotSavage 51 days ago
> Transform this image into a photographed claymation diorama of assorted artisan chocolates and candies […] viewed from a low-angle

Side note: whenever I read prompts for image generation, I notice very specific details which the model obviously ignored. Here the chocolates / candies in the last two images look anything but artisanal. They look very "sterile" and mass-produced. The viewing angle is also not accurate.

Why do we even bother writing such elaborate prompts, when the model ignores most of it anyway?

6 comments

I loved the example where he requested ‘studio lighting’ and it put a bunch of studio lights in the picture.
The candies aren’t trying to look artisanal, they’re trying to match training data marketed and labelled by companies as artisanal.

Rustic, homemade, amateur, etc might align better with the tagging.

I have noticed the same thing.The few times I wanted to use image generatation it always failed me in exactly these aspects. I always put if off as a lack of prompting skill on my end. Once you start to keep an eye out for these inconsistencies they turn out to be very common.
I believe most detailed prompts are AI generated.
This is 100% true. There are entire nodes/pipelines in programs like ComfyUI that are designed to take a simple prompt and "enhance it" which usually involves making it more verbose, adding detail, etc depending on the target model.

  Original Prompt: "Man with Trapezoid Head"
  
  AI Expansion: 
  Portrait of a man with a trapezoid-shaped head, sharp geometric facial structure, angular jawline wider at the top and narrowing toward the chin, realistic skin texture, detailed pores, dramatic studio lighting, ultra-detailed, 85mm lens, shallow depth of field, dark neutral background, cinematic, photorealistic, 8k resolution.

Note: Most people (outside the generative space) won’t pick up on this but in many cases if don't prompt otherwise, you’ll often end up with a prompt that’s better suited to older, keyword‑based models like Stable Diffusion which rely heavily on specific sets of positive and negative prompt keywords more akin to magical incantations to improve the output.
Yes, this is exactly the kind of prompt I often see. Then you have stuff like "8k resolution". WTF? The output is fixed anyway.
That's funny if it's true. I'd like to see the prompt which generates the prompt.
There might be an overlap between people who use AI enough to write such posts, and people who don’t respect craftsmanship. The output looks fine to them because they never trained their eye to look closer. They vaguely hear music but never listen to the notes.
I wonder how long it took to come up with all this?

Because if I wanted a spiral of little "buttons" like the last one at the end (and they don't look very much like sweets) I'd be able to knock that out in Blender in an afternoon, and I'm not very good at Blender.

I think you're vastly overestimating the average persons ability to use Blender if you can do that in an afternoon; just figuring out how to place a colored cube and the camera probably takes an afternoon if you pick up Blender for the first time.
Yeah, I've bounced off Blender twice now. And I've written a (basic) 3D modeller.

I think part of the problem is that pretty much all the tutorial material for Blender seems to be in video form, which is easily my least effective way to learn, even leaving aside the "I've only got one screen" issue.

And knowing these little tricks to get what you want with image generation models also takes time. Not to mention you need some knowledge on some other software just to make the underlying layout.
I guess I'm coming at it from having used Blender for an afternoon or so, and already knowing Python.

If you were good at GLSL you could do it in that maybe.

Someone somewhere is going to write something that directly draws it to a framebuffer in Brainfuck, you just know it, don't you?

I'd just do it with Pillow.
OP here. It took me an afternoon to try different methods and test the limits. But now we know how it works it’s very fast to create new ones:

1. Prompt to make SVG - review in browser, iterate.

2. Prompt to write image prompt - review in editor, refine

3. Send to Gemini, get image

So maybe 5-10 mins.

I don’t know how to use Blender.

Also this method can be done over WhatsApp/telegram which is another plus over Blender type approach.

I remember opening Blender for the first time years ago and thinking it had the steepest learning curve of any software I'd ever used.
It's not perfect, but it's been vastly improved in recent years. If you lost interest in 3D art because of Blender's bad UX in the past, I recommend you give it another shot.

Also, there might be other new 3D software with better UX. I am not a Blender fanboy, but I do love 3D art and graphics programming and want as many people as possible to get into it :^)

Yeah somewhere between 2 and 3 it got a very much improved UI.
I think 2.8 is where the Blender Foundation showed their commitment to improving the UX, but it just continually kept going from there