The headline (currently: “Trying to craft AI images that are worth displaying to end users”) is misleading and changed from the original. Author isn’t crafting any AI images; they’re using AI in tandem with manual work to help choose from a set of human-authored images.
Ok! that was my attempt to avoid linkbait and make the title less provocative (submitted title was "How to design an AI app with a sense of taste"). But I missed the mark this time, so have reverted the title to the article's own headline, except I'm not going to keep the word 'beautiful' up there since that would be certain to provoke shallow objections.
The title on HN is incorrect/misleading, they are not generating AI images. They are hand curating a database of images by location and using an LLM to pick the pictures.
Speaking of "crafting", I think this is the perfect word to describe something more than "prompting".
It's extremely hard to block out a scene with just words, eg. "rotate hand 45 degrees, stand perpendicular to the column, shadows from light source 60 degrees above horizon, large box in front of chest, approximately 2 feet wide", etc.
Image-to-image, ControlNets, previz-to-final, etc. are the way to go, and I'm convinced this is the core interface for image and video creation. Text prompts will get you a coarse grained first approximation, which you then visually adjust to your exact needs with UI/UX-first models.
I built an intentional "crafting" engine so people could mold images like clay, with full intention:
This is really early days though. I expect more tools and models to enable you to fully manipulate everything first-class, in 2d/3d. As if everything in an image were mutable.
As a film director, this is really exciting stuff.