Hacker News new | ask | show | jobs
by echelon 144 days ago
Speaking of "crafting", I think this is the perfect word to describe something more than "prompting".

It's extremely hard to block out a scene with just words, eg. "rotate hand 45 degrees, stand perpendicular to the column, shadows from light source 60 degrees above horizon, large box in front of chest, approximately 2 feet wide", etc.

Image-to-image, ControlNets, previz-to-final, etc. are the way to go, and I'm convinced this is the core interface for image and video creation. Text prompts will get you a coarse grained first approximation, which you then visually adjust to your exact needs with UI/UX-first models.

I built an intentional "crafting" engine so people could mold images like clay, with full intention:

https://github.com/storytold/artcraft

This is really early days though. I expect more tools and models to enable you to fully manipulate everything first-class, in 2d/3d. As if everything in an image were mutable.

As a film director, this is really exciting stuff.