|
|
|
|
|
by petercooper
38 days ago
|
|
Because image models at the basic level are just text tokens in, image tokens out. You'd need an agentic process on top to come up with a strategy, review output, try again, and so on. I believe Nano Banana and gpt-image-2 have a little of this going on, but it's like asking a model to one-shot some code vs having an agentic harness with tools do it. Even the most basic agent can produce better code than ChatGPT can. |
|