|
|
|
|
|
by losvedir
34 days ago
|
|
The question was whether you were giving it the rendered image and using the model's visual modal capability, or feeding back in the textual SVG. It's hard to "imagine" what the rendered SVG looks like, for both humans and LLMs, so just iterating on text won't really be as useful of a test. But if you show it what it rendered, it might observe the bad-looking bicycle and be able to fix the text that way. |
|