|
|
|
|
|
by theptip
151 days ago
|
|
Did you eval using screenshots or some sort of rendered visualization instead of the CLI? I wonder if Claude has better visual intelligence when viewing images (lots of these in its training set) rather than ascii schematics (probably very few of these in the corpus). |
|
There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.