| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by theptip 151 days ago
	Did you eval using screenshots or some sort of rendered visualization instead of the CLI? I wonder if Claude has better visual intelligence when viewing images (lots of these in its training set) rather than ascii schematics (probably very few of these in the corpus).

4 comments

cheema33 150 days ago

Computer use and screenshots are context intensive. Text is not. The more context you give to an LLM, the dumber it gets. Some people think at 40% context utilization, the LLM starts to get into the dumb zone. That is where the limitations are as of today. This is why CLI based tools like Claude Code are so good. And any attempt at computer use has fallen by the wayside.

There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.

link

fragmede 150 days ago

> And any attempt at computer use has fallen by the wayside.

You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)

https://claude.com/blog/cowork-research-preview

https://news.ycombinator.com/item?id=46593022

More to the point though, you should be using Agents in Claude Code to limit context pollution. Agents run with their own context, and then only return salient details. Eg, I have an Agent to run "make" and return the return status and just the first error message if there is one. This means the hundreds/thousands of lines of compilation don't pollute the main Claude Code context, letting me get more builds in before I run out of context there.

link

cheema33 143 days ago

>> And any attempt at computer use has fallen by the wayside.

> You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)

Claude Cowork does not do "computer use" in the traditional sense. e.g. it cannot use your computer to drive the interface of Adobe Premiere. It is not taking screenshots of your computer desktop, like a traditional "Computer use" product does.

link

Jaysobel 151 days ago

I had tried the browser screenshotting feature for agents in Cursor and found it wasn't very reliable - screenshots eat a lot of context, and the agent didn't have a good sense for when to use them. I didn't try it in this project. I bet it would work in some specific cases.

link

nanapipirara 150 days ago

Claude helped me immensely getting an image converter to work. Giving it screenshots of wrong output (lots of layers had an unpredictable offsets that was not supposed to be there) and output as I expected it helped Claude understand the problems and it fixed the bugs immediately.

link

deepl_y 149 days ago

I'm not sure if this proves anything, but i saw this article of Opus playign pokemon, and here they were given actual screenshots, and it still says it navigated visual space pretty poorly despite the advancements https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-i...

link