Hacker News new | ask | show | jobs
by aaronharnly 513 days ago
Relatedly, we find LLM vision models absolutely atrocious at counting things. We build school curricula, and one basic task for our activities is counting – blocks, pictures of ducks, segments in a chart, whatever. Current LLM models can't reliably count four or five squares in an image.
1 comments

IMHO, that is expected, at least for the general case.

That is one of the implications of transformers being DLOGTIME-uniform TC0, they don't have access to counter analogs.

You would need to move to log depth circuits, add mod-p_n gates etc... unless someone finds some new mathematics.

Proposition 6.14 in Immerman is where this is lost if you want a cite.

It will be counterintuitive that division is in TC0, but (general) counting is not.