Hacker News new | ask | show | jobs
by ashed96 223 days ago
In my experience, LLMs tend to take noticeably longer to process images than text.
2 comments

It has to get the image data first, basically just IO time before processing it
IIRC there's pre-processing (embedding/tokenization?) before feeding images to LLMs?

Hit this issue optimizing LLM request times. Ending up lowering image resolution. Lost some accuracy but could bear that.

I wonder if these stay in the prefix cache?