Hacker News new | ask | show | jobs
by ashirviskas 158 days ago
I mean this is exactly what it is. Just a wrapper to replace the tokenizer. That is exactly how LLMs can read images.

I'm just focusing on different parts