Hacker News new | ask | show | jobs
by pwatsonwailes 302 days ago
Vision language models. Basically an LLM plus a vision encoder, so the LLM can look at stuff.