Hacker News new | ask | show | jobs
by NullCascade 301 days ago
Considering most SOTA LLMs are also multimodal/vision models, could they get better results if the LLM gets visual feedback with it?