Hacker News new | ask | show | jobs
by gruez 29 days ago
I thought all the recent models are "multimodal"? Is the image part just sticking an image recognizer in front of the text model?