Hacker News new | ask | show | jobs
by efavdb 213 days ago
Are you suggesting use the clip embedding for the text as a feature to train a standard Ml model on?
2 comments

I think they're suggesting doing that with BERT for text and CLIP for images. Which in my experience is indeed quite effective (and easy/fast).

There have been some developments in the image-of-text/other-than-photograph area though recently. From Meta (although they seem unsure of what exactly their AI division is called): https://arxiv.org/abs/2510.05014 and Qihoo360: https://arxiv.org/abs/2510.27350 for instance.

I think he is. I do things like that plenty.