Hacker News new | ask | show | jobs
by Eisenstein 714 days ago
> My guess is that the systems are running image recognition models

Your guess is incorrect. Look up CLIP, BLIP, and SigLip for an idea of how they work.

1 comments

Will do, thank you.