Hacker News new | ask | show | jobs
by lxe 1180 days ago
Ah yes that's right. Well they technically do use a visual transformer for CLIP text encoder as I understand.