Hacker News new | ask | show | jobs
by stealthcat 700 days ago
Marketed as “multimodal” but actually texts and images.

Multimodal dataset should be multimedia: text, audio, images, video, and optionally more like sensor readings and robot actions.