Hacker News new | ask | show | jobs
by treesciencebot 518 days ago
For traditional LLMs this might be true (especially large MoEs at bs=1) but I highly disagree with "multi-modal models" phrase since most of the models that output in other modalities are generally compute bound. Which means less flops will make the experience so much worse (imagine waiting a couple minutes for an image and hours for videos).