Hacker News new | ask | show | jobs
by be7a 261 days ago
The biggest takeaway is that they claim SOTA for multi-modal stuff even ahead of proprietary models and still released it as open-weights. My first tests suggest this might actually be true, will continue testing. Wow
2 comments

Most multi-modal input implementations suck, and a lot of them suck big time.

Doesn't seem to be far ahead of existing proprietary implementations. But it's still good that someone's willing to push that far and release the results. Getting multimodal input to work even this well is not at all easy.

I feel like most Open Source releases regardless of size claim to be similar in output quality to SOTA closed source stuff.