Hacker News new | ask | show | jobs
by neosat 33 days ago
Do you find the video understanding work there also to be 'silly little slop', or did you only look at the gifs on the page and not read about the understanding work in a 3B model?

This is not ground-breaking by any means, but achieving this in a 3B model and sharing the approach + weights advances engineering and certainly more contribution that 'silly little slop videos' imo.

1 comments

It’s not a 3B model, it has 3B active parameters. The full model is much larger.
That's true, I should have mentioned active. Actual params are closer to 12B-14B likely, given the 40GB VRAM usage.