Hacker News new | ask | show | jobs
by niek_pas 2 days ago
This is definitely a good point. I imagine the max capacity for video models is significantly lower than for text models (there just aren't as many professionals in video as there are people who write text or code) but I could be wrong.