“the pipeline” - seems like this is just a personal hackathon project?
Why these models vs other multimodals? Which “nvidia models”?