Hacker News new | ask | show | jobs
by tivert 672 days ago
> There are not enough license bureau images to train a CLIP model, not enough expressly licensed text content to train T5. A CLIP model needs 2 billion images to perform well, not the 600m Adobe claims they have access to. It's right in the paper.

Not an expert on this, but I wonder:

1) how many images you could create/buy/tag with a billion dollar investment, and

2) if you could lower the training requirements with targeted training data creation (e.g. get low-priced/amateur models to come in singly and in groups for an hour each and work through a catalog of poses/costumes designed to result very good generative model for "people").