Hacker News new | ask | show | jobs
by Philpax 990 days ago
I more or less agree with you (I'm not convinced that training models on the imagery of the internet isn't fair use), but I wouldn't rule out a CC0 model just yet.

There's Mitsua Diffusion One [0], which doesn't produce incredible results, but it's a start and they're planning on adding more data, including opt-in work from artists.

PIXART-alpha [1] was trained on only 25 million images, and has excellent and competitive results. This could pair well with Fondant AI's 25 million Creative Commons-only dataset [2] (not all CC0, but a sizeable amount).

I don't think it's as far away as you think it is!

[0]: https://huggingface.co/Mitsua/mitsua-diffusion-one

[1]: https://pixart-alpha.github.io/

[2]: https://huggingface.co/datasets/fondant-ai/fondant-cc-25m