https://laion.ai/faq/
Based on the FAQ of the dataset that was used for training of
https://huggingface.co/spaces/stabilityai/stable-diffusion LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in. For this purpose, we suggest the img2dataset tool.
I love the "*simply*", but doesn't it mean that (depending on country, laws etc., but generally):1. The LAION group committed possible copyright infringements and even left undeniable evidence that they did - on top of their written testimony (dumping the "stolen goods into the river" does not make the infringement undone, does it?) 2. Any model trained on the "linked" data may commit copyright infringement. 3. As consequence, you using generated images may be liable. I always wonder how it possibly is legal at all - considering that as a human artist if I was to copy material and remix it it without proper permission would be liable (again depending on situation), but suddenly ML is around the corner and it's all great and now you can keep remixing the potential problematic output further - no questions asked!? I guess there are no precedence cases but why should an automaton/software (and its creators) be judged differently to persons? I don't want to spoil the fun but what am I missing? Also disappointed that this dataset did not make sure to only collect unproblematic content like Creative Commons that
allows remixing. Would be a hell of a attribution list but definitely better than what is presented here. EDIT: Formatting EDIT2: I actually followed one of the projects mentioned not the linked repository. Clarified above. |
Now I'm not saying that nothing created by these AIs should be considered copyright infringement. As a human artist, you are not judged on your process, you are judged on the end results. The same should be done for the works created by these AIs.