Hacker News new | ask | show | jobs
by mahnerak 1203 days ago
I could now find license in the huggingface repo, but it seems like the codebase is Apache 2.0. Are the pretrained weights / checkpoints also covered under this (or other permissive) license?

In other words, can we use it for commercial purposes for free?

3 comments

Hi! Just added Apache2.0 to HF models card. Thanks!
Are the pretraining and training pipelines available anywhere under a FOSS license? I'd love to take a swing at training a mid-fusion model on data other than text and images (e.g., sound, neuron spike trains, etc.)
Are weights even copyrightable under US law? It seems like they'd be the output of an automatic process (the training program) the same way the art/text produced by AI models is, which to my understanding makes them not copyrightable material.
Compression, even lossy compression, doesn't remove copyright. Whether this is more like compression or a more "transformative use" is something the courts will have to decide someday.

It might be a good time to reread What Color Are Your Bits:

https://ansuz.sooke.bc.ca/entry/23

There is a lot of manual process involved such as writing training scripts, scarping and processing training data, choosing the best weights among several runs, and spending lots of costly computation. Maybe these should make it copyrightable.
Good question, was about to ask the same!