|
|
|
|
|
by nickpsecurity
562 days ago
|
|
I’ve been wanting to run into someone on the Databricks team. Can you ask whoever trains models like MPT to consider training an open model only on data clear of copyright claims? Specifically, one using only Gutenberg and the permissive code in The Stack? Or just Gutenberg? Since I follow Christ, I can’t break the law or use what might be produced directly from infringement. I might be able to do more experiments if a free, legal model is available. Also, we can legally copy datasets like PG19 since they’re public domain. Whereas, most others have works in which I might need a license to distribute. Please forward the request to the model trainers. Even a 7B model would let us do a lot of research on optimization algorithms, fine-tuning, etc. |
|