|
|
|
|
|
by michaelt
503 days ago
|
|
Some people believe they can dodge copyright issues so long as they have enough indirection in their training pipeline. You take a terabyte of pirated college physics textbooks and train a model that can pose and answer physics 101 problems. Then a separate, "independent" team uses that model to generate a terabyte of new, synthetic physics 101 problems and solutions, and releases this dataset as "public domain". Then a third "independent" team uses that synthetic dataset to train a model. The theory is this forms a sort of legal sieve. Pass the knowledge through a grid with a million fact-sized holes and with enough shaking, the knowledge falls through but the copyright doesn't. |
|