|
|
|
|
|
by lxgr
418 days ago
|
|
Hm, I wouldn't say that that's the key point of open software. There are many open source projects that don't have reproducible builds (some don't even offer any binary builds), and conversely there is "source available" software with deterministic builds that's not freely licensed. On top of that, I don't think it works quite that way for ML models. Even their creators, with access to all training data and training steps, are having a very hard time reasoning about what these things will do exactly for a given input without trying it out. "Reproducible training runs" could at least show that there's not been any active adversarial RHLF, but seem prohibitively expensive in terms of resources. |
|
There are different variations, of course. Mostly related to the rights and permissions.
As for big models even their owners, having all the hardware and training data and code, cannot reproduce them. Model may have some undocumented functionality pretrained or added in post process, and it's almost impossible to detect without knowing the key phrase. It can be a harmless watermark or something else.