|
|
|
|
|
by nradov
546 days ago
|
|
There is an enormous "iceberg" of untapped non-public data locked behind paywalls or licensing agreements. The next frontier will be spending money and human effort to get access to that data, then transform it into something useful for training. |
|
the highest quality language data that exists is in the public domain