|
|
|
|
|
by benxh
1119 days ago
|
|
To be honest, I've been asking myself the same thing, technically the amount of "good quality" data in libgen is huge, way larger than the books3 dataset. However it would probably run afoul of copyright. Then again, a huge amount of data that LLMs go through is copyrighted. |
|