|
|
|
|
|
by benxh
1109 days ago
|
|
Yes, but the acquisition of that data itself is illegal in almost all jurisdictions, since libgen is treated as a piracy website. Now if there were a pipeline to access books from Amazon or the Google Books project for training it would be a different story. Still, for certain languages, only libgen and public piracy websites contain any scientific or fiction material in digital formats. E.g. my native language doesn't have easily accessible e-books at all, unless you go through illegal means. I hope somebody undertakes the steps necessary to train on the entirety of libgen. The amount of high quality tokens in libgen should be substantial. |
|