|
|
|
|
|
by sillysaurusx
1661 days ago
|
|
GitHub sent OpenAI something like 57 terabytes of data from GitHub. Good luck scraping that. (I helped build The Pile, the largest openly-available text dataset.) You're right that you theoretically can do this, but doing it in practice requires either funding or time. |
|
By the way, I'm so fucking stoked that Shawn Presser of The Pile responded to me. Your work is proto-solarpunk incarnate. Really amazing contributions dude, can't wait to see what's next.