| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sillysaurusx 1661 days ago

GitHub sent OpenAI something like 57 terabytes of data from GitHub. Good luck scraping that.

(I helped build The Pile, the largest openly-available text dataset.)

You're right that you theoretically can do this, but doing it in practice requires either funding or time.

1 comments

nefitty 1661 days ago

Yeah, I thought of mentioning that but wasn't sure how in the weeds anyone would want to go lol Besides, I'm optimistic about what an enterprising individual is capable of when faced with those sorts of limits... It's those clear bounds that set creativity free.

By the way, I'm so fucking stoked that Shawn Presser of The Pile responded to me. Your work is proto-solarpunk incarnate. Really amazing contributions dude, can't wait to see what's next.

link

sillysaurusx 1661 days ago

I'm really happy to hear that. Thank you.

When I started out, I only wanted to make some small contribution somewhere. It's really surreal that there are people rooting for me now. I'll do my best to continue to contribute in ways that I can.

You can too, by the way. There's not a lot of difference between me and you. I believe in you.

link

nefitty 1661 days ago

I was literally sitting here trying to stop the waves of sadness I'm feeling from not meeting my own expectations. Bumping into a kindred spirit that's getting shit done really helps. Thank you for the nudge.

link

sillysaurusx 1661 days ago

No stress, friend :) Remember, small contributions really matter! You can do it!

link