| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dom96 1103 days ago

I've been considering using Reddit data to pre-seed the content in a successor to Reddit. Though I am unsure how that would stand legally.

As a side note, I created an alternative Reddit API[1] and Reddit didn't like that so much they banned my 13 year old Reddit account.

1 - https://api.reddiw.com

6 comments

lelandfe 1103 days ago

IANAL. For the US, users grant Reddit a license to use their content when they post it. The users still own that content. Reddit's license does not extend to your reuse of it[0], nor have the underlying users directly granted you permission, so it would not be legal (in the US) for you to reuse like that.

[0] "you may not... license, sell, transfer, assign, distribute, host, or otherwise commercially exploit the Services or Content" https://www.redditinc.com/policies/user-agreement-september-...

link

RobotToaster 1103 days ago

Wouldn't that mean it would be down to the individual users who still own each bit of content to issue a DMCA takedown if they objected?

I imagine the number of such requests would be small.

link

scosman 1103 days ago

Ah. The old “I did so much copyright violation it would be infeasible for everyone I took content from to enforce” defence. I see nothing that could go wrong.

link

lelandfe 1103 days ago

Posting that you’re going to be “using Reddit data to pre-seed the content” may make it a bit harder to dodge Reddit in court.

link

danielheath 1103 days ago

Although prompting “write a comment replying to the text ‘<snip> in the style of u/landfe“ would yield something I copyrightable…

link

doix 1103 days ago

I was chatting about this with some friends. If we had a million or so spare, just fork Reddit. Grab the latest open source version of Reddit, pay the pushshift guys for the most up-to-date dump they have and get it in.

Make a system for claiming your old Reddit account. I'm guessing if you try to use OAuth, Reddit will just ban you. So you need to get creative, probably make an extension that grabs the users sessionid from their cookies or something (or let people copypaste it in if they are technical enough).

Fun to imagine but unfortunately probably won't happen.

link

soeptical 1103 days ago

Noone will use it

link

SheinhardtWigCo 1103 days ago

Just launder it through an LLM, problem solved.

link

scrum-treats 1103 days ago

Indeed. Could call it something like the RedditCrawl corpus.

link

micromacrofoot 1103 days ago

don’t even need reddit with an llm, I did some back of the napkin token math and you can fake a year of activity for a couple thousand dollars (varies by number of users and comment length of course) - hell, you can even make it look active in real-time and respond to real users - as long as you give it some guidance about commenting style (as in not the default gpt 8th grade essay style) it’s very hard to tell

link

drodgers 1103 days ago

Adversarial interoperability like this would be a great way to neutralise network lock-in effects and create a more level competetive playing field between social media companies. I think we should enshrine protections for this kind of thing.

There was a strong 2019 precedent in favour of allowing this kind of scraping of public content (from LinkedIn in that case): https://www.techdirt.com/2019/09/10/big-news-appeals-court-s...

link

tredre3 1103 days ago

> As a side note, I created an alternative Reddit API[1] and Reddit didn't like that so much they banned my 13 year old Reddit account.

"I broke Reddit's TOS deliberately and repeatedly and they banned me!" is another way to put it. But it doesn't sound as good and because of the current zeitgeist people will tend to side with you anyway. Perfect timing for you :)

link

lostlogin 1103 days ago

Having first rephrased it all via Chat GPT.

Load up those liabilities.

link

slimebot80 1103 days ago

Do you mean using ChatGPT this way would also be a liability?

link