Ask HN: Reddit API vs. Browser Requests

Y	Hacker News new \| ask \| show \| jobs

3 points by jerdthenerd 1142 days ago

I have been following the Reddit API saga quite closely, and I understand how/why Reddit as a company has incentive to effectively take 3rd Party Apps off the market.

My question is, what is stopping someone from simply writing a web scraper that acts as if its a web browser and scrapes the actual subreddit(via reddit.com not api.reddit.com) and stores them in a local cache? I'm picturing an app that runs on a popular NAS software such as TrueNas, Synology, etc. So storage is not an issue.

Is there a way for Reddit to detect that this isn't authentic traffic from an actual user? If the web scraper authenticates as a normal user, and respects the request throttling, wouldn't it just fly under the radar as a particularly addicted user?

2 comments

alexdanilowicz 1142 days ago

I imagine it would be pretty obvious from an engagement metrics perspective how a regular user acts (scrolling, stopping to read, upvoting) vs a robot.

Not to mention the sheer amount of content you'd have to scrape, which would definitely surpass "normal" user engagement.

link

harrelchris 1142 days ago

Scraping will only enable reading from Reddit. To write to Reddit or to read/write private user data, you would need to automate a browser and handle user credentials in plaintext.

link