Hacker News new | ask | show | jobs
239,621 Hacker News datasets from the past 5 years (github.com)
12 points by massanishi 2043 days ago
2 comments

OP: Hackernews offers an excellent API via Firebase: https://github.com/HackerNews/API. Unfortunately, their fetch method doesn't offer any way to filter by item types. So collecting all posts take excessively long thanks to all the comments.

Based on the official item API, this focuses only on the main posts with at least 2 engagements. The posts in datasets are about 1.5% of the items returned

With posts as a starting point, you can easily trace the hierarchy from "kids" fields.

Potential Usage:

- Generate popular titles.

- Collect comments in a hierarchical order for further training.

- Analyze popular topics in the engineering community.

- Identify the best time to post for maximum engagement.

I wish someone would use GPT-3 to generate automatic replies to comments based on previous HN history. I'm sure discussions repeat themselves a lot.
The only thing GPT-3 is good for is producing useless text that humans find hard to process.

So it's great for blog spam.

GPT-3 would produce comments that would get up voted but contribute absolutely nothing.

You'd be better crowdsourcing your idea with a page to point people to explaining how the threads been done before and the common answers.

Would love to see some examples of what you are talking about, do you have access to the OpenAI API?
There's a silicon user on HN, but no idea if they are/identify as a GPT-3. They have some training particular to HN

But here is a outed Reddit user -

https://www.theregister.com/2020/10/09/reddit_gpt3_bot/

https://www.reddit.com/user/thegentlemetre/