| I scraped 1,576 HN snapshots and found 159 stories that hit the maximum score. Then I crawled the actual articles and ran sentiment analysis. The results surprised me. *The Numbers* - Negative sentiment: 78 articles (49%)
- Positive sentiment: 45 articles (28%)
- Neutral: 36 articles (23%) Negative content doesn't just perform well – it dominates. *What "Negative" Actually Means* The viral negative posts weren't toxic or mean. They were: - Exposing problems ("Why I mass-deleted my Chrome extensions")
- Challenging giants ("OpenAI's real business model")
- Honest failures ("I wasted 3 years building the wrong thing")
- Uncomfortable truths ("Your SaaS metrics are lying to you") The pattern: something is broken and here's proof. *Title Patterns That Worked* From the 159 viral posts, these structures appeared repeatedly: 1. [Authority] says [Controversial Thing] - 23 posts
2. Why [Common Belief] is Wrong - 19 posts
3. I [Did Thing] and [Unexpected Result] - 31 posts
4. [Company] is [Doing Bad Thing] - 18 posts Average title length: 8.3 words. The sweet spot is 6-12 words. *What Didn't Work* Almost none of the viral posts were:
- Pure product launches
- "I'm excited to announce..."
- Listicles ("10 ways to...")
- Generic advice *The Uncomfortable Implication* If you want reach on HN, you're better off writing about what's broken than what you built. This isn't cynicism – it's selection pressure. HN readers are skeptics. They've seen every pitch. What cuts through is useful criticism backed by evidence. *For Founders* Before your next launch post, ask: what problem am I exposing? What assumption am I challenging? What did I learn the hard way? That's your hook. --- Data: Built a tool that snapshots HN/GitHub/Reddit/ProductHunt every 30 minutes. Analyzed 1,576 snapshots, found 2,984 instances of score=100, deduped to 159 unique URLs, crawled 143 successfully, ran GPT-4 sentiment analysis on full article text. Happy to share the raw data if anyone wants to dig deeper. |
About that data though, just publish that. Throw the data and tooling up on github or huggingface if it's a massive dataset. Would be interested in comparing methodologies for deriving sentiment.