Y
Hacker News
new
|
ask
|
show
|
jobs
Show HN: A dataset of all HN submission texts (2006-2024) in Markdown
(
huggingface.co
)
1 points
by
shutty
617 days ago
We're at nixiesearch.ai building a yet another search over HN, but we found no public datasets of the actual submission texts available - so we scraped one!
TLDR: 2.1M texts, around 55% of all stories still available online.