Hacker News new | ask | show | jobs
by xnx 1102 days ago
I second this. You've done a great service to collect this data. I'm guessing the file must be much smaller than 20GB when compressed.
2 comments

I've also did an experiment by generating and searching embeddings for all the comments on HN. Here is the walkthrough: https://www.youtube.com/watch?v=hGRNcftpqAk
It is only around 5 GB in ClickHouse. Details: https://github.com/ClickHouse/ClickHouse/issues/29693