Hacker News new | ask | show | jobs
by clu3 5149 days ago
i think the dataset is way too small, only 166K rows? That's not really interesting. if you want to compare, make it at least a few millions
2 comments

166k is kids play. At this amount we will not see any measurable advantages among serious contenders. The key data is how these perform on multiple servers, with sharding and master slave configurations, etc.
The author is developing a web interface to mailing list archives [0]. In that context 166,672 emails sounds pretty substantial.

[0] https://fedorahosted.org/hyperkitty/