Hacker News new | ask | show | jobs
by spmurrayzzz 1876 days ago
Agree totally on the "double down on what you know" point. That pays off in spades usually.

Tangentially related to that: their mongo benchmark numbers always looked odd to me. Given that I've used mongo for 10+ years for high throughput time series data without major issues, I decided to do my own benchmarks. In my testing, mongo outperformed timescale significantly both in write throughput and query performance.

This is likely in part due to the fact that I'm using well-understood internal data from real production systems, and as such my ability to be able to build performant indexes / query strategies in the database that I know best introduces a performance bias.

I always take benchmarks with a grain of salt, for this reason. And I try to lean into the tech I understand best.

1 comments

Hi @spmurrayzzz thanks for the feedback. (Timescale person)

Always strive to do the best and fairest benchmarks we can, and for that reason, all our benchmarks are fully open-source for both repeatability and improvements/contributions:

https://github.com/timescale/tsbs/blob/master/docs/mongo.md

We also really did spend a lot of time investigating approaches with MongoDB, so you'll see our benchmarks actually evaluate two _different_ ways to use time-series data with MongoDB (culled & optimized from suggestions in MongoDB forums). But always welcome to feedback:

https://blog.timescale.com/blog/how-to-store-time-series-dat...

Thanks!

Thanks for engaging here, and congrats on the round!

I've reviewed all these resources multiple times in the past, which is what prompted me to do my own benchmarks (in which mongo outperforms both multinode and single node configurations).

Some issues I noticed:

- youre using gopkg.in/mgo.v2 which is a mongo driver that hasn't had a release in 6 years. Not sure of the general performance impact here, but my tests use mongo 4.2 with a modern node.js driver. So thats one difference.

- your indexing strategy for mongo is easily changed to be able to get much better performance than the naive compound approach you took in the code (measurement > tags.hostname > timestamp).

- you didnt test the horizontal scaling path at all, this is where mongo arguably shines

I'm glad you all open source this stuff because it helps engineering leaders make better decisions, so thank you for that. But your data does not align with my own: either our production metrics or through structured load testing.

Thanks for the concrete feedback/suggestions!
I also recall that when we [Timescale] first did our benchmarks vs Mongo for time-series, our use of MongoDB for time-series beat Mongo's own benchmarks :-)

That's probably not something most companies would do for benchmarking, but we take ours seriously :-)

I appreciate all the work it takes to do them and document them. Doesn't go unnoticed I promise you.
Thank you :-)