How does InfluxDB compare to TimescaleDB? My understanding is that the use case is pretty similar (time series/metrics), are they good at different things?
For perhaps the most comprehensive comparison between InfluxDB and TimescaleDB see this blog post [1] which compares the two DBs on data model, query language, reliability, performance, ecosystem, operational management, and company/community support.
This comparison is vs. InfluxDB. This thread is about a new project called InfluxDB IOx. It's under development and we're not producing builds, so any kind of operational comparison would be very premature.
Its architecture is dramatically different than InfluxDB. You can do a comparison of the design goals. Read the post this thread refers to. I think you'll find it has very different goals than Postgres, an OLTP database, and Timescale, which is built on top of it.
Thank you for clarifying, OP's original comment refers to TimescaleDB vs InfluxDB hence the comparison blog between the two.
As you note, a comparison between InfluxDB IOx and TimescaleDB is not possible at this time, due InfluxDB IOx being still under development and unavailable for comparison, so that blog is the next best thing for developers looking for answers today.
It would be great to see similar comparison for TimescaleDB vs VictoriaMetrics :) There are some benchmarks ([1], [2], [3]) that compare performance and resource usage between TimescaleDB, InfluxDB and VictoriaMetrics, but these benchmarks may be outdated.
Hi valyala (cofounder of VictoriaMetrics) — I know we’ve talked about this several times before already, so you should know — those benchmarks you linked are from 2018. They pre-date many key features in TimescaleDB, including features like native compression, which invalidate many of those findings. So they aren’t really relevant or valid anymore. Just don’t want people to draw wrong conclusions.
Timescale is built on top of Postgres, which is a row oriented database. They've built a kind of columnar layer on top of it, which is quite interesting. Because it's Postgres you get their full SQL support.
Meanwhile, InfluxDB IOx has a very different set of goals than Postgres. It's not an OLTP (transactional) DB and never will be. It's firmly targeted at OLAP and real-time OLAP workloads.
That means we can do things like optimize for running on ephemeral storage with object storage as the persistence layer. It'll have fine grained control over replication, how data is partitioned in a cluster, and where data is indexed, queried, queued for writes and more. Push and pull replication, bulk transfer, and persistence with Parquet. This last bit means you get integration with other data processing and data warehousing tools with minimal effort.
It'll also support Arrow Flight which will give it great integration into the data science ecosystems in Python and R.
Right now, InfluxDB IOx is really too early to do any real comparison on actual operation. We're putting this out now so that people can see what we're doing, comment on it, and maybe even contribute. We think it's an interesting approach where no single item is completely novel, but the composition of everything together makes it an entirely unique offering in open source.
Edit: one other thing I forgot to mention. InfluxDB IOx is open source, Timescale isn't. For some that matters, for many it doesn't. Depends on your use case.
It's under a community license, which has restrictions. The limitations on derivative works and value added products or services are the ones that will create the most problem for people trying to build a business on it: https://www.timescale.com/legal/licenses
For users within large organizations, they're likely not able to use the software without approval from their legal department because it doesn't fall under any open source license.
Like I said, whether you care is really case dependent.
This is misinformation. Most of TimescaleDB is open source under Apache 2. The difference is that the advanced features of TimescaleDB - Eg clustering - are under a source available license and are free, while advanced InfluxDB features like clustering are under a paid enterprise license. In fact TimescaleDB recently made all of our enterprise features available for free. So one could argue that TimescaleDB is more open than Influx.
My post is about InfluxDB IOx, which is the project this thread is about. You're correct about InfluxDB having HA and clustering under a closed source enterprise license. If you read the post, I even mention this as a shortcoming of the project. One which we're hoping to rectify with InfluxDB IOx.
So some parts of Timescale are under actual Apache 2 and some parts are under a proprietary source available license. I'm not sure what the LOC of which is which, or how it's actually organized in your repo. I'll leave it up to your potential users to try to figure out which and disentangle what parts are actually open.
As I recall, AWS very publicly forked Elastic because of this very same type of confusion. The difference is that if AWS were going to fork your project, they'd just fork Postgres, which is the real open source software that you're benefitting from.
If I were building an developer focused analytics, monitoring, or data analysis product, I wouldn't do it on top of Timescale because some parts of your codebase most definitely prevent that through your license. But that's me.
> If I were building an developer focused analytics, monitoring, or data analysis product, I wouldn't do it on top of Timescale because some parts of your codebase most definitely prevent that through your license. But that's me.
That's also FUD, two ways.
First, what the Timescale License prevents is somebody offering our Community Edition as a standalone "TimescaleDB-as-a-Service", a la AWS bundling it as part of RDS, or Microsoft as part of Azure Postgres. There is a clean technical test for "DDL access to the database" by users in the license. It's not tricky. You can absolutely develop/sell/distribute/provide analytics, monitoring, or data analysis products on top of TimescaleDB Community Edition. Many companies do.
As to "hopelessly-entangled source", if you know what a directory is, you can tell the difference. There's a "/tsl" subdirectory with Timescale Licensed code. Everything else is Apache2. You can compile pure Apache-2 versions with a single compile flag, and we distribute Apache2 binaries. In fact, the Postgres community itself distributes Apache-2 binaries, and Microsoft, Digital Ocean, Rackspace, and other clouds make the Apache2 version available as part of the managed database offerings.
TimescaleDB behaves more like a regular relational database, while Influx is fairly different and has some interesting nuances that you'll need to understand if you want to have a table with a bunch of columns. For example having a lot of metadata columns has different performance implications than having a high cardinality of actual measurements (although it's been long enough since I've used Influx I don't remember what those differences actually are)
If you're used to writing SQL, TimescaleDB is much easier to write queries with although if you get over the learning curve both of the query languages in Influx seem very powerful
One notable advantage of Influx is its integrations with other tools for ingest and visualization, and it seems like 2.0 is doubling down on that
[1]: https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-...