Would you mind comparing TiDB to other HTAP databases like SAP HANA, MemSQL, HyPer? I'm more interested in the architecture, trade-offs, best/worst use cases. How would you compare the analytical bit with regard to analytical databases like ClickHouse, SQL Server tabular model, MapD?
- We suggest you use TiDB for "adhoc OLAP", and Spark for more complicated cases. While parallel, the data is still stored in a row-format (more on that next year!), so an OLAP-only solution may still have performance advantages. TiDB also supports hash joins/aggregation/sort merge joins etc. So compared to MySQL for example, you should see quite a performance improvement.
Thanks for the info. As far as I know other HTAPs often use row storage and column storage together, moving data from row to column over time. It seems like lack of such structure could be a drawback for TiDB in comparison to others.
On https://www.pingcap.com/docs/sql/mysql-compatibility/ it mentions "FOREIGN KEY constraints" under unsupported features. Is that right? Isn't that a rather big problem for an OLTP DB? Or am I missing something?
Greg from the TiDB team here. I do share your sentiment, and at the moment you can probably best track or progress on this issue here: https://github.com/pingcap/tidb/issues/8484
The explanation is just that TiDB is being developed with tight feedback from our customers that have many TB of data. The feedback from that scale of users is overwhelmingly that they do not want to take the performance hit of foreign keys. It is worth mentioning though that you can declare foreign keys and that on master we do properly check DDL statements (but there is no DML enforcement).
I am trying to figure out a design that will satisfy users with large and small data alike and even let users use foreign keys for documentation purposes when they are not enforced for performance reasons. It would be great to have more community input on this.
Yes, that is correct. I hope to see FOREIGN KEY constraints added in the future.
In the interim though, when comparing TiDB to (application) sharded systems, it is important to clarify that FOREIGN KEYS will only be available locally to a single server. So it is a limitation that some of the large deployments we encounter are already familiar with.
Do you guys plan to add support for the new MySQL X-Protocol? Seems like it would be well suited for this type of architecture, especially when using the document store type APIs.
TiDB is an open source NewSQL database that speaks the MySQL protocol. You can scale it horizontally by adding nodes.
It is a relational DB (not time series). To describe a couple of differentiators from its peers:
- It aims to optimize both OLTP and OLAP workloads (aka HTAP)
- It uses a component-based architecture (the TiDB server is stateless and speaks the MySQL protocol. TiKV is the distributed storage layer. Thus, you can scale either independently. You can also connect to tikv directly from Spark).
I don't believe TiDB is a time series DB, it's an OLTP and OLAP database and not indexed by time by default unless that's part of the table schema. I would also assume that TiKV by default is using size/leveled based compaction as it uses RocksDB, not time window compaction.