Hacker News new | ask | show | jobs
by morgo 2758 days ago
Morgan from the TiDB team here. Happy to answer any questions you have.
4 comments

Would you mind comparing TiDB to other HTAP databases like SAP HANA, MemSQL, HyPer? I'm more interested in the architecture, trade-offs, best/worst use cases. How would you compare the analytical bit with regard to analytical databases like ClickHouse, SQL Server tabular model, MapD?
At a high level:

- TiDB is Open Source (Apache 2.0 license). Several others that you mention here are commercial offerings.

- The expected data volume for TiDB is larger than memory. I believe MemSQL, for example, is memory-only.

- The architecture of TiDB is inspired by Google Spanner.

- We try to be transparent on less-suited cases. See large+small transactions, single-threaded workloads from: https://www.pingcap.com/docs/sql/mysql-compatibility/

In regards to the analytical piece:

- We suggest you use TiDB for "adhoc OLAP", and Spark for more complicated cases. While parallel, the data is still stored in a row-format (more on that next year!), so an OLAP-only solution may still have performance advantages. TiDB also supports hash joins/aggregation/sort merge joins etc. So compared to MySQL for example, you should see quite a performance improvement.

Hope this helps!

Thanks for the info. As far as I know other HTAPs often use row storage and column storage together, moving data from row to column over time. It seems like lack of such structure could be a drawback for TiDB in comparison to others.
Yes, that's correct. Expect more development on this front soon.
On https://www.pingcap.com/docs/sql/mysql-compatibility/ it mentions "FOREIGN KEY constraints" under unsupported features. Is that right? Isn't that a rather big problem for an OLTP DB? Or am I missing something?
Greg from the TiDB team here. I do share your sentiment, and at the moment you can probably best track or progress on this issue here: https://github.com/pingcap/tidb/issues/8484

The explanation is just that TiDB is being developed with tight feedback from our customers that have many TB of data. The feedback from that scale of users is overwhelmingly that they do not want to take the performance hit of foreign keys. It is worth mentioning though that you can declare foreign keys and that on master we do properly check DDL statements (but there is no DML enforcement).

I am trying to figure out a design that will satisfy users with large and small data alike and even let users use foreign keys for documentation purposes when they are not enforced for performance reasons. It would be great to have more community input on this.

Yes, that is correct. I hope to see FOREIGN KEY constraints added in the future.

In the interim though, when comparing TiDB to (application) sharded systems, it is important to clarify that FOREIGN KEYS will only be available locally to a single server. So it is a limitation that some of the large deployments we encounter are already familiar with.

Do you guys plan to add support for the new MySQL X-Protocol? Seems like it would be well suited for this type of architecture, especially when using the document store type APIs.
Yes. As a general comment: the component-based architecture lends itself well to adding additional protocols on top.

The community has added a Redis protocol on top of TiKV with Titan: https://medium.com/@shafreeck/titan-a-distributed-redis-prot...

I expect to see more, including native language drivers directly to TiKV.

Would you like to TLDR TiDB? And a quick comparison with other time series dbs?
TiDB is an open source NewSQL database that speaks the MySQL protocol. You can scale it horizontally by adding nodes.

It is a relational DB (not time series). To describe a couple of differentiators from its peers:

- It aims to optimize both OLTP and OLAP workloads (aka HTAP)

- It uses a component-based architecture (the TiDB server is stateless and speaks the MySQL protocol. TiKV is the distributed storage layer. Thus, you can scale either independently. You can also connect to tikv directly from Spark).

I don't believe TiDB is a time series DB, it's an OLTP and OLAP database and not indexed by time by default unless that's part of the table schema. I would also assume that TiKV by default is using size/leveled based compaction as it uses RocksDB, not time window compaction.