|
|
|
|
|
by umur
3736 days ago
|
|
(Part 3/3 - please see two comments below as the starting point) Aside from the different use-cases they address, there is one other, important difference between Citus and Redshift (and any other distributed database in the world, for that matter). Citus does not fork the underlying database, PostgreSQL. Instead, Citus extends PostgreSQL to transform it into a parallel processing, distributed database. We use PostgreSQL's powerful extension APIs to accomplish this (you simply CREATE EXTENSION Citus on PostgreSQL's latest version, 9.5, to get your distributed PostgreSQL database). While this might appear as an implementation piece at first, it has important product implications, and might even impact how you might want to think about your database stack. By not forking the core database, you are choosing to always stay with the core PostgreSQL product. For starters, you get the uber-cool (and uber-fast) JSONB type that came with 9.4, or the recently checked in UPSERTs, or the popular PostGIS extension for geospatial capabilities. More philosophically, the moment you use forks of database, you know you'll be diverging over time. And when you introduce new databases and/or piece together many different ones to build one application, your development cycles will only get costlier and more complex over time. This was a long answer to a short question, but hopefully useful. Let me know if you have questions, or any feedback using Citus – would love to hear your thoughts! |
|
If the data is big and I need to run analytic queries then I think I have to use a columnar storage format because row-oriented formats cause too much overhead for aggregation queries that usually need to process single column efficiently. If I use CitusDB as an analytical database, then it's comparable with Redshift, Hive etc. As you said, they're suitable for offline data but Can I use cstore_fdw in CitusDB and able to take advantage of real-time nature of Postgresql? Maybe I can push hot data to a table that use row-oriented format and move the data periodically to another table that uses cstore_fdw and execute queries that fetches data from both cold storage and hot storage tables? If CitusDB makes it easy for me, then I think this is huge.
I guess another use case is using CitusDB as distributed data store and executing filter queries such as "SELECT * FROM table WHERE partition_key = x and predicate1 = y ...". Instead of using multiple Postgresql instances and routing the queries in application level, I can just use CitusDB that takes care of replication && query routing && sharding etc. I think it can also be comparable to databases such as Cassandra, Mongo (using jsonb) since they also have similar use-cases.
Or should I think CitusDB as distributed Postgresql?