|
|
|
|
|
by buremba
3735 days ago
|
|
I think that it would be better for you to position CitusDB by comparing it to other products in terms of use cases. If the data is big and I need to run analytic queries then I think I have to use a columnar storage format because row-oriented formats cause too much overhead for aggregation queries that usually need to process single column efficiently. If I use CitusDB as an analytical database, then it's comparable with Redshift, Hive etc. As you said, they're suitable for offline data but Can I use cstore_fdw in CitusDB and able to take advantage of real-time nature of Postgresql? Maybe I can push hot data to a table that use row-oriented format and move the data periodically to another table that uses cstore_fdw and execute queries that fetches data from both cold storage and hot storage tables? If CitusDB makes it easy for me, then I think this is huge. I guess another use case is using CitusDB as distributed data store and executing filter queries such as "SELECT * FROM table WHERE partition_key = x and predicate1 = y ...". Instead of using multiple Postgresql instances and routing the queries in application level, I can just use CitusDB that takes care of replication && query routing && sharding etc. I think it can also be comparable to databases such as Cassandra, Mongo (using jsonb) since they also have similar use-cases. Or should I think CitusDB as distributed Postgresql? |
|
> If I use CitusDB as an analytical database, then it's comparable with Redshift, Hive etc.
A particular difference is in response times and concurrency. Data warehouses and Hive are great for reporting use-cases, but not for use-cases that require fast responses and have many users like analytical dashboards. This is a use-case for which Citus is particularly well-suited (see for example the CloudFlare dashboard).
> Can I use cstore_fdw in CitusDB and able to take advantage of real-time nature of Postgresql?
Yes, since cstore_fdw and Citus are both developed by Citus Data we made sure they're fully integrated. We've even seen some deployments that use a mixture of columnar- and row-based storage in a single distributed table.
We find that row-based storage generally has better ingestion performance and more indexing possibilities. Citus can do very fast execution of analytical queries by parallelizing over row-based shards and using the indexes on each of them. However, if you only need a small number of columns and have analytical queries that are not very selective, you can use columnar storage just as easily and even mix and match (might require some support).
> I guess another use case is using CitusDB as distributed data store
Yep, Citus can definitely be used for that by using hash-partitioned tables.