Hacker News new | ask | show | jobs
by dvtkrlbs 1748 days ago
Not a database expert but here is my thoughts after using Cockroach DB for a mid sized: - Single cluster mode is really nice for local deployment - WebUI is a cool idea but I wish it had more info - The biggest problem is Postgres compatibility there are some quirks that was annoying examples being. The default integer types having different sizes, cockroachdb having a really nice if not exists clause for create type but having no way to maintain a Postgres compatible way of doing it (i think this one is on postgres only being able to do it with plsql scripts is cumbersome) - For me the neutral one. IT HAS ZERO DOWNTIME SCHEMA CHANGES. But if you are just coming from Postgres and just using a regular migration tool and transactions and schema changes having serious limitations and could end up your database in inconsistent state is scary. (Docs really document the behavior but still I would expect a runtime warning for it.
1 comments

CDB was great until we started doing table creation on the minute and hundreds of inserts on those tables. When we tried to drop tables on a schedule, CDB never could catch up and would generally just crash (3 nodes). I really don't like the magic you have to do with CDB for things that your commodity DBs can be expected do.
CockroachDB dev here. We've gotten a good bit better in the last few versions in terms of schema change stability. We're still not very good at large schemas with more than 10s of thousands of tables but we've got projects under way to fix that which I expect will be in the release in the spring of 22.

I'd like to hear more about the magic.

To be fair, while I kind of agree the system should be able to handle it regardless, 10,000s of tables sounds outside the realm of 99.99% of all use cases.
You might be surprised to learn how common the "store the results of the user's query into a temporary table" pattern is.
Temporary tables in cockroach exist, but the implementation was done largely to fulfill compatibility rather than for serious use.

The implementation effectively just creates real tables that get cleaned up; they have all the same durability and distributed state despite not being accessible outside of the current session.

Getting something done here turned out to be a big deal in order to get ORM and driver tests to run, which is extremely high value.

A better implementation would just store the data locally and not involve any of the distributed infrastructure. If we did that, then temp tables wouldn't run into the other schema scalability bottlenecks I'm raising above.

Thanks for all of that information in those 2 posts.
Can you tell me more about the use case where you'd create a new table every minute?
Not OP but for analytics use cases it is common to create new tables for each minute, hour, day or whatever granularity you collect data in. This makes it easier to aggregate later, you don't end up with extremely big tables, you can drop a subset of the data without affecting the performance of the table currently being written to etc..
That sounds like what table row partitioning is for, I thought all the major databases supported that?