Hacker News new | ask | show | jobs
by Supermancho 1747 days ago
CDB was great until we started doing table creation on the minute and hundreds of inserts on those tables. When we tried to drop tables on a schedule, CDB never could catch up and would generally just crash (3 nodes). I really don't like the magic you have to do with CDB for things that your commodity DBs can be expected do.
2 comments

CockroachDB dev here. We've gotten a good bit better in the last few versions in terms of schema change stability. We're still not very good at large schemas with more than 10s of thousands of tables but we've got projects under way to fix that which I expect will be in the release in the spring of 22.

I'd like to hear more about the magic.

To be fair, while I kind of agree the system should be able to handle it regardless, 10,000s of tables sounds outside the realm of 99.99% of all use cases.
You might be surprised to learn how common the "store the results of the user's query into a temporary table" pattern is.
Temporary tables in cockroach exist, but the implementation was done largely to fulfill compatibility rather than for serious use.

The implementation effectively just creates real tables that get cleaned up; they have all the same durability and distributed state despite not being accessible outside of the current session.

Getting something done here turned out to be a big deal in order to get ORM and driver tests to run, which is extremely high value.

A better implementation would just store the data locally and not involve any of the distributed infrastructure. If we did that, then temp tables wouldn't run into the other schema scalability bottlenecks I'm raising above.

Thanks for all of that information in those 2 posts.
Can you tell me more about the use case where you'd create a new table every minute?
Not OP but for analytics use cases it is common to create new tables for each minute, hour, day or whatever granularity you collect data in. This makes it easier to aggregate later, you don't end up with extremely big tables, you can drop a subset of the data without affecting the performance of the table currently being written to etc..
That sounds like what table row partitioning is for, I thought all the major databases supported that?