Hacker News new | ask | show | jobs
by andyfowler 2210 days ago
Nutshell does this! We have 5,000+ MySQL databases for customers and trials. Each is fully isolated into their own database, as well as their own Solr "core."

We've done this from day one, so I can't really speak to the downsides of not doing it. The piece of mind that comes from some very hard walls preventing customer data from leaking is worth a few headaches.

A few takeaways:

- Older MySQL versions struggled to quickly create 100+ tables when a new trial was provisioned (on the order of a minute to create the DB + tables). We wanted this to happen in seconds, so we took to preprovisioning empty databases. This hasn't been necessary in newer versions of MySQL.

- Thousands of DBs x 100s of tables x `innodb_file_per_table` does cause a bit of FS overhead and takes some tuning, especially around `table_open_cache`. It's not insurmountable, but does require attention.

- We use discrete MySQL credentials per-customer to reduce the blast radius of a potential SQL injection. Others in this thread mentioned problems with connection pooling. We've never experienced trouble here. We do 10-20k requests / minute.

- This setup doesn't seem to play well with AWS RDS. We did some real-world testing on Aurora, and saw lousy performance when we got into the hundreds / thousands of DBs. We'd observe slow memory leaks and eventual restarts. We run our own MySQL servers on EC2.

- We don't split ALBs / ASGs / application servers per customer. It's only the MySQL / Solr layer which is multi-tenant. Memcache and worker queues are shared.

- We do a DB migration every few weeks. Like a single-tenant app would, we execute the migration under application code that can handle either version of the schema. Each database has a table like ActiveRecord's migrations, to track all deltas. We have tooling to roll out a delta across all customer instances, monitor results.

- A fun bug to periodically track down is when one customer has an odd collection of data which changes cardinality in such a way that different indexes are used in a difficult query. In this case, we're comparing `EXPLAIN` output from a known-good database against a poorly-performing database.

- This is managed by a pretty lightweight homegrown coordination application ("Drops"), which tracks customers / usernames, and maps them to resources like database & Solr.

- All of this makes it really easy to backup, archive, or snapshot a single customer's data for local development.