Hacker News new | ask | show | jobs
by niklearnstodev 964 days ago
Having never worked on a multi-tenant SaaS app, is this how multi-tenancy is typically implemented (a per-tenant-database)? Is there a certain scale at which this becomes the ideal pattern? If so, has anyone made the shift from a single-database approach to a per-tenant-database approach?
6 comments

Nile CEO here. There are many approaches to doing multitenancy. - You can create one physical DB per tenant - You can place multiple tenants on the same DB

Both approaches have pros and cons. This is exactly what Nile is solving. We want users to not worry about all the operational complexity of picking a choice. You can choose any approach in Nile.

In Nile, the tenant DB is a virtual concept. You can choose to place it on a multitenant DB along with all the other tenants or choose to place it on a dedicated DB. Nile provides a single experience irrespective of your choice and takes care of all the operational complexity. We also go one step further and also let you place a tenant in any location worldwide but still have one Postgres experience.

You can read more about it here https://www.thenile.dev/docs/tenant-management https://www.thenile.dev/docs/tenant-placement

Would love the feedback

Notion does something similar, sharding postgres grouped by their tenants. It takes a huge effort and preparation to get it right, both at the application and infra level. Link: https://www.notion.so/blog/sharding-postgres-at-notion
Exactly that! Notion, Figma, Loom, Slack, Discord, Sentry have all been an inspiration toward the problem we are trying to solve.
I'd say it's probably not the typical implementation. Mainly because it's a pain to implement and manage.

I guess that's why this was created though. It's definitely an intriguing tool.

I did shift in past from single database to sharded via citus. The actual migration was smooth, but 6+ months of work went into ensuring compatibility.

There's ways to nearly guarantee compatibility without actually using a sharded approach, like enforcing all tables have a uniform shard key.

Good point, this product does make this approach far more palatable.

Curious if the sharding strategy that you were shifting to was company-based, as is implemented in this case?

Yup! I lean towards a tenant ID approach. I ALWAYS use a library that enforces these tenant checks on queries. Ruby (https://github.com/citusdata/activerecord-multi-tenant) and Elixir (I wrote https://github.com/sb8244/ecto_tenancy_enforcer) are the ones I have experience with.
> is this how multi-tenancy is typically implemented (a per-tenant-database)?

There's a few different approaches and deciding which to use can be tough. When I was adapting a formerly single tenant Django application to a multi-tenant solution, I went for a "db-per-customer" (with many databases on a single db server) for a few main reasons:

1) Allows you to backup and restore databases on a per customer basis.

2) Helps with IT reviews if you can say "your data exists in its own DB and is not co-mingled with other customers data"

3) Implementation seemed like it was going to be simpler, especially for adapting an existing single tenant application to a multi-tenant one.

4) Makes it easy to migrate existing single tenant-db's to the multi-tenant app.

The main drawback for us I would say is that it makes managing database migrations/schema changes a bit more challenging since you now need to update N db's every time there's a schema change. This was not super difficult though.

Overall I am very happy we decided to go with a multi-db multi-tenancy solution.

Same setup for us, as this is the only option for B2B/enterprise products you have contractual obligations to delete all tenant data at the end of the term.

Additional benefits: easy to support customer-managed encryption keys and can give their BI people direct access.

Yeah, worked in projects with both of those methodologies. Combining tenants in one database is good for cross tenant aggregation but causes people to reach for...

> brittle permission logic at the application level or complex, hard-to-debug, row-level security policies in databases like Postgres

Separating them into separate databases has its own problems with connection pooling, resource management, and query speed. It's also a pain when you have to combine the data. Then you go down the rabbit hole of ETL and SaaS products with a "scratch" interface that promises to speed things up through (yet another) customer DSL and new naming convention for everything you already know.

A better option is separating tenants by schema. Think about how you switch between schemas in Postgres. Queries look something like this:

> set search_path to public, tenantA; > select * from somewhere as smw inner join elsewhere lsw on smw.id = lsw.somewhere_id;

The somewhere table is in the public schema, elsewhere is in the tenantA schema.

In Rails, there are a couple gems that come to mind for multi-tenancy. acts_as_tenant works exactly like this, by swapping the tenant schema in ActiveRecord (the Rails ORM layer). The ros-apartment gem is another (formerly just 'apartment'). I ran across another gem called 'roomer' too but it is outdated. It works the same way though. You could even roll your own if you insert the logic at the right layer of whatever stack you're using.

Set the tenant at the request level and the ORM auto switches the schema for you. Match a subdomain to the tenant and away you go. You got a SaaS application. edit: And, it'll pass a security audit because of the separation. And you can query across LOB tenant data in a BI front end if you can live without real-time data and build some materialized views; if Redshift and Snowflake are just too expensive.

Using schema separation you get the best of both worlds; smaller data in tenant specific tables with the ability to cross tenant query because they're in the same database. And yes, I'm well aware foreign servers are a thing in Postgres but with sufficiently large data you have to spend more time tuning them for fetch size and indexes to get decent performance. Forget the fact that sometimes you have to implement dynamic SQL using SQL which is a nightmare of repetitious quotation marks with any complex data structure. Foreign servers amplify the escaped quotes. My stomach turns remembering having to write that code...

Is there a npm package for something like this? React/node is not my daily driver.

hey :) I also like Rails acts_as_tenant. We searched far and wide for something similar for JS ecosystem and so far, didn't find anything.

We build our JS SDK by wrapping Knex (nice query builder, not quite an ORM) and injecting our logic to the connection process. You could do the same.

The issue is that there are at least 5 popular JS ORMs, so we don't get the same solve-it-once that we get in the Rails ecosystem.

Sometimes it is. Often it's done in the database itself, but then you have to be very careful you never have a query that returns data that belongs to someone else.