Hacker News new | ask | show | jobs
by new_test 4481 days ago
Since we are on the subject of PostgreSQL, does anyone know of a good resource that would teach me how to create webapps where each "customer" would get their own "database"? What are the best practices for that? Also, what are the best practices for ALTER'ing tables in production (e.g. adding/deleting columns).

Here's a concrete (made-up) example: I want to write a webapp where people would sign up to do some personal tracking. They create variables they are interested in (weight, mood, calory intake, etc.) And then enter their data daily. So for each customer I need to have a different database with different columns, and I want them to be able to add/delete variables "on the fly". Is it very straightforward, and hard to get wrong? Or are there "best practices" for this sort of thing? Thanks.

4 comments

You can use a generic data model (example [1]) with appropriate metadata etc. What you are proposing seems more difficult to maintain (separate databases for different customers). If this are separate deployments than different databases are an option with some core module and than specific databases for each customer. I don't recommend doing DDL updates in realtime because of other issues for example: logging of data in some log tables, the data structures in your application (consider what changes would be implied to your ORM entities or other domain entities) etc.

[1] Generic Data Model: http://c2.com/cgi/wiki?GenericDataModel

Look up the following terms

1. SaaS tenancy models. This is the data separation.

2. Custom fields would be usually represented as an EAV model (Entity-Attribute-Value).

3. Don't ALTER in production if you can help it. If you're going to do it, use migrations which are scripts which first add columns, then transform data, then reapply constraints.

I've also seen applications basically just assign a fixed number of additional columns on the relevant tables for custom fields - although this approach doesn't win any awards for elegance it can be pretty straightforward.

From what I've seen, applications that use EAV tend to evolve to suffer from bad cases of the "inner platform effect":

http://en.wikipedia.org/wiki/Inner-platform_effect

NB There is nothing "wrong" with using EAV - just that it seems prone to misuse (a bit like XML).

The only problem with a limited number of custom fields attached to the table is that they aren't necessarily optimised for sorting, might extend past the row size limit of database tables (this is a pain!), can't be reliably typed and a client will always want one more.

EAV is a sort of inner platform thing I agree, but the correct solution i.e a document store with full field level indexing that works with enterprise loads doesn't exist (yet). CouchDB was promising on that front but didn't go all the way.

XML is fine. Just don't stick it in database columns (my favourite chunk of pain!)

I like HStore a lot better than XML for postgres. And it's Waaaaay better than the wide table model.

Way back when, I did a data model that added columns for arbitrary data fields that the users wanted in PG, and wound up with a wide table model. PG can store a surprisingly large number of fields, I think it got up to the mid to high hundreds after years of this. At the time, hstore wasn't there, xml was either not there yet, or just recently added. And the queries for EAV looked surprisingly awful, especially when added to the not exactly straightforward queries we were doing on the events.

It wound up being an extremely large, extremely sparse table, with some fields having 100% usage, most having >>.01%, and a few getting used in the 1% range. On the plus side, it was possible to index any of the fields, which was especially useful with functional or 2 column indexes.

If I had to do it again, I'd be on hstore, or maybe hstore/json. It wasn't pretty but it wasn't the fatal flaw in that startup.

"XML is fine. Just don't stick it in database columns"

I've seen systems that used XML in a database column and were perfectly sensible, I've also seen systems that did terrible things with XML in a database (e.g. using complex XML string as part of a query).

That they do. In fact for a recent company I worked for, Salesforce was pretty much the inspiration for the entire company with some added sector-specific stuff.

So they built Salesforce basically...

Totally insane but it sort of works.

In addition to the previous answers, consider using hstore for custom variables. I've found it helpful for things like settings on customer accounts where these might differ quite a bit between accounts and change quite a bit over time. Otherwise the table layout applies to all accounts.

If you really need to separate data by customer then consider using PostgreSQL schemas. But I'd steer away from any solution that involves adding and dropping columns for different customers.

You could use a Postgres JSON or hstore column for the custom variables. If you use a JSON column, and want to do queries on the custom data, try out this tool I made for creating views of properties within a JSON column: https://github.com/mwhite/JSONAlchemy