Hacker News new | ask | show | jobs
by Xurinos 4913 days ago
Having done this for a while, I can solidly say that the relationships and structure of data has been consistently better when designed as a set of schema tables than as a set of classes because they are especially constrained. The constraints forced people to actually think about the relationships between facets of their data, and the resulting classes were much cleaner and more maintainable. Should we ever decide to move aspects of the classes into a database, the transition requires little code refactoring; this comes up fairly often for us, since we are trying to give users more customization powers.

People should be doing that when they design classes, but the flexibility tends to work against them. It could just be a mindset that makes it work well, and this might be a more appropriate approach for larger projects than for smaller types.

1 comments

In my experience, relational modelling works fine for domains containing simple logic.

As soon as you start having constraints such as "from date must be before to date" and "the quote must have at least one line, and these lines should sum to > 0 and < 1000000" then relational modelling fails. Hard.

I like to use a Quote with QuoteLines as an interview question. Relational modellers make two tables, both having unique identifiers. Domain modellers create two classes, sometimes exposing only one to the outside world. Only one of those classes (Quote) has an identity.

Domain/class models tend, from what I've seen in the wild, to fail faster under load and under change. Ironically, experienced relational modellers tend to build cleaner class models. Although that might be more a function of experience than anything else..

"from date must be before to date"

In Postgres:

    CREATE TABLE foo (
        from_date datetime,
        to_date datetime,
        CHECK (from_date < to_date)
    );
(This fails in MySQL. MySQL parses check constraints without error, but ignores them.)

I don't understand the quote problem, so I won't attempt to model it.

> As soon as you start having constraints such as "from date must be before to date" and "the quote must have at least one line, and these lines should sum to > 0 and < 1000000" then relational modelling fails. Hard.

Those types of requirements rarely have a big impact on the design. Sure, you might have to write some procedural code in a trigger, but that's just a handful of lines of code. (I don't know why you think the "from_date < to_date" is a difficult requirement though -- as someone else pointed out, that's just a CHECK constraint).

Relational modelling is often a very clean, concise, and readable way to represent many kinds of businesses. You can pile a few extra requirements on top, and a good DBMS will make it easy to do so.

So I don't see a "hard failure" of anything here.

> As soon as you start having constraints such as "from date must be before to date" and "the quote must have at least one line, and these lines should sum to > 0 and < 1000000" then relational modelling fails. Hard.

Really? Both those problems are insanely trivial to solve with one or two SQL statements. Your second and third criterium are perfect candidates for relational modelling; the first is so trivial it beggars belief.

Enforcing that a quote has at least one quote_line is pretty tough, at least in Postgres, and involves triggers and locks.

How do you enforce that two transactions don't delete the last quote_line at the same time?

And how would you enforce that the sum of the quote_lines is correct with multiple writers?

Not saying it can't be done easily, but I'd love to hear how.

"Enforcing that a quote has at least one quote_line is pretty tough, at least in Postgres, and involves triggers and locks."

The reason it seems simple when using a few classes is because there is an implicit assumption that concurrency is a non-issue and there's only one application involved. But those aren't good assumptions, so the approach using classes will start to look more complex (and involve locks, etc.).

In Postgres, the trick is to use SERIALIZABLE transactions everywhere (should be the default, eventually), which avoids the need for explicit locking. Then, add a trigger that is fired before changes to either table, and it would simply check that the condition holds for the quote that was modified.

Here's some code, since you asked:

  create table quote(
    quote_id int8 primary key,
    customer_id int8,
    valid daterange
  );
  
  create table quote_line(
    quote_id int8 references quote(quote_id),
    amount numeric(10,2)
  );
  
  create or replace function check_quote() returns trigger language plpgsql as $$
  declare
    line_count int;
    line_sum numeric;
  begin
    select into line_count, line_sum
      count(*), sum(amount) from quote_line where quote_id=NEW.quote_id;
    if (line_count < 1 OR line_sum < 0 OR line_sum > 1000000)
    then raise exception 'invalid quote';
    else return NEW;
    end if;
  end;
  $$;
  
  create constraint trigger quote_check_trig after insert or update or delete on quote deferrable initially deferred for each row execute procedure check_quote();
  create constraint trigger quote_line_check_trig after insert or update or delete on quote_line deferrable initially deferred for each row execute procedure check_quote();
Note that I did not need to add the CHECK constraint, because it's much better to use the appropriate data type -- DATERANGE -- instead of hacking it together from parts.
I figured that serializable transactions would be necessary. That's going to be the default?

If you forget to use serializable transactions every time you update these tables, is it possible to store incorrect data?

I think that SERIALIZABLE should be the default eventually, but others may disagree and I won't make a prediction.

However, it isn't necessary to solve the problem. It would be relatively easy to use a row lock in this case to solve the problem, as well, but I like to avoid those unless there's a reason.

If you want to have a mix of SERIALIZABLE and other transactions, or you are worried about making a mistake (or some malicious user), then you need to use the row lock. Eventually there should be a way to force users into serializable transactions.

EDIT: actually, in the trigger, you could explicitly check if the transaction isolation mode is serializable. That would be the best approach:

  if current_setting('transaction_isolation') <> 'serializable' then
    raise exception 'serializable mode required';
  end if;
I wonder if SSI (Serializable Snapshot Isolation) would be of use here. http://wiki.postgresql.org/wiki/SSI
As soon as you start having constraints such as "from date must be before to date" and "the quote must have at least one line, and these lines should sum to > 0 and < 1000000" then relational modelling fails. Hard.

Why?

Ignorance is often the reason why they find such things "difficult". Many NoSQLers just aren't aware of the existence of the check constraints and triggers offered by basically all relational databases.
apart from all the other comments here, you seem to be confusing natural and surrogate keys. the "identity" of the quote (the quote number) is a natural key. the "identity" of the quotelines table is a surrogate key - the equivalent in the class model would be the address of an instance (the thing to which the pointer in the quote points).

there's nothing significant in the class model only having the one identity; it's just a natural consequence of what's implicit and explicit in the two technologies.