Hacker News new | ask | show | jobs
by dragonwriter 2442 days ago
This looks good but some of the background has the air of the kind of artificial problems presented in late night infomercials:

> It’s impossible to split a single payment into multiple payouts, since there is a many-to-one relationship of payments to payouts.

So your model used a many-to-one relationship when you really wanted a many-to-many?

> Since this is just a SQL database, there’s nothing preventing the payouts from becoming inconsistent.

Preventing inconsistencies by enforcing constraints is a key point of an RDBMS. This is literally why people use SQL DBs.

> The payout_id can be ensured to be a valid foreign key, but nothing is stopping it from being nulled out.

You mean, like a NOT NULL constraint? Or, if you mean zeroed out, an appropriate constraint and/or before update trigger protecting the amount?

5 comments

I think this mocking of their intention and problem statements is misplaced.

> So your model used a many-to-one relationship when you really wanted a many-to-many?

In many schema discussions you’d be right, but in the case of payments systems it’s not the case. Payments and payouts are separate transactions and only loosely related. The allocation of credit from one leads to an increment in the settlement of the other, but there’s no structural relationship in the journal itself.

> Preventing inconsistencies by enforcing constraints is a key point of an RDBMS. This is literally why people use SQL DBs

The inconsistencies in question relate to the domain model of accounting, not referential integrity or other (normally) easily enforced database invariants.

> You mean, like a NOT NULL constraint? Or, if you mean zeroed out, an appropriate constraint and/or before update trigger protecting the amount?

When you want to archive off aged records, or constrain across db shards, this is a serious issue. Taking the piss doesn’t make it go away. But in that particular paragraph I believe they’re referring to the payout itself become void e.g. due to a reversal, and having designed around similar issues I understand their concern. It’s exactly where single-column ledgers become a huge pain in the audit.

> The inconsistencies in question relate to the domain model of accounting, not referential integrity or other (normally) easily enforced database invariants.

Constraints can be more than referential integrity. When PostgreSQL released transition tables, I built a very small proof-of-concept using them to constrain a double-entry (-like) system. https://github.com/jimktrains/pgdea allows any number of debits and credits in a transaction, but the whole transaction must have a 0 balance.

>> You mean, like a NOT NULL constraint? Or, if you mean zeroed out, an appropriate constraint and/or before update trigger protecting the amount?

> When you want to archive off aged records, or constrain across db shards, this is a serious issue. Taking the piss doesn’t make it go away. But in that particular paragraph I believe they’re referring to the payout itself become void e.g. due to a reversal, and having designed around similar issues I understand their concern. It’s exactly where single-column ledgers become a huge pain in the audit.

My impression from the article was more the "baby record" problem: not everything has been paid yet. They were using NULL to indicate "not paid" and thus couldn't put a NOT NULL constraint on that column because it was a valid value for new rows. There are ugly workarounds that could have allowed them to add NOT NULL such as using a special/magic "guard" payment that indicates not paid instead. (Going down that route you'd probably want a "guard" payment per client and then you're getting close to a backwards form of double-entry accounting.)

It wouldn't be too remarkable if they stopped there. It's not unusual for enterprise-targeted application databases to return nothing when one queries sys.foreign_keys. However, they've done the extra credit work to redefine double entry bookkeeping for some indiscernable reason.

> In fact, for the ease of modeling we relaxed how we reason about it and we don’t stick to standard Double Entry Accounting which has debit-normal and credit normal books which determine the sign as we prefer to consistently treating debits as positive and credits and negative.

Surely some of their customers have competent accountants maintaining the business's books. This unasked-for effort will make their job harder.

The replacement of the debit-normal and credit-normal account structure with simple use of positive and negative numbers is not unprecedented in computerized double-entry accounting systems (ledger-cli and it's ports and offshoots do this, too); it's quite natural since the credit-debit system was itself a hack around the medieval European resistance to negative numbers and a way to reduce subtraction operations by grouping everything into addition of columns of positive numbers and then one substraction of the two column totals.

> Surely some of their customers have competent accountants maintaining the business's books. This unasked-for effort will make their job harder.

Uh, why? It's trivial to present information tracked as positive debits and negative credits in two positive columns; keeping the internal architecture of the database this way should be irrelevant to the end user, it's just avoiding using a hack designed to optimize the experience of people doing a manual process with medieval European technology and attitudes towards negative numbers in an automated backend where none of the problems it mitigates exist in the first place.

That’s actually pretty common for payment systems where you care about the journal balancing but every account is basically a liability because it’s all someone else’s money.

You just end up with an apparently negative balance in the income account that represents your own cut due to fees etc. From the GL point of view it looks like a subledger with a contra account but that’s nothing to faze a competent accountant. Back in the 13th century it was probably tricky to represent on an abacus, not so much now.

(Disclaimer: I work at Square, but not on Books.)

>Preventing inconsistencies by enforcing constraints is a key point of an RDBMS. This is literally why people use SQL DBs.

This comment is not specific or related to the work at hand, but note that this is only true for a single-machine database or a sharded setup in which you never have to perform transactions across shards. That is, the ACID guarantees are only enforceable in most RDBMS if all your data is on the same machine.

Huh? Am I missing something obvious here? Plenty of databases support ACID guarantees across multiple machines in a cluster. FoundationDB and CockroachDB come to mind, but I’m sure there are others. What you’ve built is cool, but I’m still confused why existing tools aren’t good enough.

From the Cockroach documentation (source: https://www.cockroachlabs.com/docs/stable/transactions.html )

> CockroachDB supports bundling multiple SQL statements into a single all-or-nothing transaction. Each transaction guarantees ACID semantics spanning arbitrary tables and rows, even when data is distributed.

Right -- that's why I said most RDBMS. To get ACID and CAP-consistent (which, as an aside, are not the same thing) cross-shard transactions requires some form of migrating your production system onto such a distributed-capable RDBMS that you describe (of which, Spanner arguably fits the description of).
The thing I'm confused about is why migrating your production system onto a custom in-house data store (that you've written and you maintain) is easier / less work than migrating onto something you can use off the shelf which solves the problem you have.

I hear the argument that migrating from (say), postgres -> postgres + books is easier than moving everything into cockroachdb. But why is postgres + books easier than postgres + cockroach? The latter doesn't require you to write your own database from scratch. (And writing your own database is fun, so if you're doing it for that reason I understand, but thats not what the post says.)

Edit: Ah, I see - my confusion was because I missed the part which mentioned that this is just built on top of spanner. That makes sense!

Yep :P

For a fun fact, CockroachDB was actually started by ex-Squares, so we're definitely very familiar with this exact same argument internally, back in the day when neither CockroachDB nor Spanner existed as a viable option :)

Not everyone has the luxury of using databases like those in high-stakes production.
Yeah this is really confusing to me. Especially given that tenants are unlikely to mix books, just out of the box multitenant Citus + Postgres would probably get them where they needed with super good perf characteristics.

Hindsight and all that, and every team has inhouse stuff that could probably be gotten rid of, but this seems like a major operational danger compared to relying on more battle-tested stuff. They should probably get the Jepsen guy on this and see if he can get race conditions

Infomercials work.

And not just for overpriced knives.