Hacker News new | ask | show | jobs
by kuschku 3174 days ago
Indeed, the trend is clear, but for now, nothing has changed.

Google's implementation is not very helpful to all those companies, individuals, NGOs, and governments that have to follow privacy laws, HIPAA, etc though, because Google's implementation isn't open source, and these entities can't use Google Cloud. Or don't want to.

Until we get an open source solution for this, SMP machines will be useful.

And even then, you save money by having less and larger machines in your cluster than just having tiny ones. Larger machines means your overhead is reduced.

2 comments

Uhhhh... most orgs that run HIPAA workloads do so slavishly on Windows and EPIC, neither of which are famously what most would consider open source.
You’re right, the real distinction is on-premises vs off-premises, I should have made that more clear (although free software implies on-premises being possible)
I’m still confused... most orgs running HIPAA workloads are still doing so on prem, AND using very much non-free software.

Google, and Amazon, and MS, undergo extensive third-party audits and people can, and do, run HIPAA workloads there.

I’m not sure what distinction you are trying to draw.

The distinction is that companies that require HIPAA workloads aren’t going to upload their entire dataset into Google Cloud Spanner, which is not available as on-prem version, and which isn’t HIPAA certified.

So either we need an on-prem version of Cloud Spanner, Cloud Spanner needs to be HIPAA certified, certified to match German privacy laws, etc, or Cloud Spanner can’t serve these situations.

There's cockroachdb which is roughly open source spanner you can deploy on your own systems.
Which has still horrible join performance, and several other tradeoffs. (see https://www.cockroachlabs.com/blog/cockroachdbs-first-join/)

Maybe in a year, or two. But not today.

Spanner has a lot of tradeoffs too, I'm not sure what do you see as problematic in Cockroach, joins are good enough. The biggest tradeoffs are inherent to strongly consistent distributed systems. Even precise clocks and fast networks won't help as much as you might think. You still have to accept vastly different latencies and performance than in traditional single-node RDBMSs.
If I run several servers in the same rack, with a local private 10Gbps network, with CockroachDB, running on spinning HDDs, I expect to get the same (or better) throughput as with a single PGSQL instance, and a similar latency. (When accessing it from another server, via the public internet).

That’s not always the case, though.

That blog post is from a year ago. And since Cockroach DB 1.0 was only released in May this year, it's a bit misleading to link to that post as though it was the current state of the sofware.
That blog post is referenced in their FAQ today, under the topic of what it can't do right now. Sorry if I misunderstood the situation, I’d appreciate any updated links.

See: https://www.cockroachlabs.com/docs/stable/frequently-asked-q...

Ah, my apologies then. It's their fault for not updating the FAQ and you can hardly be blamed for quoting from it!

There was an update to that 6 months later: https://www.cockroachlabs.com/blog/better-sql-joins-in-cockr...

I'd assume they've made even more improvements since, so they really should update their docs.