| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by panghy 2987 days ago
	We (Wavefront) has been operating petabyte scale clusters for the last 5 years with FoundationDB (we got the source code via escrow) and we are super excited to be involved in the opensourcing of FDB. We have operated over 50 clusters on all kinds of aws instances and I can talk about all the amazing things we have done with it. https://www.wavefront.com/wavefront-foundationdb-open-source...

1 comments

panghy 2987 days ago

We basically replaced mySQL, Zookeeper and HBase with a single KV store that supports transactions, watches, and scales. It's not a trivial point that you can just develop code against a single API (finally Java 8 CompletableFutures) and not have to set up a ton of dependencies when you are building on top of FDB. We are (obviously) experts at monitoring FoundationDB with Wavefront and we hope to release the metric harvesting libraries and template dashboards that we use to do so.

Almost 5 years in and we have not lost any data (but we have lost machines, connectivity, seen kernel panics, EBS failures, SSD failures, etc., your usual day in AWS =p).

link

qaq 2987 days ago

"but we have lost machines, connectivity, seen kernel panics, EBS failures, SSD failures, etc., your usual day in AWS " <=== This I wish more people realized that is a day to day reality if you are in AWS at scale.

link

Joeri 2985 days ago

The best way I've heard it described is "complex systems run in degraded mode".

https://cdn.chrisshort.net/How-Complex-Systems-Fail.pdf

Basically once a system is complex enough some part if it is always broken. The software must be designed from the assumption that the system is never running flawlessly.

link

qaq 2979 days ago

No doubt but that's pretty high overhead for many projects colo is actually a decent choice but I guess that's not a popular opinion.

link

koide 2986 days ago

As I understand it, it's like that everywhere at scale, not just on AWS, it being a property of operating at scale.

Or are you saying that AWS is particularly unreliable at scale?

link

panghy 2986 days ago

I seem to think that cloud providers are particular opaque about small glitches (i.e. they aren't going to tell you that a router or switch was rebooted for maintenance if it comes back right away and you can email support and it's always the same response: "it's working right now") :)

link

qaq 2986 days ago

On the network side no, it's much more crappy on AWS.

link

koide 2986 days ago

Which provider is the best, network wise?

link

qaq 2986 days ago

I only have experience with AWS and on prem and high quality colo like Equinix. Possibly due to reduced complexity and having full control over networking setup but significantly fewer issues vs AWS.

link

killertypo 2987 days ago

And FoundationDB has held our data durable through all of this.

link

qaq 2987 days ago

Sounds like however bolts on PG compatible SQL layer on top will have a killer product on their hands :)

link

socceroos 2987 days ago

Have a look at CockroachDB

link

qaq 2987 days ago

Already playing with it but FoundationDB is used for production Petabyte scale deployments, and the whole deterministic simulation thing for testing is really reassuring as far as bugs/stability. I am guessing with Apple's resources that approach was taken to a whole new level after the acquisition?

link