| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nemesisj 5125 days ago

I love Heroku, but am I the only one that thinks their choice of words describing their architecture is a bit pretentious?

"...streaming data API which connects the dyno manifold to the routing mesh."

Give me a break!

7 comments

tptacek 5125 days ago

Well, there's the problem right there. They're using a dyno manifold to connect to the routing mesh. If they'd just use a flux capacitor, they could use a static manifold instead.

link

Cushman 5125 days ago

Love it. Times like this I wish there were more of a sketch comedy scene in our community... Not sure whether to go with "The AWS Enterprise" or "Bob the Cloud Mechanic".

link

dllthomas 5125 days ago

Reminds me of the good ol' Turbo Encabulator...

link

chadmaughan 5125 days ago

This is classical geek owning up. My first thought was, this is written with two purposes:

1) to prevent the average, non-technical person from understanding it ("Phew, I'm glad these guys are figuring out this stuff and not me - that's why I host with them. I don't even know what a 'dyno manifold' is!")

2) to show management how smart we are and that you still need us ("because who else is going to figure this 'routing mesh' stuff out if you fire those responsible for the outage")

A simple "we're sorry and we've given 10 lashes to the engineer performing the manual garbage collection" may have been a better approach.

Having said that, I still think Heroku is awesome.

link

aiscott 5125 days ago

Read up on the heroku architecture. These are the terms used.

The manual garbage collection wasn't the problem. An unexpected data structure created by garbage collection wasn't handled in a fault tolerant manner.

link

dsl 5125 days ago

All the reading in the world will not change the fact they pick retarded names.

Routing mesh? We call that a cluster of load balancers in the real world.

link

rhizome 5125 days ago

I can guarantee you that the Heroku architecture uses an internal slang for common sysadmin concepts.

link

erikpukinskis 5125 days ago

OK, I'll bite.

Instead of "dyno", they could possibly use a word like "VM". Except that they're not really virtual machines, nor are they EC2 instances. Read Only Chroot Jails plus Precompiled Application, Libraries, and Environment (ROCJPALEs?) They also have a pretty complex set of support structures that provide connectivity to databases and other resources. Perhaps someone can suggest an existing name for that, but I know of none.

Instead of "manifold", perhaps they could use the word "cluster". Except it's not really a cluster, it's a set of distributed clusters. And nodes in a cluster are typically machines. The nodes in the dyno manifold aren't machines, virtual machines, they're ROCJPALEs. You could use the word "array", but again, it's not really an array. It's a multi-layered, geographically distributed structure of co-hosted application jails. "Manifold" seems as good a term as any.

"Streaming" seems like a good word. It's specifically relevant to this incident... they describe how the API is not atomic; that each message is built on top of the previous entries, and the data structures are implicit in the stream. That sounds like the definition of "streaming" to me.

"API" seems like a widely accepted term. They could've described it as a "protocol", perhaps. But neither seems more jargony than the other.

"Data"... well I suppose "streaming API" without the data would work. But it serves to differentiate it from a streaming video protocol.

"Mesh" has a very specific meaning. It means that you have a set of nodes that are connected peer-to-peer and that messages travel through the network by hopping from node to node. I'm assuming that their routing layer is organized in this way.

"Routing" is also pretty well defined. Requests come in and need to be sent to the machine that can serve responses to it. What would you call that instead of routing?

I feel like people who object to this kind of language are the same folks who object to the word "cloud". People don't take the time to understand different strategies to provisioning and application hosting APIs, and then think these words don't mean anything. Yeah, salespeople use the word to hustle the Same Old Shit, but it also actually means something to people like us who are actually building stuff.

link

moe 5125 days ago

Man, that's a long and contrived justification for what amounts to a pile of bullshit.

We have seen very elaborate post-mortems from google, facebook, twitter, and no least from Amazon themselves (you know, the playground that heroku builds their sandcastles in).

The aforementioned companies had no problem explaining their respective issues in plain language that every engineer did understand.

Heroku doesn't even try to explain themselves. They just throw around fantasy words without real explanations, seemingly overwhelmed by their own awesomeness (in a failure report, no less).

As an engineer I feel insulted by this pamphlet. All I can gather from it is that they screwed up and apparently somehow related to their request-routing layer. Thanks, we knew as much before reading that text.

I still have no idea what actually went wrong and how they intend to prevent it in the future. But I'll certainly advise people to avoid a company that babbles about "control rods" when their software screws up.

link

erikpukinskis 5124 days ago

Are you a Heroku customer? I am, and I understand everything they said, and I appreciate that they went into detail about what happened.

link

tedunangst 5125 days ago

If I mechanically replace the words "routing mesh" with "load balancer", I instantly know what they're talking about without losing out on any important details.

link

jsolson 5125 days ago

Other than the fact that a load balancer is generally a monolithic piece of hardware. The failure modes are well defined, but most of them result in catastrophic outages.

I'm going to assume their routing mesh has many points of ingress and a larger number of exit paths (the dyno manifold), but that the nodes they've got participating in the mesh are actually in some sort of mesh topology (or form a connected graph).

This has the upside that if you lose several nodes in the mesh you probably haven't lost a path to any dynos. If you lose a whole AZ you can spin up new dynos in one of the existing ones and reconfigure the mesh quickly. My experience with loadbalancers, especially big load balancers is that updating a large swath of VIPs is NOT a fast operation (although you would start failing health checks on the missing nodes pretty quickly, adding new capacity to replace them is hard).

The mesh has the downside that the failure modes are a lot more complicated. Oh, and nobody knows what the hell you're talking about.

Of course, I could be wrong. They could just be using NetScalers (or ELB) and calling it a "routing mesh".

link

tedunangst 5125 days ago

Oh, and nobody knows what the hell you're talking about.

Yes. It's a level of detail that borders on obfuscation.

fwiw, I've always used the term "load balancer" to also refer to two redundant load balancing machines. (If I worked with more complex load balancers, I doubt I'd stop.) In the general sense, it just means "the apparatus that balances the load".

link

chris_wot 5125 days ago

It's "dyno manifold" that's the issue here. However, a quick Google search took me here:

https://devcenter.heroku.com/articles/dynos

It explains all :-)

link

moe 5125 days ago

Pretentious is an understatement.

This seems to be an unfortunate attempt to apply Corporate Speak to a technical announcement; "Let's see how many paragraphs we can fill with technical-sounding gibberish without actually telling anything..."

link

gojomo 5125 days ago

If a future update mentions 'phase modulation' we'll know they're just cribbing excuses from old Star Trek episodes.

link

moe 5125 days ago

Nothing a directed tachyon beam can't fix.

link

pnathan 5125 days ago

It might be pretentious. But it serves as (1) sales-speak and (2) provides a frame of metaphor that may well assist the engineers internally.

link

neilmiddleton 5125 days ago

How else would you describe them?

link

moe 5125 days ago

How about words like "ec2 instances", "erlang processes", "haproxy", "nginx" and similar stuff that was likely involved in the incident?

If they're too embarrassed to tell what happened then they should just keep quiet. Don't insult your customers with handwavy bullshit bingo, that just leaves a sour taste in everyones mouth.

Just imagine the hilarity when the PHB asks his inhouse engineer to translate this "post-mortem" into layman's terms for him. Most bosses have a bit of humor, but not when it comes to hosting infrastructure.

link

ajasmin 5125 days ago

Amazon also has its own fancy vocabulary but instead of cool sounding words like manifold they prefer short acronyms.

Things like EC2 RDS AWS S3 EBS EMR IAM AMI SQS SNS SES HPC VPC

But Amazon is the reference in cloud hosting and these terms are well understood in the field.

Heroku also had to coin some words to describe their architecture. But frankly this outage report is worthy of an Hollywood hacker movie:

"A manual garbage collection process which created an unusual record in the data stream" Wow!

link

FireBeyond 5124 days ago

Heroku had to coin some words of their own to "mask" the fact that their services are but engineering on top of the AWS stack (which isn't to belittle the effort involved).

link

dllthomas 5125 days ago

"connects the tachyon emitter to the warp nacelles"?

link