Hacker News new | ask | show | jobs
by ngz00 1237 days ago
I worked there and I can say that this is not accurate at all. It is very much a blame culture. I've seen people fired for less severe incidents. Beyond the core technology of the Pillar engine, the place is not comparable to a modern tech company in almost any way.
4 comments

As somebody who worked with them as a client, I can confirm this. There is currently a spec-level bug with their core Pillar engine and it was essentially bounced between several different teams and ultimately ignored as nobody's problem.
Unlike all of the "modern tech company" problems which are never ignored and only solved when someone's problem goes viral on social media.

They're a big company, some groups are better than others, some customers get more attention than others.

So basically like any other medium to large company? This doesn't sound unique in the slightest.
I would think that the company being a securities exchange would factor into the analysis. Don't you?
How does them being a securities exchange in any way affect the analysis of their software engineering practices? They're not some special snowflake, they can suffer the same software engineering and business process issues as other companies.
>> They're not some special snowflake

But they are. The consequence of a one-day or one-hour shutdown on their system is exponentially worse than most any other. I would expect them to have more rigorous systems, including more rigorous attention to development. Comparing the NYSE to any other business is like calling Fort Knox just like any other bank vault.

I disagreed with you until: "...like calling Fort Knox just like any other bank vault."

Interesting point that teeters on false equivalence. I think AWS or Azure might make for a better analogy. Your point identifies the inherent risk of actually operating a platform business. A bank vault is (mostly) synonymous with Cloud, in this context. If a vault is robbed or a cloud goes offline, losses extend beyond the business which inherently compounds the severity of downtime.

Linear loss vs. parabolic loss.

No company or organization is immune to bad business practices.

Them being a securities exchange does not somehow provide immunity from developing rigorous systems which have oversights, or make bureaucracy magically go away.

Likewise, the impact of an outage being more extreme does not mean the people there are infallible. Things slip through. Especially random customer requests being bounced around from team to team, the thing in question.

> like calling Fort Knox just like any other bank vault.

Main difference being that most bank vaults aren't actually empty. ;)

No they aren't.

There's far more critical snowflakes out there... FAA Airspace management, a medical radiation device, avionics in an aircraft, and facebook.

I used to work for a small startup, and postmortems were truly no blame - engineers would talk about exactly what happened and wouldn't hesitate to put the blame on their mistakes.

But as the company grew, the postmortems became more about blame since now you're not blaming an engineer, but an entire team so singling them out isn't personal. The postmortems were no longer a single engineer describing what happened in his code, but were team leads talking on behalf of teams. They were all about shifting blame from your own team and talking about why a service from another team led to the problem, even if your team could have (and should have) been able to work around it without melting down.

I'm no longer at the company, but Postmortems are much more useful when they really are no-blame because you can get to the real root of the problem, but I don't know if that's possible in a large company.

This happens within big organizations that are large enough where they start having that internal small company feel within units. I would say a good program, which could be small or a chunk in a massive org, does a blameless post mortem.

A few years back, a task to modify an index was given to a scrum team. The lead was away and the senior people could not be bothered. The junior developer stack overflowed an answer, asked for review, tested the script and let it rip. She missed that the change deleted everything if you noticed. Every environment, every data center wiped out. 10B records in each prod instance. Lessons were learned and processes fixed. She was not fired, but rather became one of the people safeguarding the keys to our prod kingdom as we fixed out broken process. I stole her away as my first report when I switched groups.

Suddenly I don't feel so bad about deleting an entire PVCS repository (happily answering 'yes' to all the 'are you sure?' questions) at 4:30PM on a Friday.
As organizations become larger they become more political. It's unavoidable.
Having been in the industry for a couple decades, and having worked at both, they're not all that different. Some groups are going to be better than others in the same company. Some companies are floating on venture money today, and might disappear tomorrow. Most technologies constantly cycle. Our experiences working at the same company were different.
Blame cultures and process cultures are both problems in different ways. Blame cultures don't care about individual accountability, only that someone suffers. Process cultures only care that no one suffers, not that individuals are accountable. Both have some misguided notion that something other than personal accountability can lead to good results. Misattributed blame and suffering does not deter poor performance or mistakes. Not even correctly aimed punishments are very good at that. Accountability isn't about punishment, it is about limiting power to the level of responsibility demonstrated. Rules and procedures don't prevent poor performance, they can in fact entrench and guard it, and they only mildly impact mistakes. Best practice can mitigate mistakes to the same extent or better (due to easier adaptability), but people keep trying to turn them into rules, and that has to be fought. If you followed all the rules but didn't get the job done, you still shouldn't be handed the same task again, but not out of blame.