Hacker News new | ask | show | jobs
by dos1 4880 days ago
On the plus side, if they're willing to share, I bet this will be a very interesting postmortem. Presumably Amazon.com is one of the more bulletproof web properties in the world. Whatever could have occurred to take it down for nearly an hour (at this point) can only be interesting!
1 comments

I can't compare to other web properties, but when I worked at Amazon, the store going down was a regular event. Something broke almost daily, though it was rare for the whole store to go down. (EG: You might not be able to search, or checkout might be down, etc.)

The store went thru periods of relative stability, and relative lack of stability, and in the periods where it was not doing so well, it (or a major piece of functionality) would go down in some key area at least once a week, sometimes multiple times a week during the holidays.

While it's been several years and I'm sure they've improved reliability, the sheer mass of the store made it very slow to evolve. And as an ex-amazonian sometimes I go and check for bugs that were issues back in the day- several of them have come back over the years, which is not surprising given that the entire group that was working on the parts I was working on disbanded because so many people were driven off by bad management. (A one-two punch in that case, a bad manager backed by another bad manager, neither of which had any technical knowledge.)

At the time I worked there, large swaths of code in the store had no team who was responsible because the team had been disbanded in one of the regular shuffles of employees. Amazon had a tendency to get a team together to do a feature, launch it, get the PR and the stock bump, then disband the team and put them on other projects. Of course some of these things stuck around if they were successful, but there was a lot of cruft from past efforts like: Local restaurant menus, the movie times system, various "social shopping features" (a perennial favorite to try again and again.) Hell, they used to have catalogs for mail order merchants- scanned paper catalogs!

At the time, they were claiming that "AWS is what we built the amazon store on!" (which was totally false, S3 was engineered completely separately from the store, and to its credit, as obidos and gurupa were crap. The only thing the store shared with AWS for at least the first several years was being hosted in some of the same datacenters.)

At least at the time I worked there, I'd call it a mess held together by the code equivalents of duct tape and bailing wire.

One of the things Amazon excels at is customer service, so when these problems would impact the customer, their bacon was often saved by customer support fixing the problem manually (eg: messed up orders, etc.)

Granted, operating at Amazon's scale is not trivial matter. But Amazon is a retailer and stock marketing company (Eg: one of their primary products is Amazon stock), more than an engineering company.

I'm kinda amazed that people perceive them as a "tech giant" along with Google, Facebook and Amazon. Shows the power of a good (actually, GREAT) side business like AWS. They get the credit for building something good and scalable with AWS, but of course it was a separate team lead by a senior executive with enough political clout to shelter that team.

'I'm kinda amazed that people perceive them as a "tech giant" along with Google, Facebook and Amazon ' err.. we are talking about Amazon here
Amazon is a weird company, and it has lots of parts. Even at, say, Microsoft there can be a huge amount of variation from division to division and team to team on how things are run, the corporate culture micro-climate, etc. At Amazon this is even more true, each team is substantially on their own, and while there is a certain amount of global overarching corporate culture every group is different and some groups buck against the trend successfully.
What a great Freudian slip.
They have one of the biggest logistics systems run by a large amount of software in the US, one of the biggest robotics deployments in the warehouse, AND they developed AWS on the IT side. Amazon's software is largely behind the curtains but they are definitely a tech giant.
> as obidos and gurupa were crap.

Except for the part where Gurupa enables scores of developers to build web apps that make hundreds of service calls yet emit results faster than the website we're using right now.

The website we're on is restarted every few days because memory leaks are hard.
It could just be that mzscheme never returns memory to the OS. Perl doesn't.
Not returning memory is different from a memory leak. Not returning memory means the memory footprint equals peak memory footprint. A memory leak is a bug in the program which causes space complexity in memory to grow unbounded. mzscheme certainly doesn't leak memory. HN leaks memory.