Hacker News new | ask | show | jobs
by tlogan 3400 days ago
These service health boards are more like advertisement page then actual status of the service.
1 comments

I guess their bizarre thinking is something along the lines of: "unless we have proof that noone can access the service, we won't change the indicator from green to yellow.

Seriously: I don't understand why you guys stay with AWS.

Because you perceive public clouds only as virtual machine providers, that you can replace with other provider in two days. A detailed cloud migration consists of replacing some parts of your software to use managed services provided by a specific cloud provider, and AWS is still has the best service offerings IMHO. When you use these services carefully also you will see that AWS is very cheap and reliable enough. Outages like today's are happening in every platform and it is possible to mitigate them.

You can use Adwords as a self-service user. Without knowing so much of details you can run your ads but also you can bery easily ruin your budget. But many enterprise customers use it very differently than those users and they are extremely optimizing the cost. Cloud is the same. If you don't know how big customers use AWS, it is normal that you are surprised because AWS is still leading the market.

You say GCP is better than AWS. Which part is better? GCP does not have many services of AWS we benefit from. How can you compare totally different providers? You can only say AWS EC2 is worse than GCP. But you cannot compare whole platforms in one sentence.

(Sorry, I'm late to reply, but since you addended your comment you might still be listening...)

After spending a year evaluating both AWS and GCP (with an emphasis on their managed database services; both SQL and no-SQL) my general feeling is this:

"Microsoft Windows is to Unix as AWS is to GCP".

(Or perhaps closer to the truth: "VMS is to Unix as AWS is to GCP".)

Baically AWS services seem like they are badly designed by buerocratic mediocre engineers following some bureocratic template for "a service".

GCP feels a lot saner (both API- and UI/console-wise). I often got the feeling it's designed by people who:

a) are smart and well-rounded in terms of experiences. It does take cleverness and experience to design something elegant that is also useful.

b) take pride in their work (it does show)

(And then, as a bonus: It's cheaper!)

You talk about SQL and No-SQL as managed services and it shows that your experience is limited to a classical application consisting of virtual machines and some data storage. However these are not the only services offered by both platforms and currently AWS has a richer feature set. For example Lambda and its deep integration with whole AWS platform is the biggest game changer from my point of view. If we are talking about virtual machines and databases, I can accept this comparison. However we are talking about 30+ services, some of them are even not available somewhere else and solving serious business problems in production and at scale. It is very wrong to put everything into basket and compare. Maybe GCP has better pub/sub service and AWS has better object storage. These should be compared seperately. Answering to your question, why do we still stay at AWS, because it is solving our problems in the most cost effective way and with reduced complexity, we are happy with it.
You're probably assuming too much again :)

I specifically spent a lot of time on Lambda and found it quite annoying compared to GCP AppEngine. So much bureaucracy. Just this thing that you have to specifically register every single Lambda API call and its parameters using an interface built by non-thinking people.. Sheesh.

For on-demand processing I just want a single HTTP-ish entry point, like AppEngine provides. (That way I can I move my service between different providers, if I wanted to move away from e.g. AWS.)

Anyway, I just updated my HN profile with more details about my experience. Please visit it to judge if I might know what I'm talking about.
Sorry for endless number of typos and mistakes. Obviously I was sleepy while I was writing this.
> Seriously: I don't understand why you guys stay with AWS.

Personally I've been using it for ages and I know most services inside and out. They do suffer downtime in some regions occasionally, but it'd be too expensive at this point to move.

And who doesn't suffer downtime? You can't avoid it; you just need a plan to deal with it. For example, having a backup replica bucket in another region and the ability to quickly switch your CDN over would probably be a good idea here; that's what I did.

If you want to go further you can replicate your data to another cloud provider entirely and use low TTLs to switch to a backup CDN if your system is that mission-critical (in the event of a worldwide AWS failure doomsday scenario).

All systems will fail you and it's our responsibility as IT professionals to have a plan to mitigate this.

Low TTL on DNS entries might do more harm than good: if your DNS provider gets seriously DDoS, being able to rely on caches can save the day.

Anyway, I agree with your conclusion.

Sunk cost fallacy.

I do agree that we should all plan for failures.

However, I also think it's a sign of failure in planning and architecture foresight if it's too expensive to move away from a particular cloud provider.

The sunk cost fallacy is when you (irrationally) decide to stick with what you're doing purely because you've already spent a lot of resources on it. It doesn't apply when you've done an economic analysis and found out it doesn't make sense to swap.

There are plenty of cases where it just wouldn't make sense to switch after looking at the costs, opportunity costs, etc. For example, if his site makes him $10 a month, outages cost him $1 a month that could be mitigated by moving, and it would cost $1000 of labor to swap providers. (Depends on interest rates.)

Perhaps it was originally a failure to not have a plan to easily move from a provider, but it doesn't seem unreasonable to me that right now it may cost too many hours of work to justify the move.

It's not as though it would be impossible; our integration with AWS isn't that deep, it's not as though we use DynamoDB for our core data store or anything like that. But even migrating from one traditional datacenter to another isn't easy from an operational point of view.

There needs to be a clear financial win. Even taking into account the failures we've seen so far, I don't see a compelling reason to leave AWS.

(You're right, I used that term incorrectly.)

Still stand behind the other two points I made in that post though.

> I don't understand why you guys stay with AWS.

Who do you recommend instead (assuming in-house or Hetzner-equiv is out of reach)? Google Cloud? Azure? Rackspace?

Google Cloud if you're looking for something similar. It's just so much better and cheaper. I think a lot of the resistance here towards that kind of move is just because people are inherently lazy and they aren't paying the bill themselves.

(I'm guessing a relatively large part is also selfish attachment to the market leader because of employment reasons. I hate wasting money, both for myself and for my employer, so I don't really understand this kind of thinking - but I do understand how it could flourish in a venture capital-rich time/locale.)

I also recommend reading:

https://thehftguy.com/2016/06/15/gce-vs-aws-in-2016-why-you-...

Google Cloud doesn't exactly have the greatest reliability/uptime either.
https://status.cloud.google.com/summary tells a different story or do you have other information?

I have used GCP for some time without being affected from any incident.

I'm not sure what you mean. If anything that link underscores my point. GCE has absolutely had it's own catastrophic errors. Remember last April when ALL instances in ALL regions went down?

https://status.cloud.google.com/incident/compute/16007?post-...

That is a pretty awesome page.. way better than a page full of green icons, during an obvious outage... I like that they have writeups a few days after the incidents....
Google also doesn't have the best record for developer tools.
GC's CDN doesn't cache files bigger than 4Mb. No Windows VMs. Bound to AWS for these 2 reasons.
As already mentioned, they do have Windows VM's but there are some caveats that indicate it's not fully baked yet. 1.) They require that each VM MUST have a public IP address so that Windows can talk to an activation server every 30 days. 2.) You cannot yet bring your own license.
Someone else already mentioned Windows VMs.

Looks like CDN has a 10MB limit:

https://cloud.google.com/cdn/docs/caching

(work at Google Cloud)

What about something like B2 from https://www.backblaze.com/ ?
S3 in a single region is based out of multiple data centres / availability zone, with data distributed so that the loss of a single availability zone won't impact either data availability or durability, even to the point of being comfortable with complete physical destruction of an AZ. The same applies for Azure, GCP etc.

B2 is based out of a single DC (or at least, was at launch and I don't see anything that suggests that has changed?) You've got to decide what's most important to you. Data persistence or $$$.

OVH
Bad idea there, support is horrible.
OVH doesn't even want to take my money to keep my server running. Their auto-billing process is busted and when it goes wrong they just delete your server.
That's not what I've seen. I misconfigured my auto-billing and got paged in the middle of the night by nodes mysteriously disappearing, but they released those machines minutes after my CC went through. Not that I'm a big fan of OVH but if you design your system to allow for failure you can't match their value for money.
What is your last datapoint on that?

The last year or two has seen a remarkable improvement according to those customers of mine that host there.

I think it's more, "if the service can't do what people need it to do, that's a problem; if the service cluster gets wedged hard enough to stop responding to the requests of our monitoring system, that's a failure."

Which would make sense (and is sorta-kinda a best-practice) if Amazon wrote services such that they "crashed early"—but instead they're seemingly written so the backend lock up and be rendered completely useless at "doing its job" but will continue to run just fine.

Either of those two design decisions is potentially a good thing on its own, but they need to be considered in light of one-another if you want your status page to make any sense. If you want to report cluster failures, code your clusters to actually fail. If you want to keep your clusters up, write your monitoring checks as whole-stack acceptance tests.

> Seriously: I don't understand why you guys stay with AWS.

You don't seem to have enough experience to comment on the issue.

Please visit this comment sub-tree:

https://news.ycombinator.com/item?id=13765786

That is a regurgitation of your opinion without any facts.

Comparing technology and saying "it seems" or "i feel" isn't really a good argument to convince me one way or the other.

> Seriously: I don't understand why you guys stay with AWS.

I tried them all and Amazon is still the best.

Postgres on RDS
Come to NEXT in a week! :).
Any chance UDF iterators for Cloud Bigtable are in the works?

Being able to run distributed D4M/GraphBLAS queries in Cloud Bigtable would be killer.

"From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database" https://arxiv.org/pdf/1606.07085.pdf