Hacker News new | ask | show | jobs
by manigandham 2779 days ago
When everything works, GCP is the best. Stable, fast, simple, reliable.

When things stop working, GCP is the worst. Slow communications and they require way too much work before escalating issues or attempting to find a solution.

They already have the tools and access so most issues should take minutes for them to gather diagnostics, but instead they keep sending tickets back for "more info", inevitably followed by a hand-off to another team in a different time zone. We have spent days trying to convince them there was an issue before, which just seems unacceptable.

I can understand support costs but there should be a test (with all vendors) where I can officially certify that I know what I'm talking about and don't need to go through the "prove its actually a problem" phase every time.

6 comments

As someone who works for Government and Enterprise - all I care about sometimes is how a company behaves when everything goes wrong.

The issue with outages for the Government organizations I have dealt with is rarely the outage itself - but strong communication about what is occurring and realistic approximate ETAs, or options around mitigation.

Being able to tell the Directors/Senior managers that issues have been "escalated" and providing regular updates are critical.

If all I could say was a "support ticket" was logged, and we are waiting on a reply (hours later) - I guarantee the conversation after the outage is going to be about moving to another solution provider with strong SLAs.

Very similar thing at our office. Considering the scale of which we run things, any outage could be a potential loss of millions _every minute_.

Sure, we use support tickets with vendors for small things. Console button bugging out, etc. But for large incidents, every vendor has a representative within an hour driving distance and will be called into a room with our engineers to fix the problem. This kind of outage, with zero communication, means the dropping of a contract.

Communication is critical for trust, especially if we're running a business off it.

Going single cloud on that scale is simply irresponsible though.

You need failovers to different providers and hopefully also have your hardware for general workloads

And suddenly the CEO doesn't care anymore if one of your potential failovers is behaving flaky in specific circumstances

Not saying it's good as it is.. communication as a saas provider is - as you said- one is the most important things... But this specific issue was not as bad as some people insinuate in this thread

Agree, if we are really talking about millions per minute (woah), then you can afford to failover to AWS.
As a government or large enterprise, you should get a support contract with the provider and have a dedicated support to contact.

Don't get it wrong. AWS is the exact same thing as Google. All you will is log a ticket and receive an automated ack by the next day.

You are incorrect about aws. If your pay for business support, and something is happening to your production environment, they are on a call with you in less than an hour.
How could I be incorrect when that's exactly what I said? You gotta pay for a support contract to have any meaningful support.
You also said that all you would get was an "automated ack". This seems to not be the case if aws provides an on-call support engineer.
I think the point is that that only happens if you have a contract. With GCP you can also get an oncall support engineer if you're large enough.
Use AWS and government "region".
"Support costs" calculation often doesn't include the costs of not having support.

When I worked at GoDaddy, there were around 2/3 of the company was customer support.

At the current company I'm at, a cryptocurrency exchange, our support agents frequently hear they prefer our service over others because of our fast support response times (crypto exchanges are notorious for really poor support).

All of my interactions with Amazon support have been resolved to my satisfaction within 10 minutes or less.

Companies really ought to do the math on the value that comes from providing fast, timely, and easy (don't have to fight with them) customer support.

Google hasn't learned this lesson.

Google hasn't learned this lesson.

They have though; they've just drawn the conclusion that they'd rather put massive amounts of effort in to building services that users can use without needing support. This approach works well once the problems have been ironed out, but it's horrible until that's the case. Google's mature products like Ads, Docs, GMail, etc are amazing. Their new products ... aren't.

There's a big difference between SaaS applications and compute infrastructure for your business.

Google Ads and such also have a terrible support reputation, even with clients spending 8 figures.

>Google's mature products like Ads, Docs, GMail, etc are amazing.

Until something goes wrong and the only recourse is to post an angry Hacker News thread or call up people you personally know at Google to get it fixed. For example https://techcrunch.com/2017/12/22/that-time-i-got-locked-out....

I've seen Google projects where the project lead explained (actually responded) that they don't want to provide support, end of story. Google puts folks in charge but does not give them enough in the way accountability objectives.
With Dell you can certify with them so you can get replacement parts and such without the BS back and forth with some guy in india. Saves everyone time and money.
I did this many years ago and it was great.

We actually got to a point where we had a couple of spare parts onsite (sticks of RAM, HD, etc) and so we repair immediately and then request the replacement. This was on a large HPC cluster so we had almost daily failures of some kind (most commonly we'd get a stick of RAM that would fail ECC checks repeatedly).

> instead they keep sending tickets back for "more info"

Isn't that the case with basically every support request, no matter the company or severity? The first couple of emails from 1st & even 2nd level support are mostly about answering the same questions about the environment over and over again. We've had this ping-pong situation with production outages (which we eventually analysed and worked around by ourselves) and fairly small issues like requesting more information of an undocumented behavior which didn't even effect us much. No matter how important or urgent the initial issue was, eventually most requests end up being closed unresolved.

I've definitely had interactions with smaller companies where you can effectively bypass first and second line by demonstrating you know what you're doing, mostly just saying the right things for them to accept that you've done basic troubleshooting steps already and really do need to talk to someone beyond that point.
Yes, same experience here, support at smaller companies can be more dedicated when talking to "knowledgeable" customers. It's generally easier to get to their 3rd level, sometimes just because there is no 1st or 2nd level at all. But at "big" enterprises - not so much.
> I've definitely had interactions with smaller companies where you can effectively bypass first and second line by demonstrating you know what you're doing, mostly just saying the right things for them to accept that you've done basic troubleshooting steps already and really do need to talk to someone beyond that point.

"Shibboleet" https://www.xkcd.com/806/

Smaller companies or personalized support structures (like named engineers) is very different. You can build up a relationship and usually bypass many questions to get to main issue, and even get it resolved before you can even open a case at larger organizations.

GCP does have role-based support models with a flat-rate plan, which is really great, but the overall quality of the responses leaves much to be desired.

Heh, your "test" reminds me of an old Hanselman article:

https://www.hanselman.com/blog/FizzBinTheTechnicalSupportSec...

To say "when it works it's stable and reliable" implies that it is neither...
60% of the time, it works every time...