Hacker News new | ask | show | jobs
by rckoepke 2151 days ago
What types of AWS data would be trawled? Are we talking about data inside S3 buckets, database schemas, particular architecure styles, the fact that a product is consuming {x, y, z} amounts of cloud resources, or simply "spending $m / year" in gross?
5 comments

I worked in an area where it is really hard to figure out exactly what workloads were being run and where it would have been extremely useful to know even basic things like CPU utilization patterns, network throughput patterns, etc for a specific customer.

We had access to absolutely none of that information. We flew blind, relying entirely on the fact that we gave our customers enough hand-holding support that they would willingly volunteer information about their workloads so we could help them optimize it/save money.

No one even attempted to get more detailed customer information AFAIK because it would have been extremely against company culture. That isn't Earning Trust or having Customer Obsession. The idea of reading data in someone's S3 bucket or inspecting what is happening inside of someone's EC2 instance in any way was unthinkable. Amazon is huge and imperfect, but from what I saw AWS takes data privacy extremely seriously.

I can confidently tell you that Amazon's employees cannot see customers data inside S3 buckets or EC2 instances. They are extremely serious about that stuff since they know that will erode their customer's confidence.

But there's probably other superficial business data that's helpful to evaluate that.

> I can confidently tell you that Amazon's employees cannot see customers data inside S3 buckets or EC2 instances.

From a technical standpoint, that statement is false.

Every employee might not have the credentials to, but for AWS to function as it does, SOMEONE inside the company has to have those credentials.

If you change 'cannot' to 'don't', well then we've just gotta take you at your word, which is where we started anyway.

> SOMEONE

That's not necessary unless SOMEONE includes computer programs.

Yes, when things go very seriously wrong, I believe AWS can have literal people override that permission, which will leave a mile long audit trail and likely accompanied by an internet scale outage.

The point I’m trying to get across is that the default viewpoint of many knowledgeable developers I know is ‘Of course AWS can’t see inside my EC2 instance because X’ — where X is some magical technology that doesn't exist.

I don’t want to devolve into audit logs and permissions and multi user key signing and wether they actually do or not.

The statement that ‘they can’t’ is 100% false, full stop. That’s all I’m trying to get across.

The technology to do it does exist likely on hardware you possess. The trusted computed platform lets you build a signed OS that encrypts its data using keys on the TPM. Using this, you could build an S3 implementation that stores customer data, but doesn’t let you access it.

It’s probably not a good idea to make a system with no human fallback, but it IS possible with current, non-magic technology.

The reality is that groups of people inside AWS have access to your stuff. A given person might only be on the S3 or EC2 team... but each of those teams can ssh to hosts in production, or has other access that could be used to compromise your data.

Amazon does take privacy and security very seriously, but these systems are run by people. Attacks like the recent Twitter attack could work for various AWS services.

Source: I used to work in EC2 Networking.

Except they get audited by 3rd parties on statements like that, and have controls tested. It's not like they're just ... digital ocean or somebody.
Do you have evidence of this claim re DO?

I worked with a DO on an technical issue, and they were steadfastly against me granting them temporary access to our servers even though it would have made the issue easier to diagnose. Cloud provider that verifiably get caught doing this will quickly lose the trust of all their large customers

DO doesn't have a great track record for customer trust. I run personal workload but couldn't recommend it over AWS to a larger company.

  - https://news.ycombinator.com/item?id=23117660
  - https://news.ycombinator.com/item?id=20064169
Sales != Engineering (in regards to the first one), AWS have had similar issues. The second one wasn't good.

https://www.zdnet.com/article/aws-error-exposed-godaddy-serv...

Reading through that second one, while the inciting incident was certainly pretty bad, their eventual response was, to my mind, all that could be hoped from a company in this day and age:

https://www.digitalocean.com/blog/an-update-on-last-weeks-cu...?

They recognized that their processes were too mechanistic and inhuman, and introduced a lot more compassion and open communication into them—and even chose to spend more money on hiring people to reduce ticket queue wait times.

I'd say that speaks volumes in DigitalOcean's favour.

The audits check that controls are in place, not that the controls are technically bulletproof or people-proof.

Source: Worked at AWS for several years including working on systems that had audit requirements for [secret project where I could not know the name of the customer because I don't have TOP SECRET security clearance].

Nobody said things were perfect or bullet proof. But that they are there, and it's not just 'trust us'. And it's not just single technical controls - the control regimes include mitigations against technical failure and requirements for ways to catch collusion and actions taken outside of authority.

And there are lots of things that many folks at the big cloud providers don't know about their internal threat management and monitoring. Source: Audited most of them for that customer you weren't allowed to know the name of. :)

Yeah. True. I guess what I meant is that just a handful of employees have access to that and they need to have legitimate reasons.
Also, it is possible to build systems such that, no, there isn't a 'root' or 'unlimited permission' or whatever. Or that there is, but it's a multi-person credential.

This is one area where AWS takes things MUCH more seriously than it's competition, and they don't talk about it enough publicly.

The critical factor here is whether there are controls in place to prevent it. Sure, somebody probably could, but what to what lengths must that person go to do it, and what happens when it is discovered? Most things are not technically impossible, after all.
for its faults aws takes data privacy super serious. if you are in support you cant even see attachments customers put on cases without providing auditable justification

and you def cant see in s3 buckets or instances. hell if a customer sends you a link to an object in their s3 youre not supposed to open it

Some group of people on the S3 team likely have root access to the machines where your objects are stored. If you don't have encryption turned on...
You keep making factually incorrect statements. I'm not going to go into detail to refute them, because I don't feel comfortable sharing internal design details and security mechanisms, but your comfort in confidently asserting falsehoods is disconcerting, to say the least.
If you work in AWS security, then you of all people know about the litany of service teams who don't meet their security goals every year.
I find it funny that none of the people here arguing really understand what data is important from a strategic sales point from view and what's not. The customers databases and other crap they store on the cloud. Not really important.

The raw billing information, oh motherfucking yes.

Agree. The billing data gets explicitly or implicitly discussed when various orgs talk about their successes, annual planning etc.
This is incorrect, at least from a logical POV and why it's hard to trust what cloud vendors say. A statement like this is either naive (most likely) or actively attempting to mislead.

Technically, its absolutely possible. Most likely you'll just need a support ticket or bug, and then you can troll around as engineer.

Also, security teams also usually have access to stuff when things get interesting.

Better to say that access is strictly on a case by case basis and monitored thoroughly.

Ideally customer is notified each time it happens - that would be cool, but likely technically not possible since data ends up in so many systems (like logs, SIEM, telemetry, debug files, backups, data scientist desktops,....)

> Ideally customer is notified each time it happens - that would be cool, but likely technically not possible

You're underestimating the investments that AWS (and Amazon at large) make in to security, confidentiality, and auditing. You're also missing a fundamental implication of building AWS on AWS primitives.

As a relevant example there is only one AWS IAM and one CloudTrail. It's a core tenant of AWS IAM to put that control and root of trust in to the customers control. That means when developer support is helping with your ticket they do so via your accounts AWSServiceRoleForSupport role. That means you can control whether that role exists, which principals can assume it, the capabilities it has, and you can see those same API calls in your CloudTrail logs. Although it would make support difficult you're welcome to delete that service linked role and prevent support.amazonaws.com from assuming said role in your account.

https://docs.aws.amazon.com/awssupport/latest/user/using-ser...

Yes, those are great features for compliance. But you seem to believe that your AWS instance is indeed yours. IAM is a concept built on top of lower level primitives that you do not control, but Amazon does.

I'm not talking about Amazon SSH into your EC2 instance - but of course they can do that also - at will, without you authorizing it.

Lower level disks, logs, hypervisor, telemetry, etc.. are accessible beyond your control.

> IAM is a concept built on top of lower level primitives that you do not control, but Amazon does.

Of course there are lower level primitives. And if the public documentation and observed behavior is insufficient I encourage you to inquire more about the various compliance, certification, and third party auditing programs in place https://aws.amazon.com/compliance/programs/. However at some point this approaches solipsism and I can’t prove a negative in a HN thread.

> I'm not talking about Amazon SSH into your EC2 instance - but of course they can do that also - at will, without you authorizing it.

No. Extraordinary claims need evidence. Either you have serious non public information counter to many AWS statements ... or you misunderstand some fundamentals of SSH and public key cryptography.

> Lower level disks, logs, hypervisor, telemetry, etc.. are accessible beyond your control

I would encourage you to read the AWS data privacy statements https://aws.amazon.com/compliance/data-privacy-faq/. Particularly the definitions of “customer content” and the “shared responsibility model.”

This really isn't how modern security works at most cloud companies. Even if you have root class credentials or the ability to escalate to them in some way (and that's a big if by itself), its a LOT of steps to get access to customer data, almost always involving broken glass, many protection layers, and often requires cooperation of multiple other root level people/credentials from completely different teams.

Depending on how the infrastructure is built, or what the particular service set up, it may not even be possible to gain access to specific data without extraordinary means, possibly involving replacing physical hardware.

I already corrected my statement in another reply. You're right. I said probably only a handful of people can access customer data to do their job. I personally never met one. The goal of my comment was to illustrate that in my experience handling customer data there was a big deal. It's not like something you can casually query to see if a particular customer has a good business or not.
Amazon is a massive company. How can you know this with confidence? Are you in the C-Suite?
It’s the thing they tell you the most when you work there. Like in a a obnoxious way. Most infosec training is about that.

If someone has access to customer’s data for their work they have to do a bunch of extra training and do other stuff. Potentially sign some things and there’s probably a different way to authenticate. I really don’t know because I never had to do that and nobody I knew had that type of access but I heard when you do you have to put with more things.

But then what about other commenters saying that this is exactly what their sectors of the company do? Do you think it's impossible that a massive company like Amazon that controls an ungodly amount of the Internet would break those rules? Especially when the government of their home country hasn't pursued an antitrust case in God knows how long
>But then what about other commenters saying that this is exactly what their sectors of the company do?

i don't see anybody claiming that amazon is harvesting data from inside their customer's infrastructure. amazon has a lot of data that's "amazon's data" that would tell them about businesses that are operating on AWS that might be ripe for competition.

For example, they know what your AWS bill is, and how it's been trending. If you pay a huge bandwidth bill and it goes up 50% each month, they know you've got a business model that's working and that they can undercut you on one of your big expenses.

You're right that other commenters aren't necessarily saying that they're peering into buckets and PII...but I err on the side of questioning that the company is committing wrongdoing.
Amazon does not trawl customer data.

However, metrics like AMI popularity is Amazon's data... and that definitely informs first-class AWS product development. Once the company identifies a business opportunity, different teams often investigate "build" and "buy" options simultaneously.

Same goes for retail - Amazon works backwards from high-margin categories to identify opportunities, then pursues investment in existing brands versus spinning up products under the company brands.

This all feels very monopolistic to me, but regardless it's worlds apart from the accusation of stealing private information through faux investment offerings.

I don't think the difference is all that large. Legally, yes. But ethically they are pretty close. After all, any product launched like that will be at the expense of those already operating in that niche including Amazon's platform users.
Yeah I don’t know. It’s possible that there’s some evil stuff happening. I’m just relating my experience as a pawn employee. They parrot this incessantly.
1. Did you work on a team at Amazon in the likes of what user throwaway_aws mentioned?

2. What measures that you know of is Amazon implementing to make sure no employees across all teams are having access to said resources?

As I said below this is something that they will talk a about like every freaking day. They talk about customer’s data as the most important thing to take care of.

Basically is preferable to get a bullet in the head than to ever reveal or tamper with customer’s data.

I cannot answer your question about who has access or not but I’m telling you what’s the culture when it comes to customer’s data.

At the end of the day I was just another IC doing menial work so probably not a good reference, but that was my experience

I'm sorry but what you just said is patently false:

https://www.bloomberg.com/news/articles/2019-07-29/capital-o...

Quote:

Capital One Financial Corp. said data from about 100 million people in the U.S. was illegally accessed after prosecutors accused a Seattle woman identified by Amazon.com Inc. as one of its former cloud service employees of breaking into the bank’s server.

While the complaint doesn’t identify the cloud provider that stored the allegedly stolen data, the charging papers mention information stored in S3, a reference to Simple Storage Service, Amazon Web Services’ popular data storage software.

My reading of this is that the ex-employee used the knowledge about EC2 instance credentials being accessible as a path to gain unauthorized access to data. In theory anyone could have exploited this vulnerability even if they had never worked for Amazon. They never say that Amazon employees had privileged credentaials that would give them unauthorized access to customer data.

AWS customers that want to avoid this vulnerability should disable IMDSv1 as per https://aws.amazon.com/blogs/security/defense-in-depth-open-...

There was zero inside knowledge and they were an ex employee at all times relevant to the incident.

The EC2 instance credentials via the metadata url is public documented functionality. Its how things like the SDK “just work.”

The S3 bucket policy, instance creds, and (inferred) overly permissive IAM policy is all public documented functionality. This looks like a simple case of an initial intrusion being escalated via permissive configuration and controls. There would be no story if the suspect had not been employed by AWS in the past.

Disclaimer: Im a Principal jn AWS but have no direct or inside knowledge of this incident. Everything I know or have stated here is public record (eg the indictment) or public AWS docs.

That leak didn't involve any insider access. So it doesn't prove that employees get access to the S3 data.
Can speak for AWS. Only the later. Basically the usage information for cloud resources. This constitutes the foundation for billing. BTW, this is be true for any cloud, any SAAS.

There is no way an employee can look into customer data. There's enough trail inside AWS to prove that without any doubt.

What are the measures being implemented to ensure that no employee can look into customer's data?
I used to work for AWS and had to deep dive into IAM to build a feature.

Basically Everytime you touch AWS your session is tagged with your credentials and has a unique ID. So everything downstream you touch has your session ID associated with it.

Now say somebody from Redshift wants to access the customer's data. They will then need to access to the encryption key in KMS. The trail will be there since KMS lives in the customer's account (you can audit your own access). And for production services, human actors cannot access these keys - only production credentials can. An engineer who can log into a prod host in theory can grab the temporary credentials there but it expires in 15 minutes so your trail will be rather visible. Also access to prod host has a high bar - only senior people can do it.

Now in theory somebody can coordinate with a malicious user in KMS team - but the bar is high. Also the actual master key never leaves the premise for KMS so your attack surface is very limited.

Of course there are some core teams like IAM and KMS where if they become vulnerable the whole thing falls apart. But that's a big stretch for those systems since they are the core to the business.

This is about as bad a revelation as the original one. So the encryption key is fair game without explicit customer approval?
I think perhaps you misunderstand the architecture of KMS. KMS master keys are used to remotely decrypt the symmetric encryption keys for encrypted data that are stored alongside the encrypted data. KMS master keys don't ever leave the KMS servers themselves, and servers can't be accessed directly by anyone. AFAIK they don't have open ports except for handling production traffic and are hardened against opening a shell. An engineer on a different team with access to a host running a customer workload could potentially run off with a temporary customer credential being used by the customer workload, which they could then use to call KMS to decrypt encryption tokens for as long as the credential lasted. But they couldn't get at the KMS key itself or retain access past the expiration of the stolen credential, and all of the aforementioned audit logs would report all of the activity of the stolen credential.
I think you misunderstand my concern. What I'm missing in the above scenario is that a resource that should be 100% under the control of the customer and nobody else can be accessed by AWS personnel to open up a door that should be closed unless the customer permits access.

What the technical implications are is moot, the process that hands out these credentials should not be accessible to anybody but the customer. It implies that AWS personnel can impersonate customer representatives or processes run on behalf of those customers. That's a serious problem.

In all the years that I've been co-locating I do not remember a single instance where a representative of the hosting facilities that I've used gained access to our data or hardware without my very explicit permission.

As for audit logs: they are only as useful as those inspecting them, and more often than not are entirely passive until required for evidentiary purposes.

Plus, if there is any legitimate concern about AWS having access to KMS keys (at this point it would be that they own the servers, and that's about it), you can roll a CloudHSM and import your own keys.

KMS is very clear about it's usage and what it involves. It's obvious that with Symmetrical Encryption AWS obviously needs to know the other end of the key at some point so that it can decrypt the data.

However, as customers can't even export these keys and the whole system is based on using KMS to actually perform the decrypt operations it is a non-starter. It's a lot more secure than most infrastructure which probably encrypts locally but is stored in a broom cupboard with a $10 lock.

I can tell you generally how this works in Azure, I can't speak for AWS, but unless a customer is using BYOK for encryption of their data, I can't imagine how AWS c o u l d n ' t be capable of accessing data, and even then I wouldn't gurantee they couldn't still get your data. In Azure (as of a couple years ago), in order to access a customer's tenant it required VP approval, the support engineer was granted access for a specific amount of time, and typically only to specific services, all with the customers knowledge beforehand. It may have changed since the last time I had to go through this process and was restricted to blue badge employees. I have worked support cases since then and the support engineer would not even do a log me in/WebEx, etc session as they said they were not allowed to see the portal. But it may have been that they were not a blue badge and/or bcuz the customer was a critical infrastructure customer.

In order for AWS to comply with LEO's they must have some way of accessing data, that is NOT to say they do this for business purposes.

At the end of the day there's obviously nothing other than remotely storing your keys that will keep your data opaque. Even supposing that the IAM team doesn't have a way to forge a valid credential if they need to, the confirm/deny response of their service to authorization checks is the source-of-truth for whether a credential is valid, and they could update their service endpoint to affirm bad credentials if they wanted to. Presumably for law enforcement purposes they have a way to forge a credential that doesn't show up in audit logs.
Other than the data each service actually retains themselves (i.e. the Lambda service themselves store your Lambda Functions because they need to execute them) customer data is generally stored encrypted at rest with KMS keys belonging to the customer (or sometimes managed by the storage team). It wouldn't be possible to peer into unencrypted data without persuading the KMS API to authenticate your access to the key. Presumably this capability exists, because otherwise Amazon wouldn't be able to honor warrants for customer data, but the premise that KMS is handing out decryption tokens for customer data for the benefit of Amazon Retail's business analysts is pretty silly.

And of course, you're always vulnerable to someone with access to the physical host of an EC2 instance where your workload is running. Only GCP AFAIK offers an encrypted-in-processing compute service, and it's like a week old.

https://cloud.google.com/blog/products/identity-security/int...

Given how granular AWS billing data is, I would expect the odds to be fairly good that it alone is sufficient to make a good analysis for which third-party offerings are compelling markets. Then AWS takes their execution advantage, along with things like the lower friction that arises from first-party integration with IAM and billing, as well as not having to pay retail for the cloud resources, and it becomes very difficult to retain a moat unless you have a paradigm or perspective that is both critical to succeeding and is also incompatible with AWS culture.
You’re correct. It’s disturbingly detailed as far as what it reveals about architecture.
aggregated api usage stats, api client headers is often enough to identify competitor products and their traction, and is non-sensitive, coupled with account id to customers.
Do you have to use AWS to sell on Amazon?
no