Hacker News new | ask | show | jobs
by gwbas1c 414 days ago
Depending on what the metadata is, it can be a huge security risk.

For example, some US government agencies consider computer names sensitive, because the computer name can identify who works in what government role, which is very sensitive information. Yet, depending on context, the computer name can be considered "metadata."

4 comments

AWS does not treat metadata with the same level of sensitivity as other data. The docs explicitly say that sensitive information should not be stored in eg tags or policies. If you are attempting to do so, you’re fighting against the very tool you’re using.
To add on this point, in my interaction with AWS employees it seems that

- The account manager and the enterprise support TAM can view a list of all resources on the account, including metadata like resource name, instance type and cost explorer tags. Enterprise support routinely present a monthly cost review with us, so it is clear that they can always access this information without our explicit consent. They do not have the ability to view detailed internal information about it though, such as internal logs.

- When opening support case, the ticketing system ask for resource ARN which may contains the name. It seems that the support team can view some data about that object including monitoring data and internal logs, but potentially accessing "customer data" (such as ssh-ing into an RDS instance) requires explicit, one off consent.

- I never opened any issues about IAM policy, so I don't know if they see IAM role policy document

- It seems that the account ID and account name is also often used by both AWS' sales side and reseller's side. I think I read somewhere that it is possible to retrieve the AWS account ID if you know S3 bucket or something, and when exchanging data with external partner via AWS (eg. S3, VPC peering) you're required to exchange account ID to the partner.

I used to work for an AWS support partner, essentially being tier 3 support. We were also involved in onboarding large customers to AWS and working with their teams to migrate their systems.

We always told them that AWS account IDs are not considered sensitive information by AWS, neither are S3 bucket names. Metadata (tags etc) is generally visible to AWS in a variety of ways if you ask for help, but are not public.

This helped them use the services to the sensitive information was actually hidden and apply the correct security policies.

Most of the access youre describing is based on AWSServiceRoleForSupport https://docs.aws.amazon.com/awssupport/latest/user/using-ser.... You can see both the IAM policy and the cloudtrail access logs in your account. There are internal tools which use IdP + business justification (ex open support ticket for a specific service) before giving a human limited, predefined, access to that role.

Some services will have an internal “admin” tool that is limited to a smaller group with similar limited access + review mechanisms. IME those tools are generally built in to the service implementation and dont expose access via a similar service principal/role. The reduced customer visibility is mitigated by very restrictive access, like “team primary oncall + manager approval + high severity ticket”.

I invite you to consider the possibility that even though that’s the case, it’s Amazon’s fault for this design choice and one that can be critiqued especially since metadata disclosure can be paired with other exploits. For example, if I know a bucket name then I know the bucket’s domain name since buckets are by default created open to the public.

There’s no inherent reason for treating metadata as less sensitive and there would be fewer problems if it were treated with the same sensitivity as normal data.

Said another way, some users expect the metadata to be treated sensitively and Amazon’s subversion of this is an Amazon problem not a user problem since this user expectation is rather reasonable.

> Said another way, some users expect the metadata to be treated sensitively and Amazon’s subversion of this is an Amazon problem not a user problem since this user expectation is rather reasonable.

It's an Amazon problem to the extent that they lose business over it. But if people choose to use AWS, despite having different requirements for data security than AWS provides, that is a user problem. At some point the onus is on the user to understand what a tool does and doesn't do, and not choose a tool that doesn't meet their requirements.

s3 buckets being public by default was stopped 2 or 3 years ago: https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-s3...

longer if using the console

S3 buckets were never public by default. From the link you posted:

"...Amazon S3 buckets are and always have been private by default. Only the bucket owner can access the bucket or choose to grant access to other users..."

The feature and announcement you linked was about making active an additional safety feature that would block them becoming public. Even if you intentionally ( or accidentally ) configured them with public access.

The well known accidents in the past, of Facebook or the Pentagon having private data in public S3 buckets, I can only attribute to the modern practices of self-paced learning, skipping videos on Udemy courses or deciding formal training is no longer necessary because I can Google it...

The globally unique names of S3 could be problematic with just the metadata of name.

You could figure out how a company names their S3 buckets. It's subtle, but you could create a bunch of typo'd variants of the buckets and sit around waiting for s3 server logs/cloudtrail to tell you when someone hits one of the objects.

When that happens, you could get the accessing AWS Account # (which isn't inherently private, but something that you wouldn't want to tell the world about), IAM user accessing object, and which object was attempted to be accessed.

Say the IAM user is a role with terribly insecure assume role policy... Or one could put an object where the misconfigured service was looking and it'd maybe get processed.

This kind of attack is preventable but I doubt most people are configuring SCPs to the level of detail you'd need to completely prevent this.

That’s why Amazon recommends the use of the expected owner parameter for S3 operations.

ISTR it’s also possible to apply an SCP that limits S3 reads and writes outside your organization. If not via an SCP then via a permission boundary at the least.

Yep, an SCP can restrict what S3 buckets you can access via IAM.

If you're using a VPC you can deploy a VPC S3 Gateway Endpoint which has a policy document on it, this will restrict which buckets the whole VPC can access no matter what their IAM policy says. This also has the benefit of blocking access using non-IAM methods, like signed URLs or public buckets.

I know because it was one of the key decisions we made with R2 and pushed this point in the community.

The majority of S3 buckets, especially valuable ones, remain created back when it was the default and thus the metadata sensitivity with bucket names remains (and that isn’t the only metadata issue).

AWS S3 buckets have always been default private since forever.
> since buckets are by default created open to the public.

This is false

"We kill people based on metadata."

- Ex-NSA chief Michael Hayden

Metadata is data. In a large corporation, metadata can also reveal projects under NDA that only a select few employees are supposed to know about about.

That sounds more like the government’s fault for putting a secret in the name.

Make the computer name a random string or random set of words, no relation to the person or department who uses it. Problem solved.

And more problems created.

Now you have to have another system that decodes the random words to human usable words. Is that information going to be stored all in one system? Is each team going to be responsible for the translation? How is that going to be protected from information loss?

I work with systems like this so, yea, it can be done. But it cannot be done trivially.

As someone who has been in the IT department for a company, this is a basic and trivial function of an asset tracking system. It’s literally just matching the serial number or computer name of the asset to the employee.
I don't the US government is representative of any kind of advisable behavior. Perhaps if they weren't doing stuff that makes people want to murder them we wouldn't have to light piles of cash on fire to protect the perpetrators.
Whether or not it’s advisable doesn’t really change the fact that the if US government is commonly doing something then that it is not correct to describe a security impact to those SOPs as “hardly a security risk”