Hacker News new | ask | show | jobs
by ykl 23 days ago
If you've used AI coding models in a large corporate setting, you'll know that a lot of big corporate deployments basically require using AWS Bedrock for two simple reasons:

1. Large companies tend to already have an existing relationship with AWS, which makes things way easier to go through vs. setting up a new vendor relationship 2. Large companies tend to have strong internal requirements about making sure that internal data stays under company control. With AWS Bedrock, you can be a lot more confident that what you're feeding into the models is not going to end up in someone's training set somewhere. For where I work, this requirement is a dealbreaker for going directly through OpenAI's API instead of going through AWS Bedrock.

4 comments

To go a step further, the reason it's often impossible to add a new vendor if that you've signed a bunch of contracts with your customers saying you're not going to send their data to other vendors in all sorts of various flavors.
And the pain of the procurement process, specially when you follow a certification such as iso27001, soc2 or similar.
soc2 is stupidly viral but generally not a blocker since its pretty straightforward to get.

It's really the per-customer contractual agreements you had to sign to grow that make things horrible.

3. from my opportunity - For many (not all) LLMs, Bedrock gives you control over which country the data stays in. You have no control over that with the Claude API, for example. We do not work in the US and have strong requirements for the data to stay in our country, which Bedrock gives us control over.
> We do not work in the US and have strong requirements for the data to stay in our country, which Bedrock gives us control over.

It doesn't actually. The US can request data from whatever country US companies store it, and companies must comply.

So if you have strong requirements for data to stay in your country, using a US provider, whatever it is, is out of question no matter what the company's marketing claims (they are not maintaining these claims under oath for what it's worth: https://www.senat.fr/compte-rendu-commissions/20250609/ce_co... )

Ah yes, there is a gap between what our regulator wants and what the reality is. I have no qualms that they'll hover out the data if they want to, we know that since Snowden. But I have to comply with the regulator, not with reality.
The definition of malicious compliance…
4. AWS billing is already cross-charged to different departments per account. Copilot/Claude/Codex would need that setting up all over again, and is (probably) all coming out of a central bucket right now. Switching to Bedrock APIs is really easy, and solves a problem for people high enough up in the organisation that they can insist on it.
A very interesting comment.

Curious to understand how AI will continue to grow if this is the trend. Assuming most valuable data is behind such firewalls. And whatever is public has been harvested, trained on top of whatever has been acquired illegally (this is a grey area).

Will it become a closed ecosystem without outside input?!

The pace of data creation is only increasing, and our capabilities of sharing and storing it is growing as well. Lots of this is out in the open, ready for anyone to crawl and scrape.

There probably is a point of “peak data” where the amount of new data will start decreasing, but that’s likely a 22nd or 24rd century problem.

Pace of data creation ignores the fact that the majority of the big gains in LLM “intelligence” has come from scraping and feeding in the huge amount of public data that already exists.

Unless we’re producing data on the order of an entire new internet every couple of years, then it’s hard to see how LLMs can achieve further huge leaps in capability compared to training on effectively 0% of the internet vs 100% of the internet.

That is without going into fact that many already use AI to type out and write stuff. I have a customer in Far East that routinely uses it even for simple emails, he is not so familiar with English.
The majority of the gains come from the size of the supercomputers used to train them on. That's still growing. The algorithms used, and how the data is annotated is also some secret sauce.
If anything, trend will go towards sharing data less. It will become more important to keep the knowhow and data to yourself so the companies will do that.

And individuals will loose motivation to share, because it wont be that pro-social activity anymore anyway.

imo it will slowly turn into where people run their own AI
How is one certain bedrock data isn’t being shuttled to external providers?
What other people are saying, but also because Amazon does not want to fuck around in this space. They don't want the legal fight or the reputational damage that would come with it.
They also don't really stand to benefit from doing so, unlike basically everyone else in this space.

They have access to a ridiculous amount of private customer data and so far have not shown any predilection to misusing that access.

To take an easy example that has actually had lawsuits I can link to, you must be unfamiliar with the lawsuits against Amazon for misusing sellers' data in order to undercut them with their own products... https://www.reuters.com/sustainability/13-bln-uk-lawsuit-acc...

There's zero reason to "trust" Amazon about anything. (And yes, I know the retail and AWS sides of the company are different, but it's still the same company. The same rot is always there, just shuffled around.)

this is not related to AWS, but merely to amazon's retail business and their sellers know and sign up for the deal when they sell via amazon.

every single retail company does this, they allow suppliers to sell the product using retails's infrastructure, and then retailer turns around and create private label products using sales data (Costco's Kirkland Signature, Walmart's Great Value, are just some examples)

Yes, but Kirkland's signature comes from the same factory. If I'm the factory owner and Costco vis going to guarantee me sales albeit at a slightly lower margin, so long as I slap a different sticker on it, that's different than from Amazon finding out which of my products sells best and then gets someone else to rip it off so I don't get paid anything.
The retail side is completely different from AWS.
Wow there must be an echo in here because I swear I said just that... And then pointed out that it's the same crap being recycled back and forth across the company. There is no real separation.
They have very little to gain and a hell of a lot to lose.
In contrast to Microsoft, OpenAI, and Anthropic, AWS has never done anything close to sneaking in unwanted training opt-outs after the fact.

They are the only ones I trust not to do that so far. And their terms are extremely clear on that, no fuzzy language. Exactly what we want to see. So we use Bedrock.

Contracts and the force of law?
Bezos and Altman pinky-promised and are super trustworthy.
Seems like trusting AWS with your data has been a good bet for a long time. They wouldn’t have the size/scale otherwise.
Bezos is not in AI gold rush. AWS is shovel rental.

Also unlike Altman they are trustworthy - a lot of Amazon competitors do run on AWS for decades.

You really don't understand what AWS offers if you think this is what is getting them workloads (including competitors and highly sensitive govt workloads).
Andy Jassy is actually trustworthy.
They could be lying with all this:

https://docs.aws.amazon.com/bedrock/latest/userguide/data-pr...

But it seems tremendously unlikely with how explicit they are being with it. It is clearly one of the top selling features for the service.

The only response with an actual link to the docs, thanks homie!

Edit: From your source - “You are responsible for maintaining control over your content that is hosted on this infrastructure.”

So they don’t.

Contractual obligation, external third party audits, and above all, AWS’s reputation.

AWS isn’t going to risk their reputation, and thus huge chunks of their business, just so a few AI labs can get some extra training data. That’s an insane risk with zero upside for AWS. AWS knows full well they will make insane quantities of cash without breaking legal contracts with companies who pay them billions each year for infra.

Having worked with lots of companies, I can say that trust is there. But true test is competitors of Amazon. Does Walmart use them? Ebay? Although not in exact same business.
they’re crap on a lot dimensions of how they treat customers but data privacy/security is one thats taken pretty seriously at AWS, perhaps owing to the massive reputational damage that would result if they played loose with it.