Hacker News new | ask | show | jobs
by ushakov 1255 days ago
The tech diagram on AWS is insane. This is what I call vendor lock-in porn

I'm wondering how much your monthly bill is?

6 comments

This is like the 5th iteration of that darn diagram, and I still can't quite get it right. The other comments are correct, the underlying setup isn't that complicated, and it wouldn't be too tricky to build it for another cloud, since all that's really happening is a Python script reading a NetCDF file. The real perk to building everything using a server-less approach was that everything scaled down to 1 user (me), and then has scaled up beautifully as more people have signed up.

The monthly bill isn't great, but has been manageable through donations so far. I'm handling about 10 million API calls/ month, so lots of Lambda invocations, but they're all thankfully very short with tiny outbound data transfer totals, and luckily inbound is free. If the costs ever get out of hand is to throttle the request rate down, but there's still a few optimizations I can do on the AWS side that should help (ARM here I come!).

Want me to take a gander at it? This is a project I’d love to see thrive.
It's rather simple, just messy to look at. It's a timer that triggers a cluster that processes raw data files and uploads the JSON to a server with a reliable file system. Then a Lambda function processes API requests to read that JSON and serve a response.

"Lock-in" is just a trade-off like any other engineering decision, same as what programming language or API schema you choose. AWS is massive and reliable, and these services are cheap and widely available so I don't see much of an issue here.

Which part is insane?

Put another way, how would you implement this without self-managing pieces like nginx/apache, rabbitmq/kafka, or mongodb on a compute instance?

The pieces of managed infra they're depending on are all pretty interchangeable from one cloud provider to another. It's not the cheapest way to solve the problem if you discount the cost of management, upgrades, etc. But if you factor those costs in, it's quite competitive.

I would go to cloud providers, which can offer me managed rabbitmq, kafka or mongodb

Certainly, I wouldn't get any vendor-specific services like Lambda or SNS (in this example)

The Lambda functions will just be running your business logic code. Only specific part Lambda provides is the glue. Which is easy to replace with any other cloud or self hosted alternative.
The the end of the day, the real reason I went with Lambda to build this is that when I started, I didn't know anything about spinning up a server, building a storage array, or serving an web request. But what I did know was how to write a Python script to extract weather data from a file. The really cool thing about some of these cloud tools is that they let me ignore all of those (very difficult) problems and just focus on the data
Exactly what the cloud is best used for imho. You can focus on your core problems and still shift to a different solution later on when that becomes more effective cost wise.

But still, when I first started in AWS we had lift and shift'ed most of our platform onto AWS and when it came to adding new features I first looked at Lambda instead of building out on the existing platform. The speed at which I could setup the feature and start measuring and iterating on it was so much better compared to doing it on our existing platform. Even though that was pretty ironed out and fully automated.

Where is your lamda_handler defined? I searched your repos but could not find it.
Easy enough to move to other clouds if it was defined in something like terraform or even cloudformation. Could even move out of the cloud to your own bare metal services pretty easily, since the various components are so well defined. AWS makes the most sense though as stated in the docs because a lot of the data being processed is in S3, and reading data from S3 inside of most AWS services is free.
This is a pretty standard looking diagram. It probably isn't very expensive to run.

AWS diagrams are pretty intimidating until you've built a few things with several AWS services.

The benefit is that a lot of maintenance work is taken care of for you, and your costs can be low if you don't need a lot of compute.

I am an Infrastructure Architect (aka "Cloud Architect") so I design cloud systems like this on the daily. The "vendor lock-in" argument always makes me laugh. Its the #1 thing I hear all day long.

This diagram is actually pretty simple. It looks worse than it is. All it uses are Lambdas (serverless functions), S3 buckets (object storage), and SNS (broadcast/push queues). There appears to be one traditional server in there with EFS, which is just an elastic file system.

All of these systems have equivalents in all the major cloud providers. So if the builder of this wanted to move to GCP or Azure, they are not really locked to AWS. This can all be built in another cloud.

Now, could you do it in a day? No. Assuming they are building it with Infrastructure as Code (such as Terraform) then they would need to convert the provider and change resource blocks. But this akin to refactoring in a codebase. Its work, but its not terribly difficult. Then they point it to their new cloud and run `terraform apply`.

There is almost no way to entirely remove vendor lock-in. The closest you could come is by designing everything yourself on bare metal servers and renting those from a cloud provider. So instead of using a managed queue system, you run some sort of messaging queue on the server. Then you host files on the server's filesystem, and you run the "lambdas" as applications on the server. But that almost causes more headaches than you save or solve for.

I look at Cloud Providers as similar to cell phone providers. I know people who live in fear of being locked into a contract with Verizon or something. But really, what are you going to do? You will always need a cell phone. The only other real choice is AT&T or maybe Sprint/TMobile. How often are you really going to switch and what are you really gaining by doing so? Energy spent worrying about being "locked in" to a cloud vendor is energy wasted. Yeah you can move from AWS to Azure or GCP. But that's about it. What do you gain by switching? Probably almost nothing. They are all pretty comparable at this point in reliability, features, and price (GCP is the slight laggard here, but not by much). If Google calls your company and offers you a huge discount to switch, you could still do it. Aside from that, there's minimal incentive to do so.

There are a few weird services that AWS has for example that might be considered "lock-in" services. This would be things like AWS Snowball or AWS Groundstation. These don't have comparable systems on other platforms. In the case of Snowball you probably have so much data on AWS that just transferring data would take months (or even years) which could be considered a form of lock-in.

tl;dr - This is a very tame arch diagram. A few lambdas, s3 buckets, and messaging queues, all of which have comparable services on all major clouds. There isn't significant vendor lock-in, this could be rebuilt fairly easily (assuming they used IaC) on any major cloud provider.

Hello

> This diagram is actually pretty simple

The diagram looks like an ad

> All it uses are Lambdas (serverless functions), S3 buckets (object storage), and SNS (broadcast/push queues)

Do you actually need all of this or do you use it because Amazon tells you to? I know for instance you cannot use Amazon SES without also using S3 and Lambda

> So if the builder of this wanted to move to GCP or Azure, they are not really locked to AWS. This can all be built in another cloud

You're saying that I cannot move to other cloud provider without my existing code becoming useless?

> Assuming they are building it with Infrastructure as Code (such as Terraform) then they would need to convert the provider and change resource blocks

What about the data pipelines and business logic?

> There is almost no way to entirely remove vendor lock-in

There is: avoiding vendor-specific APIs altogether

> Closest you could come is by designing everything yourself on bare metal servers and renting those from a cloud provider

I don't have to. There are things like Railway, Fly.io, PlanetScale, Supabase, Upstash, Minio, which can work without locking me in

> What do you gain by switching?

Freedom

> There isn't significant vendor lock-in, this could be rebuilt fairly easily (assuming they used IaC) on any major cloud provider

You are contradicting yourself

Also a cloud engineer. I use a similar setup professionally and for various personal projects.

For someone who isn't familiar with standing up cloud resources the diagram can look overwhelming but once you play around with AWS for a bit, most of the resources you see are fairly boilerplate.

VPC is essentially just setting up a virtual LAN for your resources. S3 is being used as an API mediator to NOAA. CloudFront is a CDN for your API Gateway. Lambdas run your logic. API Gateway triages api requests to lambdas, and a couple other services act as mediators.

There is some vender lock-in in that everything here is built on AWS, but all the major cloud providers have similar/equivalent services. If you decided to move to GCP or Azure you could probably move the entire stack in a few days (maybe more depending on how much your lambdas use AWS specific internal APIs).

If vendor lock-in is a really big concern for you, you can run everything on an EC2 instance running Ubuntu instead. That way you can just spin up your service on another Ubuntu instance in another datacenter, or locally, or whatever.

Soooo, yes. There is some vendor lock-in here, but not much.

To answer your cost question. I run a very similar setup for personal projects and I rarely exceed AWS's free tier for most services. On a high-usage month it's around $85. It isn't super optimized. I could make it cheaper and nearly free if I put in the work.

That said, cost for a service like this scales very proportionally to usage. For example, AWS API Gateway can process 1 million requests for free before they start asking you for money. If the service becomes super popular we'd likely see the "Buy me a coffee" button promoted a little more and eventually you may see a paid tier as an option, but as it is, it's probably pretty affordable to run.

I'm the dev behind this, and really appreciate all the insight from actual cloud professionals! Your guess here is spot on, I designed it so that I could more or less fit in the free tier with exactly one user, with costs scaling pretty linearly afterwards. There are a few more optimizations I could do, but it's honestly pretty impressive how much traffic (I'm at about 10 million calls/ month) can be (somewhat) cheaply handled using AWS
> "I know for instance you cannot use Amazon SES without also using S3 and Lambda"

You can absolutely use SES without S3 and Lambda. I've used it many times in various projects.

As that page makes clear, SES can hand off incoming mail to a Lambda, or to S3 – or to SNS which can deliver it to any HTTP endpoint, or e-mail address, or text it to your phone for that matter.
What do you expect SES to do with your mail after receiving it? S3 and Lambda are optional delivery locations, amongst other choices.
Exactly this. It receives the email, now what? You need to run some code on it and so the way to do that is one of the compute services. AWS isn't forcing you to do anything here.

99.9% of SES users I promise are only sending mail anyway. You aren't forced to have Lambdas or anything else to send mail.

Wow, I struck a nerve. I'm happy to address these points however.

> The diagram looks like an ad

Lol, Its an architecture diagram. You could swap the AWS-specific icons for generic ones I suppose, and it wouldn't change anything. It is fulfilling its purpose of explaining how all the services connect together to deliver the product. Just because it is an AWS Lambda icon doesn't mean you couldn't make it an Azure Function icon instead and perform the same goal. You're just too focused on hating AWS here to see the forest through the trees.

> Do you actually need all of this or do you use it because Amazon tells you to? I know for instance you cannot use Amazon SES without also using S3 and Lambda

This is just a standard event-driven architecture. There's really nothing exciting to see here. Data comes in, it gets stored somewhere (S3), that triggers an event (SNS), a compute service (lambda in this case, but could be anything, even a standard VM, bare metal or anything else) picks up the task and processes it or performs a job on the data and stores it, it triggers another event, something else picks it up, and so forth. This isn't an AWS design, its just an event driven architecture design and this is how they work.

SES can be used standalone. It doesn't require Lambda or S3 like you postulate. There are only a few times AWS requires something else and its usually Cloudwatch or S3 and these will sometimes be the destinations required for specific types of logging or auditing and so forth.

AWS is forcing you to do nothing here. The creator chose this stuff. But I assume they chose it to keep it free. Most of this will survive under the free tier until the project becomes massive. If it inches over the free tier, it will still be cheap. That's probably the incentive for a lot of this.

> You're saying that I cannot move to other cloud provider without my existing code becoming useless?

Correct your code is your code. Think about a lambda. In this scenario data comes from NOAA and is put in a bucket. You write a serverless function that takes that data out of the bucket, reformats the NOAA data into your proprietary format and puts it in another bucket. The code that does that is written in Go, Python, C++, Java, or whatever you want. If written correctly, it accepts data, processes it, and outputs it. So if you move to another cloud provider your code would still work. It might run in an Azure Function instead of an AWS Lambda, but that doesn't matter. Your code does the same thing you don't need to throw it out.

> What about the data pipelines and business logic?

Hard to say without more information. But its possible BI dashboards need to be changed to point to the new service and stuff like that. Sure. Again, you're not switching clouds in an afternoon. But the point its the next cloud system is eerily similar. Its not like you have to rebuild, it would be more of a refactor.

> There is: avoiding vendor-specific APIs altogether

Possibly. But if built correctly your code doesn't need to be aware of the environment it is in. But there might be cases where you interact with the services directly, like downloading from S3. This would change with the next provider possibly (although most actually have S3-compatible APIs). But most of your application will not directly interact with the cloud, it will interact with the services. So for example you use RDS to host a managed Postgres db, but your application is just interacting with postgres, not AWS here. But you're right there might be some scenarios that use vendor-specific APIs.

> I don't have to. There are things like Railway, Fly.io, PlanetScale, Supabase, Upstash, Minio, which can work without locking me in

I fail to see how these are any different than tying yourself to a product like S3 or Lambda. In many ways, these solutions are TRUE vendor lock-in, with all the vendor specific APIs that you live in fear of. Fly.io is a PaaS, which is going to be way harder to move away from than switching from AWS to Azure. PlanetScale, Minio, and Upstash are literally no different than equivalent products in AWS/GCP/Azure. I guess you could host the instance that runs these products ondifferent clouds and it would be the same, but you're still tying yourself to something. The risk of tying yourself to a startup is higher than tying yourself to Amazon/Microsoft/Google. You're trading one evil for another, in most ways you are actually losing freedom with these not gaining it.

> You are contradicting yourself

My point is that there isn't as much vendor lock-in that people fear. Yes it exists, but don't live your life in fear of it. Yes you would need to refactor stuff here and there. But the same architecture diagram we saw for AWS is basically the same one that would exist in Azure or GCP. The underlying tools don't change. The marketing names and logos change, which clearly bothers you, but the underlying system doesn't change.

You clearly know way more about cloud implementations than I do, so I really appreciate the time you took to explain that out! Since I am the dev here though, the one things I can confirm is that you're 100% correct about the setup methodology- almost every decision was based on "how can I do this in a way that's cost effective". In particular, the underlying data was already on AWS, so it just made sense to build it there.

I think one thing that gets lost in discussion is the advantages of serverless approaches for people without a ton of technical background. I built 90% of this without knowing anything about servers or APIs, but the cloud tools (from whoever) let me ignore all of that and just write a bunch of Python scripts that do cool things. I know it ends up sounding a bit like an AWS ad (I wish I was sponsored by them, but am not), but there really are perks to the approach