Hacker News new | ask | show | jobs
by itsmemattchung 1107 days ago
> When looking at the cloud resources, we noticed many On-Demand EC2 instances with relatively low CPU utilization, which can be expected considering they don't have customers yet.

As a software consultant myself, I'd probably stop the conversation right there and ask why they are building such a robust distributed system — SQS, SNS, etc — without any customers. Still want to be deployed in AWS? Toss the damn app on a single EC2 instance...

16 comments

I’ve been exploring this lately because, honestly, the cloud is total overkill for small startups and hobby projects.

Kubernetes has its value even for small scale workloads like that, but it’s still a few steps more than, say, running a Capistrano script to push your code to a small Linux box with a database on a second one.

You’ll get really far on minimal resources these days, especially with cheaper ARM boxes that offer far more bang for your buck. Paying 1k+ a month to AWS/GCP/Azure is total insanity when you’re not even averaging a single active user a day.

At the beginning, just for the development experience I would just put an instance in some cloud provider and use microk8s or k3s to serve the app. It's very straightforward and then you can move to a managed service if needed. You will probably be using the same tooling and integrations at different steps. Context switching is low and you can reproduce locally. I'm down for serverless options when needed but I have a strong preference for local development.
> …the cloud is total overkill for small startups and hobby projects.

It absolutely can be, sure. But solutions like Vercel, Cloudflare Workers, Supabase, etc. can be excellent and inexpensive for those use cases.

And surely a vivid tech stack does more to make you look good in front of vc‘s than an overkill architecture does make you look incompetent
IME the investor cares more about Traction that Technology.
IDK, I remember seeing a tweet from Paul Graham saying that any new startup should use Typescript (I guess instead of js) so there might be some rules of thumb that some investors follow.
I consider them separate from the cloud on the basis they’re offering a platform as a service that just happens to re-sell cloud resources.

If you tried to replicate them on the same cloud provider, end to end, it would cost far more than they charge.

Vercel is a lot of things but I don't think I've ever seen it referred to as "inexpensive".
So far, for my hobby projects it's basically free.
This is exactly how the serverless guys "get you": Low traffic is nearly free but you pay for it on the slope of the scale cost ramp.
Sure, but aren't all cloud services notoriously expensive as you scale? At some point I assume you'd do what companies like Dropbox and Basecamp did, and re-host some or all of it.
It’s not the first time I’ve written about this. The hyperscalers are pretty much the most expensive way to build a business that isn’t presently hyperscale, and their ecosystems are increasingly optimized for sprawling stacks built on a virtually unlimited number of microservices.

That’s just not a realistic or necessary approach for everyone.

AWS is engineered for excruciatingly detailed billing right down to the moment you’re consuming or releasing capacity, and that’s how they built it. Managing that spend is exhausting.

My business runs on under $200/mo in Linode compute resources and the performance is significantly better than on similarly situated EC2 instances. We were spending that on databases alone with AWS and getting a fraction of the performance.

I make extensive use of “pure” Linode Kubernetes Engine k8s. It’s portable to any other Kubernetes cluster, and it lets me take my stack _anywhere_, even to a rack in the nearest data center willing to rent me space, if I really wanted.

With so many developers I feel that there is a complete lack of familiarity with what it takes to just run a website. So many came up in the land of cloud and k8s and etc. There are use cases for these more advanced production environments. But if more developers just learned how to make a website on linux, with a db, a webserver, and an application. They would know that a lot of more complex things just aren't needed... especially when you don't even have customers.
Truly, a very small number of real servers, just enough for blue/green deployments and so you can stay up if any one server goes offline, meets any plausible needs for a really, really high percentage of businesses & products. A ton of early-stage ones can get away with skipping most of that and just run on one or two servers, period, for quite a while.

If you're outsourcing operations to AWS or whomever, a couple largish instances and a couple supporting services can get you pretty much that same thing, for a bit more money and a bit less control over performance-consistency.

All that HA/scaling/clustering/cloud stuff is expensive, not just in monetary terms, but in performance terms. If you don't actually need it, a high percentage of your compute & (especially) your network traffic may be going to that, rather than actually serving the product. It also adds a hell of a lot of complexity, which comes at a significant time-cost for development, unless you want your defect rate to shoot up.

> But if more developers just learned how to make a website on linux, with a db, a webserver, and an application.

And hell, nothing's stopping you from writing 12-factor apps and deploying containers, and scripting your server set-up and config, even if you don't go straight for heavy, "scalable" architecture. Even if your server's a beige Linux box in a closet. Enough benefits that the effort's probably a wash at worst (hey, documentation you can execute is the best documentation!) even if you never need to switch architectures, and then you'll have a relatively easy time of it, if you do end up needing to.

> just run on one or two servers, period, for quite a while

famously, StackOverflow

i had a client who was burning… $10k? maybe $20k per month largely on nodes for EKS when they had no paying customers and ~zero load. (they had fully “production” sized clusters in all of their environments, and they had a slew of weird not-quite-prod environments.)

they also had some rabbitmq-on-k8s system going that fell over during small tests because they couldn’t get k8s to actually scale it. (which then convinced them they needed k8s, and bigger nodes)

sigh

The promise of cloud infrastructure is that it can scale to fit demand — start small, and grow as needed. But sometimes the truth is that it just lets people spend money more easily (:

Back in the day, it would have required a whole procedure to buy that hardware, have it set up, etc. Now you can needlessly spend $10k per month with just a few clicks!

This is one reason I like serverless. It works for a bunch of cases when you can wrap your head around it, and cost can scale linearly with your growth.

At some point, it might make sense to move off for cost reductions, but tools like GCP Cloudrun (deploy dockerized app servers that scale dramatically better than k8s) can be really nice for a small team.

And in that case, why ec2, why not a more affordable provider?
Because I already have an AWS account that bills directly to my credit card along with some other stuff that I'm already paying for. Every time I go down the let me save money route I spend hours reading through CD website reviews for hosting providers without any real understanding of their quality to save a few dollars and end up burning tens of hours of time. Or I could just fire the fucking thing up on AWS and then turn it off if I decide not to work on the project further
Who would you recommend as a more affordable provider?
I use digital ocean for simple projects. It’s not bad
There's a lot of expertise in AWS-land.
A DevOps junior should be able to start a VM just about anywhere, and without specialized experience.

AWS/GCP/Azure knowledge definitely helps when deploying there, but it's also not really necessary to get something running.

There are a lot of salesforce consultants but still would advise to look at other solutions.
The OP here, thanks for your comment.

To be honest I wasn't hired to challenge their entire setup, only to make it more cost effective.

So I chose the most straightforward way I could think of that would allow us to come up with a cost effective setup that will be scalable, fault tolerant and simple to maintain later on.

It all probably started with such a single instance running Docker compose, but then over time it evolved into this setup.

The ideal setup I mentioned would have been also cost effective, scalable and resilient.

I recently spoke with some folks who declined to invest because our solution was too simple: specifically, the fact that we don't use Kubernetes was a negative signal.

That's baffling to me, but that perspective is out there too.

>ask why they are building such a robust distributed system — SQS, SNS, etc — without any customers

I think this is one of those things that really depends on the use case. If they are performing expensive inference, I think having any queue is better than no queue. Going from a synchronous system to an asynchronous one is not easy and it's not something you would want anyone to be paged for once it starts to matter. Getting SQS/SNS up and running now could be a couple hours of work today and is practically free if your traffic is low.

Similarly I have a number of side projects that run extremely cheaply just using ECS and Fargate. I don't even think about Kubernetes really, it's just a PaaS to me that I'm shipping ARM binaries to. As a result I don't think very hard about autoscaling, failover, load balancing or deployment. A github action just pushes master to ec2 and everything "just works".

What SQS has to do with EC2?

One is a queuing service, the other one is a VM.

So instead of using SQS that has $0 cost when there are no customers, you suggest I install, configure and run RabbitMQ on an EC2, to save $0 when there are no customers?

Or save $1 when I have 100 customers? SQS is dirt cheap.

The point of SQS or any other usage-based AWS _developer_ service compared to DIY is that you can be up and running in minutes at a minuscule cost.

I agree with you about over-engineering and building a distributed "microservices" architecture when you have no customers.

But I'll pick SQS any time of the day when I need queueing functionality to increase my developer velocity so I can focus on building value rather than wasting my life installing, configuring and running anything on EC2.

The AMQP protocol alone and its various, good client libraries (compared to terrible AWS SDK which is a very thin abstraction over just sending/parsing raw JSON off the wire) is by itself enough to justify RabbitMQ.

> when I need queueing functionality to increase my developer velocity so I can focus on building value rather than wasting my life installing, configuring and running anything on EC2.

SQS still requires configuration, which means you either need to use the (terrible) AWS console UI or spin up a whole Terraform/CloudFormation/CDK/etc stack, not to mention that merely connecting to it requires correctly setting up AWS IAM (so you don't use a key that gives access to your entire AWS account). Vim'ing the RabbitMQ config file in contrast doesn't seem so bad, and even just using a static hardcoded password means the worst an attacker can do is take down your queue instead of taking over your entire cloud infra.

The question is: what are queueing for zero customers?
You might as well ask “why use a database when you have no customers?”
If I'm building a marketing automation app that allows customers to do a newsletter blast, I'll put those 1000 email recipients into a queue and run through it at a required pace with a retry interval if anything fails.

What do you suggest I do before I get my first customer?

- Blast 1000 emails in one go and pray upstream accepts it?

- Push these to a database and keep checking it with a CRON?

- Run RabbitMQ on an EC2 and push 1000 messages there?

- Implement SQS in "15 minutes" at $0 cost?

A single EC2 instance with SQLite as the database can get you pretty far.
Yeah that's a good starting point. Maybe just docker on those when you have two apps so they don't step on each other.
Exactly. Worry about scaling when scaling is in the horizon
No no no. We want to be like Google. Web Scale. Big big data. Huuge
It is rather amusing how over engineered most seed projects have a tendency to be.

I do think ddb and lambda hit a sweet spot for costs on ramping up. The rest, though, really struggle.

For me, setting up connections between SQS, SNS, DDB, Lambda, step functions, S3, Route53, API Gateway in CloudFormation is just a muscle memory. I’m much faster at it at this point that I am at standing up an EC2. I agree it can be hard to learn, but it certainly isn’t hard to do.

Elsewhere in the comments, there’s a suggestion that this kind of thing isn’t appropriate for “hobby projects” and early stage but I disagree. Those are the times when you really want something you can step away from without doing a disservice to your customers (i.e. letting packages go out of date and get vulnerable) and cost you as little as possible in a steady state so you can focus on acquiring customers and not worrying about fuddling around with the guts.

Your muscles must be tuned to enormous amounts of IAM-fu ;-)
Indeed. One of the hard things to figure out is the keeping the number of roles small while avoiding stars (IAM ain’t GitHub).
Yes. Stars should be removed frankly. The fact they admit new actions without any review or awareness alone is scary.

However IAM isn’t really for humans. It is just really hard to reason about roles programmatically. Some of the new minimal rights discovery from cloud trail analysis leads to an interesting pattern I’ve not seen a lot of : in lower environments permissions are wide open, but a capture of the required roles happens pre-prod and is used and tested against in preprod then promoted to production. This seems like a really useful pattern, and it exposes where your integration tests are incomplete.

A single EC2 instance is an equally bad trade-off on the opposite side of the spectrum from over architected SQS, SNS, etc…

The ideal trade off is a single Kubernetes cluster with as much in the cluster as makes sense for the team and stage of the project. As you say, toss the app on a single node to start, but the control plane is tremendously valuable from on the onset of most projects.

I don’t see the reasoning.

A startup that outgrows an EC2 server will be making enough money to hire more people to scale the system properly than what was initially designed: trading away everything for development velocity.

Kubernetes is not the right tool for this startup. Kubernetes is what large, old-school non-tech companies use to orchestrate resources, because it’s easier to find someone that “knows k8s” (no one knows k8s unless they’re consulting) than it is to find someone that can build properly distributed systems (in the eyes of whoever is in charge of hiring).

Most startups are at least going to want to be able to deploy, scale up or down, and restart an app without downtime. I wouldn't say that's overkill.

While it's not impossible to do with a single instance, you can spend a lot of time shaving that yak. It's reasonable to pay a bit more to have that stuff handled for you in a robust way.

These reasons related to deployment, but there's also lots of value in the security aspects of the control plane.

  * automatic service account for each workload
  * automatic service to service auth to 3rd party services
  * the audit log
  * role based access control
  * well defined api
  * the explain subcommand
  * liveness and readiness probes
  * custom resources
The list goes on, but the big ones for a small team just getting started are workload identity and security.
Is that right?

K8S is basically another answer to Conway’s Law. Every startup I’ve worked at switched to it because then the infrastructure could map more closely to the code. Not unlike microservices at a higher level.

The old-skool approach is depending on a team of SREs or sysadmins to provision hardware for you and basically handle the deployment, which K8S plus container images basically abstract away.

Not to say that dedicating resources to platform development (k8s style) isn’t a time sink when you’re trying to build product and find a fit in the market.

In my experience, giving code preferential treatment is how you end up with complexity lunacy; so I’ll add an addendum to Conway’s Law:

“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure — and which mirrors the skills of its key creators.”

K8s is designed to solve Google problems. Your startup will not have Google problems. Your startup will have Pinterest problems, or Gitlab problems, or Reddit problems — at which point you do not need K8s; you need someone who knows infra (which I expect devs to be working on distributed systems to understand).

Using K8s in a startup context is a sign of conformist thinking, detached from any critical aspect.

> The old-skool approach is depending on a team of SREs or sysadmins to provision hardware for you

This assumes that K8s won't require a "team of SREs". My experience is you need the same amount of SREs to maintain Kubernetes, probably more, because now you have a complicated control plane, a networking nightmare, then you layer that on top of resource-contention issues, security issues, cloud provider compatibility issues, buggy controllers, the list goes on.

The only thing K8s is great for is the maintainers, the consultants, and highly experienced SREs that inevitably have to be hired to clean up the mess that was created. This is my experience working in two similar sized environments, one with >1M containers, and another with an equivalent scale of bare metal servers.

then you layer that on top of resource-contention issues, security issues, and the list goes on.

applications running on bare metal don't have resource contention issues? or security issues?

Conway's law is about mapping teams to code+infrastructure (generally: areas of responsibility), not about mapping code to infrastructure. It's about people and politics.

You're right that K8S is an answer to Conway's Law: our people don't get along or can't collaborate or we have too many of them, so we will split them into team per service and force them to collaborate over network interfaces. Likewise, the infrastructure people will communicate with the other teams using Dockerfiles.

Why would you plan not to have customers? Don't you think the company is able to forecast demand for a new product launch?

Disney: We'd like to launch a new streaming service.

Consultant: Great! You have no customers right now so you can run it on a singleton EC2 instance until you outgrow that scale!

Disney: ...We expect 20 million people to sign up in the first week

> Don't you think the company is able to forecast demand for a new product launch?

I'm pretty sure "follow the forecast" is exactly what motivated that post.

In other words, the infrastructure is overkill for the initial forecast of customers.

They're not working for Disney.

It wasn't really that the infrastructure was overkill, it was that scalable choices weren't made in the first place.

Remember, the comment I replied to said:

> As a software consultant myself, I'd probably stop the conversation right there and ask why they are building such a robust distributed system — SQS, SNS, etc — without any customers. Still want to be deployed in AWS? Toss the damn app on a single EC2 instance...

But in the article, it's pointed out that SQS and SNS would have been better choices at lower costs for low usage:

> When it comes to the application, if I had been involved from scratch, I would have recommended SQS and/or SNS for the message bus, which are free of charge at low utilization.

Basically, this company is in a pickle because they didn't have architecture experts from the beginning, and the development team started writing an application without much thought to areas where SRE and DevOps teams often get involved: scaling and cost optimization.

Which is another way to say that most startups seem to wait too long to hire DevOps/SRE teams because they are roles considered to be "cost centers:" work that is not directly contributing to the money-making business logic.

SQS and SNS are a perfectly good primitives for building a robust distributed system that costs $0 when not in use, by triggering compute via Lambda or Batch.

Your comment is really pretty ignorant of how these tools interact. Using serverless primitives is the opposite of leaving nodes running for no reason.

This is 100% the line of questioning to pursue.