Hacker News new | ask | show | jobs
Ask HN: How would you, a solo dev/small team, make a scalable web app in 2022?
15 points by star_juice 1553 days ago
Title says it all, how would you make a scalable web application in 2022 as a solo/small team of devs? To limit the problem space, how about something that can handle CRUD operations for an internet T-shirt store getting 1000-100000 simultaneous visitors? I'm certainly open to learning clever ways to architect a mini-Youtube if that's somehow possible though.

Perhaps to make advice easier, I've played around with things like Firebase and Meteor, did a kubernetes tutorial or two, deployed small toy apps on AWS/GCP, as well as have some familiarity with ReactJS/HTML/CSS (I'm trying to get into gatsby but we'll leave that aside), Docker, and the Django/DRF framework. The issue is I don't really know how to integrate them into a "scalable" or "highly available" web application, or even a methodology for comparing scalability for one approach vs another. I also understand CI/CD pipelines are a part of the typical microservices architecture, but would they even make sense for a solo dev vs trying to get a monolith running? Given "scalable" can mean a lot of different things, I'm entirely fine with more open ended answers about focusing on things that will matter more than sheer connections handled and dynamic content served, so please don't feel limited to answering in as few steps as possible if you think there are other insights about better practices/design patterns that should be mentioned.

So for extra context: I'll have a bit of free time in the next couple months to just sketch out ideas and try things (probably the last time for awhile), but I'm genuinely curious how the real software development practitioners would approach an open ended task like this. Unfortunately they weren't exactly offering a class on how to make scalable web apps (probably because there's thousands of different ways) during my pretty conventional college CS program, ya know?

11 comments

Going from 0 users to 1 user is the (first) hardest part of scaling a web app. Optimize your early architecture for speed of change, not for speed of serving web pages. You will need to make lots and lots of changes to get to 1 user, and then to 10 users, and so on.

So use the tools and platforms that you already know, that you can hire for, and that will allow you to try out changes rapidly. Do everything you can to keep things as simple as possible for as long as you can -- the more architectural pieces you add now, the more you will have to change later.

So, as I mentioned in the other reply I'm not really trying to start a business here I just want to understand more about the processes and tools underpinning these vast scalable web apps that surround us. But this advice sounds pretty reasonable relative to how one would actually go about creating a small tech startup and iterating on it enough to get into YCombinator (for example).

Given it doesn't sound like you're gonna share the info about scalable infrastructure, could you possibly provide some guides or extra reading for the patterns and practices one should be going with at the the small and feisty stage? I might as well take some notes down about this side of things if you have materials that speak to it.

The advice given by the original commenter is how you should start any application, whether it's one that's going to be used internally at a company, the main product at a startup, or a single app among a suite of existing applications at an established company.

Scaling is a reward for building something useful. Building the useful thing is harder.

Some generalities though, try to organize your application in such a fashion that data that is often needed together is stored close together, geographically, and try to shard (i.e. separate out) your data based on some identifier that can be sliced into many small pieces.

Given I have zero intention of making it any of those things, and this question was aimed at specifically learning more about the technical underpinnings of the rewarding part, I'm not really sure it's the right advice for what I'm doing. You will note that I indeed recognize it as being good advice in general though.

The generalizable advice about data co-location/data sharding is definitely something I will keep in mind (if this weird learner project really involves data in such quantities) however, thanks!

I can give you examples of scalability benchmarks from gaming and extrapolate from there why it's such a moving target to pin down what makes systems fast and scalable.

Now, the general measure of technical proficiency in game engines is in how detailed your scenes are and how fast they render. If you are rendering empty space then it's quite easy to blow up the scale by making the numbers big, and this is how early space games like the original Elite operate; a simple scene defined by a few numbers and some procedural generation can be made into the whole galaxy by repeating that scene with a different seed number. It is taking advantage of the saying "it's easy to get a wrong answer infinitely fast" by defining wrong answers to be right.

So we have to look at what's actually being processed to simulate and render the scene to understand scaling. And right away that should trigger something in your head about applications: if they have fewer features, their processing is simpler, so they scale more readily. Scaling problems are produced by feature complexity creating bottlenecks that can't be optimized by rote. And in most cases, we would rather have our apps produce right answers slowly than wrong ones quickly, hence the product design is a critical part to optimization: if we know our design will never need a certain feature, that's the place where we can optimize it.

From there, you can dig into the nuts and bolts of defining what kind of performance envelope you expect to have: so in games you might use a target frame rate, polygon count, texture memory, and the number of live AIs and entities. But as you build out the game the numbers start moving around because you're still adding features: when you add detailed animations with a lot of bones you spend some more of your CPU budget to deform the model. Every shader effect could have a GPU time cost. When you add audio and audio processing you have to allocate some memory and CPU time to the playback and effects. If you want to continuously stream in a scene(as is done in open-world games) you have to consider the rate and latency at which you can load it off persistent media, which leads to various different strategies. So you don't know at the beginning quite what you need. Instead you try to set general targets for what you'll try to hit, stand up a test scene with similar numbers, and then iterate on them later as you get more developed, with more features, fleshed out scenes and final assets.

On early cartridge platforms streaming was generally done off ROM with bank switching, which made it nearly instant: NES Zelda 2 does it up to hundreds of times on the overworld screen, because it was given a rough port from Famicom Disk System, which had more working RAM. This causes slowdowns in some parts of the map.

Games on CDs and DVDs had a huge capacity but limited bandwidth and high latency: this meant that the strategy to get the most out of them involved physically locating the data in places where the drive head would seek quickly, and then linearizing the data so that it didn't have to stop and start. Which meant that some data would have multiple copies for different scenes.

Modern gaming on SSDs changes the paradigm again, back towards lower latency accesses bolstered by hardware decompression: that allows the games on new consoles to eliminate loading screens.

Now, in a web app you can encounter a similar kind of thing with your database accesses and frontends. Some applications need to write very frequently, others need to read a lot. The distribution of reads and writes can vary(e.g. hosting one very popular video versus a sprawling e-commerce platform). These things determine where scaling needs to take place. But if you have no real users, you have an "empty space" scene where the bottlenecks aren't present because there's nothing to do - you can guess, but even the best guesses tend to be wrong when a site starts getting serious traction. Will you be able to batch things up like a DVD access? Will you need something like global state like a social network, or is the state just limited to the user session? You don't really know what it'll look like until the features go in and you can start profiling against the real-world samples.

It's not that anyone is trying to hide the secrets - it's just that scaling is a speciality you only end up possessing through the direct experience of trying to get a little more out of the architecture you have; the specific thing you learned may not apply if your next project has a different performance profile and different hardware.

In the meantime, the next best thing would be to take large existing datasets, construct synthetic benchmarks out of those, and then have fun optimizing them. Stuff like "how fast can I load this enormous CSV, do trivial processing, then store the result".

Alright, you win: this answer is fantastic. This is a far, far better way to think about what limits scalability than simple things like pages served per millisecond. I never did expect to find ready answers that will guide me to making the universally scalable app, but now I see the problem can be reduced even further into niche sorts of scaling which, while solvable with hardware tricks, do their very best to escape generalization.
Here are three books you might find interesting:

* The Pragmatic Programmer, by Dave Thomas & Andy Hunt

* Scalability Rules, by Marty Abbott & Michael Fisher

* Release It!, by Michael Nygard

Thanks! Do they include coding examples or is it advised you think of your own wrt the material?
A sensible plan shouldn't mention both "100000 simultaneous visitors" and "I'll have a bit of free time in the next couple months to just sketch out ideas".

100000 simultaneous visitors are a huge number that implies both progressive growth over a long time, without wasting resources in premature, oversized and overcomplicated infrastructure, and a solid business plan (in fact, an exceptional one) behind that growth, making products and services more important and more sophisticated than generic web application scalability.

On the other hand, two months of experimentation can allow you to learn a lot about highly scalable web application architectures and related technology, but building a real one requires an actual business; the most you can do on your own is reimplementing some benchmark.

That's why I put the range starting at 1000 simultaneous users funny enough, but I also wasn't sure if 100k was comically low (I suppose if you just wanted to serve static pages with a jpg and one line of text it might be) or high so left it in. That's already some insight to work from, but you may be mistaking me wanting to serve 100k simultaneous T shirt buyers with an actual expectation of needing that much margin: I don't, but I do want to know how/why the back of store infrastructure for such a large number of users works, and more about what a solo developer/small team would realistically want to choose instead.

To be clear I am not trying to create a business, I'm wondering what the ecosystem of best practices, tools, and frameworks to solve an open ended problem like this is so I can try them out before trying to build a real one (which I think you're right, would require an actual business to justify its existence) For example: could you elaborate on some possible benchmarks to try reimplementing, to this end?

> but I'm genuinely curious how the real software development practitioners would approach an open ended task like this

I just panic and procrastinate. Some of us never grow out of that.

But that aside, pick one of the mature traditional "batteries included" web frameworks to get started. It's never going to be the ideal architecture, but it's not going to be an awful one either. Really learn how to use it, and keep the code quality high from the start. Too many projects I've worked on have ended up in a deep hole because people decided to try a whole new tech stack in production without actually learning how to use it first. Learning by doing is great and all, but there will always be a pressure to "just ship it" instead of reworking all the beginner mistakes that will eventually come back and bite you down the line.

Technical debt is a made up problem and cannot hurt you :,(

All of this is entirely fair and worth considering, given pretty much all mature frameworks do tend to have some extension or build out option to make them "scalable" as far as I can tell. Now to choose one that's not Django, which I've heard is a bit of a nightmare for this sort of building out (but maybe more because of the problems you mentioned than anything wrong with the framework itself?)

I think any framework that has been deployed at scale is going to have people saying it's a nightmare. But very few people have deployed comparable applications in different frameworks at a comparable scale. With only one data point you can't really draw any conclusions.

It's a different matter if you can argue from a specific technical feature of a framework that makes it unsuitable. I don't know Django, so I don't know if anything like that applies to it. For now I think you should focus on finding something, anything, that's both enjoyable to work with, and lets you focus more on developing your business than worrying about architecture or menial implementation details.

So a lot of the scalability complaints arise from the fact it's written in Python, but as far as I can tell they're largely newer developers trying to grow some web app they made rather than people presenting at PyCon about why it's awful. Fortunately I am not trying to bootstrap a business from this thread, but I agree that it's far better to spend a few weeks getting my hands dirty with some framework or language that I'm familiar with than just reading about all the cool things I can do in some other language or framework that I need to still learn.
I've been working on a project that largely addresses this need. It's a distillation of knowledge gained from building web apps of my own as well as for startups for the past several years. The stack comprises of React (via NextJS), GraphQL, Express, Node.js, and PostgreSQL - all written w/ TypeScript.

The goal was to provide a hardened boilerplate app with no distinct featureset but at a minimum have things like ORM, user authentication, migrations, as well as a frontend web UI set up so that it could be used as a head-start for any web app projects in the future.

It's built for fast developer experience, while decoupled enough (a la containers) so that it can easily be taken into a kubernetes cluster for scale. (Still a monolith though.)

The project is private now, but happy share it on request. Also happy to answer any questions.

This sounds right up my alley honestly, could you explain a bit more about the affordances you set up for the decoupled container component to allow it to grow from a few containers to an orchestrated K8s cluster? Just some insight into how and when one decides to grow from a container or two to full on clusters would be really good to know.

As for sharing, while it's very generous of you to offer access and I'm certainly interested in using it once it gets a public launch, I'm much more in pursuit of the thinking behind system design choices and better developing the intuition that makes those choices. Honestly the framework you're putting together sounds like it could help a lot of people with similar problems to me though :D

Sure; because it is a monolith, it makes things simpler by having just the one backend container to scale. So it's a matter of making more instances of it available (via # of pod replicas in your k8s cluster) for increasing availability.

You can start using k8s right away, and just have 3 replicas running to start. As you scale, just up the number of replicas (and nodes as you need them) as you go.

Your real bottleneck becomes the database at that point (in addition to any blocking 3rd party APIs you may be using), which I would not host in k8s but use a managed service such as AWS RDS. This bottleneck will make itself apparent later on, depending on your application and the scale you reach. But you should definitely have the resources to cross that bridge once you, if ever, reach it, because you should be dealing with a large number of customers at that point.

Ah gotcha yeah at the small scale(?) scaling we're talking about monolithic applications, being a bit simpler to organize and run, do still make for a compelling solution. That's a great tip regarding how to handle ballooning storage issues via managed cloud offerings (in the weird case I make something that really works) that I hadn't considered. However, it's starting to feel like these scaling questions/solutions are a lot more akin to Factorio bottleneck chasing than I would like haha.
You should pick a hypothetical target first. So you got the Tshirt store but I'd pick something more complex. And then go through all layers from the start and make your app modular and take advantage of 3rd party. For example for the Tshirt store. Use a cdn. All your pages should be ssr or issr if not ssg from the getgo. This reduces your server load. And off loads the brunt of your demand onto a 3rd party. (cloud flare, CloudFront)

Then use an auth provider and payment processor. (firebase, PayPal) with dbaas.

All that's left is glue and server less functions. And none of the scaling is something you have to do.

Once you get that then sure, peice by piece you can put it on your own hardware and think it through from there (cdn, db)

The heavy usage of 3rd party tools does sound about right wrt what I've heard get used to vastly expand apps after they've got a minimum viable product people actually want to use though, particularly the usage of CDNs to reduce latency. However I haven't heard much about the details regarding ssr/issr vs ssg though, so if you have any reading that elaborates more on it I'd love to take a look.

Additionally, taking this scaled t-shirt app approach as a template can you share any insight into how a more complicated platform (eg Youtube, Amazon) would further leverage such in-house/3rd party services to most efficiently use their resources (ie at the fully mature platform stage)?

I use golang(no framework just pure golang) and my server can serve upto 1500req/sec as tested using ab(apache benchmark tool) on dual core i3-7130u.

For scaling it to 100000 req i can go with more beefier machine and scale it vertically or i can use a load balancer and start sending customer requests to different servers and scale it horizontally, this is only possible because selling t-shirt functionality can be handled independently. Homework problem for you how would you keep track of stock of your t-shirts when scaling horizontally.

I'd actually been thinking about giving golang a try after seeing some very clever usage of concurrently operating Go routines, do you know of any particularly interesting server examples you could share?

As for the homework...

naive answer: constantly update every T shirt app instance with a record of the current stock, as every transaction completes, and then make sure the system all agrees before allowing additional transactions. Sort of like a really inefficient internal distributed ledger.

horizontial/vertical scale agnostic answer: split the stock tracking into two separate services. One that is focused on managing transactions/maintains the number of T shirts (presumably a hash table for different types of T-shirts with their quantities), the other being updated every time that inventory table changes. The first service can probably be further decomposed into constituent services but for now we'll keep it simpler as a single big one. At page load time the app instance can send a GET (or equiv) request to the secondary service to get quantities of all the shirts on a given page (using matching item ID/hashes) that is then cached in something like localStorage, with ones in limited supply or out of stock denoted as such by the browser/app on render. On a transaction completion you send an update request to the primary stock service, which then begins its process of updating the table/passing the updated table to service two. This will allow customers to add products to their cart with a much lower chance of finding out some are out of stock at check out, but also (hopefully) keeps the stock tracking footprint lower than something like a constantly polling no-matter-what or monolithic tracking system.

Given the bottlenecks I can already see with the checkout transaction part, I'm actually extra curious how to improve it, so I hope you'll share what the exemplar solution would be!

Use managed services for any crud services like your REST api or whatever with auto scaling. Use firebase auth (free at all scales) and Firestore for real time updates and firebase messaging for notifications. You can use it for static content hosting too.

I’m still working on my idea but I ran some cost and performance simulations and this setup should work well enough at scale at a low cost that once it’s an issue I’m sure I will have enough money to hire experts to build custom solutions to fill any gaps.

Is it possible you could share more specifics about these simulation figures? While I'm still very much pursuing technical/organizational info about what underpins the big scalable services (or the meta-scalable ones like firebase, which itself helps other devs make scalable apps) I'd be very interested to know how stark the difference in cost/performance is for this managed crud/firebase-heavy solution vs other offerings, or even just as a baseline of whats out there in the space of easier-to-architect scaling options.
1000 concurrent visitors is literally 31 billion page views per year if you assume a whopping 1 sec.

What you're asking depends on the type of load. A t-shirt website load will be on the DB, and the need for ACID operations, but a YouTube site will need tons of CPU for transcoding, but losing a comment or two, or them showing up in the wrong order is no big deal so you could use distributed nosql.

But you're trying to run before you can even crawl.

If you're looking at a more realistic load scenario for the t-shirt sitez in reality your first scaling step would be very different to what you're talking about.

Almost all scaling problems for the kind of t-shirt site you describe above can be handled with a couple of carefully chosen in-memory caches. This can literally be a global variable and it'd work fine. Most frameworks have a built-in memory cache that's better, obviously, and can handle asynchronous access/reset gracefully.

If you want to get really fancy, you can even use redis or memcached. Again, many frameworks have that baked in or easily added with a simple library.

In this case, it would be a cache of the t-shirt data for displaying the product page data, and for serving search results.

So instead of using an orm to get the data from the database, you'd keep that data loaded in memory to access it really cheap and fast, and reset the cache (or part of it) when something actually changes, like you add a new tshirt.

To reduce server load you use a CDN for static assets, you'd also pre-resize images into thumbnails, right size for product page, etc. for search results then serve them separately so your webserver's not dealing with them. Something like S3. By scaling them correctly it reduces your bandwidth use, and makes your page load faster.

You don't do image resizing on demand as it's a CPU intensive task.

The rest of the app can be a perfectly normal web app, or monolith as they're known.

The next step would be talking about scaling UP (making your server beefier) for the T-Shirt app, but for the YouTube app you'd be looking at scaling OUT (adding more severs).

Then a couple more steps down the line and you might be considering k8.

Unless you've had the whole app over-engineered by a software architect who doesn't have any real experience. Then you'd have more microservices than customers.

Thank you for the fleshed out answer! These are the sorts of considerations and solutions I was looking for. The first scaling step is probably the most informative of all though, is it possible you could point me towards additional reading about how developers have handled that first scaling step/the decision making behind what should be prioritized to go in the limited cache space (naive example: localStorage in js). Even if you can't, I really do appreciate what you've shared already!
Depends on what exactly it wants to do. You could use S3 static site hosting behind CloudFront with Lambdas for the functions. Not sure if it would scale to that number of users though. My project gets very little traffic.
That could work for a smallish CRUD application I think (just using the function as a service approach to really squeeze efficiency out of the compute time), but yeah the specific numbers aren't quite as important as learning about how to acquire a scalable fraction of the power behind the massively distributed platforms that now dominate the world. This seems like one such approach that some devs have very definitely made working services from, based on a quick search.
search "system design" these are pretty much standard questions these days asked at high tier level companies for SWE.

with that said, doing it is easy, but balancing it with budget becomes also tricky.

Ah that makes a bit more sense. Could you possibly point me in a more specific direction, eg a current system design guide aimed at developers just trying to get their feet wet with things more complex than an SPA? Yeah, that issue is one I've noticed; you're expected to sort of pick this knowledge up from a job but to get the job where you become acquainted with it you're likely gonna need at least some relevant experience with it.
Try reading Designing Data-Intensive Applications.
I've heard it recommended before, but some have mentioned it's overly theoretical and doesn't offer much in the way of helping you build working examples of the material (although I suppose the complexity of the material does make toy examples a bit tricky to come up with). Do you know of anything that does have such offerings?
It's not exactly what you're asking for but if you want to grok distributed systems and managing workloads then learning some Erlang/Elixir (OTP runtime) really helped me, as you can "code along" with your book of choice and they handle real-world situations like node failure and backpressure management.

Other topics that come to mind are books about building out microservice architectures. There are certainly plenty of war stories out there and micro-services seem to tend towards re-implementing OTP runtime primitives in arbitrary languages as design patterns, so you get even more of a feel for what's going on at a lower level of abstraction.

Read designing data intensive applications
I've heard this book (assuming it's https://dataintensive.net/) recommended before, but some have mentioned it's overly theoretical and doesn't offer much in the way of helping you build working examples of the material (although I suppose the complexity of the material does make toy examples a bit tricky to come up with). Do you know of anything that does have such offerings, or possibly if I'm thinking of a different book?