Hacker News new | ask | show | jobs
by atonse 1212 days ago
Even though I love their simplicity as an example of how to be pragmatic and not over-engineer, do remember that they’ve tuned their code to the point that they built an ORM that is one of the fastest in the NET world. I used it and it was awesomely lightweight.

It’s as much an example of how far world class talent can go, as it is about doing more with less.

5 comments

Right - Marc Gravell and Tim Craver, who worked on the core architecture of Stack Overflow, were both so obsessive about extracting performance from .net web applications that when they couldn’t do any more from the outside, they both quit and went to work for Microsoft on performance improvements in the framework itself.

I feel like it’s similar to how people point to Craigslist as evidence that you can still build sites in Perl - ignoring the fact that Craigslist has Larry Wall on a retainer.

Running highly scalable monoliths is easy! As long as you’re willing to hire some of the five to ten people in the world who are capable of advancing the state of the art of development on that technology stack…

Except that servers are literally 50-100x more powerful than they were when these sites were built. You just don't need legendary talent anymore to accomplish pretty reasonable scaling with a simple low server count architecture.
It's true! Hardware is powerful enough nowadays to run all those needless microservices and containers. ;-)
You don't, really. You can use Django or Perl today and just enable nginx caching for non authenticated users, for many applications.

Stack Overflow didn't need these optimizations. They could have just deployed 20 servers instead and still been profitable. People optimized just because they like to.

Yes, Microsoft SQL Server is famous for its ability to get faster just by adding more servers.
The discussion isn't really around the DB/SQL Server. As far as I could tell, we were discussing .NET and optimizations in its ORM.
Minor correction but that’s Nick Craver https://nickcraver.com/
> Running highly scalable monoliths is easy! As long as you’re willing to hire some of the five to ten people in the world who are capable of advancing the state of the art of development on that technology stack…

I truly believe that being able to design and run a modular monolith application effectively (not talking about the 'hyperscale' scenario here) should be a prerequisite for designing and running a set of interconnected microsservices. The challenge is similar, but dealing with modular monoliths has the advantage of not having to deal with the uncertainty of networking programming (i.e. remote calls, network error handling, distributed transactions).

I think the other point being - very few applications need this kind of scaling.
Dapper! I used it a while back and it was a single class that bundled query results straight into a list of objects by emitting low level CLR bytecode

Looks like its expanded a little since then

https://github.com/DapperLib/Dapper

You can also see this the other way around — it's a testament to how slow some other stuff is.

Which, to be clear, is not intended to be a negative statement about that "other stuff". It really depends. Some is. But I've also seen things just done poorly by applying tools wrong, e.g. ORM misuse leading to thousands of queries that should have been one OUTER JOIN.

But I don't think you need engineers of their unique calibre to get most of what they got. It's probably an exponential thing, if you have some merely good engineers you could maybe achieve 80% of their performance. The last 20% are just much more costly.

Yep. Following some of the SO folks on Twitter a while back, I remember watching them do all sorts of things with .NET that didn’t feel remotely “necessary” for a Q&A website. It’s not like you can pull people off the street and have them get away with infrastructure this simple.
> It’s not like you can pull people off the street and have them get away with infrastructure this simple

I know that in many cases simple != easy but I can't help feeling sad while reading this.

When I started my career cloud wasn't yet mainstream bu as a beginner I was able to deploy and configure a nginx proxy and loadbalance between 2-3 backend servers without too much effort. It wasn't some kind of rocket science.

I guess the current issue is that cloud has been marketed so much that nobody who's just starting out in the industry even has a second thought about using it by default. What can I say, great job from the cloud providers in capturing their customers as soon as they get in front of the store.

Great, now you have an nginx reverse proxy as a load balancer in front of a few servers. Now sort out log storage, certificate expiry, access controls, patch management, health monitoring, and remote administration, update it whenever you add or remove backend servers for maintenance, and make sure to synch it up to DNS, and you’ve almost got the same capability as an AWS ELB. Except yours doesn’t have high availability or horizontal autoscaling.

Getting all of that stuff right actually kind of gets close to rocket science. Which can be worth doing… but just be aware that Amazon will happily sell you a rocket kit.

I'm not an "on-premise bare-metal server absolutist". Of course there are trade offs in terms of convenience but there are also trade offs in terms of cost and performance and vendor lock-in. It all depends on what you need and what are your specific constraints.

Is time to market critical? Will you have daily traffic fluctuation between 10 to 10k users? Will you lose a ton of money/customers for any service interruption? By all means use the latest version of managed kubernetes combined with whatever other cloud service tickles those itches. But don't forget to always keep an eye on your bills and think how can you reduce them by simplifying your architecture.

But if you're just building a corporate intranet for a few dozen users who log in once a week I'm pretty sure a simple VM (even if managed in AWS) would make much more sense.

And if you really want to roll your own there are plenty of options to make your life much easier compared to sending a rocket into outer-space. Yes it's more work upfront but after you do the setup the first time there's little to do.

infra automation & templates: - ansible, docker, etc

log storage: - mount shared storage - ELK - use a paid LaaS or monitoring SaaS

certificate management (on LB machine only): - certbot

access controls: - linux user and groups management

patch management: - enable unattanded upgrades for security patches

health monitoring: - in terms of lb nginx has that built in. - for more advanced use cases use a paid service (new relic) or a free one (nagios)

remote administration: - ansible, etc.

Don't get me wrong I use cloud on a daily basis for work, I'm just sad because most teams don't know how to use it effectively without jumping the gun.

> log storage, certificate expiry, access controls, patch management, health monitoring, and remote administration, etc

This is how you can satisfy those needs with stock Linux. Install Ubuntu then:

    apt-get install certbot unattended-upgrades systemd-journal-remote
    wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh
Remote admin and access controls are already handled via SSH and ordinary UNIX permissions. DNS editing is easy, just use your registrars UI for it.

Oddly, the most painful part is uploading servers and making them properly start up, be backed up etc. You can use Docker but I've written a tool that does it without that, just using systemd and Debian packages. You can run it on Mac/Windows too and it'll build a package for your server, upload it, install it, start it up etc to a list of servers defined in the config. You can sandbox the server with an additional line of code, define cron jobs with a few others etc. It's a bit more direct than Docker, and gives you the traditional stuff like OS managed security updates (for the libraries the OS provides).

> Except yours doesn’t have high availability or horizontal autoscaling

HA: Some people have extremely distorted ideas of how reliable server-class hardware and datacenters can be. There was someone on Reddit commenting on the 37signals cloud exit who believed that normal datacenters have 99% availability! Actual figure for most well run commercial DCs: closer to five nines. Some datacenter providers like Delft (as used by 37signals) promise 100% availability and give SLA credits for literally any downtime at all, which they can do because they have so little.

Auto-scaling: this is often a requirement that comes from the high cost of cloud services. If you only need 9 servers you don't need to auto-scale, you can just buy the servers and leave them running 24/7. Yeah, there are definitely places for that like companies that need to occasionally run huge batch jobs where the cloud model of multi-tenant sharing makes total sense, but for a website like Stack Overflow it's just not needed. Remember that their hardware runs at low utilization despite not having any caching layer; they can absorb huge spikes in traffic without issue assuming they're provisioned with sufficient bandwidth.

> Getting all of that stuff right actually kind of gets close to rocket science ... Amazon will happily sell you a rocket kit

This makes me feel kinda old, but I can't grow a beard let alone a gray one :( It's a type of sysadmin skill that was once considered entry level and which could be readily found in any university IT department. Probably still can be. Yes, if you grew up with AWS writing nodejs apps on a MacBook, if you never installed Linux into a VM and played with it, then it may seem scary. But it's not really so bad. You should try it some time, it's a generic skill that can come in handy.

To add on to the HA comment: A lot of people have distorted ideas of how much availability they actually need. A lot, if not most, applications could probably get away with the absolutely abysmal 99% uptime, depending on how that downtime was distributed. 99% uptime could mean anything from ~3 days of downtime a year, 7 hours a month, 14 minutes a day, to half a second of unavailability a minute.

Like sure, it's not ideal, but real businesses almost never are. And, as you pointed out, most datacenters get dramatically better uptime than that.

Not to take anything away from Dapper (it's an excellent library), but it isn't really that much faster than EntityFramework anymore.

> EF Core 6.0 performance is now 70% faster on the industry-standard TechEmpower Fortunes benchmark, compared to 5.0.

> This is the full-stack perf improvement, including improvements in the benchmark code, the .NET runtime, etc. EF Core 6.0 itself is 31% faster executing queries.

> Heap allocations have been reduced by 43%.

> At the end of this iteration, the gap between Dapper and EF Core in the TechEmpower Fortunes benchmark narrowed from 55% to around a little under 5%.

https://devblogs.microsoft.com/dotnet/announcing-entity-fram...

Again, this isn't to take anything away from Dapper. It's a wonderful query library that lets you just write SQL and map your objects in such a simple manner. It's going to be something that a lot of people want. Historically, Entity Framework performance wasn't great and that may have motivated StackOverflow in the past. At this point, I don't think EF's performance is really an issue.

If you look at the TechEmpower Framework Benchmarks, you can see that the Dapper and EF performance is basically identical now: https://www.techempower.com/benchmarks/#section=data-r21&l=z.... One fortunes test is 0.8% faster for Dapper and the other is 6.6% faster. For multiple queries, one is 5.6% faster and the other is 3.8% faster. For single queries, one is 12.2% faster and the other 12.9% faster. So yes Dapper is faster, but there isn't a huge advantage anymore - not to the point that one would say StackOverflow has tuned their code to such an amazing point that they need substantially less hardware. If they swapped EF in, they probably wouldn't notice much of a difference in performance. In fact, in the real world where apps, the gap between them is probably going to end up being less.

If we look at some other benchmarks in the community, they tell a similar story: https://github.com/FransBouma/RawDataAccessBencher/blob/mast...

In some tests, EF actually edges past Dapper since it can compile queries in advance (which just means calling `EF.CompileQuery(myQuery)` and assigning that to a static variable that will get reused.

Again, none of this is to take away from Dapper. Dapper is a wonderful, simple library. In a world where there's so many painful database libraries, Dapper is great. It shows wonderful care in its design. Entity Framework is great too and performance isn't really an interesting distinction. I love being able to use both EF and Dapper and having such amazing database access options.

Totally agree. To clarify, when I picked Dapper, it was 2014, where there was a huge difference.

No doubt EF has probably gotten to that level since MS has done a stellar job with .NET core of relentlessly slimming things down and improving performance.