Hacker News new | ask | show | jobs
by bob1029 1537 days ago
I think the biggest problem for most developers is not understanding what one computer can actually do and how reliable they are in practice.

Additionally, understanding of how tolerant 99% of businesses are to real-world problems that could hypothetically arise can help one not frustrate over insane edge case circumstances. I suspect a non-zero number of us have spent time thinking about how we could provide deterministic guarantees of uptime that even unstoppable cosmic radiation or regional nuclear war couldnt interrupt.

I genuinely hope that the recent reliability issues with cloud & SAAS providers has really driven home the point that a little bit of downtime is almost never a fatal issue for a business.

"Failover requires manual intervention" is a feature, not a caveat.

6 comments

Some people don't even realise how much traffic a simple web app with server side rendering (decently written), hosted on an average dedicated server can hold... They dont need cloud, autoscaling, microservices, kafka, event driven architectures, etc.

We've lost our way in the masked marketing the cloud providers are creating to help us solve problems we will never encounter, unless we are building the next Netflix or Facebook.

If you want to get an idea of where things are at right now, this is a good place to start looking:

https://www.techempower.com/benchmarks

If you just need plaintext services, something like ~7 million requests per second is feasible at the moment.

By being clever with threading primitives, you can preserve that HTTP framework performance down through your business logic and persistence layers too.

Their benchmarks are also really cool because you can choose to filter down technologies by what you personally know or just want to compare, for example: https://www.techempower.com/benchmarks/#section=data-r20&hw=...

Thus, in my case those numbers might be closer to the following:

  - plaintext: up to 2'500'000 requests per second, most technologies go up to around 500'000
  - data updates: up to 14'000 requests per second (20 updates per request, so 280'000 updates per second)
  - fortunes: up to 300'000 requests per second (full CRUD and sorting)
  - multiple queries: up to 32'000 requests per second (20 queries per request, so 640'000 queries)
  - single query: up to 530'000 requests per second, most technologies go up to around 100'000
  - JSON serialization: up to 970'000 requests per second, most technologies go up to around 200'000
Of course, their setup also plays a part, since the VPSes that i'd go for probably wouldn't be comparable to a Dell R440 Xeon Gold.

It's really nice to have this data, but the code that's written also plays a really big factor - i've seen people who write code with N+1 problems in it and call ORMs in loops and adamantly defend that choice because "such code is easier to reason about" instead of a simple DB view that would be 20-100x faster. With such code, it'd be closer to the "multiple queries" test.

Then again, these tests basically tell you that in 90% of the cases you should go for Java or .NET, abandoning Python, PHP and Ruby for them (though one could also introduce Rust into the mix and say the same), which realistically won't happen and people will use whatever technologies and practices that they feel comfortable with.

I've seen applications that work fine with hundreds of thousands of page loads per minute (multiple requests per load) and i've seen systems that roll over and die with 100 concurrent users, lots of variety out there.

Also, those complicated architectures are often quite unreliable anyway - just in ways that don't show in metrics. Slack comes to mind: not only its functionality is poor compared to eg IRC, but it fails in hilarious ways, eg showing duplicated messages, or not showing them at all. Another example is YouTube - the iOS app gets confused when displaying an ad, which results in starting the playback at a wrong time offset. I guess it's because companies like those don't care about actual reliability - what they do care about is availability.
Slack comically uses gigabytes of RAM and plenty of CPU time in the client side.
Wtf are you doing with it? My slack instance (on linux) is resting around 300 MB resident set size and 0% cpu. 300 MB is still a lot for a chat app, but it is definitely not gigabytes.
Make sure you're counting all the sub-processes it spawns (at least on Mac, don't know about linux)
If you just add up memory usage for subprocesses you are likely to over count due to shared memory. The number you typically want to add up in Linux is ‘proportional set size’ which is, I think, the sum over every page of the process’s memory of page_size / number of processes which can access the page. I don’t know what happens if you mmap some physical memory twice (I think some newish Java GC does this).
I wonder who he stole that joke from.
I would assume bruised_blood, but I can't [easily] find the original, so I posted that.
No, sure. That’s fair enough.

My point simply being that iamdevloper is a notorious joke thief and is especially unsporting about it when it’s pointed out.

It's a nice demonstration of the efficiency of web apps vs. native apps.
It really has nothing to do with that. The slack client is just written poorly.
> The slack client is just written poorly.

Why yes, Slack does seem to be written poorly - just as poorly as every other web app that I use including VS Code, Discord, MS Teams, etc..

Maybe the Slack developers are just stupid, uneducated, malicious, poorly managed, or ambivalent (or all of the above) but the platform does to be conducive to creating clunky and bloated software.

While i agree with your overall point, i think that VS Code is one of the better (only?) examples of really good web technology based software. It's snappy, reliable and has very few bugs.

If you want bad examples of similar software, have a look at Brackets (https://brackets.io/index.html) and Atom (https://atom.io/).

Maybe things have improved in the past years, but last i checked Atom in particular was horrendously slow.

How could you say that Slack has poor functionality compared to IRC?
When you type something into IRC that message shows up in the log and every online users client pretty reliably. Furthermore the high degree of diversity among clients provides a pretty extreme amount of client side functionality that Slack completely lacks (scripting is a huge one.)
The versatility of clients is indeed a huge benefit of IRC. I used to use IRC at work and always had my Weechat window split with a small pane up top showing either my highlights or a channel I needed to monitor at the time. With Slack, you can’t do that, which means you have to repeatedly click between channels if you need to pay attention to multiple at a time.
You can use split view to keep an eye on another channel. But another window would be better.

https://slack.com/intl/en-gb/help/articles/4403608802963-Ope...

I love irc, but is is just silly.

Slack has much better history because you don't need to have been online when messages are sent to log them. Slack is absolutely more reliable in this regard.

IRC is easy to script because the protocol is so simple. But you leave so much on the table for that cost.

Obviously if your use case is text only that you don't care about being persistent and you lean heavily on scripting to get things done then IRC will do the trick. Otherwise it's such a crutch to do anything besides beyond that.

Slack is not instantly "better" than IRC, it's just a different approach to the chat problem and it's arguably more approachable for people that don't want to learn about the chat space.

Logging is just different between the two.

For IRC, logging is outside the scope of the IRC protocol. Anyone can log anything anytime anywhere with whatever policies and procedures they want. This usually leads to each channel/project having some "official" log of the channel somewhere, using whatever they feel is good for them.

Slack on the other hand centralizes the logs, which removes lots of control into the administrators/slack developers.

So Slack's logs are likely easier to find, but that doesn't necessarily make them easier to use.

Persistency is also just different, IRC makes it your problem, but it's a solved problem if you care about it. irccloud.com and sr.ht both offer persistence in different ways as two differing examples to the problem.

Slack of course centralizes the problem and removes some control.

I personally think Slack and approaches like it (I prefer MatterMost) are great for internal things where administrators need central control of stuff for various reasons. For public things, I think Slack is a bad solution, and something like IRC or Matrix is a better solution to the problem of public chat.

> For IRC, logging is outside the scope of the IRC protocol.

Nope, the community has understood that server-side logging (and making it available to clients who missed stuff happening) is a useful thing. https://ircv3.net/specs/extensions/chathistory

IRC has logs for history, they're fast and you can run your own logger to control the retention policy if you want. These heavy weight IM tools have extremely short log retention (months) and searching through the logs is extremely slow and frustrating IME.
Not only that, but you can grep IRC logs using whatever Unix tools you want, which is much more powerful than anything Slack has to offer.
> I think the biggest problem for most developers is...

... reading blogs and such where some loud mouth is telling them about so called "best practices" and so they bring that back to work with them.

There are not enough loud mouths telling people to keep it simple (until you can't or know better).

Past the proof of concept, "developers" should frankly not be making these decisions. People who understand systems and failure analysis should be. You might have devs with that experience, but they're comparatively rare.

As far as complexity... if you get big enough, you can't avoid it. My meta-rule is to only accept additional complexity if solving the issue some other way is impractical.

It is almost always far, far easier to add additional moving parts to your production environment than it is to remove them after they're in use.

These requirements don't come out of nowhere. Normally they come from:

1. CEOs/whoever that don't listen to how much additional complexity it is to build a system with extremely high uptime and demand it anyway.

2. Developers with past experience that systems going down means they get called in the middle of the night.

3. Industry expectations. Even if you're a small finance company where all your clients are 9-5 and you could go down for hours without any adverse impacts, regulators will still want to see your triple redundant, automated monitoring, high uptime, geographically distributed, tested fault tolerant systems. Clients will want to see it. Investors will check for it when they do due diligence.

Look at how developers build things for their own personal projects and you'll see that quite often they're just held together with duct tape running on a single DO instance. The difference is, if something goes wrong, nobody is going to be breathing down their neck about it and nobody is getting fired.

If the additional complexity is just "Use this premade thing" and it only adds a half hour here and there of work, while also giving you essentially a premade and pre-documented workflow than new people will instantly know(Whatever your "bloated" tool tells you to do), then it might be a net win anyway.

If the extra complexity is microservices and containers you might have an issue, but microservices are kind of a UNIX philosophy derivative, I'm not sure the complexity is really intentionally added(Like when someone uses an SPA framework or something), it just kind of shows up by itself when you pile on thousands of separate simple things without really realizing the big picture is a nightmare.

> "Failover requires manual intervention" is a feature, not a caveat.

I was scarred by the DDoS of Linode on Christmas Day 2015 (as a Linode customer at the time). I believe that was the only time my Christmas was ever interrupted by work. Of course, one might respond that being the one perpetually on-call sysadmin isn't ideal.