Hacker News new | ask | show | jobs
by lightbendover 1202 days ago
> No one is working with a huge amount of data in big loops using virtual methods to take every element out of a huge dataset like he is showing.

Things way worse than that exist. Replace "virtual method" with "service call."

4 comments

> Things way worse than that exist.

Yeah. I opened discord earlier, and it took about 10 seconds to open. My CPU is an apple M1, running about 3ghz per core. Assuming its single threaded (it wasn't), discord is taking about 30 billion cycles to open. (Or around 50 network round-trips at a 200ms ping).

Crimes against performance are everywhere.

Or as Casey would put it: Discord is taking 3.7moo ("Moon Unit") to open. A Moon Unit is equal to ~2.7 seconds, the maximum ping time to the moon. Therefore, if Discord had their servers on the moon, nobody would know the difference.
Have you measured CPU time? A very large factor will be Disk IO and Network
Exactly. The webpage is probably asking for resource from 10 different servers and one of them is a bit slower than the others, and the page rendering itself likely doesn't take very long.
No; I have no idea why its so slow. Its kind of hard to tell - I guess I could use wireshark to trace the packets. But who cares? At least one of these things is true:

- It makes horribly inefficient use of my CPU

- It needs an obscene number of network round-trips to load

- One of the network servers that discord needs to open takes seconds to respond to requests

This isn't a new problem. Discord always takes about 10 seconds to open on my computer. (Am I just on too many servers?)

It should open instantly. Everything on modern computers should happen basically instantly. The only reason most software runs slowly is because the developers involved don't care enough to make it run fast.

Except for a few exceptions like AI, scientific computing, 3d modelling and video editing, modern computers are fast enough for everything we want to do with them. Software seems to have higher requirements each year simply because the developers get faster computers each year and spend less effort keeping their software tight and lean.

> The only reason most software runs slowly is because the developers involved don't care enough to make it run fast.

There is truth to that, but also:

* some of them would care if they knew what was possible with reasonnable effort (that's what Casey is trying to address. So far in the course i'm not really seing much that I could apply to the kind of code I write, sadly - but I'm hoping to learn stuff.)

* it's very likely that making performance-aware or optimized code takes just a tad longer than not doing it, and time-to-ship is valued much higher than time-to-run in most industries (this is the point I think Casey is overlooking, or at least not addressing enough. I don't know if it's by design - maybe he disagrees with the trade-off entirely - or if he's biased towards one of the few industries where time-to-run is crucial.)

Right; most teams optimize for velocity before performance.

This makes sense when you're a shiny new startup. But seriously, 10 seconds for discord to open? There's a point in every product's lifecycle where performance is a feature. Discord isn't a startup anymore. Why can't they fix these performance problems? At least discord is pretty snappy once its loaded. The new reddit interface? Its a hog. But despite a massive outcry, why haven't they fixed it?

My pet theory is that they don't know how. And talking about velocity is just a smoke screen.

I think most professional engineers don't really understand the software stack well enough to be able to improve the performance of the software they write. Its pretty understandable - nobody asks about this stuff in job interviews. And the software stack only gets more complicated each year. If you follow React tutorials online, you can get pretty far adding features to a web app without ever needing to understand how react actually works. Or the web browser, and Vite / webpack / whatever and the operating system it runs on top of.

And thats a pretty good deal! More engineers! So long as we don't mind the new reddit site. And electron apps that take seconds to load.

Of course Casey Muratori knows how to write performant code. He understands the whole stack. He knows how to read the assembly that the C++ compiler produces. Thats something more of us should aspire towards.

I wonder if it would be valuable to make an online course talking about performance engineering. I feel like its one of those things that has fallen by the wayside, and I think thats a massive pity.

Which is precisely the point made couple comments up. Calling a lot of virtual methods in the critical path is peanuts compared to making a lot of network requests in said critical path.

But hey, those network calls are fast on my loopback interface, or my company LAN, when I'm playing with the dev version, using test set simulating 2 users and 5 posts for each. Surely it'll be just as fast for the real users, over the Internet, on channels with 1000 users and 5 posts per second.

"Don't make tons of RPCs" is a totally separate issue from "don't make subclasses because virtual methods cost a few extra cycles".
It's the same problem. Virtual calls are degenerate, in-process RPCs. Or put another way, the reason you make tons of RPCs is the same reason you make tons of virtual calls: you consider services or subclasses to be cheap, so you use them a lot to mold your systems to organizational/people problems instead of the thing the software is supposed to do.
IMO the main difference is that for ~98% of people writing code, subclasses actually are very cheap. The performance losses (11 cycles per iteration?) aren't enough to dissuade me from organizing my code cleanly.
It's more than performance. Some of the worst code I've worked on has had too many layers of sub-classes, making it difficult to navigate and a real loss to developer productivity. After a certain point, it becomes OO spaghetti or, more accurately, "lasagna." At more than 3 layers, you really need to stop and think if it's necessary.
Query loops are also just as problematic: looping through a result set, and making another query (or worse, N queries) per result, etc.
As in like a micro service? Ahahaha. Our CTO just pushed for microservices everywhere and we're not even that far along and we're chasing all kinds of performance problems. Insanity.