Hacker News new | ask | show | jobs
by tluyben2 2353 days ago
But that is the case now too and in my experience it swung to paying through the nose for hardware in general; as more or less a sidetrack I take on projects where I optimise (mostly online) systems. Example; a few weeks ago a startup asked me to check out their setup as they were spending almost 30k$/mo on AWS. I spent a few days optimising and now they are down to less than 10k$. With some more work it will be a few 1000$; there is still so much wrong. But that is less low hanging fruit so it will be a lot more expensive. Still well worth it imho.

People really bought into the ‘people are more expensive than hardware’ as an excuse to get screwed like this. For $5k in human cost, these guys (and their investors) now save 200k/year in hosting. And this is not an isolated story; I am working on another one at this very moment. Programmers have become so incredibly sloppy with the ‘autoscaling’ and ‘serverless’ cloud ‘revolution’.

5 comments

I don’t know if you feel this way, but my complaint about the more hyped cloud services is that not only can they be expensive (fine) but the promised time-savings and simplicity of operating the system often doesn’t really materialize either, except in restricted circumstances that you don’t appreciate in advance and only find out later after you’ve already committed.

If it really did save time and were simpler, some companies would (quite reasonably) be willing to pay a premium for that - time is money and all that. In reality it seems like people often end up with the worst of both worlds - it’s expensive, complicated, still needs a huge staff to maintain, and doesn’t even work that well.

Well it is better than making even trivial architectures with actual hardware (I have some pictures of me hauling servers over xmas a long time ago; that was cheap (in monthlies and hardware, not in hours!) but I would and do pay a premium for that). Otherwise I do agree somewhat; most overbearing systems can be done much simpler but we are all preparing (and thus paying) for eventualities that most likely will never occur or will not influence the bottom line.

Tech like AWS Lambda (of which I like the theoretical idea) are meant to remedy the issues with complexity for a premium. But that premium makes, personally, my eyes water. I cannot see any high volume operation justifying going live with it. Are there big examples of those? And how is it justified vs the alternatives (which are, besides some programmer+admin time and scalability) far more efficient?

There are some significant high volume cases. We work with companies doing billions of Lambda invocations per month and realising large cost saving benefits. Lambda itself is usually the smallest part of the bill as one of the advantages of building serverless applications is you shift the responsibility of certain execution to specially designed managed services as opposed to code consuming CPU cycles; for example API Gateway takes over routing, S3 takes over file system calls, etc. A large portion of savings organisations see though is in time to production, as well as the overhead of managing servers and container clusters which is a lot more costly than you might think. Especially in the environment we are in now where qualified Dev Ops talent is hard to come by and at a premium. Sure, a developer can take some time to try and learn how to put together some infrastructure, but that's time taken away from adding direct value to business needs and not to mention the fall out when things go pear shaped later because it turns out a few hours Googling doesn't turn someone into a DevOps expert.
You definitely know what you are doing then; I see mostly the negative cases... The abuses of things for which they are not made etc. Thanks for the insight!

> as well as the overhead of managing servers and container clusters which is a lot more costly than you might think

A lot of people underestimate that in my experience; I see a lot of people who find it cool setting them up (also, a large amount are not doing this scripted but via the web interface). My current case has a myriad of VPC, container clusters, load balancers, clusters, auto scaling etc and it looks really impressive but it's very costly and their dev (who was also devops) disappeared as he buckled under the stress. Also, none of that is needed in this case (not saying there are not many cases it is needed!).

Anyway I will experiment more with Lambda; I think I'm tainted by the very costly abuse cases I had to move to normal linux environments to make affordable for the startup.

Thanks for sharing. I am aware there is no small amount of cases where the cloud offerings do save money in total.

But to be fair, for most projects the complexity that Amazon's services carry with them is absolutely not justified. Sure I can learn to work with 10-20 Amazon services but even me as a senior guy who knows his way around pretty much anything you throw at him, that's precious time spent not helping the direct business needs but basically making sure the house won't collapse.

And a lot of smaller companies like to merge the "programmer" and "DevOps" titles into one person because of course, that means one paycheck and not two. And as you said, they get angry that you can't become a pro sysadmin in an afternoon.

I suppose I am just trying to say yet again that many companies reach for BigCorp tools when they really ought to be fine with 2-3 DigitalOcean droplets and 1 dedicated DB droplet, plus 1 extra for backups.

But it does save an enormous amount of time. We have numerous customers using tools like the Serverless Framework to help put together sophisticated systems in days that would have traditionally taken months. I've experienced it myself personally and worked with multiple customers who see the same thing.

Its also not the initial time saving. After implementation, infrastructure maintenance is almost non-existent because the services are all managed for you and you can focus on providing direct value and not worrying about whether your infrastructure can meet your needs.

> paying through the nose for hardware in general

You also have to consider that there are limits to how parallel an application can be - Amdahl's Law - at some point even throwing hardware at a scaling issues has its limits.

Of course, there's also a truism that the team who implemented the first pass won't have to support (financially or as a developer) the software when it no longer scales.

No, amdahl's law is (roughly speaking) a limit to how parallel an algorithm can be. Applications (in the sense of web apps) generally have the potential to scale via Gustafson's law, but we are (IMO) largely held back by framework and old ways of programming. https://en.wikipedia.org/wiki/Gustafson's_law
So long as an application needs to share state between worker processes, (database, redis cluster, etc) then Amdahl’s law still applies. There’s very few modern applications that can truly scale linearly.
You're confusing something. Fully consistent databases are primarily limited by the speed of communication because they need to replicate writes and queries to all nodes and wait for a response even if a node is on the other side of the planet. Unless your CPU is extremely slow (clock frequencies of a few kilo Herz) the speed of light is a significantly more important limit. This is actually a usecase where modern CPUs are more than fast enough and we don't need a significant improvement in processing speed. Faster storage and networks are welcome though.
As a guy using a 10-core Xeon workstation with 64GB 2666Mhz DDR4 ECC RAM and a NVMe SSD capable of 2.8 GB/s read and write... I have to tell you that I only partially agree.

I've noticed my compilation speeds got dramatically better (compared to a MacBook Pro and an old-ish i7-3770 desktop PC). And it can handle even the sluggishness of Slack just fine without you noticing a lag, which I view as a huge achievement.

However, one thing my very detailed system monitors are telling me every day is -- 99% of all software we use every day is not parallel enough. So I have this amazingly powerful CPU that only (1) Git garbage collection, (2) PostgreSQL restoring a big backup, (3) Rust compiler and (4) [partially] Elixir compiler can saturate to its full potential.

I'd say that if everybody buys the new AMD Threadrippers and PCIe 4.0 motherboards, RAMs, SSDs and GPUs, we'd all be collectively fine for like 10 years.

The software however, it badly needs more parallel processing baked in it.

Share consistent state. Eventually consistent models (most web apps) are often generally okay.
(NOTE: I don't disagree with you, I am more like paraphrasing you and adding my take.)

In practice most software is light years away from this theoretical limit of "can't be anymore parallelised". And I fully agree that throwing hardware at a problem indeed has limits, although they are financial and not technical IMO.

As mentioned in another comment down this tree of comments, my 10-core Xeon workstation almost never has its cores saturated yet I have to sit through 5 seconds to 2 minutes of scripted tasks that can relatively easy be parallelised -- yet they aren't.

And let's not even mention how my NVMe SSD's lifetime saturation was 50% of its read/write limit...

There's a lot that can be improved still before we have to concern ourselves with how much more we can parallelise stuff. That's like worrying when will the Star Trek reality come to happen.

You're quite right that there's plenty to optimize. It's not that there isn't money in optimizing. It's that there's often not _enough_ money in optimizing to rise to the level of the top N priorities for a business.
Agreed, until you raise it at the right level at the right time. People do not find me for nothing... Usually after the initial launch euphoria dies down and someone looks at the books and asks why such a large % of the expenditure goes there. People start looking around online and see things like ‘our application serves 200k requests/day with one 50$/mo server’ and compare that with their 30k/mo setup barely serving 50k/day and start poking around. It is usually apples and pears, but more often than not there are massive issues. Most of them I would consider beginner issues but they are not made by beginners; many senior programmers I meet simply do not know about normal forms, proper types (all are stringy), proper indexes, O(n^2) etc; they trust cloud scaling to solve it all. And it does! But it costs...
And ofcourse, there is a limit to what you want to spend even if it might make some profit long term. You need to be able to find programmers to maintain things etc as well. If I needed something handling massive traffic while handling real business logic but not allowed to cost more than a few bucks in hosting, I would use something like [0]. But that would be silly for maintenance reasons alone. Does anyone know a modern (well maintained I mean really) equivalent though? I played around with this a long time ago and it is incredibly efficient.

[0] http://datadraw.sourceforge.net/ (github; https://github.com/waywardgeek/datadraw as sourceforge seems down)

Edit; maybe I answered that last question by finding a github version: seems waywardgeek does maintain at least to keep it running.

> Does anyone know a modern (well maintained I mean really) equivalent though? I played around with this a long time ago and it is incredibly efficient.

https://diesel.rs ? Maybe https://tql.antoyo.xyz/ if you care more about ease of use.

Datadraw is not an ORM; it is more comparable to a statically compiled Redis. So it is far less flexible, but it is very efficient/fast.

One of the purposes of Datadraw is for instance to build SQL databases on top of.

> almost 30k$/mo

That's like, a couple full-time developers, AIUI? Maybe even less than that. Perhaps the people who say "people are more expensive than hardware" have a point - at least in the Bay Area. Or you can move to the Rust Belt if you'd like a change.

Sure, but my point was that they cut that bill with 20k PER month by giving me 5k one off... They gave me 10k runway to poke around but 5k was enough to fix it; it was simply that bad to start with. The low hanging fruit in most systems I see is really trivial to fix; they just have no one to do it... I bet other people here have seen that before when thrown into an existing project (and I read Spolsky at an impressionable point in my career so I am usually the one against rewriting the whole thing outright).
What you’re saying is that there were a handful of bottlenecks that you caught immediately or were found with some simple profiling, right? Not that they made the mistake of writing their app in Python instead of assembly, as the article seems to imply is now necessary.
> there were a handful of bottlenecks that you caught immediately

Exactly. I was responding mostly to the point that most CTO's/management belief that you should just let hardware handle it while programmers should just deliver fast as they can. He says it is always a balance; you cannot pay for optimized assembly when writing a crud application, but I claim we completely swung to the other side of the spectrum. For instance, a financial company I did work for had no database indices besides the primary key and left AWS to scale that for them. And then we are not even talking about Mongo (this was MySQL); Mongo is completely abused as it is famous for 'scaling' and 'no effort setup', so a lot of people don't think about performance or structure at all in any way; people just dump data in it and query it in diabolical ways. They just trust the software/hardware to fix that for them. I recently tried to migrate a large one to MySQL, but it is pure hell because of it's dynamic nature; complete structured changed over time while the data from all the past is still in there; fields appeared, changed content type etc and nothing is structured or documented. With 100s of gbs of that and not sure if things are actually correctly imported, I gave up. They are still paying through the nose; I fixed some indexing in their setup (I am by no means a Mongo expert but some things are universal when you think about data structures, performance and data management) which made some difference, but MySQL or Postgresql would've saved them a lot of money in my opinion. Ah well; at least the development of the system was cheap...

But if they hired you at the beginning you wouldn't have been able to save this much money that would actually justify your salary. I think they made the right decision depending on the amount of time they were burning the cash.
seems like you deserve more of a cut than that.
Well, the premise going in after a quick (very quick) review of the system was: 'I will check what I can do in 5 days at $10k; I believe I can help, but if I cannot, you lose $10k. If I can help you in less time, you only pay that time.'. I do not think I can move that to some other deal with that premise. Maybe if I say; 'I will do this for 50% of the money you save in 12 months after I am done' that would work, but this is is a side thing which I do because I like optimizing things; if I sell it in another way, it's not bound to time which will make it a timesink and risk. It is a choice.
I am curious how do you even find a side job like that.

I am definitely a spiritual brother with you because I love optimising things. But I am very unsure how do I even start a side career with that premise.

Any advice?

> Any advice?

Spend a lot of time with funded startups. Meetups, conferences etc. They will be happy to talk about this. But also online; you need to 'dox' nicks some times, but when you see quite broad questions in slack/reddit about performance of systems and you find out this is some (tech) (co-)founder you can ask them to help. I do no-cure-no-pay if the system is an MVP and crud; I do no-cure-still-pay if the system is larger and already live. That is not because I want to blackmail the company (and if I like the idea you can give me a % as well instead, all fun and games), but usually because 'wanting to help' is punished when it's 'free' as in no good deed will go unpunished. I Did no-cure-no-pay with optimising (and other services) live systems in the past, but as soon as I touch it, people blame me for all kinds of dataloss (while i'm very careful and absolutely make (offsite) backups always) and other misery. So when basically what I do is connect with (co)founders who are in a jam and when they don't have production data yet, I will go no-cure-no-pay; when they have production data they need to keep, I will explore but if I cannot do anything (for that price, mind you; there is always something to do), I still get paid.

There are probably literally 1m projects and growing at any time in this world that have serious issues and that are burning money and that will crash (all the time or sooner or later) that need help. For instance, I know of a large state own postal/courier tracking system that crashes under load every 48 hours. We tried to help them but they are fine just rebooting (manually!). Fine, that happens too.

What sort of waste you tend to see more, if you do this regularly? Is it the case that people are aware of the cost and “don’t care”, or is it surprising/hidden cost?
There are 2 types: 1) they know the costs and thought it would scale infinitely with money but it doesn't (crashes, hangs, etc) 2) they knew it would cost more to scale but they did not expect it going up quite that fast as it does with more traffic (not linear).