Hacker News new | ask | show | jobs
by tluyben2 1537 days ago
I am old :) But experience is a big thing; I can sniff by just skimming the table definitions where probably something is very wrong. In uni in the 90s I studied both relational theory and formal methods and I had to spend a lot of time figuring out and fixing complexity; if you take a university level book on big O complexity and work through it, you will have a good feeling what software can and cannot do and in what way. That has not really changed; we have more efficient and more cache, we have improved algorithms, but things that cannot be looked up in O(1) are still dangerous and possibly can incur enormous IO even with only a few million records. Naive developers see that things are blazingly fast locally on their laptops and that’s it. I have met, especially in the last few years (In my bubble this is getting worse, quite fast), quite a lot of lead devs that actually do not know what an index is for and so I see entire dbs without any or only on the id field. I know people (for some reasons) do not like ORMs that create tables and indexes, but it would prevent many rookie mistakes if they did.
1 comments

I can’t remember the last time I came across a non-CotS database schema that has secondary indexes in a significant number. Like more than half a dozen for a hundred tables or more.

I’ve never seen a database use “advanced” features like clustered columnstore or even just page compression.

I just have an email in my inbox from this morning from a small vendor that “doesn’t recommend” columnstore for a database containing 10 TB of numeric metrics in one table.

That would compress to a few gigabytes and query times would go from minutes to milliseconds.

But they “don’t support it”.

Which I now translate to: “we haven’t even flipped through the manual and when we googled it in a panic we didn’t understand it.”

This is how your data is being managed at huge enterprises and government agencies around the world.

Exactly. And what I find far far worse is that this is bleeding into startups. I did not express myself too clear above; I don't consult for one client (I used gig, not gigs which I cannot edit anymore); it's different clients every time; I make things fast(er) and then go to the next one. I do this really fast (aka cheap) because I don't look or care about the code (if I would care, this would be a fulltime job). I find stuff by going over the code with grep and checking the storages and then making taking a sledgehammer to beat things into performance.

I had a startup with around $1m seed invest who asked for help (actually one of the board members who is a friend) because they were burning through the 1m too fast and very big cost was the AWS. I wasn't allowed to make changes, but recommended actually adding indexes to the database and adding cache in some places in the code. I also found some strange O(2^n) 'algorithms' in the code but they weren't used much; I recommended not being clever and using libraries or the database (they all had to do with geo pathfinding stuff; do people know how to use google?). I estimate that their costs on AWS would dramatically drop doing that. Instead of doing this, their investors are upping their investment so the company can keep iterating fast.

I kind of understand this to some extent, however these things won't cost too much time and when you are building things the first time they don't cost extra time at all, you just need to know them (yes, i'm trying to be polite and nice about people who create software and do not know about db indexing). Some of these companies will grow to be the next something-you-use-every-day and this is how the data is handled.

Maybe I should write a book about anonymised client misery stories. I have too many and one day I will die and some people will never encounter this; because I usually work in gigs I got via c-level execs, I see many layers of absolute garbage at the same time inside a company, especially inside big ones. People here and on reddit who have never seen these things and think large enterprises are these smooth ran places really should be exposed to the absolute chaos that goes on there.

The knee-jerk reaction I get to any proposed database tuning is: "That sounds expensive, let's just throw more cores at it to solve the scalability issues."

Of course, tuning is often a one-time activity and cores cost money monthly, but they ignore that.

They also ignore that if one user gets a poor experience, then it is by definition not caused by a lack of scale. Conversely, it will cause scalability issues, but that's a side effect and not the root cause.

I'm starting to suspect that 90-98% of all "web-scale" architectures are compensating for errors like this. Nobody has tried to use the "release" build, add an index, or just use a binary data format.

you sir must become a regular dailywtf poster. please! :-)