Hacker News new | ask | show | jobs
by mgkimsal 2547 days ago
postgres. I realize this is heresy to badmouth Postgres here, but it was Postgres (9.0 or 9.1 IIRC).

Had a query that selected from table X joining against view X which selected from view A, and view A joins against tables A,B and C, and table B also joins against a view which uses table A and C.

This was just bad. But trying to explain to non-tech people how bad it is, when "it used to work", is difficult. It used to work when you had 50-100 records. No one ever tested was this would be like with 30k records and 50 concurrent users all executing the same nested/circular view mess simultaneously.

But the fact that it was postgres is kind of beside the point. I don't know of any mainstream DB that would handle this well.

The short term fix was to do this large query once at the end of a process and cache the results; the set of queries in question were happening on a 'dashboard' view which everyone hit all the time. It would still cause problems with concurrency, because when 80 people would go through a process and get 'done' (think timed training exercises), the queries would still all be running more or less concurrently, and still cause timeouts, but it wasn't as frequent, because people tended to be staggered a bit more as they finished.

1 comments

PostgreSQL has progressively reduced the number of optimisation boundaries that are encountered with views-on-views situations, but you can still wind up telling the database to churn and slosh a whole bunch of data from which you only cherry-pick a tiny portion.

I have definitely been guilty of failing to test how my schemata behave with large data sets.

Oh then there's ORMs. I've seen ActiveRecord spit out some frankly batshit insane queries that would stump a room of Einsteins. But somehow PostgreSQL picked it up, chopped it into a plan and got to work plowing through an incredibly wasteful and repetitious query.

> I have definitely been guilty of failing to test how my schemata behave with large data sets.

As have I. It's the dangerous part of the "get it done" approach. And there's no perfect approach - everything is a tradeoff. How much time do you spend dealing with situations that might never happen?

Experience does give you some grounding when making those tradeoff decisions. No, we don't need the architect the application to scale up/down to handle 25k concurrent users in 5 minutes; that's unlikely going to happen. Yes, we do need to spend the extra 2 hours installing and learning a templating system to avoid common XSS pitfalls.