Hacker News new | ask | show | jobs
by dhuramas 2554 days ago
Out of curiosity - Which database is this? and is it really taboo to have views joining against multiple other views?

If the underlying views were performant- I'd assume the query optimizer would do the right thing(at least 90% of the time).

EDIT: I guess it depends - Just did more research and found this [1]. As long as the views don't do unnecessary heavy lifting or joining unnecessary tables, it should be fine.

[1] https://dba.stackexchange.com/questions/151169/are-views-har...

2 comments

postgres. I realize this is heresy to badmouth Postgres here, but it was Postgres (9.0 or 9.1 IIRC).

Had a query that selected from table X joining against view X which selected from view A, and view A joins against tables A,B and C, and table B also joins against a view which uses table A and C.

This was just bad. But trying to explain to non-tech people how bad it is, when "it used to work", is difficult. It used to work when you had 50-100 records. No one ever tested was this would be like with 30k records and 50 concurrent users all executing the same nested/circular view mess simultaneously.

But the fact that it was postgres is kind of beside the point. I don't know of any mainstream DB that would handle this well.

The short term fix was to do this large query once at the end of a process and cache the results; the set of queries in question were happening on a 'dashboard' view which everyone hit all the time. It would still cause problems with concurrency, because when 80 people would go through a process and get 'done' (think timed training exercises), the queries would still all be running more or less concurrently, and still cause timeouts, but it wasn't as frequent, because people tended to be staggered a bit more as they finished.

PostgreSQL has progressively reduced the number of optimisation boundaries that are encountered with views-on-views situations, but you can still wind up telling the database to churn and slosh a whole bunch of data from which you only cherry-pick a tiny portion.

I have definitely been guilty of failing to test how my schemata behave with large data sets.

Oh then there's ORMs. I've seen ActiveRecord spit out some frankly batshit insane queries that would stump a room of Einsteins. But somehow PostgreSQL picked it up, chopped it into a plan and got to work plowing through an incredibly wasteful and repetitious query.

> I have definitely been guilty of failing to test how my schemata behave with large data sets.

As have I. It's the dangerous part of the "get it done" approach. And there's no perfect approach - everything is a tradeoff. How much time do you spend dealing with situations that might never happen?

Experience does give you some grounding when making those tradeoff decisions. No, we don't need the architect the application to scale up/down to handle 25k concurrent users in 5 minutes; that's unlikely going to happen. Yes, we do need to spend the extra 2 hours installing and learning a templating system to avoid common XSS pitfalls.

I think SQL views are a bit of a code smell. They're really only useful in my opinion as a hack, where creating a rather gruesome abstraction within the database is easier than changing the query layer.

I'm currently working on a service written by a bunch of hipster developers who, wrote raw SQL in Python. And they wrote views, lots of really difficult to understand views that are self-joining (on JSON fields no less!). The performance is really unpredictable given the input sizes. It's the only time I'd call performance of a database chaotic because given an input size (beyond certain safe ranges) I have no idea what the performance is.

The problem is from my perspective a key bit of your application logic get hidden, you're then bound to migrations to change it.

> lots of really difficult to understand views that are self-joining (on JSON fields no less!)

HOLY TAMOLE! You might have my story beat there. That's one level I didn't have to deal with. My sympathies!