Hacker News new | ask | show | jobs
by m_mueller 2243 days ago
Your example illustrates perfectly how a descriptive query language simply can't work in the general case for something other than ad-hoc workflows where performance isn't a big concern. Why are we all doing this instead of engineering DB access procedures directly? This is one thing I really liked about CouchDB and its map/reduce based access - stop trying to be smart with queries, instead be dead simple to scale horizontally. Granted, it's hard to design the data structures in the design space of normalization (many joins) vs. read performance (few joins) - but at least it's all laid bare to reason about.
2 comments

I somewhat agree. Not with the sweeping that declarative query languages can't work, for every case where faulty statistics cause a bad plan there are likely to be many cases where a statistic based plan choice correctly switched plans due to data distribution changes.

But I do agree that PostgreSQL has way too little tools to nail down the performance even though they have downsides. Tools like pinning execution plans would be nice, as it has less severe worst case behaviors. As would be the ability to just pass the execution plan directly, although that would have severe cross-version compatibility implications and security will also be hard to nail down after the fact because of all the "can't happen" assumptions sprinkled around in executor code. And even just plain plan hints would be great to have, be it the heavy handed "join in this order", "use this index", or the more graceful "this clause is way less selective than you think", "this clause is functionally dependent on that one" or "assume there is correlation between ordering and predicates".

I'd put it this way: It could be made to work, but IMO all current RDBMS have the same problems with terrible worst case guarantees. Query planners have not enough degrees of freedom - maybe with a new concept like automated indices it could be made to work with acceptable average and worst case guarantees.
Sorry, but hard disagree. For all the failings of SQL systems, they work staggeringly well 99% of the time as long as the operator understands how they work. No-SQL systems do have their place, but all too often their choice stems from a failure to understand SQL-based databases. (And the hard truth: unless you’re doing something especially novel there’s a lot less difference between map/reduce and a bog standard table index than no-SQL proponents would have you believe.)

In my estimation, if you're not already intimately familiar with the intricacies of performant SQL, chances are you're not playing in a space where a no-SQL architecture is an appropriate fit anyway. And you're certainly not in a position to make an informed decision between SQL and no-SQL. It's a much better use of this hypothetical person's time to set up Mariadb or Sqlite and do a deep dive into the fundamentals query performance.

Just to clarify: I'm not saying there is a significant difference between map views and indices, I'm saying I prefer the directness of programming with them over trusting the query planner and table statistics. In couch, to do a table scan, I need to program it as such, otherwise I'm forced to use a view - things like forgotten indices for some edge cases are impossible. Sure takes longer to do things, but when performance issues arise it's easy to reason about while with SQL based DBs I've seen a lot of time being spent at that stage. They work well 99%, and then that remaining 1% takes 99% of your time.