Hacker News new | ask | show | jobs
by Tharkun 3714 days ago
Not every query is parallelizable. Maintaining performance is a lie. An easy to grasp is example is computing a median. And I mean an exact median, not an approximation.
2 comments

@Tharkun: You are right that not every query is immediately parallelizable. Distinct count's are another example. In some cases data can be re-partitioned so we can calculate exact values and push down computation in parallel. This may provide better performance than a single large table, so there are still benefits to it. Ultimately though there will be tradeoffs to moving to an entirely distributed environment, but depending on the use-case the value may offset those.
I'm not sure why folks are downvoting you because most database systems that provide the full array of relational operations (joins, groupby, groupby cube, etc) do not scale linearly (maybe past a handful of nodes). Mixing OLTP / OLAP using current technologies is hard.