Hacker News new | ask | show | jobs
by army 4079 days ago
Do you know for sure that RethinkDB can't work out what columns are being filtered by a function? In the examples it would certainly be possible with some analysis.
1 comments

It can't work them out because you compose the query in a third party scripting language.

RethinkDB has no access to the structure of the source in order to analyze it statically and work out an optimal I/O read plan. It interacts with the language runtime by providing an API and receiving callbacks to the API from the runtime.

SQL is parsed & analyzed statically at the server, a plan is created based on that analysis and executed. So with SQL it is possible to do so.

With RethinkDB you compose your query in the script, basically, and all of the optimization opportunities end with the exposed API (no function source analysis).

It's not impossible to redesign the API to provide or even mandate static details like requested fields to RethinkDB, and it has a bit of that, but it allows freely mixing in client-side logic and even OP is confused about what it means to have a client-side mapping function.

If they would allow complex expressions to run on the server, it'd become quite verbose to compose that via an API in an introspective way, to the point it'd warrant a DSL in a string... and we're back to SQL again.

> RethinkDB has no access to the structure of the source in order to analyze it statically and work out an optimal I/O read plan.

Actually this isn't true. One of the really cool things about RethinkDB is that despite the fact that queries are specified in third party scripting languages they actually get compiled to an intermediate language that RethinkDB can understand.

That being said AFAIK RethinkDB doesn't optimize selects the way columnar databases do. I believe it can only read from disk at a per document granularity. But it does have the ability to optimize this in the future.

I don't think that's true. From what I perused of the driver implementations, I think that as calls are made, the driver basically builds an AST up, and then when you call run() it compacts it and sends it over to the DB. ie, when you call filter() you aren't actually filtering, you're adding a filter operation to the AST.

I would think that would allow Rethink to analyze the structure of the query and perform appropriate optimizations.

I'm talking about map(), and you're talking about filter().

Here's the code in question:

  .map(function(album){
    return {artist : album("vendor")("name")}
  })
If this is simply adding a node to an AST, it could be expressed without a function:

  .map({artist : ['album','vendor','name']})
Using a function for this would be quite superfluous.
You can express it both ways in RethinkDB, and they'd both do the same thing -- add a node to the AST. The function is just a convenience syntax.
> It can't work them out because you compose the query in a third party scripting language.

The restrictions on what language features you can use in lambdas inside queries exist because the query isn't executed on the client, the query in the client language is parsed into a client-language-independent query description which is shipped back to the server and executed on the server. So all the information about the query is available to the server (how much it actually uses for optimization, I don't know, but the query is not opaque to the server; what is composed in the scripting language has the same relation to what the server sees as when you use an SQL abstraction layer that builds SQL and sends it back to the server with an SQL DB.)