Hacker News new | ask | show | jobs
by BringTheTanks 4079 days ago
> Not earth-shattering, but with ReQL all I need to do is attach a function

Thing is, some SQL databases have columnar storage, and in there selecting everything, then filtering with an attached function would eliminate the performance benefit of not selecting all of the fields.

This is why SELECT looks like it does. Not to mention it's much shorter than attaching a function for the purpose.

The author also himself acknowledges that:

> The downside is that your queries end up quite long and, for some, rather intimidating.

Ok so they're "quite long" and have less potential for optimizing the performance of. Amazing?

His example of creating specific indexes and views is also not new to SQL.

> There are 3 official drivers: Python, Ruby and Node.

Amazing?

> The query itself didn’t change at all – I could copy and paste it right in. I had to wrap it with connection info and a run() function, but that’s it.

So just like an SQL query, except I can connect to an SQL RDBMS from virtually any language I can think of, and not just a narrow selection of 3 script languages.

I sympathize with author's excitement, but from all his examples SQL feels like it has quite an edge both in availability and in terms of design and fit for the domain than a bunch of JS functions composed together (as much as I like composing functions together in JS).

I realize how much hard work the folks at RethinkDB have put into creating their product. But technology adoption is not driven by pity, it's driven by benefits. For a new type of DB to not be a flash in the pan it needs a lot more than being "stable and fast". It needs to offer significant additional benefits when compared to existing DBs. And I ain't seeing it.

4 comments

ReQL is declarative. The queries are compiled into an AST in the client drivers, the AST is shipped to the database, and is then analyzed and executed on the cluster. There is no information loss, and nothing runs on the client. The query language looks operational (as is, just run these commands in order), but is also declarative in the same way as SQL is.

This blog post explains how this works: http://rethinkdb.com/blog/lambda-functions/

> So just like an SQL query, except I can connect to an SQL RDBMS from virtually any language I can think of

Well, sure, because (1) major SQL databases have DB-specific drivers for many languages (often third-party), and (2) SQL uses a well-established, common model for which generic connectivity tools exist (ODBC, JDBC, etc.) so even minor SQL-based databases can go pretty far if they've got just ODBC and JDBC drivers.

But while RethinkDB may only have the three languages with official drivers, there are lots of third-party drivers, and there is documentation on the protocol and process for writing third-party drivers. Obviously, it kinds of loses out where ODBC/JDBC and similar technologies are concerned (though you probably could build drivers for Rethink using them, but you'd probably have to lose lots of Rethink's unique features -- particularly the push feed one -- when using them.)

> I realize how much hard work the folks at RethinkDB have put into creating their product. But technology adoption is not driven by pity, it's driven by benefits. For a new type of DB to not be a flash in the pan it needs a lot more than being "stable and fast". It needs to offer significant additional benefits when compared to existing DBs. And I ain't seeing it.

The key additional benefit compared to most better-established storage technologies seems to be ability to simply set up push feeds from queries. I'd say the demand (or lack thereof) from that is likely to be the determining factor in whether the resources get devoted (first- and third-party) to the RethinkDB ecosystem to bring the kind of conveniences that are seen with more established DBs.

Do you know for sure that RethinkDB can't work out what columns are being filtered by a function? In the examples it would certainly be possible with some analysis.
It can't work them out because you compose the query in a third party scripting language.

RethinkDB has no access to the structure of the source in order to analyze it statically and work out an optimal I/O read plan. It interacts with the language runtime by providing an API and receiving callbacks to the API from the runtime.

SQL is parsed & analyzed statically at the server, a plan is created based on that analysis and executed. So with SQL it is possible to do so.

With RethinkDB you compose your query in the script, basically, and all of the optimization opportunities end with the exposed API (no function source analysis).

It's not impossible to redesign the API to provide or even mandate static details like requested fields to RethinkDB, and it has a bit of that, but it allows freely mixing in client-side logic and even OP is confused about what it means to have a client-side mapping function.

If they would allow complex expressions to run on the server, it'd become quite verbose to compose that via an API in an introspective way, to the point it'd warrant a DSL in a string... and we're back to SQL again.

> RethinkDB has no access to the structure of the source in order to analyze it statically and work out an optimal I/O read plan.

Actually this isn't true. One of the really cool things about RethinkDB is that despite the fact that queries are specified in third party scripting languages they actually get compiled to an intermediate language that RethinkDB can understand.

That being said AFAIK RethinkDB doesn't optimize selects the way columnar databases do. I believe it can only read from disk at a per document granularity. But it does have the ability to optimize this in the future.

I don't think that's true. From what I perused of the driver implementations, I think that as calls are made, the driver basically builds an AST up, and then when you call run() it compacts it and sends it over to the DB. ie, when you call filter() you aren't actually filtering, you're adding a filter operation to the AST.

I would think that would allow Rethink to analyze the structure of the query and perform appropriate optimizations.

I'm talking about map(), and you're talking about filter().

Here's the code in question:

  .map(function(album){
    return {artist : album("vendor")("name")}
  })
If this is simply adding a node to an AST, it could be expressed without a function:

  .map({artist : ['album','vendor','name']})
Using a function for this would be quite superfluous.
You can express it both ways in RethinkDB, and they'd both do the same thing -- add a node to the AST. The function is just a convenience syntax.
> It can't work them out because you compose the query in a third party scripting language.

The restrictions on what language features you can use in lambdas inside queries exist because the query isn't executed on the client, the query in the client language is parsed into a client-language-independent query description which is shipped back to the server and executed on the server. So all the information about the query is available to the server (how much it actually uses for optimization, I don't know, but the query is not opaque to the server; what is composed in the scripting language has the same relation to what the server sees as when you use an SQL abstraction layer that builds SQL and sends it back to the server with an SQL DB.)

    So just like an SQL query, except I can connect to an SQL RDBMS from virtually any language I can think of, and not just a narrow selection of 3 script languages.
http://rethinkdb.com/docs/install-drivers/