Hacker News new | ask | show | jobs
by uhoh-itsmaciek 2458 days ago
>Am I the only one that thinks rails/activerecord is crazy here for (often) preferring two trips to the db over a join on indexed primary keys?

You mean

  SELECT u.* , c.* FROM users u INNER JOIN comments c
    ON u.id = c.user_id WHERE u.id = <...>
?

If the average user record includes a non-trivial amount of data and there are many comments, you're repeating a lot of information in the query results.

disclaimer: I'm currently doing some work for pganalyze.com (the blog host)

2 comments

Yes.

I'm not sure about "non-trivial amount of data" - especially if we're talking about a single row (single user) here? I suppose there might be a few blolbs of large json documents?

And I'm not sure I understand "repeating information in query results" - surely we're talking about receiving a query, transforming it and responding with the data?

In a rails+graphql ideal world, that'd maybe mean Parsing the json query, generate some simple ruby (active record) code, fire it off to the db (let ar /arel do its job; generate sql) - serialize to json and write the response?

If there truly is non-trivial data in fields not requested, it might be worth it to naemrrow the columns selected. But I doubt it, in the general case.

>I'm not sure about "non-trivial amount of data" - especially if we're talking about a single row (single user) here? I suppose there might be a few blolbs of large json documents? > >And I'm not sure I understand "repeating information in query results" - surely we're talking about receiving a query, transforming it and responding with the data?

For a query like

  SELECT u.* , c.* FROM users u INNER JOIN comments c
    ON u.id = c.user_id WHERE u.id = <...>
above, the user information is repeated for each comment, e.g.,

  u.id, u.name, c.id, c.text
  1     Bob     1     you are right
  1     Bob     2     you are wrong
  1     Bob     3     you are right again
If you have a hundred comments, and many user fields, this can be a significant amount of data to pass around. Or maybe this is not what you meant? I feel like we may be talking past each other. Can you give an example of what you think the ideal sql should be for "give me user with id = 1 and all their comments"?
No, you're probably right - I suppose the database drivers might naively serialize that in a one-to-one representation (to be honest, I've never really looked into the protocols that postgres or Ms sql use for transferring data).

I didn't consider the kind of expansion you're alluding to - on the other hand i can't ever remember seeing an indication that the amount of data returned is an issue with app performance.

But I've certainly seen the number of round-trips result in real issues. And complicated joins, for that matter.

I've worked on a few Postgres drivers and the protocol is basically sending data row-by-row, column-by-column. Sure, it could be more clever, but that's time and cpu and memory spent being clever instead of moving data.

But you're right, there are always trade-offs, and there are some situations where the current AR approach is not ideal, and a single join would be more efficient. However, I suspect this is the way AR does it because it's right enough often enough that it works well as the default approach.

Is the concern over repeated information about [load on the DB], [size of data sent to the client], or something like [processing requirements on the client]? I'm assuming you're talking about [size of data sent to the client], but shouldn't that data be readily-compressible?
I mostly meant data size on the wire, yeah, but these are all potential issues. In any case, the performance difference between the two approaches is going to heavily depend on your schema and your data, and I suspect the AR works well enough in practice.

And in this example, the data is definitely compressible, but the Postgres wire protocol does not implement compression (it leaves that to tls right now, though CRIME makes that a problem). Adding native compression to the protocol has been on the TODO list for a while: https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes_...