Hacker News new | ask | show | jobs
by tschreiber 422 days ago
Verified:

  EXPLAIN plan actions = 1
  SELECT *
  FROM amazon.amazon_reviews
  WHERE helpful_votes > 0
  ORDER BY -log(1 - (rand() / 4294967296.0)) / helpful_votes
  LIMIT 3
Lazily read columns: review_body, review_headline, verified_purchase, vine, total_votes, marketplace, star_rating, product_category, customer_id, product_title, product_id, product_parent, review_date, review_id

Note that there is a setting query_plan_max_limit_for_lazy_materialization (default value 10) that controls the max n for which lm kicks in for LIMIT n.

2 comments

Sorry if this question exposes my naivety, why such a low default limit? What drawback does lazy materialization have that makes it good to have such a low limit?

Do you know any example query where lazy materialization is detrimental to performance?

My understanding is that with higher limit values you may end up doing lots of random I/O (for each granule the order in which you read it would be much less predictable than when ClickHouse normally reads it sequentially), essentially one I/O operation per LIMIT value. So larger default values would only be beneficial in pathological examples given in the article, but much less so in "real world".
Awesome! Thanks for checking :-)