Hacker News new | ask | show | jobs
by akalitenya 2634 days ago
Yes it should, there is special "unnest" parameter for fields in BlockML reference - https://mprove.io/docs/blockml/fields/dimension
1 comments

wow, impressive! how about symmetric aggregates (e.g. being able to do correct summation/aggregation on numeric values despite a one_to_many join)?
Yes, look at this page - https://mprove.io/docs/blockml/fields/measure. You need to specify measure "type" and "sql_key" that will be used to avoid counting duplicates.
How do the underlying queries look for symmetric aggregates? Sadly SQL never supported the ability to compute an aggregate based on the unique values of another column.
Mprove does it the way the Looker did it before:

   CREATE TEMPORARY FUNCTION mprove_array_sum(ar ARRAY<STRING>) AS
  ((SELECT SUM(CAST(REGEXP_EXTRACT(val, '\\|\\|(\\-?\\d+(?:.\\d+)?)$') AS FLOAT64)) FROM UNNEST(ar) as val));
...

   SELECT COALESCE(mprove_array_sum(ARRAY_AGG(DISTINCT CONCAT(CONCAT(CAST(a.id AS STRING), '||'), CAST(a.population AS STRING)))), 0) as a_cohort_size
Recently, Looker began to do it differently, most likely to improve bigquery performance:

  COALESCE(ROUND(COALESCE(CAST( ( SUM(DISTINCT (CAST(ROUND(COALESCE(lesson_5_cohorts.population ,0)*(1/1000*1.0), 9) AS NUMERIC) + (cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001 )) - SUM(DISTINCT (cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001) )  / (1/1000*1.0) AS FLOAT64), 0), 6), 0) AS lesson_5_cohorts_m_sum_distinct