Hacker News new | ask | show | jobs
by siculars 2635 days ago
Will this handle (repeatable) “record” (struct/array) data types natively?

/disclaimer: work in google cloud

1 comments

Yes it should, there is special "unnest" parameter for fields in BlockML reference - https://mprove.io/docs/blockml/fields/dimension
wow, impressive! how about symmetric aggregates (e.g. being able to do correct summation/aggregation on numeric values despite a one_to_many join)?
Yes, look at this page - https://mprove.io/docs/blockml/fields/measure. You need to specify measure "type" and "sql_key" that will be used to avoid counting duplicates.
How do the underlying queries look for symmetric aggregates? Sadly SQL never supported the ability to compute an aggregate based on the unique values of another column.
Mprove does it the way the Looker did it before:

   CREATE TEMPORARY FUNCTION mprove_array_sum(ar ARRAY<STRING>) AS
  ((SELECT SUM(CAST(REGEXP_EXTRACT(val, '\\|\\|(\\-?\\d+(?:.\\d+)?)$') AS FLOAT64)) FROM UNNEST(ar) as val));
...

   SELECT COALESCE(mprove_array_sum(ARRAY_AGG(DISTINCT CONCAT(CONCAT(CAST(a.id AS STRING), '||'), CAST(a.population AS STRING)))), 0) as a_cohort_size
Recently, Looker began to do it differently, most likely to improve bigquery performance:

  COALESCE(ROUND(COALESCE(CAST( ( SUM(DISTINCT (CAST(ROUND(COALESCE(lesson_5_cohorts.population ,0)*(1/1000*1.0), 9) AS NUMERIC) + (cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001 )) - SUM(DISTINCT (cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001) )  / (1/1000*1.0) AS FLOAT64), 0), 6), 0) AS lesson_5_cohorts_m_sum_distinct