| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by siculars 2635 days ago
	Will this handle (repeatable) “record” (struct/array) data types natively? /disclaimer: work in google cloud

1 comments

akalitenya 2635 days ago

Yes it should, there is special "unnest" parameter for fields in BlockML reference - https://mprove.io/docs/blockml/fields/dimension

link

segah 2634 days ago

wow, impressive! how about symmetric aggregates (e.g. being able to do correct summation/aggregation on numeric values despite a one_to_many join)?

link

akalitenya 2634 days ago

Yes, look at this page - https://mprove.io/docs/blockml/fields/measure. You need to specify measure "type" and "sql_key" that will be used to avoid counting duplicates.

link

infinite8s 2632 days ago

How do the underlying queries look for symmetric aggregates? Sadly SQL never supported the ability to compute an aggregate based on the unique values of another column.

link

akalitenya 2632 days ago

Mprove does it the way the Looker did it before:

   CREATE TEMPORARY FUNCTION mprove_array_sum(ar ARRAY<STRING>) AS
  ((SELECT SUM(CAST(REGEXP_EXTRACT(val, '\\|\\|(\\-?\\d+(?:.\\d+)?)$') AS FLOAT64)) FROM UNNEST(ar) as val));

...

   SELECT COALESCE(mprove_array_sum(ARRAY_AGG(DISTINCT CONCAT(CONCAT(CAST(a.id AS STRING), '||'), CAST(a.population AS STRING)))), 0) as a_cohort_size

Recently, Looker began to do it differently, most likely to improve bigquery performance:

  COALESCE(ROUND(COALESCE(CAST( ( SUM(DISTINCT (CAST(ROUND(COALESCE(lesson_5_cohorts.population ,0)*(1/1000*1.0), 9) AS NUMERIC) + (cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001 )) - SUM(DISTINCT (cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(lesson_5_cohorts.id  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001) )  / (1/1000*1.0) AS FLOAT64), 0), 6), 0) AS lesson_5_cohorts_m_sum_distinct

link