| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nikhilsimha 804 days ago

We haven’t tried materialize - IIUC materialized is pure kappa. Since we need to correct upstream data errors and forget selective data(GDPR) automatically - we need a lambda system.

Tecton, we evaluated, but decided that the time-travel strategy wasn’t scalable for our needs at the time.

A philosophical difference with tecton is that, we believe the compute primitives (aggregation and enrichment) need to be composable. We don’t have a FeatureSet or a TrainingSet for that reason - we instead have GroupBy and Join.

This enables chaining or composition to handle normalization (think 3NF) / star-schema in the warehouse.

Side benefit is that, non ml use-cases are able to leverage functionality within Chronon.

4 comments

jamesblonde 804 days ago

FeatureSets are mutable data and TrainingSets are consistent snapshots of feature data (from FeatureSets). I fail to see what that has to do with composability. Join is still available for FeatureSets to enable composable feature views - join is resuse of feature data. GroupBy is just an aggregation in a feature pipeline, not sure your point here. You can still do star schema (and even snowflake schema if you have the right abstractions).

link

jamesblonde 804 days ago

Normalization is a model-dependent transformation and happens after the feature store - needs to be consistent between training and inference pipelines.

link

nikhilsimha 804 days ago

Normalization is overloaded. I was referring to schema normalization (3NF etc) not feature normalization - like standard scaling etc.

link

jamesblonde 804 days ago

Ok, but star schema is denormalized. Snowflake is normalized.

link

nikhilsimha 804 days ago

To be pedantic, even in star schema - the dim tables are denormalized, fact tables are not.

I agree that my statement would be much better if used snowflake schema instead.

link

throwaway2037 803 days ago