| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by better365 804 days ago

For offline computation, it is okay with the table with 5bn rows. But for online serving, it would be really challenging to serve the features at a few milliseconds.

But even for offline computation, for the same computation logic, the code will be duplicated in lots of places. we have observed the ML practitioners copied sql queries all over. In the end, it is not possible for debugging, feature interpretability and lineage.

Chronon abstracts all those away so that ML practitioners can focus on the core problems they are dealing with, rather than spending time on the ML Ops.

For an extreme use case, one user defined 1000 features with 250 lines of code, which is definitely impossible with SQL queries, not to even mention the extra work to serve those features.

1 comments

mulmen 804 days ago

How does Chronon do this faster than the precomputed table? And in a single docker container? Is it doing logically similar operations but just automating the creation and orchestration of the aggregation tasks? How does it work?

link

better365 803 days ago

We utilize a lambda architecture, which incorporates the concept of precomputed tables as well. Those precomputed tables store intermediate representation of the final results. These precomputed tables are capable of providing snapshot or daily accuracy features. However, when it comes to real-time features that require point-in-time correctness, using precomputed tables may present challenges.

For the offline computations, we will reuse those intermediate results to avoid calculation from the beginning again. So the engine can actually scale sub-linearly.

link

mulmen 803 days ago

Thanks. How does Chronon serve the real-time features without precomputed tables?

link