Hacker News new | ask | show | jobs
by nikhilsimha 807 days ago
Snapshots can’t travel back with milliseconds precision or even minute level precision. They are just full dumps at regular fixed intervals in time.
3 comments

https://en.wikipedia.org/wiki/Sixth_normal_form Basically we've had time travel (via triggers or built in temporal tables or just writing the data) for a long time, its just expensive to have it all for an OLTP database.

We've also had slowly changing dimensions to solve this type of problem for a decent amount of time for the labels that sit on top of everything, though really these are just fact tables with a similar historical approach.

6NF works well for some temporal data, but I haven't seen it work well for windowed aggregations because the start/end time format of saving values doesn't handle events "falling out of the window" too well. At least the examples I've seen have values change due to explicit mutation events.
Agree, you don't really want to pre-aggregate your temporal data, or it will effectively only aggregate at each row-time boundary and the value is lower than just keeping the individual calculations.
Databases have had many forms of time travel for 30+ years now.
Not at the latency needed for feature serving and most databases struggle with column limits.

But please enlighten us on which databases to use so Airbnb (and the rest of us) can stop wasting time.

Shameless plug, but XTDB v2 is being built for low-latency bitemporal queries over columnar storage and might be applicable: https://docs.xtdb.com/quickstart/query-the-past.html

We've not been developing v2 with ML feature serving in mind so far, but I would love to speak with anyone interested in this use case and figure out where the gaps are.

Snapshots don’t have to be at regular intervals and can be at whatever resolution you choose. You could snapshot as the first step of training then keep that snapshot for the life of the resulting model. Or you could use some other time travel methodology. Snapshots are only one of many options.
These are reconstruction of features / columns that don’t exist yet.
I don’t understand what this means. How can something be reconstructed without first existing? Is this not just a caching exercise?