|
> What I find missing in the internet is that people who have been doing this for years are not writing about this. In my last job I implemented a feature store from scratch with ca. 500 hand crafted and ca 2.500 with code generator automatically generated features. It didn't only serve the current value of the features, but the data scientists could populate an 'init' table manually with (customer_id, reference_date, target_value) tuples, and the pipeline re-calculated the historic feature values for the given customer and reference_date. So if the data scientists came up with a new fature definition, after implementation (5 mins - 2 hours per feature), he - and all other data scientists - immediatelly got access to the features's history. We had so many features, that I had to implement an automatic feature-prunning, otherwise the users got lost. We could train, test, validate and deploy models within 24 hours (model fitting run over-night). When I left the company, we had ca 40 models in production, managed by 1 person (by me) in part-time (3-4 hours a week). This was in an off-line business, so we didn't had to deal with latency by feature serving and didn't had to be able to change a feature's value during the day, so everything could run batch based over night. Why I didn't write about it? Because it was implemented in PL/SQL running on Oracle ExaData and in SAS. No one cares about feature stores implemented with tech like that. People care about models trained in python, ported to Scala by Java devs, running in docker on k8s, features coming from HiveQL, sqoop, oozie or Spark and stored on cassandra, MySQL or Elasticsearch. But do they have a feature store with built in time-travel functionality? |
> do they have a feature store with built in time-travel functionality?
My feature store hasn't supported time-travel yet. But many SaaS implementations do, including Tecton, Hopswork, Splice Machine, etc. Open source feature stores haven't implemented this critical feature AFAIK.
> Why I didn't write about it? Because it was implemented in PL/SQL running on Oracle ExaData and in SAS. No one cares about feature stores implemented with tech like that.
Haha, I do. Actually I am thinking about implementing feature stores using some "old-school" DB technology. By the way, just curious since I've never used PL/SQL: Are you able to implement the time-travel functionality using pure SQL?