Hacker News new | ask | show | jobs
by simba-k 1454 days ago
Hey everyone,

I'm Simba Khadder, Co-Founder & CEO of Featureform. I'm super stoked to be sharing our open-source feature store with you all. At my last company, we were building models that served to <100M MAU. Most of our time was spent feature engineering and using off-the-shelf model architectures. I remember having google docs that got shared around with useful SQL snippet, and digging in my file system to find untitled_128.ipynb which had a super useful transformation. We built Featureform so no one would ever have to deal with that again.

Featureform is a virtual feature store. It enables data scientists to define, manage, and serve their ML model's features. Featureform sits atop your existing infrastructure and orchestrates it to work like a traditional feature store.

By using Featureform, a data science team can solve the organizational problems:

- Enhance Collaboration Featureform ensures that transformations, features, labels, and training sets are defined in a standardized form, so they can easily be shared, re-used, and understood across the team.

- Organize Experimentation The days of untitled_128.ipynb are over. Transformations, features, and training sets can be pushed from notebooks to a centralized feature repository with metadata like name, variant, lineage, and owner.

- Facilitate Deployment - Once a feature is ready to be deployed, Featureform will orchestrate your data infrastructure to make it ready in production. Using the Featureform API, you won't have to worry about the idiosyncrasies of your heterogeneous infrastructure (beyond their transformation language).

- Increase Reliability Featureform enforces that all features, labels, and training sets are immutable. This allows them to safely be re-used among data scientists without worrying about logic changing. Furthermore, Featureform's orchestrator will handle retry logic and attempt to resolve other common distributed system problems automatically. Finally, Featureform will monitor and notify you of infrastructure problems and data drift.

- Preserve Compliance With built-in role-based access control, audit logs, and dynamic serving rules, your compliance logic can be enforced directly by Featureform.

You can check out our repo: https://github.com/featureform/featureform

Our docs: https://docs.featureform.com

Our quickstart guide: https://docs.featureform.com/quickstart-local

Read more about feature stores: https://featureform.com/post/feature-stores-explained-the-th...

2 comments

Seems like this can help with ensuring features used in training are exactly identical to features used in serving? Recently I wanted to use “target encoding” of features with TF, but saw that TF does not have a “target encoding” layer that can be used in serving. Would this help with this scenario?
How do you solve the issue with applying columnar transform from feature engineering step as row major transformer at inference time?