Hacker News new | ask | show | jobs
by maycotte 2079 days ago
We have been building a feature first data store for seven years and it feels like feature store is about to become one of the more exciting ways to extract value from data. We see feature stores doing much more than becoming just another silo for ML, but instead a way to get a real-time, centralized view of fragmented data that has to either be federated or put in a data lake to to be queried together. I share more of my thoughts in this blog post: https://www.molecula.com/why-moleculas-feature-based-approac... Molecula is based on the OSS platform Pilosa (https://www.pilosa.com/) and both Pilosa and Molecula are transitioning to reposition as feature stores over the coming months. Doing machine-scale analytics and ML on the data itself will be a thing of the past.
2 comments

This looks like a highly specialised tool. How is it going to integrate with a Data Scientist's favourite tools, such as Jupyter notebooks, Pandas or Spark and especially ML frameworks like TensorFlow, SkLearn etc.?
Great question. Right now we have pushed hard to make SQL the primary interface to our feature store and can output Pandas and other formats at query time, however we are working on integrated/hosted Jupyter notebooks and excited to continue collaborating with the community on better feature first/oriented endpoints/interfaces. There is so much room to innovate here.
But is it open-source? Our Hopsworks Feature Store is * https://github.com/logicalclocks/hopsworks as is GoJEK Feast * https://github.com/feast-dev/feast

I really believe that all Enterprise platform software should have an open-source version if it is to have a meaningful effect on how people work (in this case, how Data Scientists and Data Engineers work together).

Yes, Pilosa is OSS under Apache 2.