Hacker News new | ask | show | jobs
by A-Train 1032 days ago
I'm a Data Scientist. For some time, I've been working on a library for feature engineering. • GitHub: https://github.com/feature-express/feature-express • Website: https://feature.express It isn't yet complete, and I wouldn't consider it ready for production use or handling larger datasets. Here are some of its characteristics: • Event-based workflows: Initially, everything is converted to an event format, ingested into an event store, and processed from there. • In-memory: Both the event store and evaluation have been built from scratch. • Written in Rust, but there's a Python package available. • A DSL (Domain Specific Language) for defining aggregations, similar to SQL. Why am I developing this? I've always found it challenging to build models based on time. These models can be surprisingly tricky, and there's a high risk of accidentally using future data, which can lead to data leakage. FeatureExpress is designed to nearly eliminate such mistakes. Moreover, I believe that representing data as events is an intuitive approach.