| Hey HN, I’ve built Fahmatrix, a minimal, fast Java library for working with tabular data — inspired by Python’s pandas, but designed for performance and simplicity on the JVM. After working extensively with Python’s data stack, I often ran into limitations related to speed, especially in larger or long-running data workflows. So I built Fahmatrix from scratch to offer similar APIs for manipulating CSVs, performing summary statistics, slicing rows/columns, and more — but all in Java. Features: Lightweight and dependency-free CSV/TSV import with auto-headers Series/DataFrame structures (like pandas) describe(), mean(), stdDev(), percentile() and more Fast parallel operations on numeric columns Java 17+ support Docs: https://moustafa-nasr.github.io/Fahmatrix/
GitHub: https://github.com/moustafa-nasr/fahmatrix I’d love feedback from the Java and data communities — especially if you’ve ever wanted a simple dataframe utility in Java without needing full-scale ML libraries. Happy to answer any questions! |
https://github.com/jtablesaw/tablesaw
https://github.com/dflib/dflib
My preferred way is just use duckdb java API. I didn't see anything better in performance/efficiency. Also a SQL query is often easier to write