Hacker News new | ask | show | jobs
by uwemaurer 395 days ago
Always great to see efforts to make working with data frames easier. Here are some similar data frame libraries for Java:

https://github.com/jtablesaw/tablesaw

https://github.com/dflib/dflib

My preferred way is just use duckdb java API. I didn't see anything better in performance/efficiency. Also a SQL query is often easier to write

2 comments

Thanks! I'm aware of those great projects. Fahmatrix aims to offer a lightweight, dependency-free alternative that’s easy to embed in any Java app. DuckDB is super impressive, especially for SQL-heavy tasks — but my goal is more about a native, fluent API for those who prefer direct Java code over SQL.
Yes. It has bothered me for a long time too. Maybe the best mix is a dataframe library with basic operations (column select, non-null etc), which also allows SQL for more complex stuff?
Totally agree that SQL can be the best tool for many jobs. My goal with Fahmatrix is to serve the opposite niche: where devs want something that's Java-native, procedural, and simple without reaching for an external engine. SQL support or DSL might come later though — I see the appeal.
Sure. So maybe notehr comment would be to make it (particularly the Series class), as compatible with Java Streams as possible.

Next step would likely be compatibility with popular libraries such as Apache Commons Math: https://commons.apache.org/proper/commons-math/userguide/sta...

Polars and duckdb interoperate nicely and can enable this flexibility
Does Polars have a Java library?