Hacker News new | ask | show | jobs
by mathisd 90 days ago
Few observations related to data engineering in the context of a data warehouse: 1. Protocols and IR (Intermediate Representation) have layed and continue to enable interoperability and composability of data tools (see Apache Arrow, Substrait, Catalog). (great introduction here https://voltrondata.com/codex). 2. Current OSS data tooling is really good (except on user interface). 3. Agentic workflow are working incredibly well for data-engineering tasks. 4. LLM is pushing for declarative tools and docs close to code.

That's why I am working on a (early) project called Orca [1]. Orca is a template and a set of patterns for building a production-ready and agentic-enabled data warehouse using entirely free and open-source tools. Go check-out the README for more info. I would be interested to get feedback to it!

[1] Orca : https://github.com/mathisdrn/orca