Hacker News new | ask | show | jobs
by ramkiz 74 days ago
Very cool idea. The part I would love to hear more about is how you are thinking about the boundary between notebook/IDE convenience and actual data lake guarantees. For example, what exactly is versioned, how reproducible are transformations, and how much lineage visibility do I get once I start mixing SQL, PySpark, natural language queries, and imported web/DB data?
1 comments

Everything including actual data, schema and transform is versioned and tracked at job run level.

You will get job run level lineage for any datasets created in the system.