| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ramkiz 74 days ago
	Very cool idea. The part I would love to hear more about is how you are thinking about the boundary between notebook/IDE convenience and actual data lake guarantees. For example, what exactly is versioned, how reproducible are transformations, and how much lineage visibility do I get once I start mixing SQL, PySpark, natural language queries, and imported web/DB data?

1 comments

Everything including actual data, schema and transform is versioned and tracked at job run level.

You will get job run level lineage for any datasets created in the system.