| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mateuszklimek 1570 days ago
	It's "inspired" the dbt transformation part by using the same models and logic/part of code of generating them. We, for example, had a funny thing of computing metrics in 4 threads via multiple dbt models, and this is also done in elementary in a very similar way :) The lineage part is independent (re_data uses lineage from dbt), so I haven't looked into that much.

2 comments

Maayansa 1570 days ago

While writing our dbt project we looked into more than 60 dbt projects to learn from prior work while developing Elementary, and have been inspired by different things in different places. You're right that we were inspired by a couple of techniques you used, one being that creative way to improve performance (though the 4 thread setting itself is the dbt recommendation in their docs). Another is using z-score for anomaly detection, which we saw in a number of related projects and it's widely used in the industry.

In terms of the lineage, you can see in the code that we mostly rely on query and access history that exist in Snowflake and Bigquery to parse the queries and learn about the connection between nodes in the graph. We use other python libraries like sqlfluff and sqllineage as low level parsers for some specific use cases which we extend and solve many things on top of them. Actually we're heavy open source users, depending on around 20 libraries, all MIT or Apache.

link

mateuszklimek 1568 days ago

Okay, I'm happy that you admit inspiration in this comment (in opposition to the previously deleted one).

Also, I think it's more than just following up re_data in a couple of places. Elementary's whole data monitoring part started much later than your Lineage part, and it seems to try to follow what re_data did there on the idea & implementation level. I'm sure the other 59 projects you mention were not dbt packages for data reliability (there were no other one in the dbt hub) which is what re_data is and now elementary also tries to copy this. (seeing our traction)

As mentioned, it's open-source. You can use our code. But if you are doing that, state that clearly in the LICENSE.

link

nuclearnice1 1570 days ago

I think mateuszklimek is pointing out that the MIT license requires you to include the redata copyright in your source.

link

DarthNebo 1569 days ago

Right on point, they don't even have a filled out LICENSE on the repo

> Copyright [yyyy] [name of copyright owner]

https://github.com/elementary-data/elementary/blob/master/LI...

link

windsquirrel 1570 days ago

Gotcha - I can see what you mean, appreciate the clarification

link