|
|
|
|
|
by snidane
1460 days ago
|
|
The article is not explaining the point, which I believe is: type your dicts if you want to provide strict guarantees to your downstream about data shape. If you know precisely what the data is used for - great, go ahead - type system is your friend. If you don't know how the data should be used, it's often a different story. Wrapping data in hand typed classes is a terrible idea in the typical data engineering scenarios where there might be hundreds of these api endpoints, which also might be changing as the upstream sees fit. Perfect way to piss off your downstream users is to keep telling them "sorry the data is not available because I overspecified the data type and now it failed on TypeError again". Usually the downstream is the domain expert, they know which fields should be used and they don't know which ones before they start using it. Typically the best way is to pass ALL the upstream data down, materialize extra fields and NOT modify any existing field names, even when you think you're super smart and know better than domain experts. Too often it happens that a "smart" engineer though he knew better and included only some fields. Only for then to be realized that the data source contained many more gold nuggets, and it was never documented that these were cleverly dropped. |
|
Also great for property testing / fuzzing. And other fun meta datamodel stuff like eg inferring schema from example data.
In general programming language type systems are pretty weak in comparison because they're not very programmable. (In most languages, for most people, etc .. there are fancy level type systems approaching formal proof toolkits but they're hard to use)