| Let me preface by saying terms like "Dependency Inversion" are fuzzy and meant to convey general strategies that can be refined for specific cases, so my exact definition of DI here might be subtly different from others', and certainly overlaps with other ideas that also have their own fuzzy names. In its general form, Dependency Inversion says "don't depend on concrete things; depend on abstract things" with the corollary "don't go get something you need; ask for what you need in general terms and let someone else figure out which concrete things you actually get". Following this to its logical conclusion often ends up with exactly the kind of directed acyclic graph data structures they describe. So look at their old vs. new example: Old: df['COLUMN_C'] = df['COLUMN_A'] + df['COLUMN_B']
df is a one extremely specific data frame. It's saying "take this specific column of this exact data frame and add it to this other specific column of the same exact data frame". The result is a new specific column of exactly one data frame.New: def COLUMN_C(COLUMN_A: pd.Series, COLUMN_B: pd.Series) -> pd.Series:
return COLUMN_A + COLUMN_B
COLUMN_A and COLUMN_B are any series. The result is a strategy for creating new columns from existing ones. This is very general. From a DI perspective, COLUMN_A and COLUMN_B can be thought of as "requests".A directed acyclic graph represents the result of finding the appropriate (concrete) dependencies for every (abstract) request and linking them together. This is what "dependency injection" frameworks are essentially doing. Often this is done via convention, including looking at the names of variables and functions. Hamilton is doing this. In my opinion, names of variables and functions should not be used by dependency injection frameworks if you can avoid it, because it makes the code brittle in the face of what should be "safe" refactors (including minification and uglification). But it's possible it can't reasonably be avoided in Hamilton's case and may be the right choice. Note that you don't need the injection framework (Hamilton) to benefit from using dependency inversion. This is often what people mean when they say using dependency inversion makes your code more testable: you can test it in isolation by just calling the function directly, instead of depending on the injection framework to stitch things together for you. That's well and good, but testability is just a side benefit of the real win of cleaner and better organized code. If you want an even more general pattern, it's a common and effective strategy to look at the structure of the code you're sick of writing, and seek to encode that structure explicitly into a data structure that you can programmatically manipulate. Less code-as-code and more code-as-data. That used to be called "metaprogramming", but it lost its name because it's so general and ubiquitous now. This huge category of refactors covers things like iterables instead of loops, reactive streams instead of callbacks, dependency injection frameworks, expression trees, various forms of reflection, and more. |
Can you recommend any noteworthy resources for a data scientist who is interested in learning more about software engineering patterns like this?