|
|
|
|
|
by joeharris76
3851 days ago
|
|
We've struggled with the same issues. We also keep a "base" schema in Redshift as a close mirror to source systems (tweaked a little for distribution keys). In our case we control the extracts but schema changes have to be manually added. If a new column is important and we miss it then we have to run time consuming backfills. Then only way we've been able to ameliorate it somewhat is to be more involved in the upstream release process but that is time consuming in itself. The solution I'd like to implement is having the extracts and loads be built dynamically whenever a change is detected. E.g. have a process that runs separately checking the schema of our sources for changes, dynamically updating the ETL processes and then publishing them to Azkaban. Looking forward though our biggest data sources are going non-relational so that approach will entail a great deal more complexity. Regarding schema knowledge / metadata, this is a perennial problem. I once worked on a huge Teradata DW system at a global bank who had a large internal website for just this purpose which was hugely out of date at all times. When I started building out the analytic capability at my current company I made a fork of "Rocco" (https://github.com/joeharris76/rocco) for doing inline documentation as Markdown inside SQL block comments. http://pow.cx/docs/ is a good example of what I was hoping to achieve. However even our smallest components have many, many files and the docs were too difficult to navigate using the "Jump To…" dropdown menu. So, while this produces some very nice docs, sadly they turned out to be largely unhelpful to anyone who didn't already understand the code. Our current approach is to try and keep individual components small, include a comprehensive README in each component and use Slack comms to fill in the gaps. |
|
What you described with documentation efforts is kind of scenario I fear. We could spend significant effort putting together a documentation system that simply doesn't meet a real need. It's good to know other folks are also struggling with that, though.