|
|
|
|
|
by void_mint
1859 days ago
|
|
> what's the relation to bugs in logging and analytics? I'm not sure what you mean. Software has bugs, data has bugs, etc. To be able to fix a bug and rerun a solution is important in all areas of software, it has nothing to do with logs or analytics (but data and data model type questions usually are important to those domains). > also, is there a good resource on how to backfill? Not really, because "backfill" means something different to everyone that holds data. Starting with what questions to ask, I would ask "What do we do if a lot of our data shows up incorrect" and "What do we do if lots of our data goes missing", and solving problems in an individual data stack that arise from those questions. As an example, at a previous job our ETL/ELT system was all started with a file showing up in an S3 bucket. The code that ingested the contents of those files occasionally had bugs that required reingesting of all data that was processed by that version of the code. Having tools to identify (at the data level) what data was affected by this bug, and then being able to delete that data from a datastore and reingest only those S3 files with a newer version of the ingestion code made these types of bugs much easier to manage over time. |
|
The reason I asked about resources is because I have data generated by a personal project. The initial data model was sloppy and so now I'm finding myself having to backfill to clean the data and it's rather painful. Though I haven't come across anything that deals with the subject so I'm just winging it on my own