Hacker News new | ask | show | jobs
by Avalaxy 1074 days ago
Tools-wise there are many ways to do it. From simple custom scripts to more advanced solutions like a data lakehouse. It depends on the budget and the needs I suppose.

> but I was warned the files may randomly break spec (new/renamed fields, etc) due to errors in the process that generates them. Is there a standard way to handle this type of thing?

This is a political decision. I think you have two options:

- Set a clearly defined schema, and validate against that. Whenever the input doesn't match the schema, put it in a separate 'error folder' and notify someone to edit the data so it can be retried.

- Accept schema changes. Either by having some sort of auto migrations, or by simply not defining a schema at all (like a document store).

Imho, the last option is not a good one if you want to build analytics on top of it, because you need to have some cleaned up and structured version of the data that you can rely on.