Hacker News new | ask | show | jobs
by mb14 1796 days ago
1. How do you deal with late arriving events if a python transformation needs to join data across two different sources?

2. Is hotglue able to handle large backfill jobs or jobs with arbitrary input data sets?

1 comments

Great question! Instead of dealing with race conditions like this, we encourage users to use our webhooks to create a workflow.

For example, if I wanted to sync my user's sales data and then relate it to the invoices from the accounting data, I could kick off a sync for the sales, and save that data into what we call a snapshot. From there, I would get a webhook when the sales data was ready, at which point I could start the sync of the accounting data.

Yes, we can handle large backfill jobs (ie. all the data up to the present), and then we incrementally sync new data.

Hopefully that answers your questions – happy to clarify.