Hacker News new | ask | show | jobs
by vklmn 1686 days ago
The same way as Kafka. Jitsu nodes (=collection agents) writes to write ahead log, and then either sends data to destination right away, or sends data in batches.
1 comments

Thanks! I take it this file is where I can get started to learn more:

https://github.com/jitsucom/jitsu/blob/0aaa74b59eb9d8c885c80...

I see that it instantiates an "AsyncLogger" - does the service wait until data is written to the log prior to returning success to the client?

Is the WAL the same source used to feed both database storage destinations and other SaaS destinations?

Hi! My name is Sergey, I’m a Jitsu product engineer. I’ll gladly answer your question! AsyncLogger works asynchronously by design. There is a go channel which writes JSON’s to the log file. Answering your question: the service doesn’t wait until data is written to the log prior to returning success to the client. WAL log is designed for keeping events JSON’s between Jitsu instance restarts to prevent data loss. When you deploy your Jitsu application, it will handle service restart signals (e.g. sigterm) and closes database connections as well as other resources. All incoming events are stored in WAL log in this time. So, after the Jitsu starts, all events from WAL log will be passed to the main events JSON pipeline and stored to the destinations.
Is the WAL only used during restart, or also during normal operations? Trying to create a mental model of how data flows through the system and into destinations.
During normal operations as well. Jitsu supports destinations in two modes: stream and batch. In case of using batch mode: all JSON events will be stored into WAL asynchronously (client doesn't have to wait) and then batch destination processes WAL files in background and stores data in batches. In case of using stream mode: all JSON events will be stored into queue (which is persistent) and will be processed one by one and stored with insert statements into the data warehouse.