Hacker News new | ask | show | jobs
by frankacter 1020 days ago
That depends a lot on your environment, but I can generalize a few scenarios that are more common.

Apache Kafka for example, is an open source open-source distributed event streaming platform that, among other things, provides mechanism for data integration to ensure end-to-end data transfer.

If it is log data, Apache Flume aggregates and moves large amounts of log data efficiently. Ensures data is not lost during transfer.

Apache Spark Structured Streaming, for stream processing, it provides exactly-once semantics to guarantee data is not lost or duplicated during transfer.

Apache NiFi is another open source ETL tool that allows transferring data between systems reliably while ensuring integrity through versioning, provenance etc.

Python libraries like Fleep, Tenacity help make data transfers fault tolerant and ensure retries/rollback on failures. Integrity can be checked through hashes.

Node.js libraries, streams like StreamData allow building fault tolerant data pipelines while ensuring integrity through FlowFile handling.

Azure Data Factory provides reliable data transfer mechanisms like replication, retries, monitoring to guarantee end-to-end transfer without data loss.