What do you usually want to happen with late data? In DD you have the option to ignore it at the source but not to update already-emitted results. Is the latter important for you?
In DDflow, you could also use the `Product` timestamp combinator, and track both the time that event came from, as well as the time you ingested it.
You can then make use of the data as soon as the frontier says it's current for the relevant ingestion timestamp, and occasionally advance the frontier for the origin timestamp at the input, so that arrangements can compact historic data. An affected example would be a query that counts "distinct within some time window". It only has to keep that window's `distinct on` values around as long as you can still feed events with timestamps in that window.
If you are no longer able to, the values of the `distinct on` become irrelevant for this operator, and only the count for that window needs to be retained.
If I have to report transaction (aka money) then yes. I need to update already emitted results. If it's just a log-based metric for internal use then no.
What I would like to have is a choice - and Apache Beam for example lets you choose this.