|
|
|
|
|
by pentium10
3097 days ago
|
|
During the 4 years we at REEA.net tried a lot of approaches, in different projects.
1. We stream every upate as a new row directly to the table. We have for every table a view that has a naming convention "table_latest" and it targets latest version of the record. We didn't see a big latency increase.
Deleted rows have a 'deleted' flag, we use EXCEPT in the query.
2. Since DML is available we started on newer projects dual aproach, having a dedicated streaming table, and rematerializing to a flat table. It adds developer complexity but it's a win, as the materialized table is easier to be used by our marketing team.
We added to the reports "Based on data 10 minutes ago" label, and a CTA "re-run using live data", which in background calls the script to rematerialize the table and rerun the report. Our markting team loves this, and they don't abuse this functionality. On some projcts we have remateralization every 1h, on others daily.
3. regarding partitioned tables where you can stream only 7 days of window into their respective partitions we use also a complementary load job. We stream in the 7 day window. And we have a heavy "future date" use case, where we use "load jobs" to a temp table, then query and write to their respective partitions. |
|