|
|
|
|
|
by kppullin
1371 days ago
|
|
We currently use the kafka connect bigquery connector, along with debezium, to "stream" both the "changelog"/"transaction log" and to "mirror" our rdbms instances into bigquery. While this works, it's been a fair amount of effort to iron out issues over time. We also have had to work around bigquery limits including issues exceeding concurrent queries (switched to batch mode, which has it's own issues) and frequency of writes (we've had to throttle to flushing every minute, which is good enough, but did have a use case for faster updates). Also have issues related to partitioning and clustering, and more... So seeing this to potentially replace the kafka connect bigquery connector looked appealing. However, according to the docs and listed limitations (https://cloud.google.com/datastream/docs/sources-postgresql) it does not handle schema changes well nor postgres array types. Not that any of these tools handle this well, but given the open source bigquery connector, we've been able to work around this with customizations to the code. Hopefully they'll continue to iterate on the product and I'll be keeping an eye out. |
|
(case when string then varchar(4000/max))
and similar. It looks like a relatively easy thing to incrementally improve.