Hacker News new | ask | show | jobs
by ashug 872 days ago
I'm not familiar with Kinesis's sink APIs, but yes I'd imagine you'll have to write your own connector from scratch.

To answer your question, though, no: in the Kafka connector, the frequency of inserts into ClickHouse is configurable relatively independent of the batch size, so you don't need massive scale for real-time CH inserts. To save you a couple hours, here's an example config for the connector:

  # Snippet from connect-distributed.properties

  # Max bytes per batch: 1 GB
  fetch.max.bytes=1000000000
  consumer.fetch.max.bytes=1000000000
  max.partition.fetch.bytes=1000000000
  consumer.max.partition.fetch.bytes=1000000000

  # Max age per batch: 2 seconds
  fetch.max.wait.ms=2000
  consumer.fetch.max.wait.ms=2000

  # Max records per batch: 1 million
  max.poll.records=1000000
  consumer.max.poll.records=1000000

  # Min bytes per batch: 500 MB
  fetch.min.bytes=500000000
  consumer.fetch.min.bytes=500000000
You also might need to increase `message.max.bytes` on the broker/cluster side.

If you're still deciding, I'd recommend Kafka over Kinesis because (1) it's open source so more options, e.g. self host or Confluent or AWS MSK and (2) it has a much bigger community, meaning better support, more StackOverflow answers, a plug-and-play CH Kafka connector, etc.

1 comments

Thanks these config are helpful