Hacker News new | ask | show | jobs
by alexatkeplar 3871 days ago
Pub/Sub is pretty meh. The Google Cloud team still don't seem to understand the need for a Kafka-like unified log service, unlike AWS (Kinesis is 2 years old this week) or even IBM Bluemix (who just launched Message Hub, which is true hosted Kafka).
3 comments

Depending on the sort of logging you're looking for, there's a logging API (https://cloud.google.com/logging/docs/api/) and you can also stream to BigQuery.

(Disclaimer: I work on GCP)

This is the best, but I keep thinking that I will no longer need ELK, but still do. Can you create a dashboard system that runs off of bigquery?
One thing you gotta remember is, PubSub charges per volume, regardless of speed (in other words, scaling is free). AWS will charge you varying orders of magnitudes for varying scales, in addition to volume.
What's meh about it versus Kinesis?
Cloud Pub/Sub is really a competitor to Amazon SQS, not Kinesis. It's more helpful to think of Kafka and Kinesis as databases containing first-class, immutable streams; writing to the streams and reading from the streams are completely decoupled, unlike in a traditional pub-sub system. Jay Kreps' blog post explains it better than I can:

https://engineering.linkedin.com/distributed-systems/log-wha...

In my limited Pub/Sub experience, this seems to be how it works. You publish to a topic (an immutable stream), and then create a decoupled subscription that reads messages from the topic. Am I missing something?
I think this sentence [1] helps to explain the difference:

> When you create a subscription, the system establishes a sync point. That is, your subscriber is guaranteed to receive any message published after this point.

[1] https://cloud.google.com/pubsub/subscriber

With Kafka or Kinesis, I can write events to a stream/topic completely independently of any consumer. I can then bring as many consumers online as I want, and they can start processing from the beginning of my stream if they want. If one of my consumers has a bug in it, I can ask it to go back and start again. That's what I mean by an immutable stream in Kafka or Kinesis.

Cloud Pub/Sub engineer here. You can create as many consumers as you want. You can create them offline and bring them up and down whenever you want. Each consumer will receive a full copy of the stream, starting with its sync point (subscriber creation). Each message is delivered, and redelivered, to each consumer until that consumer acks that message.

If I understand your point correctly, the only expectation we haven't matched is the ability to "go back and start again". We hear you.

From your comment it sounds like you haven't used Kinesis or Kafka yourself - rather than take my word for it, I'd suggest your team give both of those platforms a serious try-out to really understand the capability gaps. I'd be surprised if a lot of your [prospective] customers weren't asking for these kinds of unified log capabilities in Cloud Pub/Sub.
I'm not familiar with Kafka

1. Can you direct the consumer to a point in stream? (ideally time based i.e messages from 16 Nov UTC)

2. Can old events be auto removed defined by rules?

I haven't played with kafka in a while, but basically,

1. each group id represents a point in the stream that a consumer is processing off of. You could technically have multiple processes consuming off of a single group id.

2. there was a configuration on time to keep things there as well as space if I remember correctly, but basically, there has to be. There's a pretty hard limit on what all you can store on disk.

edit: changed consumer id to group id. If you want more info, feel free to ping me about the ecosystem

Take a look at PubSub + Google Cloud Dataflow combo.