Table of Contents
Cloud messaging systems are an integral part of any organization’s communication ecosystem. They are used to facilitate communication between different system components in a decoupled manner. They support scalability and reliability of distributed systems which are required for modern day applications to function seamlessly.
In today’s article we understand and compare two prominent cloud-based messaging systems – Apache Kafka and Google Pub/Sub, their key features, key differences and use cases.
What is Apache Kafka
Developed by LinkedIn, a distributed streaming platform which is meant to handle high throughput, data feeds in real time. It is based on a public subscription model where messages are sent by publishers to a topic and subscribers receive messages from the topic. Kafka runs on a cluster of brokers having partition split across nodes in a cluster. Data streams are published to topics via APIs.
Key Features of Apache Kafka
- High volumes of data handling in efficient manner
- Scalability and fault-tolerance is provided with cluster of servers
- Storage of data on disks and replication within cluster for reliability
- Wide range of use cases support and complexity in processing requirements
Use Cases for Apache Kafka
- Advanced features such as stream based processing, partition and replication
- Distributed streaming of real-time data processing
- Storage and replay of messages in long term analysis
What is Google Pub/Sub
Google Pub/sub is a messaging service from Google cloud. It is a scalable, fully managed messaging system which enables asynchronous, decoupled communication between cloud applications. Pub/sub is based on the publish-subscribe model to support both push and pull message deliveries. Messages remain in store until acknowledgement. Publishers and pull publishers can make Google API HTTPS calls. It supports auto scaling and load is distributed across Google data centers and users are charged based on volume of data.
Key Features of Google Pub/Sub
- No need to manage underlying infrastructure fully managed service from Google
- Automatic scaling to meet application requirements
- Seamless integration and working with Google other services
- Ensures message delivery at least once
Use Cases for Google Pub/Sub
- Fully managed messaging services for asynchronous and decouple communication requirements
- Microservices architecture
- Event driven systems
- Simple and reliable communication system
Comparison: Apache Kafka vs Google Pub/Sub
Parameter | Apache Kafka | Google Pub/Sub |
Architecture | Apache Kafka is distributed streaming platform | Google pub/sub is a messaging service (fully managed) |
Scalability | Apache Kafka is designed for high throughput, data feeds in real time and ideal for large scale deployments | Google pub/sub is designed for scalability and can handle real time data feeds but not meant for large scale deployment |
Persistence | Apache Kafka supports long term storage of messages on a disk | Google pub/sub do not provide message storage functionality |
Features | It has rich set of features such as portioning, replication and stream-based processing | Pub/sub is meant for reliable delivery of messages |
Usage | Ideal for large scale data processing, data streaming in real time and data processing pipelines | Ideal for asynchronous, decoupled communication between applications over cloud |
Application | Data analytics, log aggregation and real time monitoring requirements | Microservices architecture, IoT applications and event driven applications |
Management | Apache Kafka requires to manage a cluster | Google Pub/sub is fully managed Google service, you need not to worry about underlying infrastructure |
Messaging Guarantee | Per normal connector at least once | At least once |
Per Spark direct connector precisely once | ||
Throughput | ~30,000 messages/sec | Default – 100MB/s in |
200MB/s out | ||
Maximum is quoted unlimited | ||
Configurable Persistence Period | There is no maximum period defined | Not configurable (7 days) or until subscriber’s acknowledgement |
Replication | Replicas are configurable. Message acknowledgement is published on send, receipt or successful replication | Message published acknowledgement post half of the disks on cluster have the message |
Languages Supported | Java, Go, Scala, Python, C++, .NET, .NET core, node.js, PHP, Ruby, Spark etc. | Java, Go, .NET, .NET core, Ruby, Python, Spark. |