Apache Kafka vs Google Pub/Sub: Understand the difference

Cloud messaging systems are an integral part of any organization’s communication ecosystem. They are used to facilitate communication between different system components in a decoupled manner. They support scalability and reliability of distributed systems which are required for modern day applications to function seamlessly. 

In today’s article we understand and compare two prominent cloud-based messaging systems – Apache Kafka and Google Pub/Sub, their key features, key differences and use cases. 

What is Apache Kafka  

Developed by LinkedIn, a distributed streaming platform which is meant to handle high throughput, data feeds in real time. It is based on a public subscription model where messages are sent by publishers to a topic and subscribers receive messages from the topic. Kafka runs on a cluster of brokers having partition split across nodes in a cluster. Data streams are published to topics via APIs.

Key Features of Apache Kafka 

  • High volumes of data handling in efficient manner
  • Scalability and fault-tolerance is provided with cluster of servers
  • Storage of data on disks and replication within cluster for reliability
  • Wide range of use cases support and complexity in processing requirements 

Use Cases for Apache Kafka 

  • Advanced features such as stream based processing, partition and replication  
  • Distributed streaming of real-time data processing
  • Storage and replay of messages in long term analysis

What is Google Pub/Sub  

Google Pub/sub is a messaging service from Google cloud. It is a scalable, fully managed messaging system which enables asynchronous, decoupled communication between cloud applications. Pub/sub is based on the publish-subscribe model to support both push and pull message deliveries. Messages remain in store until acknowledgement. Publishers and pull publishers can make Google API HTTPS calls. It supports auto scaling and load is distributed across Google data centers and users are charged based on volume of data.

Key Features of Google Pub/Sub

  • No need to manage underlying infrastructure fully managed service from Google
  • Automatic scaling to meet application requirements
  • Seamless integration and working with Google other services
  • Ensures message delivery at least once 

Use Cases for Google Pub/Sub

  • Fully managed messaging services for asynchronous and decouple communication requirements
  • Microservices architecture
  • Event driven systems
  • Simple and reliable communication system 

Comparison: Apache Kafka vs Google Pub/Sub

ParameterApache Kafka  Google Pub/Sub
ArchitectureApache Kafka is distributed streaming platformGoogle pub/sub is a messaging service (fully managed)
ScalabilityApache Kafka is designed for high throughput, data feeds in real time and ideal for large scale deploymentsGoogle pub/sub is designed for scalability and can handle real time data feeds but not meant for large scale deployment
PersistenceApache Kafka supports long term storage of messages on a diskGoogle pub/sub do not provide message storage functionality
FeaturesIt has rich set of features such as portioning, replication and stream-based processingPub/sub is meant for reliable delivery of messages
UsageIdeal for large scale data processing, data streaming in real time and data processing pipelinesIdeal for asynchronous, decoupled communication between applications over cloud
ApplicationData analytics, log aggregation and real time monitoring requirementsMicroservices architecture, IoT applications and event driven applications
ManagementApache Kafka requires to manage a clusterGoogle Pub/sub is fully managed Google service, you need not to worry about underlying infrastructure
Messaging GuaranteePer normal connector at least onceAt least once
Per Spark direct connector precisely once
Throughput~30,000 messages/secDefault – 100MB/s in
200MB/s out
Maximum is quoted unlimited
Configurable Persistence PeriodThere is no maximum period definedNot configurable (7 days) or until subscriber’s acknowledgement
ReplicationReplicas are configurable. Message acknowledgement is published on send, receipt or successful replicationMessage published acknowledgement post half of the disks on cluster have the message
Languages SupportedJava, Go, Scala, Python, C++, .NET, .NET core, node.js, PHP, Ruby, Spark etc.Java, Go, .NET, .NET core, Ruby, Python, Spark.
Download the comparison table: kafka vs pub/sub

Leave a Comment

14 + 1 =

Select your currency
USD United States (US) dollar