Table of Contents
Modern cloud architecture has applications which are decoupled into smaller and independent modules or building blocks which are easier to develop, deploy and maintain. Publish (Pub)/ Subscribe (Sub) messaging architecture provides instant event notifications capability for distributed applications hosted over cloud platforms. The Publish/Subscribe model allows messages to broadcast to different parts of systems in asynchronous mode.
In this article we will learn more about Google Cloud Pub/Sub architecture, how it works, and use cases etc.
What is Google Cloud Pub/Sub?
Pub/Sub allows asynchronous communication with latencies on the order of 100 milliseconds. Pub Sub is used for streaming analytics and data integration pipelines for ingestion and distribution of data. It is effective as message-oriented middleware for services integration or as a queue for parallel tasks.
Pub/Sub enables to create systems of event producers and consumers known as publishers and subscribers. Publishers communicate with subscribers in asynchronous manner by broadcasting of events, rather than using synchronous RPC (Remote procedure calls). Publishers send events notification to Pub Sub service and Pub Sub deliver events to all services that need to react to them. It increases overall response and robustness of the system.
Pub/Sub Service Types
Pub/Sub consists of two types of services
- Pub/Sub service – Default choice for most users and applications. Highest reliability and largest set of integrations along with auto capacity management and guarantees synchronous replication of all data to at least two zones and best effort replication to third additional zone
- Pub/Sub Lite service – Built for lower cost. Lower reliability compared to its counterpart. Either zonal or regional topic storage and zonal topics are stored in only one zone. Regional Lite replicates topics to the second zone asynchronously. It requires to pre-provision and managing storage and throughput capacity.
Google Cloud Pub/Sub Architecture
Pub/Sub servers run on all GCP regions and allow fast and global data access, and users are given control over where messages are stored, cloud Pub/Sub offers global data access in that publisher and subscriber clients do not know the location of servers which they will connect and data routing by services. Pub Sub is divided into two parts :
Data Plane and Control Plane.
- Data Plane – Data Plane handles moving messages between publishers and subscribers. Publisher sends message on a topic to Pub Sub it is encrypted by proxy layer and sent to publishing forwarder which is connected to publisher. Message is written to storage to ensure delivery. The publishing forwarder acknowledges receipt of message back to the publisher at which point Pub/Sub guarantees that message is delivered to all subscriptions attached herewith.
- Control Plane – Distribution of clients to forwarders in a way that offers scalability, availability and low latency for all clients. Any forwarder is capable of serving clients for any topic or subscription. When a client connects to Pub Sub the router decides the data centres client would connect to based on shortest network distance. The router will ensure uniformity of load and stability of assignments. Client takes a list of forwarders and connects to one or more.
Use cases for Pub/Sub Services
- Ingestion user interaction and server events – User interaction events from end user applications or server events from system are forwarded to Pub/Sub service and then use a stream processing tool which delivers them to BigQuery , Bigtable, Cloud storage and other databases. It let you gather events from many customers simultaneously.
- Real time event distribution – events, raw or processed, are available to multiple applications across team and organization for processing in real time. It allows integration with many Google systems which export events to Pub Sub.
- Replication of data among databases – Pub Sub is used to distribute change events from databases. Events can be used to build view of the database state and state history in BigQuery and other systems for data storage
- Parallel processing. and workflows – efficient distribution of large number of tasks among multiple workers such as text files compression, email notifications, AI modules evaluation, reformat images using Pub/Sub messages to connect to cloud functions.
- Enterprise event bus – create and enterprise-wide real time data sharing bus, distribution of business events, updates to databases and events analytics across organization.
- Data streaming from applications , services or IoT devices – real time event feeds of events for use in Google cloud products.
- Distributed cache refreshes – application publishing invalidation events to update the IDs of objects which are changed.
- Load balancing for reliability – services instances deployed on compute engines in several zones but subscribe to common topics. If service fails in one zone others can take up load automatically.