By GCP Study Hub — 12 May 2025

Pub/Sub Subscription Models: Push vs Pull Explained

A practical guide to understanding push versus pull pub/sub subscription models, helping you choose the right pattern based on latency requirements, processing volume, and architectural constraints.

Choosing between pub/sub subscription models is one of the fundamental architectural decisions you'll face when building event-driven systems. Whether you're designing a real-time notification system or a batch data processing pipeline on Google Cloud, understanding the trade-offs between push and pull subscriptions directly impacts your application's latency, resource utilization, and operational complexity.

The question is simple: should your messaging system push data to subscribers proactively, or should subscribers pull data on their own schedule? This choice affects everything from how quickly messages get processed to how much code you need to write and maintain.

Understanding Pull Subscriptions

In a pull subscription model, the subscriber controls the flow of messages. Your application explicitly requests messages from the Pub/Sub service, typically by calling an API endpoint. The subscriber decides when to ask for messages, how many to retrieve at once, and when to acknowledge successful processing.

Think of pull subscriptions like checking your mailbox. You walk out to the mailbox when it's convenient for you, take whatever mail is there, and process it at your own pace. You're in complete control of the timing.

Consider a data analytics platform that processes sensor readings from manufacturing equipment. A robotics manufacturer collects temperature, vibration, and operational status data from thousands of machines across multiple factories. Their processing pipeline runs every 15 minutes, aggregating sensor data to identify maintenance needs and optimize production schedules.


from google.cloud import pubsub_v1
import time

subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path('my-project', 'sensor-data-sub')

def process_sensor_batch():
    response = subscriber.pull(
        request={
            "subscription": subscription_path,
            "max_messages": 1000
        },
        timeout=30
    )
    
    sensor_readings = []
    for received_message in response.received_messages:
        sensor_readings.append(received_message.message.data)
    
    # Process batch of sensor readings
    analyze_equipment_health(sensor_readings)
    
    # Acknowledge after successful processing
    ack_ids = [msg.ack_id for msg in response.received_messages]
    subscriber.acknowledge(
        request={
            "subscription": subscription_path,
            "ack_ids": ack_ids
        }
    )

while True:
    process_sensor_batch()
    time.sleep(900)  # Wait 15 minutes

This pull-based approach gives the analytics platform several advantages. The processing system can batch messages together efficiently, retrieving 1,000 sensor readings at once rather than processing them individually. The application controls exactly when processing happens, which means they can schedule intensive analytics jobs during off-peak hours when compute resources are cheaper.

Pull subscriptions also simplify error handling and retry logic. If processing fails, the application simply doesn't acknowledge the messages, and Google Cloud Pub/Sub automatically makes them available again. The subscriber can implement sophisticated retry logic, exponential backoff, or even route problematic messages to a dead letter queue after a certain number of attempts.

Limitations of Pull Subscriptions

The primary weakness of pull subscriptions is latency. Messages sit in the subscription until the subscriber requests them. If your application polls every 15 minutes, messages could wait nearly that long before processing begins. For workloads where every second counts, this delay is unacceptable.

Pull subscriptions also place more implementation burden on your application. You need to write code that manages the polling loop, handles connection failures, implements acknowledgment logic, and scales the number of concurrent pullers based on message volume. This means more code to write, test, and maintain.

Consider what happens when message volume suddenly spikes. With a pull model, your application won't automatically scale to handle the increased load unless you've built autoscaling logic that monitors queue depth and spawns additional workers. You're responsible for detecting the backlog and responding appropriately.


# Pull subscriptions require explicit scaling logic
def scale_workers_based_on_backlog():
    subscription = subscriber.get_subscription(
        request={"subscription": subscription_path}
    )
    
    # Check number of undelivered messages
    num_undelivered = subscription.num_undelivered_messages
    
    # Calculate required workers
    messages_per_worker = 1000
    required_workers = num_undelivered // messages_per_worker
    
    # Scale up if needed (implementation specific)
    if required_workers > current_workers:
        provision_additional_workers(required_workers - current_workers)

This additional complexity becomes technical debt. Every subscriber needs similar boilerplate code, and each implementation might handle edge cases differently. When you're running dozens of different services all consuming from Pub/Sub topics, this duplication adds up.

Understanding Push Subscriptions

Push subscriptions flip the control model entirely. Instead of your application requesting messages, Google Cloud Pub/Sub delivers messages to your application by making HTTP POST requests to a webhook endpoint you provide. The service takes responsibility for delivery, retries, and flow control.

This is like having mail delivered directly to your door. You don't need to check for it because the postal service brings it to you. You simply need to be ready to receive it when it arrives.

A telehealth platform handling real-time appointment notifications illustrates push subscriptions well. When a patient books an appointment, cancels, or when a doctor joins a video session, these events need immediate delivery to update patient and provider dashboards, send SMS notifications, and trigger automated workflows.


// Express.js webhook endpoint for push subscription
const express = require('express');
const app = express();

app.post('/pubsub/appointment-events', express.json(), async (req, res) => {
    const message = req.body.message;
    
    // Pub/Sub sends base64 encoded data
    const appointmentEvent = JSON.parse(
        Buffer.from(message.data, 'base64').toString()
    );
    
    try {
        // Process immediately
        if (appointmentEvent.type === 'appointment_booked') {
            await sendPatientConfirmation(appointmentEvent.patientId);
            await notifyProvider(appointmentEvent.providerId);
            await updateDashboards(appointmentEvent);
        }
        
        // Acknowledge success with 200 status
        res.status(200).send('OK');
    } catch (error) {
        // Return error status to trigger retry
        console.error('Failed to process event:', error);
        res.status(500).send('Processing failed');
    }
});

app.listen(8080);

The push model delivers messages within seconds of publication. There's no polling interval creating artificial delays. As soon as an appointment event occurs, the webhook receives it and processes it immediately. For a healthcare application where timely communication matters, this responsiveness is critical.

Push subscriptions also reduce the amount of code you need to write. Your application simply exposes an HTTP endpoint and processes messages as they arrive. Google Cloud handles the complexity of monitoring the topic, retrieving messages, and managing delivery. If your endpoint returns an error or times out, GCP automatically retries with exponential backoff.

The platform also handles flow control automatically. If your endpoint starts responding slowly or returning errors, Pub/Sub reduces the rate of message delivery. When your service recovers, delivery ramps back up. You get built-in backpressure without implementing it yourself.

Constraints of Push Subscriptions

Push subscriptions come with strict requirements. Your endpoint must be accessible via HTTPS, which means you need proper TLS certificates. The endpoint must also be publicly accessible so Google Cloud can reach it, or you need to configure VPC Service Controls to allow the connection.

This architectural constraint can be problematic. Some organizations run services in private networks without external access. Setting up ingress for push subscriptions might require load balancers, API gateways, or other infrastructure you wouldn't otherwise need. Each component adds cost and complexity.


# Creating a push subscription requires a publicly accessible HTTPS endpoint
gcloud pubsub subscriptions create patient-notifications-push \\
  --topic=appointment-events \\
  --push-endpoint=https://api.telehealth.example.com/pubsub/appointment-events \\
  --push-auth-service-account=pubsub-invoker@project.iam.gserviceaccount.com

Push subscriptions also limit your batching opportunities. Messages arrive one at a time, which means your endpoint processes them individually. If you're performing operations that benefit from batching like bulk database inserts or aggregations, push subscriptions force you to implement your own buffering layer.

Consider a logistics company tracking package scans across their distribution network. A freight company processes millions of scan events daily as packages move through sorting facilities, trucks, and delivery routes. Their analytics system benefits from inserting scan records in batches of 10,000 to minimize database overhead.

With push subscriptions, each scan event triggers a separate webhook call. To achieve efficient batching, they'd need to build an intermediate layer that accumulates events in memory or temporary storage before writing to the database. This adds complexity that pull subscriptions avoid by naturally batching message retrieval.

How Google Cloud Pub/Sub Handles Subscription Models

Google Cloud Pub/Sub treats push and pull as first-class subscription types with different optimization strategies. When you create a subscription on GCP, you're choosing a complete delivery and flow control mechanism that the platform manages differently under the hood.

For pull subscriptions, Pub/Sub maintains a server-side buffer of messages and serves them in response to pull requests. The service tracks which messages each subscriber has pulled but not yet acknowledged, maintaining delivery guarantees even if your subscriber crashes. GCP allows you to configure acknowledgment deadlines and retry policies specific to your processing latency requirements.


# Creating a pull subscription with custom retry policy
gcloud pubsub subscriptions create sensor-analytics-pull \\
  --topic=sensor-readings \\
  --ack-deadline=60 \\
  --min-retry-delay=10s \\
  --max-retry-delay=600s

For push subscriptions, Google Cloud Pub/Sub implements sophisticated delivery logic including automatic retries, exponential backoff, and congestion control. The platform monitors your endpoint's response times and error rates, dynamically adjusting delivery rates to match your capacity. This adaptive behavior means GCP automatically backs off when your service is struggling and increases throughput when your service is healthy.

One unique capability in GCP is the ability to configure push subscriptions with authentication. You can specify a service account that Pub/Sub uses to generate OIDC tokens or JWT tokens when invoking your webhook. This means your endpoint can verify that requests genuinely come from your Pub/Sub subscription rather than an attacker who discovered your URL.


# Push subscription with OIDC authentication
gcloud pubsub subscriptions create secure-webhook-push \\
  --topic=sensitive-events \\
  --push-endpoint=https://internal-api.example.com/events \\
  --push-auth-service-account=pubsub-authenticated@project.iam.gserviceaccount.com \\
  --push-auth-token-audience=https://internal-api.example.com

Google Cloud also offers a hybrid approach through Cloud Run and Cloud Functions integration. When you deploy serverless functions on GCP, you can configure them to be triggered directly by Pub/Sub topics. This gives you the immediate delivery characteristics of push subscriptions with the operational simplicity of serverless platforms. GCP handles scaling, retries, and infrastructure management completely automatically.

This tight integration changes the calculus for many workloads. Instead of choosing between managing your own pull-based workers or exposing public webhooks, you can deploy a Cloud Function that processes messages with zero infrastructure overhead. For a mobile game studio processing player analytics events, this could mean deploying an event handler in minutes rather than days.

Realistic Scenario: Video Streaming Platform

A video streaming service processes two distinct event streams that illustrate when each subscription model makes sense. The platform serves millions of viewers across web, mobile, and TV applications.

The first stream handles playback quality telemetry. Every few seconds, video players report buffering events, bitrate changes, and error conditions. These events flow into BigQuery for quality analysis and alerting. The service collects roughly 50 million telemetry events per hour during peak viewing times.

For this workload, they use pull subscriptions. A fleet of Dataflow workers pulls batches of 5,000 telemetry events, performs light transformation, and streams them into BigQuery using batch inserts. This batching approach reduces BigQuery insert operations by several orders of magnitude compared to individual inserts, directly impacting their data ingestion costs.


# Dataflow pipeline using pull subscription
import apache_beam as beam
from apache_beam.io.gcp.pubsub import ReadFromPubSub
from apache_beam.io.gcp.bigquery import WriteToBigQuery

def run_telemetry_pipeline():
    with beam.Pipeline() as pipeline:
        (
            pipeline
            | 'Read from Pub/Sub' >> ReadFromPubSub(
                subscription='projects/myproject/subscriptions/telemetry-pull'
            )
            | 'Parse JSON' >> beam.Map(parse_telemetry_event)
            | 'Batch Window' >> beam.WindowInto(
                beam.window.FixedWindows(30)  # 30 second windows
            )
            | 'Write to BigQuery' >> WriteToBigQuery(
                table='myproject:analytics.playback_quality',
                write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
            )
        )

The second stream handles content moderation events. When viewers flag inappropriate content or automated systems detect potential violations, these events need immediate review. Speed matters because harmful content should be removed quickly, and false positives frustrate users if their legitimate content stays unavailable too long.

For moderation events, they use push subscriptions that invoke a Cloud Run service. The service receives flagged content metadata, retrieves the relevant video segment from Cloud Storage, runs it through their moderation models, and updates the content status within seconds. This push-based architecture keeps median moderation latency under 10 seconds.

The business impact is measurable. Before implementing push subscriptions for moderation, their pull-based system checked for flagged content every 2 minutes, resulting in average moderation delays of 60 seconds. The push model reduced this to 10 seconds, improving both safety outcomes and user experience.

Their cost structure also reflects these choices. The pull-based telemetry pipeline processes 1.2 billion events daily but generates only around 8,000 BigQuery insert operations per hour due to batching. At BigQuery's pricing for streaming inserts, batching saves them approximately $15,000 monthly compared to unbatched inserts. The push-based moderation flow handles far fewer events around 100,000 daily but values speed over batch efficiency, making the individual processing overhead worthwhile.

Choosing Between Push and Pull Subscriptions

The decision framework comes down to a few key factors. Latency requirements drive the choice in many cases. If you need sub-second response times and can't tolerate polling delays, push subscriptions deliver messages faster. If you can accept delays measured in seconds or minutes, pull subscriptions work fine and might offer other advantages.

Processing patterns matter significantly. When your application benefits from batching operations together like database writes, file processing, or aggregations, pull subscriptions let you retrieve hundreds or thousands of messages at once. When messages require individual processing and batching provides no benefit, push subscriptions work well.

Infrastructure constraints sometimes make the decision for you. If your services run in private networks without external access, push subscriptions require additional networking infrastructure. If you can't easily expose HTTPS endpoints, pull subscriptions avoid that requirement entirely. Conversely, if you're already using Cloud Run or Cloud Functions, push subscriptions integrate smoothly with zero additional infrastructure.

Factor	Pull Subscription	Push Subscription
Latency	Seconds to minutes depending on polling interval	Sub-second delivery after publication
Batching	Natural batching by retrieving multiple messages	Individual message delivery requires buffering layer
Infrastructure	Requires worker processes or compute instances	Requires HTTPS endpoint (webhook or serverless)
Code Complexity	More code for polling, ack, and scaling logic	Simpler code, just process incoming requests
Flow Control	Subscriber controls rate explicitly	GCP manages rate based on endpoint health
Retry Logic	Implement explicitly in subscriber code	Automatic with exponential backoff
Best For	Batch processing, analytics, scheduled jobs	Real-time notifications, webhooks, immediate actions

Operational expertise also influences the choice. Teams comfortable managing worker fleets and implementing distributed systems patterns might prefer the control that pull subscriptions provide. Teams that value operational simplicity and prefer managed services might lean toward push subscriptions, especially when combined with Cloud Run or Cloud Functions.

Cost considerations can tip the balance in either direction. Pull subscriptions running on Compute Engine give you fine-grained control over compute costs but require you to manage capacity. Push subscriptions to Cloud Run only charge when actually processing messages but at a higher per-invocation cost. Running the numbers based on your message volume and processing requirements reveals the more economical choice for your specific workload.

Exam Preparation and Practical Takeaways

Understanding pub/sub subscription models appears frequently in Google Cloud certification scenarios, particularly for the Professional Data Engineer and Professional Cloud Architect exams. Exam questions often present a business requirement and ask you to choose the appropriate subscription type or identify problems with a proposed architecture.

Watch for scenarios that emphasize latency requirements. Phrases like "real-time processing," "immediate notification," or "within seconds" typically point toward push subscriptions. Language like "batch processing," "scheduled analytics," or "efficient bulk loading" usually indicates pull subscriptions.

Pay attention to infrastructure constraints mentioned in questions. If a scenario describes services in a private VPC without external access, pushing directly to those services creates architectural challenges. If the scenario mentions serverless deployments or Cloud Run, push subscriptions integrate naturally.

Also recognize that you can use both subscription types with the same topic. A common pattern involves one push subscription for real-time alerting and separate pull subscriptions for batch analytics. The exam sometimes tests whether you understand that multiple subscriptions operate independently, each receiving a copy of every message.

The key to answering these questions correctly is thinking through the entire data flow. Consider where messages come from, what processing they need, where results go, and what latency the business can tolerate. The right subscription model emerges from understanding the complete picture.

Building real-world intuition for these patterns takes practice with actual implementations. You need to feel the difference between writing polling code and handling webhook requests, experience how batching affects throughput, and observe how latency changes with different approaches. For readers looking for comprehensive exam preparation that includes hands-on labs and detailed scenarios covering pub/sub subscription models and other critical Google Cloud patterns, check out the Professional Data Engineer course.

Thoughtful engineering means recognizing that neither push nor pull subscriptions are universally superior. Each model represents different trade-offs between latency, complexity, and operational requirements. Success comes from understanding these trade-offs deeply enough to match the subscription model to your specific needs, infrastructure, and constraints.