How to Replay Messages in Pub/Sub Using Snapshots
A practical guide to replaying Pub/Sub messages using snapshots and the seek feature, essential for data recovery and reprocessing scenarios.
Learning how to replay messages in Pub/Sub using snapshots and seek is essential for anyone preparing for the Professional Data Engineer exam. This Google Cloud capability allows you to capture subscription states and replay messages that have already been acknowledged, providing a critical recovery mechanism for data processing pipelines. By the end of this tutorial, you'll understand how to create snapshots as recovery points, use the seek feature to reprocess messages, and implement these techniques in production scenarios.
The ability to replay messages becomes crucial when processing logic changes introduce bugs, when downstream systems fail, or when you need to reprocess data with updated transformations. Google Cloud Pub/Sub provides snapshots and the seek feature specifically to address these recovery needs, giving you control over message acknowledgment states even after messages have been processed.
Why Message Replay Matters
In production data pipelines, things can go wrong. A code deployment might contain a bug that corrupts data. A downstream database might become unavailable, causing message processing to fail silently. Or you might need to apply new business logic to historical messages. Without the ability to replay messages, you would lose data or need complex workarounds to recover from these situations.
Pub/Sub snapshots and seek provide a time machine for your message streams. You can capture the exact state of a subscription at any moment, then return to that state later if needed. This capability appears frequently on the Professional Data Engineer exam because it represents a key operational skill for managing reliable data pipelines on GCP.
Prerequisites and Requirements
Before starting this tutorial, you need a Google Cloud project with billing enabled, the gcloud CLI installed and configured on your local machine, and appropriate IAM permissions (Pub/Sub Editor or roles with pubsub.snapshots.create, pubsub.subscriptions.seek, and pubsub.subscriptions.get). You should have basic familiarity with Pub/Sub concepts including topics and subscriptions. Plan for approximately 30 minutes to complete the tutorial.
If you need to set up the gcloud CLI, visit the Google Cloud SDK documentation to download and initialize the tool for your operating system.
Understanding Snapshots and Seek
A snapshot in Pub/Sub captures the acknowledgment state of all messages in a subscription at a specific point in time. Think of it as a bookmark that records which messages have been acknowledged and which remain unacknowledged. Snapshots don't store the actual message data. Instead, they reference messages that already exist in the Pub/Sub retention window.
The seek feature changes the acknowledgment state of messages in bulk. When you seek to a snapshot, Pub/Sub resets the subscription to match the acknowledgment state captured in that snapshot. When you seek to a specific timestamp, all messages received before that time are marked as acknowledged, and messages received after that time become unacknowledged and available for reprocessing.
Message retention duration works hand in hand with these features. By default, Pub/Sub retains messages for seven days, but you can configure retention up to 31 days. Snapshots and seek only work within this retention window. You can't replay messages that have aged out of retention.
Step 1: Create a Topic and Subscription
First, set up a Pub/Sub topic and subscription for testing. These resources will let you publish messages and then practice creating snapshots and seeking.
Create a topic named replay-demo:
gcloud pubsub topics create replay-demoCreate a subscription to this topic with an extended retention period:
gcloud pubsub subscriptions create replay-demo-sub \
--topic=replay-demo \
--message-retention-duration=7d \
--ack-deadline=60The message retention duration of seven days ensures messages remain available for replay. The acknowledgment deadline of 60 seconds gives subscribers enough time to process messages before they become available for redelivery.
Verify the subscription was created successfully:
gcloud pubsub subscriptions describe replay-demo-subYou should see output showing the subscription configuration, including the topic, retention duration, and acknowledgment deadline.
Step 2: Publish Test Messages
Publish several test messages to create data for replay scenarios. You'll use these messages to demonstrate how snapshots and seek work.
Publish a batch of messages with timestamps:
for i in {1..10}; do
gcloud pubsub topics publish replay-demo \
--message="Message $i published at $(date +%H:%M:%S)"
doneThis command publishes ten messages, each with a sequence number and timestamp. In a real scenario, these would be business events like customer orders, sensor readings, or transaction records.
Wait a few seconds, then publish another batch to create a time gap:
sleep 10
for i in {11..20}; do
gcloud pubsub topics publish replay-demo \
--message="Message $i published at $(date +%H:%M:%S)"
doneThis gap will help demonstrate seeking to a specific timestamp later in the tutorial.
Step 3: Pull and Acknowledge Messages
Pull messages from the subscription and acknowledge them. This simulates normal message processing where a subscriber receives and processes messages successfully.
Pull five messages without acknowledging them immediately:
gcloud pubsub subscriptions pull replay-demo-sub \
--limit=5 \
--format="table(message.data,message.messageId,ackId)"You'll see a table showing the message data, message IDs, and acknowledgment IDs. The messages remain unacknowledged, so they would be redelivered after the acknowledgment deadline expires.
Now pull and automatically acknowledge messages:
gcloud pubsub subscriptions pull replay-demo-sub \
--limit=10 \
--auto-ackThe auto-ack flag tells Pub/Sub to immediately acknowledge these messages after pulling them. In a normal processing pipeline, your application would acknowledge messages only after successfully processing them. Once acknowledged, these messages would typically not be redelivered. However, the seek feature allows you to override this behavior and replay acknowledged messages.
Step 4: Create a Snapshot
Create a snapshot to capture the current acknowledgment state of the subscription. This snapshot becomes a recovery point you can return to later.
Create a snapshot with a descriptive name:
gcloud pubsub snapshots create known-good-state \
--subscription=replay-demo-subThe snapshot named known-good-state now records which messages have been acknowledged and which remain unacknowledged at this exact moment. You might create such a snapshot before deploying new processing logic, allowing you to roll back if the deployment causes problems.
Verify the snapshot was created:
gcloud pubsub snapshots describe known-good-stateThe output shows the snapshot name, the subscription it was created from, and the expiration time. Snapshots expire after seven days by default, matching the message retention period.
Step 5: Process More Messages and Simulate a Problem
Continue processing messages to simulate a scenario where something goes wrong after creating the snapshot. This will demonstrate why snapshots are valuable for recovery.
Pull and acknowledge the remaining messages:
gcloud pubsub subscriptions pull replay-demo-sub \
--limit=20 \
--auto-ackAt this point, all messages have been acknowledged. In a real scenario, imagine that the processing logic contained a bug that corrupted data in your downstream data warehouse. You've just discovered the problem and need to reprocess messages with corrected logic.
Step 6: Seek to a Snapshot
Use the seek feature to return the subscription to the state captured in your snapshot. This marks previously acknowledged messages as unacknowledged, making them available for reprocessing.
Seek to the snapshot you created earlier:
gcloud pubsub subscriptions seek replay-demo-sub \
--snapshot=known-good-stateThis command resets the subscription acknowledgment state to match the snapshot. Messages that were acknowledged after the snapshot was created are now unacknowledged and ready for redelivery.
Verify that messages are available again:
gcloud pubsub subscriptions pull replay-demo-sub \
--limit=5You should see messages that were previously acknowledged. These messages can now be reprocessed with corrected logic. The seek operation effectively turned back time for your subscription, giving you a second chance to process the data correctly.
Step 7: Seek to a Specific Timestamp
The seek feature also allows you to seek to a specific point in time rather than a named snapshot. This approach is useful when you know exactly when a problem started and want to reprocess messages from that moment forward.
First, publish more test messages to create fresh data:
for i in {21..30}; do
gcloud pubsub topics publish replay-demo \
--message="Message $i published at $(date +%H:%M:%S)"
donePull and acknowledge all messages:
gcloud pubsub subscriptions pull replay-demo-sub \
--limit=30 \
--auto-ackNow seek to a timestamp from five minutes ago. Calculate the timestamp in RFC 3339 format:
SEEK_TIME=$(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%SZ)
echo "Seeking to: $SEEK_TIME"
gcloud pubsub subscriptions seek replay-demo-sub \
--time="$SEEK_TIME"This command marks all messages received after the specified time as unacknowledged. Messages received before that time remain acknowledged. This granular control lets you replay exactly the messages you need without reprocessing data that was already handled correctly.
Pull messages to verify the seek operation worked:
gcloud pubsub subscriptions pull replay-demo-sub \
--limit=10You should see messages that were published within the last five minutes.
Real-World Application Examples
Consider a climate modeling research institute that processes weather station telemetry through Pub/Sub. The institute deploys a new data transformation algorithm that inadvertently drops temperature readings below freezing. Before deployment, the operations team created a snapshot called pre-deployment-v2. When scientists notice missing data in their models, the team uses seek to revert to the snapshot, fixes the algorithm bug, and reprocesses the affected messages. The research continues without losing critical climate data.
A mobile game studio publishes player achievement events to Pub/Sub for analytics processing. During a major game update, the analytics pipeline experiences a critical failure that prevents proper processing of player progression data for two hours. The data engineering team identifies the exact timestamp when the failure began. They use seek with a timestamp to mark all messages from that point as unacknowledged, then restart the pipeline with fixes applied. Players' achievements are correctly recorded, maintaining trust in the game's progression system.
A freight logistics company uses Pub/Sub to track shipping container movements across ports. When integrating a new warehouse management system, the company creates a snapshot before cutover. The integration initially appears successful, but auditors later discover that customs documentation is being incorrectly associated with shipments due to a timezone handling bug. The company seeks back to the pre-cutover snapshot, corrects the timezone logic, and reprocesses all shipment events. Customs compliance is maintained, avoiding regulatory penalties.
Understanding Snapshot Lifecycle and Retention
Snapshots in GCP have important lifecycle characteristics that affect their usability. When you create a snapshot, it captures references to messages within the current retention window of the topic. The snapshot itself expires after seven days by default, but this expiration is tied to the topic's message retention duration.
If you configure a topic with a 14-day retention period, snapshots created from subscriptions to that topic will also be retained for 14 days. This alignment ensures that snapshots remain usable for as long as the messages they reference exist in Pub/Sub.
You can't seek to an expired snapshot. When planning snapshot strategies for production systems, consider your recovery time objectives. If you might need to replay messages from more than seven days ago, configure longer message retention on your topics before creating snapshots.
Common Issues and Troubleshooting
One frequent issue occurs when attempting to seek beyond the message retention window. If you try to seek to a timestamp older than the retention duration, Pub/Sub will seek to the oldest available message instead. Always verify your topic's retention configuration before relying on snapshots for long-term recovery.
Permission errors often appear when creating snapshots or seeking. The service account or user performing these operations needs specific IAM roles. Ensure you've granted pubsub.snapshots.create for snapshot creation and pubsub.subscriptions.seek for seek operations. The Pub/Sub Editor role includes both permissions.
Another common problem involves seeking to a snapshot from a different subscription. Snapshots are tied to specific subscriptions and capture their acknowledgment state. You can't seek one subscription to a snapshot created from another subscription, even if both subscribe to the same topic. Create separate snapshots for each subscription you want to protect.
If messages don't appear after seeking, check whether the subscription has active subscribers pulling messages. Seeking changes the acknowledgment state, but it doesn't automatically deliver messages. Your application must pull messages or have push subscriptions configured to receive the unacknowledged messages.
Integration with Other GCP Services
Pub/Sub snapshots and seek integrate naturally with broader Google Cloud data architectures. When using Dataflow to process Pub/Sub messages, you can create snapshots before deploying pipeline updates. If the updated pipeline produces incorrect results in BigQuery or Cloud Storage, seek back to the snapshot and rerun the corrected pipeline. This pattern provides a safety net for complex streaming transformations.
Cloud Functions triggered by Pub/Sub can benefit from snapshot protection. Before deploying a new function version, create a snapshot of the triggering subscription. If the new function version contains bugs, seek to the snapshot and redeploy a corrected version. The messages will invoke the fixed function, maintaining data integrity.
When building event-driven architectures with Cloud Run services consuming Pub/Sub messages, snapshots protect against service bugs. A Cloud Run service might process payment events and write to Cloud SQL. If database constraint violations indicate processing errors, seek to a pre-deployment snapshot and reprocess with corrected SQL statements.
Pub/Sub works alongside Cloud Monitoring and Cloud Logging to provide visibility into snapshot and seek operations. Set up alerts on subscription age metrics to detect when subscriptions fall behind. Create log-based metrics to track seek operations, helping you understand recovery patterns and improve deployment processes.
Best Practices for Production Use
Create snapshots as part of your deployment pipeline. Before releasing changes to message processing logic, automatically create a snapshot with a name that includes the deployment version and timestamp. This practice ensures you always have a recovery point tied to known-good code versions.
Implement snapshot retention policies that align with your business requirements. If your organization has compliance requirements to replay data for 30 days, configure topic retention accordingly and create snapshots at regular intervals. Document snapshot names and purposes so operations teams understand which snapshots to use for specific recovery scenarios.
Test your snapshot and seek procedures regularly. Create runbooks that document exact commands for common recovery scenarios. Practice seeking to snapshots in non-production environments to verify the process works as expected. During incidents, you want tested procedures, not experimentation.
Monitor snapshot age and expiration. Set up Cloud Monitoring alerts when snapshots approach expiration. If a snapshot represents an important recovery point, create a new snapshot to extend protection before the old one expires. Consider labeling snapshots with metadata indicating their purpose and criticality.
Use descriptive snapshot names that convey meaning. Instead of generic names like snapshot-1 or backup-tuesday, use names like pre-v2-deployment-2024-01-15 or known-good-before-schema-change. Clear naming helps teams make correct decisions during time-sensitive recovery operations.
Cost Considerations
Snapshots in Google Cloud Pub/Sub don't incur additional storage costs beyond the standard message retention charges. You pay for message storage based on the retention duration configured on your topics. Creating multiple snapshots doesn't multiply storage costs because snapshots store references to messages, not duplicate copies.
However, seeking to snapshots or timestamps can affect costs indirectly. When you seek a subscription, previously acknowledged messages become unacknowledged and available for redelivery. Reprocessing these messages counts toward your Pub/Sub throughput and may trigger additional processing costs in downstream services like Dataflow, Cloud Functions, or Cloud Run.
Factor replay costs into your disaster recovery planning. If you might need to replay a full day of messages through an expensive processing pipeline, understand the financial impact before executing the seek operation.
Advanced Snapshot Strategies
For high-value data pipelines, consider implementing a rolling snapshot strategy. Create snapshots on a schedule, such as every hour or after processing specific message volumes. Retain multiple snapshots to provide recovery points at different granularities. This approach gives you flexibility to seek to the most appropriate recovery point based on when problems are detected.
Combine snapshots with dead letter topics for comprehensive error handling. Configure subscriptions with dead letter policies to capture messages that repeatedly fail processing. Create snapshots before draining dead letter subscriptions, allowing you to replay failed messages after fixing processing bugs. This combination ensures no messages are lost even in complex failure scenarios.
In multi-region architectures, coordinate snapshots across regional subscriptions. If you have subscriptions in multiple Google Cloud regions consuming from the same topic, create snapshots simultaneously across regions. This coordination ensures consistent replay behavior if you need to fail over to a different region during recovery.
Verification and Testing Procedures
After seeking to a snapshot, verify the operation succeeded by checking subscription metrics. Use the following command to view the number of unacknowledged messages:
gcloud pubsub subscriptions describe replay-demo-sub \
--format="value(pushConfig.pushEndpoint, ackDeadlineSeconds)"Pull a small number of messages to confirm they match your expectations. Check message timestamps and content to ensure you've rewound to the correct point in the message stream.
Monitor subscription backlog after seeking. A sudden increase in unacknowledged messages indicates the seek operation successfully marked messages for reprocessing. Use Cloud Monitoring dashboards to track the backlog as your processing pipeline consumes the replayed messages.
Summary
You now know how to replay messages in Pub/Sub using snapshots and seek. You've created snapshots to capture subscription states, used seek to revert to those states, and practiced seeking to specific timestamps. These capabilities provide essential recovery mechanisms for production data pipelines, protecting against processing bugs, system failures, and deployment issues.
The skills you've built apply directly to real-world scenarios where data integrity and reliability matter. Whether you're processing financial transactions, IoT telemetry, or user events, snapshots and seek give you confidence to make changes knowing you can recover from problems. These features appear regularly on the Professional Data Engineer exam because they represent critical operational knowledge for managing Google Cloud data platforms.
Readers looking for comprehensive exam preparation covering these topics and many other GCP data engineering concepts can check out the Professional Data Engineer course. You've built practical experience with an important Pub/Sub capability that will serve you in both certification exams and production operations.