How to Set Up Log Sinks in Cloud Logging

A comprehensive guide to configuring log sinks in Google Cloud Logging for exporting logs to BigQuery, Cloud Storage, and Pub/Sub for analysis and archival.

Introduction

Learning how to set up log sinks in Cloud Logging is a fundamental skill for managing data at scale on Google Cloud Platform. Log sinks allow you to export logs from various GCP services to different destinations based on filters you define. This capability becomes essential when you need to retain logs for compliance requirements, perform deep analysis on operational data, or trigger automated workflows based on specific log events.

By the end of this tutorial, you'll have configured working log sinks that export logs from Google Cloud services to BigQuery for analysis, Cloud Storage for long-term archiving, and Pub/Sub for real-time processing. These skills are directly applicable to the Professional Data Engineer exam, where understanding log management and data pipeline architectures is critical.

The process involves identifying log sources, creating filters to select specific log entries, and configuring export destinations. You'll work with actual GCP commands and see exactly what successful configurations look like in production environments.

Prerequisites and Requirements

Before you begin setting up log sinks in Cloud Logging, ensure you have the following:

A Google Cloud Platform project with billing enabled. You'll need Owner or Editor role on the project, or specific IAM permissions: logging.sinks.create, logging.sinks.update, and permissions to create resources in the destination services. Install and configure the gcloud CLI on your local machine.

You'll need at least one GCP service running that generates logs, such as Compute Engine, Cloud Run, or Cloud Composer. Prepare a destination: a BigQuery dataset, Cloud Storage bucket, or Pub/Sub topic. Plan for 30 to 45 minutes to complete this tutorial.

If you need to install the gcloud CLI, visit the Google Cloud SDK documentation. Make sure you authenticate using gcloud auth login and set your project with gcloud config set project PROJECT_ID.

Understanding Log Sinks Architecture

Log sinks function as export pipelines within Google Cloud Logging. When you create a log sink, you define three key components: the source of your logs, a filter that determines which logs to export, and a destination where those logs will be sent.

Log sources can be any GCP service including Compute Engine virtual machines, Cloud SQL databases, Cloud Composer workflows, Cloud Run services, or Dataflow pipelines. These services automatically send their logs to Cloud Logging, where they become available for routing.

The destinations for log sinks include BigQuery for running SQL queries against your log data, Cloud Storage for cost-effective long-term retention, Pub/Sub for streaming logs to other systems, and even other Google Cloud projects. Each destination serves different operational needs and compliance requirements.

Step 1: Identify Your Log Sources and Requirements

Before creating log sinks, determine what logs you need to export and why. For this tutorial, we'll work through three common scenarios that represent different business needs.

First, view the logs currently available in your project to understand what you're working with:

gcloud logging logs list --limit=20

This command displays the log types available in your project. You'll see entries like projects/PROJECT_ID/logs/compute.googleapis.com%2Factivity for Compute Engine activity logs or projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity for Cloud Audit logs.

Next, examine some actual log entries to understand their structure:

gcloud logging read "resource.type=gce_instance" --limit=5 --format=json

This shows you sample log entries from Compute Engine instances. Look at the fields available, such as severity, timestamp, resource.labels, and jsonPayload. These fields will be important when creating filters for your log sinks.

Step 2: Create a BigQuery Dataset for Log Analysis

For a freight logistics company tracking delivery vehicle performance, storing logs in BigQuery enables SQL-based analysis of fleet operations. Create a BigQuery dataset to receive exported logs.

Create the dataset using the following command:

bq mk --dataset \
  --location=US \
  --description="Log sink destination for vehicle telemetry analysis" \
  YOUR_PROJECT_ID:fleet_logs

Replace YOUR_PROJECT_ID with your actual Google Cloud project ID. The dataset name fleet_logs will store tables that Cloud Logging automatically creates as logs are exported.

Verify the dataset was created successfully:

bq ls --project_id=YOUR_PROJECT_ID

You should see fleet_logs listed among your datasets. BigQuery will automatically create tables within this dataset based on the log types being exported, with names following the pattern cloudaudit_googleapis_com_activity_YYYYMMDD.

Step 3: Configure a Log Sink to BigQuery

Now you'll create your first log sink. This sink will export application logs with ERROR severity from Cloud Run services to BigQuery, allowing the freight company to analyze delivery application failures.

Create the log sink with this command:

gcloud logging sinks create fleet-app-errors-sink \
  bigquery.googleapis.com/projects/YOUR_PROJECT_ID/datasets/fleet_logs \
  --log-filter='resource.type="cloud_run_revision" AND severity="ERROR"'

This command does several things. The sink name fleet-app-errors-sink identifies this export configuration. The destination URL points to your BigQuery dataset. The filter selects only ERROR level logs from Cloud Run services, reducing costs by exporting only relevant data.

After creating the sink, Cloud Logging returns a service account email address. This account needs permission to write to BigQuery. Copy the service account email from the output, which looks like service-123456789@gcp-sa-logging.iam.gserviceaccount.com.

Grant the necessary permissions:

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:service-123456789@gcp-sa-logging.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

Replace the service account email with the one provided in your sink creation output. This permission allows Cloud Logging to create tables and insert rows into your BigQuery dataset.

Step 4: Create a Cloud Storage Bucket for Log Archival

A telehealth platform needs to retain audit logs for seven years to meet healthcare compliance requirements. Cloud Storage provides cost-effective long-term storage. Set up a bucket for this purpose.

Create the storage bucket:

gsutil mb -c STANDARD -l US -b on gs://telehealth-audit-logs-YOUR_PROJECT_ID/

The -b on flag enables uniform bucket-level access, simplifying permission management. Use a unique bucket name by including your project ID.

Configure lifecycle management to automatically move logs to cheaper storage classes over time:

cat > lifecycle.json << EOF
{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
        "condition": {"age": 30}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
        "condition": {"age": 90}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "ARCHIVE"},
        "condition": {"age": 365}
      }
    ]
  }
}
EOF

gsutil lifecycle set lifecycle.json gs://telehealth-audit-logs-YOUR_PROJECT_ID/

This configuration moves logs to NEARLINE after 30 days, COLDLINE after 90 days, and ARCHIVE after one year, dramatically reducing storage costs while maintaining compliance.

Step 5: Configure a Log Sink to Cloud Storage

Create a log sink that exports all Cloud Audit logs to your Cloud Storage bucket:

gcloud logging sinks create telehealth-audit-archive-sink \
  storage.googleapis.com/telehealth-audit-logs-YOUR_PROJECT_ID \
  --log-filter='logName:"cloudaudit.googleapis.com"'

This filter captures all audit logs including admin activity, data access, and system events. The broad filter ensures complete audit trails for compliance.

Grant the Cloud Logging service account permission to write to your bucket:

gsutil iam ch serviceAccount:service-123456789@gcp-sa-logging.iam.gserviceaccount.com:objectCreator \
  gs://telehealth-audit-logs-YOUR_PROJECT_ID/

Cloud Logging will now create dated directories and JSON files in your bucket containing exported logs. The structure follows the pattern cloudaudit.googleapis.com/activity/YYYY/MM/DD/.

Step 6: Create a Pub/Sub Topic for Real-Time Processing

A solar farm monitoring system needs to trigger immediate alerts when sensor errors occur. Pub/Sub enables real-time log processing. Create a topic to receive these logs:

gcloud pubsub topics create solar-sensor-alerts

Create a subscription to consume messages from this topic:

gcloud pubsub subscriptions create solar-alerts-processor \
  --topic=solar-sensor-alerts \
  --ack-deadline=60

This subscription allows downstream systems like Cloud Functions or Dataflow to process log messages as they arrive. The 60-second acknowledgment deadline gives processors time to handle each message.

Step 7: Configure a Log Sink to Pub/Sub

Create a log sink that streams sensor error logs to Pub/Sub:

gcloud logging sinks create solar-error-stream-sink \
  pubsub.googleapis.com/projects/YOUR_PROJECT_ID/topics/solar-sensor-alerts \
  --log-filter='resource.type="gce_instance" AND resource.labels.instance_id:"sensor-" AND severity>=ERROR'

This filter targets Compute Engine instances with names starting with "sensor-" and captures ERROR and CRITICAL severity logs. The real-time nature of Pub/Sub means alerts can trigger within seconds of an error occurring.

Grant the logging service account permission to publish to your topic:

gcloud pubsub topics add-iam-policy-binding solar-sensor-alerts \
  --member="serviceAccount:service-123456789@gcp-sa-logging.iam.gserviceaccount.com" \
  --role="roles/pubsub.publisher"

Your log sink is now active and will begin streaming matching logs immediately.

Step 8: Verify Log Sink Operation

After creating log sinks, verify they're working correctly. List all sinks in your project:

gcloud logging sinks list

You should see your three sinks listed with their destinations and filters. Check the status of a specific sink:

gcloud logging sinks describe fleet-app-errors-sink

For the BigQuery sink, verify that tables are being created in your dataset. Wait a few minutes for logs to flow, then check:

bq ls fleet_logs

You should see tables appearing with names based on the log types being exported. Query one of these tables:

bq query --use_legacy_sql=false '
SELECT 
  timestamp,
  severity,
  resource.labels.service_name,
  jsonPayload.message
FROM `YOUR_PROJECT_ID.fleet_logs.cloudrun_googleapis_com_stderr_*`
WHERE severity = "ERROR"
ORDER BY timestamp DESC
LIMIT 10'

For Cloud Storage, check that log files are appearing in your bucket:

gsutil ls -r gs://telehealth-audit-logs-YOUR_PROJECT_ID/ | head -20

You should see directories organized by date with JSON log files inside. For Pub/Sub, pull messages from your subscription to verify logs are flowing:

gcloud pubsub subscriptions pull solar-alerts-processor --limit=5 --auto-ack

If logs matching your filter exist, you'll see them displayed as Pub/Sub messages.

Advanced Filter Techniques

Google Cloud Logging supports sophisticated filters that allow precise control over which logs are exported. Understanding these techniques helps optimize costs and focus on relevant data.

Use negative filters to exclude certain logs. For example, exclude health check logs that add noise without value:

gcloud logging sinks create app-logs-no-healthchecks \
  bigquery.googleapis.com/projects/YOUR_PROJECT_ID/datasets/app_logs \
  --log-filter='resource.type="cloud_run_revision" AND NOT jsonPayload.requestUrl=~"/health"'

Combine multiple conditions with AND and OR operators for complex scenarios. A video streaming service might want to capture both high-severity errors and specific user actions:

gcloud logging sinks create streaming-critical-events \
  storage.googleapis.com/streaming-analysis-YOUR_PROJECT_ID \
  --log-filter='(severity>=ERROR) OR (jsonPayload.event_type="subscription_cancelled")'

Filter based on specific label values to segment logs by environment or team. A mobile game studio might separate production from staging logs:

gcloud logging sinks create game-production-logs \
  bigquery.googleapis.com/projects/YOUR_PROJECT_ID/datasets/prod_logs \
  --log-filter='resource.labels.environment="production" AND resource.type="k8s_container"'

These filtering techniques ensure you export only the logs that provide value, controlling storage costs while maintaining necessary visibility.

Real-World Application Examples

Understanding how different organizations use log sinks helps contextualize this GCP feature within practical operations.

A payment processor handling millions of transactions daily uses log sinks to maintain comprehensive audit trails. They export all Cloud SQL query logs to BigQuery for fraud analysis, searching for unusual patterns like rapid successive queries from the same IP address or queries accessing unusually large datasets. Simultaneously, they sink API Gateway logs to Cloud Storage, retaining them for three years to meet financial services regulations. Critical payment failures trigger immediate alerts through a Pub/Sub sink connected to their incident management system.

An agricultural monitoring company uses sensor data from thousands of IoT devices deployed across farms. They configure log sinks to separate operational logs from sensor telemetry. Device health logs flow to BigQuery where data scientists run queries identifying failing sensors before they impact crop monitoring. Weather sensor readings go to a Pub/Sub topic that feeds a Dataflow pipeline, which processes the data and stores it in a time-series optimized format. Audit logs showing configuration changes to sensor networks archive to Cloud Storage with lifecycle policies moving data to Archive storage after six months.

A university research computing platform supports hundreds of researchers running computational workloads on Compute Engine. They use log sinks to track resource usage and cost allocation. VM creation and deletion logs export to BigQuery with tables partitioned by date, enabling monthly cost analysis queries grouped by research department. Application logs from research workloads sink to separate Cloud Storage buckets per research group, giving each team access to only their logs while maintaining centralized compliance controls. Security logs flow through Pub/Sub to a Cloud Function that checks for suspicious activity patterns and posts alerts to a Slack channel monitored by the security team.

Common Issues and Troubleshooting

Several issues commonly arise when setting up log sinks in Cloud Logging. Understanding these problems and their solutions saves debugging time.

If logs aren't appearing in your BigQuery dataset, first verify the service account has the correct permissions. Run this command to check IAM bindings:

gcloud projects get-iam-policy YOUR_PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:service-*@gcp-sa-logging.iam.gserviceaccount.com"

If the service account is missing or lacks roles/bigquery.dataEditor, add the permission as shown in Step 3. Also check that your filter isn't too restrictive. Test your filter by running a logging query:

gcloud logging read 'resource.type="cloud_run_revision" AND severity="ERROR"' --limit=10

If this returns no results, your filter doesn't match any existing logs. Adjust the filter criteria or wait for matching logs to be generated.

For Cloud Storage sinks not writing files, verify bucket permissions using:

gsutil iam get gs://YOUR_BUCKET_NAME/

Look for the logging service account with the roles/storage.objectCreator role. Bucket configuration issues sometimes prevent writes even with correct permissions. Check that the bucket exists and is in a supported location.

When Pub/Sub sinks aren't delivering messages, confirm the logging service account has publisher permissions on the topic:

gcloud pubsub topics get-iam-policy solar-sensor-alerts

Check your subscription pull to ensure messages aren't being consumed elsewhere before you can verify them. Multiple subscribers can receive the same messages, but if acknowledgments happen automatically, messages disappear quickly.

Quota errors can prevent log export when dealing with high-volume logging. Check your quota usage in the Google Cloud Console under IAM & Admin > Quotas. Look for Logging API write quotas and Pub/Sub publish quotas. Request quota increases if you're hitting limits.

Cost Optimization for Log Sinks

Managing costs is essential when exporting logs, especially for high-volume systems. Google Cloud charges for log storage in destinations and for the volume of logs ingested by Cloud Logging.

Use exclusion filters to prevent low-value logs from being stored in Cloud Logging before they reach your sinks. Exclusion filters differ from sink filters because they prevent logs from being ingested at all. Create an exclusion filter for health check logs:

gcloud logging exclusions create exclude-healthchecks \
  --log-filter='resource.type="cloud_run_revision" AND jsonPayload.requestUrl=~"/health"'

This prevents health check logs from consuming your Cloud Logging quota and reduces costs, though it means those logs can't be exported via sinks since they never reach Cloud Logging.

For BigQuery destinations, use partitioned tables and clustering to optimize query costs. Cloud Logging automatically creates date-partitioned tables, but you can create custom schemas with clustering for better performance on large datasets.

Implement retention policies on BigQuery tables to automatically delete old log data:

bq update --time_partitioning_expiration=2592000 \
  YOUR_PROJECT_ID:fleet_logs.cloudrun_googleapis_com_stderr_

This sets a 30-day expiration on partitions, automatically deleting older data. Adjust the expiration time based on your retention requirements and compliance needs.

For Cloud Storage, the lifecycle policies shown in Step 4 automatically reduce costs by moving logs through storage classes. Monitor your storage distribution using:

gsutil du -s -h gs://telehealth-audit-logs-YOUR_PROJECT_ID/

Review this periodically to ensure lifecycle policies are working as expected and logs are transitioning to cheaper storage classes appropriately.

Integration with Other GCP Services

Log sinks integrate with the broader Google Cloud ecosystem, enabling powerful data workflows and operational patterns.

When you export logs to BigQuery, you can join log data with other datasets for comprehensive analysis. A retail company might join application error logs with sales transaction data to identify whether errors correlate with lost revenue:

SELECT 
  DATE(e.timestamp) as error_date,
  COUNT(*) as error_count,
  SUM(t.transaction_amount) as daily_revenue
FROM `project.fleet_logs.cloudrun_googleapis_com_stderr_*` e
LEFT JOIN `project.sales.transactions` t
  ON DATE(e.timestamp) = DATE(t.transaction_time)
WHERE e.severity = 'ERROR'
GROUP BY error_date
ORDER BY error_date DESC

This query reveals whether error spikes coincide with revenue drops, informing prioritization of bug fixes.

Pub/Sub sinks enable event-driven architectures where logs trigger automated responses. Connect a Pub/Sub subscription to a Cloud Function that processes each log message. A climate modeling research platform might trigger automated email alerts when long-running simulations encounter errors:

import base64
import json
from sendgrid import SendGridAPIClient
from sendgrid.helpers.mail import Mail

def process_log_alert(event, context):
    pubsub_message = base64.b64decode(event['data']).decode('utf-8')
    log_entry = json.loads(pubsub_message)
    
    if log_entry.get('severity') == 'ERROR':
        message = Mail(
            from_email='alerts@research.example.com',
            to_emails='oncall@research.example.com',
            subject=f'Simulation Error: {log_entry.get("resource", {}).get("labels", {}).get("instance_id")}',
            html_content=f'

Error detected: {log_entry.get("jsonPayload", {}).get("message")}

' ) sg = SendGridAPIClient(os.environ.get('SENDGRID_API_KEY')) sg.send(message)

Cloud Storage log exports integrate with BigQuery through external tables, allowing SQL queries without importing data. Create an external table pointing to your Cloud Storage logs:

bq mk --external_table_definition=gs://telehealth-audit-logs-YOUR_PROJECT_ID/*.json \
  YOUR_PROJECT_ID:fleet_logs.storage_audit_logs

This creates a BigQuery table that queries JSON files in Cloud Storage directly, useful for ad-hoc analysis of archived logs without paying for BigQuery storage.

Dataflow pipelines can consume logs from Pub/Sub sinks for complex stream processing. An ISP monitoring network performance might use Dataflow to aggregate latency measurements from logs, calculate rolling averages, and write results to BigQuery for dashboard visualization. This creates a real-time analytics pipeline powered by log data.

Best Practices for Production Deployments

Successful production implementations of log sinks follow several important practices that ensure reliability, security, and maintainability.

Organize log sinks by purpose and team ownership. Create separate sinks for security logs, application logs, and infrastructure logs rather than one sink that exports everything. This separation allows different teams to manage their own log analysis without interfering with compliance or security workflows. Use clear naming conventions like security-audit-archive, app-errors-analysis, or infra-performance-metrics.

Implement monitoring for your log sinks to detect export failures. Create a log-based metric that tracks sink errors:

gcloud logging metrics create sink_errors \
  --description="Count of log sink export errors" \
  --log-filter='protoPayload.methodName="google.logging.v2.ConfigServiceV2.UpdateSink" AND protoPayload.status.code!=0'

Set up alerting policies based on this metric to notify your team when sinks fail to export logs, preventing silent data loss.

Document your log retention strategy clearly. Different log types have different retention requirements based on compliance needs, operational value, and cost considerations. Create a retention policy document that specifies how long each log type is retained and where it's stored. Implement this policy through BigQuery partition expiration, Cloud Storage lifecycle rules, and Pub/Sub message retention settings.

Use Infrastructure as Code tools like Terraform to manage log sink configurations. This approach provides version control, peer review, and reproducibility. A Terraform configuration for a log sink looks like this:

resource "google_logging_project_sink" "fleet_app_errors" {
  name        = "fleet-app-errors-sink"
  destination = "bigquery.googleapis.com/projects/${var.project_id}/datasets/fleet_logs"
  filter      = "resource.type=\"cloud_run_revision\" AND severity=\"ERROR\""
  unique_writer_identity = true
}

resource "google_bigquery_dataset_iam_member" "sink_writer" {
  dataset_id = google_bigquery_dataset.fleet_logs.dataset_id
  role       = "roles/bigquery.dataEditor"
  member     = google_logging_project_sink.fleet_app_errors.writer_identity
}

This configuration creates the sink and grants permissions automatically, ensuring consistency across environments.

Regularly audit your log exports to verify they contain expected data. Schedule monthly reviews where you sample exported logs and confirm they match your filter criteria. This catches configuration drift or changes in log formats that might break your filters. Run periodic test queries against your BigQuery log tables to ensure data quality remains high.

Next Steps and Advanced Topics

After mastering basic log sink configuration, several advanced topics expand your capabilities with Google Cloud Logging.

Explore aggregated exports that combine logs from multiple projects into a single destination. This becomes important for organizations with many projects who need centralized log analysis. Create an aggregated sink at the organization or folder level using the gcloud logging sinks create command with the --organization or --folder flag instead of running it at the project level.

Investigate log-based metrics that extract numeric values from logs and expose them as Cloud Monitoring metrics. These metrics enable alerting on patterns found in logs without exporting data. For example, create a metric counting API errors per endpoint, then alert when any endpoint exceeds a threshold error rate.

Study the Cloud Logging API for programmatic log sink management. The API allows automated creation and modification of sinks based on application deployment patterns. A continuous deployment system might automatically create log sinks for each new microservice deployed, ensuring consistent log collection across your infrastructure.

Learn about log sampling and exclusion strategies for extremely high-volume logging scenarios. When dealing with millions of log entries per second, exporting everything becomes prohibitively expensive. Sampling techniques allow statistical analysis while dramatically reducing costs.

Examine cross-project and cross-organization log aggregation patterns for large enterprises. These architectures centralize log management while maintaining appropriate access controls and cost attribution per team or business unit.

Summary

You have successfully learned how to set up log sinks in Cloud Logging to export GCP logs to multiple destinations. You created log sinks that export to BigQuery for analysis, Cloud Storage for long-term archival, and Pub/Sub for real-time processing. You configured appropriate filters to control which logs are exported, granted necessary permissions to Cloud Logging service accounts, and verified that logs flow correctly to each destination.

These skills enable you to meet compliance requirements through appropriate log retention, perform deep analysis on operational data to improve system reliability, and build event-driven architectures that respond to log patterns in real time. The practical experience with gcloud commands, IAM permissions, and filter syntax prepares you for real-world data engineering challenges on Google Cloud Platform.

Understanding log sinks is fundamental for the Professional Data Engineer certification, appearing in questions about data pipeline architectures, compliance implementations, and operational monitoring. Readers looking for comprehensive exam preparation that covers log management along with the full range of Google Cloud data engineering topics can check out the Professional Data Engineer course. You now have working implementations you can adapt for your own projects and the knowledge to design sophisticated log management strategies for organizations of any size.