Cloud Monitoring Metrics: Built-in, Custom, and External

Understanding the differences between built-in, custom, and external cloud monitoring metrics helps you make better decisions about observability strategy, cost management, and operational complexity.

When you start monitoring applications and infrastructure on Google Cloud, you immediately face a fundamental question: which cloud monitoring metrics should you track? The platform offers three distinct types: built-in metrics that come automatically with GCP services, custom metrics you define yourself, and external metrics pulled from sources outside Google Cloud. Each approach involves specific trade-offs in cost, implementation effort, and operational value that can significantly impact your observability strategy.

This decision matters because monitoring is rarely free. Every metric you collect consumes storage, generates API calls, and potentially triggers alerts that require human attention. Choose poorly, and you might miss critical signals while drowning in noise and unexpected bills. Choose wisely, and you build a monitoring foundation that scales efficiently while surfacing exactly the insights your team needs.

Built-in Metrics: The Foundation Layer

Built-in metrics are automatically collected and provided by Google Cloud services without any configuration on your part. When you launch a Compute Engine instance, start a Cloud SQL database, or deploy a Cloud Run service, Google Cloud immediately begins tracking dozens of predefined metrics specific to that service.

These metrics fall into three categories. Infrastructure metrics include CPU utilization, disk I/O rates, network throughput, and memory consumption. Application metrics capture response times, error rates, request counts, and latency percentiles. System metrics track details like load averages and process counts that reveal overall system health.

For example, a Cloud SQL instance automatically reports metrics like database connections, transaction counts, CPU seconds used, and storage consumption. You access these through Cloud Monitoring without writing a single line of instrumentation code. A furniture retailer running an inventory management system on Cloud SQL can immediately see query latency spikes or connection pool exhaustion without custom instrumentation.

The strengths here are compelling. Built-in metrics require zero setup effort. They follow consistent naming conventions across GCP services, making them predictable once you learn the patterns. Google optimizes their collection and storage, so they perform efficiently even at high cardinality. For services covered by service level objectives, built-in metrics provide the foundation for reliability tracking. The data flows automatically into Cloud Monitoring where you can build dashboards, configure alerts, and analyze trends.

When Built-in Metrics Fall Short

The limitation becomes apparent when you need business context. Built-in metrics tell you that your Cloud Run service is processing 500 requests per second with 99th percentile latency of 240 milliseconds. They can't tell you how many of those requests resulted in completed purchases, abandoned carts, or feature activations.

A subscription box service might see perfectly healthy infrastructure metrics while customer churn accelerates because a recommendation algorithm is returning poor results. The API responds quickly and error rates stay low, but the business outcome suffers. Built-in metrics lack this semantic layer.

Another gap appears in hybrid or multi-cloud architectures. If your payment processing runs partially on Google Cloud and partially on AWS, built-in GCP metrics only cover one side of the system. You lose the unified view needed to understand end-to-end transaction flows.

Custom Metrics: Business and Application Context

Custom metrics let you define and track measurements specific to your application logic or business operations. You instrument your code to emit these metrics, sending them to Cloud Monitoring through the Cloud Monitoring API or through libraries like OpenTelemetry.

These metrics capture what matters to your specific context. A mobile game studio might track metrics like daily active users, session duration, in-game currency spent, level completion rates, and matchmaking wait times. A telehealth platform might measure appointment booking conversion rates, video call quality scores, prescription fulfillment time, and patient satisfaction ratings.

Here's a practical example using Python with the Google Cloud Monitoring client library:


from google.cloud import monitoring_v3
import time

client = monitoring_v3.MetricServiceClient()
project_name = f"projects/{project_id}"

series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/ecommerce/cart_abandonment_rate"
series.resource.type = "global"

point = monitoring_v3.Point()
point.value.double_value = 0.23
point.interval.end_time.seconds = int(time.time())
series.points = [point]

client.create_time_series(name=project_name, time_series=[series])

This code sends a custom metric tracking cart abandonment rate directly to Cloud Monitoring. The metric becomes available for dashboards, alerting, and analysis alongside built-in metrics.

Custom metrics shine when you need application-specific insights. They bridge the gap between technical performance and business outcomes. When your alert fires because the order_completion_rate metric drops below threshold, you know immediately that revenue is at risk, not just that some technical component is unhealthy.

The Cost and Complexity Trade-off

Custom metrics introduce both financial and operational overhead. Google Cloud charges for custom metric ingestion beyond the free tier (currently 150 MB of custom metric data per month per billing account). A high-cardinality custom metric with many unique label combinations can generate substantial costs.

Consider a logistics company tracking package delivery times with labels for origin city, destination city, carrier, and service level. If you have 100 origin cities, 100 destination cities, 5 carriers, and 3 service levels, you potentially create 150,000 unique time series. At 8 bytes per data point written once per minute, this generates roughly 630 GB per month just for this single metric family.

Beyond cost, custom metrics require instrumentation work. Your development team must identify what to measure, add the code to collect and emit metrics, handle errors gracefully when the monitoring API is unavailable, and maintain this instrumentation as the application evolves. This ongoing maintenance burden compounds across teams and services.

There are also performance considerations. Every API call to write custom metrics adds latency to your application path. If not implemented carefully with batching and async operations, metric collection can slow down request processing.

External Metrics: The Multi-Cloud and Hybrid Perspective

External metrics allow you to pull monitoring data from sources outside Google Cloud into Cloud Monitoring. This creates a unified observability platform regardless of where your infrastructure lives.

The most common scenarios involve multi-cloud deployments and hybrid architectures. A financial services company might run core transaction processing on Google Cloud while maintaining legacy systems on-premises. A media streaming platform might use Google Cloud for content delivery but AWS for video transcoding. External metrics let you monitor all these components in one place.

Google Cloud provides the Monitoring Query Language and integrations with tools like Prometheus for external metric collection. You can also use the Cloud Monitoring API to push metrics from anywhere. For multi-cloud scenarios, you might run Prometheus in each environment and configure Cloud Monitoring to scrape those Prometheus endpoints, or you might forward metrics through a centralized collector.

The benefit is operational consolidation. Rather than jumping between the Google Cloud Console, AWS CloudWatch, and on-premises Grafana dashboards, your team works from a single pane of glass. Alert correlation becomes possible across environments. You can track end-to-end flows that span multiple clouds.

Integration Overhead and Data Gravity

External metrics come with integration complexity. You need to set up and maintain the collection infrastructure. Authentication and network connectivity between environments require careful configuration. Each external source may have its own metric naming conventions and data formats that need translation.

Data egress costs can surprise you. If your external metrics originate in AWS and you're pulling significant volumes into Google Cloud, you pay AWS egress charges. A freight company tracking shipments across a hybrid infrastructure might generate hundreds of gigabytes of metrics monthly, leading to substantial cross-cloud data transfer costs.

There's also a question of latency. External metrics travel further and through more systems before reaching Cloud Monitoring. This can delay alert firing compared to built-in or custom metrics generated directly on GCP.

How Cloud Monitoring Handles Metric Types

Cloud Monitoring in Google Cloud provides a unified platform for all three metric types, but the implementation differs meaningfully from traditional monitoring systems and gives you specific architectural advantages.

The service automatically scales metric ingestion without requiring you to provision capacity. Whether you're sending 100 time series or 100,000, Cloud Monitoring adjusts transparently. This differs from self-hosted Prometheus or InfluxDB where you must plan for storage and query performance as metric volume grows.

Built-in metrics benefit from zero-configuration automatic collection. When you create a BigQuery dataset, Cloud Monitoring immediately begins tracking slot usage, query execution times, and storage consumption without any agent installation or API calls. This happens because the metric collection is embedded in the service implementation itself.

For custom and external metrics, Cloud Monitoring offers the Ops Agent for Compute Engine instances. This unified agent collects both system metrics and application logs, replacing the older separate Monitoring and Logging agents. The Ops Agent supports third-party applications through built-in receivers for popular tools like Apache, MySQL, and Nginx. You configure what to collect through a YAML file, and the agent handles batching, retry logic, and efficient transmission to Cloud Monitoring.

The Query Language in Cloud Monitoring supports filtering, aggregating, and joining across all metric types. You can create a dashboard chart that displays built-in Cloud Run request latency alongside a custom metric for order processing time and an external metric from your payment gateway. The language treats all metric types uniformly once they're ingested.

One architectural detail matters for cost management. Cloud Monitoring charges based on the volume of metric data ingested, specifically the size of custom metric time series data and log data beyond the free tier. Built-in metrics are free. This pricing structure incentivizes you to use built-in metrics wherever possible and only create custom metrics when the built-in options don't cover your needs.

The platform also handles metric retention automatically. Built-in and custom metrics are retained for different periods based on the metric type and aggregation level. Recent data is available at full granularity while older data is automatically downsampled. You don't manage this retention or downsampling yourself.

A Realistic Decision Scenario

Consider a solar energy company that operates solar farms across multiple regions. They use Google Cloud for data analytics and monitoring while their solar inverters and sensors run on-premises at each farm location.

They start with built-in metrics from their BigQuery datasets where they analyze production data and from Cloud Functions that process incoming sensor readings. These metrics immediately show them query performance, function execution times, and resource utilization. This covers their Google Cloud infrastructure health.

They add custom metrics for business-specific measurements. A custom metric tracks energy production efficiency calculated as actual output divided by theoretical maximum based on solar irradiance. Another tracks maintenance prediction scores generated by their machine learning model. They implement these using Python code that runs in Cloud Functions, writing approximately 50 unique time series with one data point per minute per solar farm. With 20 farms, this generates about 1.4 GB of custom metric data monthly, staying within reasonable cost boundaries.

For external metrics, they configure Prometheus collectors at each solar farm to gather hardware metrics from inverters, battery systems, and weather sensors. These Prometheus instances expose metrics endpoints that a centralized collector running on GKE scrapes and forwards to Cloud Monitoring. This adds approximately 200 time series per farm, generating another 5.5 GB monthly but providing complete visibility into on-premises hardware.

Their monitoring dashboard combines all three types. A single view shows built-in Cloud Functions error rates, custom business metrics for production efficiency, and external hardware metrics for inverter status. When efficiency drops at a specific farm, the operations team sees both the business impact through custom metrics and the root cause through external hardware metrics, all without switching between tools.

The cost breakdown helps illustrate the trade-offs. Built-in metrics are free. Custom metrics at 1.4 GB monthly cost approximately $2.50 above the free tier. External metrics at 5.5 GB cost roughly $10. The total monitoring cost of about $12.50 monthly is negligible compared to the operational value of unified visibility, but the company remains mindful about metric cardinality. They avoid adding labels that would explode time series counts unnecessarily.

Comparing Your Metric Strategy Options

When deciding which metric types to use, several factors should guide your thinking.

FactorBuilt-in MetricsCustom MetricsExternal Metrics
Implementation EffortZero configuration requiredRequires code instrumentationRequires integration setup
CostFree in Cloud MonitoringCharged beyond free tierCharged plus potential egress fees
CoverageInfrastructure and service healthBusiness logic and application eventsMulti-cloud and on-premises systems
MaintenanceAutomatic updates by GoogleYour team maintains instrumentationYour team maintains collection infrastructure
LatencyImmediate collectionDepends on implementationHigher due to data travel distance
Cardinality ControlPredefined by serviceYour responsibility to manageDepends on external source

Start with built-in metrics as your foundation. They provide comprehensive infrastructure and service-level monitoring without cost or effort. Use them for alerting on resource exhaustion, service degradation, and technical issues.

Add custom metrics selectively when you need business context or application-specific insights that built-in metrics can't provide. Prioritize metrics that directly inform operational decisions or business outcomes. Be disciplined about label cardinality to control costs.

Incorporate external metrics when you have meaningful infrastructure outside Google Cloud that needs unified monitoring. Evaluate whether the integration overhead and ongoing costs justify the operational benefits of consolidated visibility.

Building Your Monitoring Foundation

Understanding the trade-offs between built-in, custom, and external cloud monitoring metrics helps you construct an observability strategy that balances coverage, cost, and operational complexity. Built-in metrics give you free, comprehensive infrastructure monitoring. Custom metrics add critical business context at modest cost when implemented thoughtfully. External metrics unify multi-cloud and hybrid environments when integration overhead is justified.

The key insight is that you don't choose just one approach. Effective monitoring combines all three types strategically based on your specific operational needs and architectural constraints. Start with built-in metrics everywhere possible, layer in custom metrics where business visibility matters, and integrate external metrics when you need cross-environment observability.

For those preparing for Google Cloud certifications, understanding these metric types and their trade-offs appears frequently in exam scenarios. Questions often present monitoring challenges and ask you to recommend the appropriate metric strategy considering cost, complexity, and coverage. The Professional Cloud Architect and Professional Cloud DevOps Engineer exams particularly emphasize these observability decisions. Readers looking for comprehensive exam preparation can check out the Professional Data Engineer course, which covers Cloud Monitoring extensively along with other essential GCP services.

The engineering discipline comes from recognizing that more metrics don't automatically mean better observability. Thoughtful selection based on operational value, combined with awareness of cost and complexity implications, builds monitoring systems that genuinely improve reliability and performance rather than simply generating data.