How to Use UNNEST in BigQuery for Nested Fields

Master BigQuery's UNNEST function to work with nested and repeated fields. This hands-on tutorial shows you how to query denormalized data structures efficiently.

If you're preparing for the Google Cloud Professional Data Engineer exam, understanding how to use UNNEST in BigQuery is essential. This tutorial walks you through querying nested and repeated fields, a critical skill for working with denormalized data in Google Cloud's data warehouse. By the end of this guide, you'll be able to write queries that efficiently extract data from complex nested structures without relying on expensive JOIN operations.

BigQuery's approach to data storage differs from traditional relational databases. The platform encourages denormalization, storing related data within single records using nested and repeated fields. This design takes full advantage of BigQuery's columnar storage architecture, reducing query costs and improving performance. The UNNEST function is your tool for flattening these nested structures when you need to analyze the data inside them.

Why BigQuery Uses Nested and Repeated Fields

Traditional normalized databases spread related data across multiple tables, requiring JOIN operations to bring information together. These joins become computationally expensive as datasets grow larger.

BigQuery solves this problem through denormalization. By storing related data as nested structures within a single table, you eliminate the need for many joins. The columnar storage format means BigQuery only scans the specific columns your query needs, making denormalized tables surprisingly efficient.

Nested structures in BigQuery use the RECORD or STRUCT data type. Think of these as tables within tables. A single row can contain multiple subrows of related information, keeping everything together in a compact format. This is where the UNNEST function becomes crucial for your GCP data engineering work.

Prerequisites and Requirements

Before starting this tutorial, you'll need access to a Google Cloud Platform project with BigQuery enabled, BigQuery Data Viewer role or higher permissions, and basic SQL query knowledge. You should also be familiar with the BigQuery console or command line interface. Estimated time is 30 to 45 minutes.

You don't need to create any tables for this tutorial. We'll use public datasets provided by Google Cloud to demonstrate UNNEST functionality.

Understanding the Structure of Nested Data

A single row in a denormalized weather table contains geographic coordinates (latitude and longitude), a forecast column containing a nested table, and within the forecast, multiple records for different hours, each with timestamp and temperature data.

This structure means one row represents a location with all its hourly forecasts stored internally. Without UNNEST, you cannot directly query the data inside those nested forecast records.

Step 1: Query a Nested Field Without UNNEST

First, let's see what happens when you query a table with nested fields without using UNNEST. Open the BigQuery console in your Google Cloud project and run this query:

SELECT
  geography,
  forecast
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`
LIMIT 10;

This query returns rows where the forecast column appears as a complex object. You'll see the nested structure displayed, but you cannot filter or aggregate the data inside those nested records directly. To work with the individual hourly forecasts, you need UNNEST.

Step 2: Use UNNEST to Flatten Nested Data

Now let's flatten the nested forecast data using UNNEST. This function expands the nested array into individual rows, one for each element in the array:

SELECT
  geography,
  forecast_data.hours,
  forecast_data.time AS forecast_time,
  forecast_data.temperature_2m_above_ground
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`,
  UNNEST(forecast) AS forecast_data
LIMIT 20;

This query does several important things. The UNNEST(forecast) function expands the nested forecast array. We assign an alias (forecast_data) to reference the unnested elements. We can now access individual fields within the nested structure using dot notation. Each nested record becomes its own row in the result set.

Run this query in the BigQuery console. You'll see the nested data flattened into a traditional tabular format where each hour of forecast data appears as a separate row.

Step 3: Filter Unnested Data for Specific Values

The real power of UNNEST becomes apparent when you need to filter nested data. Let's retrieve only the 10:00 AM forecasts across all locations:

SELECT
  geography.point.latitude,
  geography.point.longitude,
  forecast_data.time AS forecast_time,
  forecast_data.temperature_2m_above_ground AS temp_celsius
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`,
  UNNEST(forecast) AS forecast_data
WHERE
  forecast_data.hours = 10
LIMIT 100;

This query demonstrates how UNNEST enables filtering on nested fields. Without denormalization and UNNEST, you would need to join a locations table with a separate forecasts table, scanning both tables completely. With this approach, BigQuery only processes the relevant columns and can apply the filter efficiently.

The WHERE clause filters after unnesting, selecting only records where the hours field equals 10. This gives you a clean result set showing 10:00 AM forecasts for various geographic points.

Step 4: Aggregate Data from Nested Fields

You can also perform aggregations on unnested data. Let's calculate the average temperature across all hours for each location:

SELECT
  geography.point.latitude,
  geography.point.longitude,
  AVG(forecast_data.temperature_2m_above_ground) AS avg_temp,
  COUNT(forecast_data.hours) AS forecast_count
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`,
  UNNEST(forecast) AS forecast_data
GROUP BY
  geography.point.latitude,
  geography.point.longitude
LIMIT 50;

This query unnests the forecast data and then groups by location, calculating average temperature and counting forecast hours. The UNNEST operation happens before the aggregation, allowing BigQuery to treat each nested record as a separate row for calculation purposes.

Real-World Application Examples

Understanding UNNEST is crucial for many practical Google Cloud data engineering scenarios. Here are concrete examples across different industries.

Mobile Game Studio Analytics

A mobile game studio stores player session data with nested arrays of in-game events. Each player record contains their profile information plus an array of actions (purchases, level completions, achievements) during gameplay sessions. Using UNNEST, data engineers can analyze which specific game events correlate with player retention:

SELECT
  player_id,
  event.event_type,
  event.timestamp,
  event.value
FROM
  game_analytics.player_sessions,
  UNNEST(events) AS event
WHERE
  event.event_type = 'level_completed'
  AND DATE(event.timestamp) = CURRENT_DATE();

This pattern avoids creating separate tables for players and events, reducing join operations and query costs in the GCP environment.

Telehealth Platform Patient Monitoring

A telehealth platform stores patient vital signs with nested repeated measurements. Each patient visit record includes an array of blood pressure readings taken throughout the appointment. Medical analysts can use UNNEST to identify patients with concerning trends:

SELECT
  patient_id,
  visit_date,
  reading.measurement_time,
  reading.systolic,
  reading.diastolic
FROM
  healthcare.patient_visits,
  UNNEST(bp_readings) AS reading
WHERE
  reading.systolic > 140
  OR reading.diastolic > 90;

The denormalized structure keeps all visit-related data together while UNNEST enables analysis of individual measurements.

Smart Building IoT Sensor Data

A smart building management system collects sensor data with nested arrays of measurements. Each sensor device record contains metadata plus an array of hourly readings for temperature, humidity, and occupancy. Facility managers can query specific sensor readings using UNNEST to optimize HVAC systems:

SELECT
  building_id,
  floor,
  sensor_id,
  measurement.hour,
  measurement.temperature,
  measurement.occupancy_count
FROM
  facilities.sensor_data,
  UNNEST(measurements) AS measurement
WHERE
  measurement.temperature > 24
  AND measurement.occupancy_count > 10;

This Google Cloud implementation processes sensor data efficiently without maintaining separate tables for devices and readings.

Verification and Testing

After running queries with UNNEST, verify your results are correct. Check that the unnested query returns more rows than the original nested table if you're expanding arrays. Ensure all nested records appear in your results. Verify that WHERE clauses applied to unnested fields return expected values.

You can validate row multiplication by comparing these two queries:

-- Count of parent records
SELECT COUNT(*) FROM `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`;

-- Count after unnesting
SELECT COUNT(*)
FROM `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`,
  UNNEST(forecast) AS forecast_data;

The unnested count should be significantly higher, reflecting the expansion of nested arrays into individual rows.

Common Issues and Troubleshooting

When working with UNNEST in BigQuery, you may encounter these challenges.

Issue: Query Returns No Results After UNNEST

If your query returns empty results after adding UNNEST, the nested array might be empty or null for some records. Use a LEFT JOIN variation with UNNEST to preserve parent rows:

SELECT
  geography,
  forecast_data.hours
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`
LEFT JOIN
  UNNEST(forecast) AS forecast_data;

This syntax keeps parent records even when their nested arrays are empty.

Issue: Cannot Access Nested Field Names

If you get errors about unrecognized field names, check the schema carefully. In the BigQuery console, expand the table schema to see the exact field names within nested structures. Nested fields require dot notation: forecast_data.hours not just hours.

Issue: Query Performance Degradation

Unnesting very large arrays can multiply row counts significantly, impacting performance. Add WHERE clauses that filter on parent table columns before unnesting when possible. This reduces the number of arrays BigQuery needs to expand.

Working with Multiple Nested Levels

Sometimes you'll encounter data with multiple levels of nesting. You can chain UNNEST operations to flatten deeply nested structures. Suppose you have a table where each forecast contains nested hourly data, and each hour contains nested location-specific adjustments:

SELECT
  location,
  hour_data.hour,
  adjustment.factor,
  adjustment.reason
FROM
  weather.complex_forecasts,
  UNNEST(forecasts) AS hour_data,
  UNNEST(hour_data.adjustments) AS adjustment
WHERE
  adjustment.factor > 1.5;

Each UNNEST operation flattens one level of the nested structure. The order matters because later UNNEST operations reference aliases from earlier ones.

Best Practices for UNNEST in Production

When implementing UNNEST queries in production Google Cloud environments, follow these recommendations.

Use partitioning on date or timestamp fields to reduce the data scanned. UNNEST queries can become expensive on large tables, and partitioning helps BigQuery skip irrelevant data.

Select only needed columns. BigQuery's columnar storage is most efficient when you select specific columns rather than using SELECT *. This is especially important with nested data where you might only need a few fields from large nested structures.

Use clustering on frequently filtered fields. If you regularly filter unnested results by certain fields, cluster your table on those fields to improve query performance and reduce costs.

Consider materialized views. If you frequently unnest the same data, create a materialized view with the unnested structure. This preprocesses the unnesting operation and can significantly speed up queries.

Monitor query costs. Use the query validator in the BigQuery console to check how much data your UNNEST queries will process before running them. Denormalization saves costs compared to joins, but inefficient UNNEST queries can still be expensive.

Integration with Other GCP Services

UNNEST functionality integrates naturally with other Google Cloud Platform services in data pipeline architectures.

When streaming data into BigQuery from Dataflow, you can create nested structures in your pipeline code before inserting records. This preprocessing in Dataflow reduces the need for complex transformations in BigQuery later.

Trigger Cloud Functions based on BigQuery query results that use UNNEST. For example, a scheduled query could unnest IoT sensor data, identify anomalies, and trigger a Cloud Function to send alerts.

Build dashboards in Data Studio using BigQuery views that include UNNEST operations. This allows business users to visualize nested data without writing complex SQL themselves.

Orchestrate data pipelines with Cloud Composer (managed Apache Airflow) that include BigQuery queries with UNNEST as transformation steps. This is common in ETL workflows where source data arrives in nested formats.

Advanced UNNEST Techniques

As you become more comfortable with UNNEST in BigQuery, consider these advanced patterns.

Using UNNEST with ARRAY_AGG

You can unnest arrays and then re-aggregate them in different ways. This is useful for reshaping data:

SELECT
  geography,
  ARRAY_AGG(STRUCT(forecast_data.hours, forecast_data.temperature_2m_above_ground)
    ORDER BY forecast_data.hours) AS sorted_forecasts
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`,
  UNNEST(forecast) AS forecast_data
WHERE
  forecast_data.temperature_2m_above_ground > 20
GROUP BY
  geography;

This query unnests forecasts, filters for warm temperatures, and rebuilds a nested array sorted by hour.

Conditional UNNEST with CASE Statements

Apply conditional logic to unnested data for complex transformations:

SELECT
  geography,
  forecast_data.hours,
  CASE
    WHEN forecast_data.temperature_2m_above_ground > 30 THEN 'hot'
    WHEN forecast_data.temperature_2m_above_ground > 20 THEN 'warm'
    WHEN forecast_data.temperature_2m_above_ground > 10 THEN 'mild'
    ELSE 'cold'
  END AS temp_category
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`,
  UNNEST(forecast) AS forecast_data;

This categorizes temperatures after unnesting, useful for creating derived fields in denormalized GCP data warehouses.

Performance Comparison: UNNEST vs JOIN

To understand why Google Cloud recommends denormalization with UNNEST, consider the performance difference. In a normalized schema, you might have a locations table and a separate forecasts table with a foreign key relationship.

A typical JOIN query would look like:

SELECT
  l.geography,
  f.hours,
  f.temperature
FROM
  locations l
JOIN
  forecasts f ON l.location_id = f.location_id
WHERE
  f.hours = 10;

BigQuery must scan both tables completely, match records based on the join key, and then filter. With large datasets, this becomes expensive.

The equivalent denormalized query with UNNEST:

SELECT
  geography,
  forecast_data.hours,
  forecast_data.temperature_2m_above_ground
FROM
  `bigquery-public-data.noaa_global_forecast_system.NOAA_GFS0P25`,
  UNNEST(forecast) AS forecast_data
WHERE
  forecast_data.hours = 10;

BigQuery scans one table, expands the nested array in memory, and applies the filter. The columnar storage means only the necessary columns are read. This typically processes less data and costs less to run.

Next Steps and Enhancements

Now that you understand how to use UNNEST in BigQuery, explore these related capabilities.

Learn to create nested structures using ARRAY_AGG, STRUCT, and related functions. This lets you build denormalized tables from normalized sources.

Combine UNNEST with window functions like ROW_NUMBER() and LAG() to perform advanced analytics on nested time series data.

Use procedural language features to build dynamic queries that unnest different nested fields based on runtime conditions.

Study the query execution plan in BigQuery to understand how UNNEST operations affect performance and cost in your specific use cases.

The Google Cloud documentation on arrays and structs provides additional examples and reference material for working with nested data types.

Cost Optimization for UNNEST Queries

While UNNEST queries are generally more efficient than joins, you can further optimize costs in your GCP environment.

Use query result caching. BigQuery caches query results for 24 hours. Identical UNNEST queries run by different users or processes will use cached results at no cost.

Partition by date. If your nested data includes timestamps, partition tables by date. Filter on the partition column before unnesting to minimize data processed.

Set up slot reservations. For predictable workloads with many UNNEST queries, flat-rate pricing through slot reservations can be more economical than on-demand pricing.

Use incremental queries. Instead of unnesting entire tables daily, query only new or changed records using timestamps or incremental load patterns.

Summary

You've learned how to use UNNEST in BigQuery to work with nested and repeated fields effectively. This tutorial covered the fundamentals of denormalized data structures in Google Cloud, walked through practical examples of flattening nested arrays, and showed you how to filter and aggregate unnested data. You can now query complex nested structures efficiently, understanding why BigQuery's approach differs from traditional relational databases.

The skills you've developed here are essential for the Professional Data Engineer exam and for real-world data engineering work in GCP. You've seen how denormalization with nested fields reduces query costs compared to normalized schemas with joins, and you understand when and how to apply UNNEST to extract insights from complex data structures.

Readers looking for comprehensive exam preparation covering UNNEST, BigQuery optimization, and all other Professional Data Engineer topics can check out the Professional Data Engineer course. Keep practicing with public datasets in BigQuery to reinforce these concepts and explore more advanced nested data patterns in your Google Cloud projects.