Cross-Project Data Sharing in BigQuery: When and How

This guide explains the key architectural decisions for cross-project data sharing in BigQuery, comparing authorized datasets against direct project access with real-world scenarios and certification exam insights.

When you work with BigQuery in a multi-team organization, you quickly encounter a fundamental question: how should you share data across different Google Cloud projects? Cross-project data sharing in BigQuery isn't just about granting access. It's about choosing the right pattern that balances security, governance, cost visibility, and operational simplicity. This decision shapes how your teams collaborate, how you track spending, and how you maintain control over sensitive information.

The core challenge comes down to this: should you grant users direct access to datasets in other projects, or should you create authorized views and datasets that act as controlled gateways? Each approach carries distinct implications for your data architecture, and understanding these trade-offs helps you build systems that scale properly as your organization grows.

Direct Cross-Project Access

The simplest way to enable cross-project data sharing in BigQuery is through direct access. With this approach, you grant Identity and Access Management (IAM) permissions at the dataset level, allowing users in one project to query tables that physically reside in another project.

When a data analyst in Project A needs to query tables from Project B, you add their user identity or group to the dataset's IAM policy in Project B with the BigQuery Data Viewer role. The analyst can then reference those tables directly using the fully qualified table name format.

Here's what that looks like in practice:


SELECT 
  customer_id,
  order_date,
  total_amount
FROM `production-data-project.sales.orders`
WHERE order_date >= '2024-01-01'
ORDER BY order_date DESC;

The query runs from the analyst's own project, but it directly accesses the table in the production data project. BigQuery charges the query costs to whichever project executes the query, which in this case would be the analyst's project.

This approach works well when you have straightforward access requirements. A financial analytics team might need read access to the entire sales dataset, and there's no need for complex filtering or transformation logic. Direct access is transparent, easy to understand, and simple to audit through Cloud IAM permissions.

When Direct Access Makes Sense

Direct cross-project access shines in scenarios where trust boundaries align with project boundaries. If your data engineering team maintains a central data warehouse project and various analytical teams need full visibility into specific datasets, granting direct access reduces operational overhead.

Consider a logistics company that operates a fleet management system. The central data platform team maintains all sensor readings, GPS coordinates, and maintenance records in a dedicated GCP project. The route optimization team, the predictive maintenance team, and the customer service team all need access to this raw data. Each team works in their own project with their own budgets and quotas.

Granting direct dataset access means each team can write queries that fit their specific needs without requiring the central team to anticipate every use case. The route optimization team might join vehicle location data with weather information, while the maintenance team correlates sensor readings with repair history. Both teams query the same underlying tables but in completely different ways.

Drawbacks of Direct Access

Direct access creates challenges when you need fine-grained control over what data users can see. Suppose your sales dataset includes customer credit card information, discount codes negotiated with specific enterprise customers, or internal cost structures that should remain confidential. With direct dataset access, you face an all-or-nothing proposition. Users either see everything in the table or nothing.

You might try to solve this by creating separate tables for sensitive and non-sensitive data, but this fragments your data model and creates maintenance burden. Every time the schema evolves, you need to update multiple tables. When analysts need to join data across these split tables, they're forced to write more complex queries.

Cost allocation also becomes murky. When Project A queries data stored in Project B, BigQuery charges the query processing costs to Project A. The storage costs remain with Project B. This split creates accounting complexity, especially when you need to justify infrastructure spending across departments. The team that maintains the data doesn't see the full cost of providing that data to downstream consumers.

Here's an example that illustrates the security limitation:


-- An analyst with direct access sees ALL columns
SELECT 
  customer_id,
  customer_name,
  email,
  credit_card_last_four,  -- Sensitive field
  internal_discount_tier,  -- Sensitive field
  lifetime_value
FROM `production-data-project.crm.customers`
WHERE region = 'US-WEST';

If you grant BigQuery Data Viewer on this dataset, users can run that query and see the sensitive columns. You cannot selectively hide columns with IAM policies alone.

Authorized Datasets and Views

The alternative approach uses authorized views (or authorized datasets) to create controlled access points. With this pattern, you define views in a separate dataset that expose only the columns and rows that users should see. You then authorize those views to access the underlying sensitive tables, but users only receive access to the views themselves.

Think of authorized views as a security layer that sits between users and your raw data tables. The view definition determines exactly what data gets exposed. Users query the view without needing any permissions on the underlying base tables.

Here's how you might structure this for the customer data scenario:


-- View created in shared-analytics-project.public_views.customers
CREATE VIEW `shared-analytics-project.public_views.customers` AS
SELECT 
  customer_id,
  customer_name,
  email,
  lifetime_value,
  region
FROM `production-data-project.crm.customers`
WHERE region IN ('US-WEST', 'US-EAST');

The view excludes the sensitive credit_card_last_four and internal_discount_tier columns entirely. It also filters rows to only US customers, hiding international customer records. Analysts receive BigQuery Data Viewer permissions on the shared-analytics-project.public_views dataset but have zero access to the underlying production-data-project.crm dataset.

For this to work, you must authorize the view to access the base table. In the BigQuery console or through the API, you add the view to the base dataset's authorized views list. BigQuery then allows the view to query the base table even when the end user cannot.

Benefits Beyond Security

Authorized views solve the column-level and row-level security problems, but they offer additional advantages. They provide a stable interface for downstream users even as the underlying schema evolves. If you need to rename columns or restructure tables, you can update the view definition without breaking every query that depends on that data.

They also enable cost consolidation. When users query authorized views, you can structure your project architecture so that query costs get charged to a centralized analytics project rather than scattering costs across every consuming team. This gives finance teams clearer visibility into data platform spending.

Consider a healthcare network that operates multiple hospitals, each with its own Google Cloud project for operational systems. A central research project needs to aggregate patient outcomes data for clinical studies, but privacy regulations require strict de-identification. Authorized views in a shared research project can expose de-identified patient records without giving researchers any access to the production hospital systems.

How BigQuery's Architecture Supports Cross-Project Sharing

BigQuery's separation of storage and compute fundamentally enables flexible cross-project data sharing. Unlike traditional database systems where the data and the query engine are tightly coupled to a single instance, BigQuery stores data in Google Cloud's distributed storage layer while query execution happens in a separate, automatically scaled compute layer.

When you query a table in another project, BigQuery doesn't copy data or establish network connections between project-specific database instances. Instead, the query engine in your project reads directly from the shared storage layer where the data physically resides. This architecture makes cross-project queries just as efficient as same-project queries. There's no performance penalty for referencing tables across project boundaries.

The project that executes the query pays for the compute resources (the amount of data scanned or the slot time consumed), while the project that owns the table pays for storage. This clean separation means you can organize projects by business unit, cost center, or security boundary without worrying about data movement overhead.

BigQuery also integrates cross-project access with Google Cloud's IAM system rather than maintaining a separate permission model. This means your existing organizational policies, groups, and service accounts work seamlessly across projects. When you grant a Google Group access to a dataset, any member of that group can query the data regardless of which project they're working in.

The authorized dataset feature extends this further by allowing entire datasets to act as trusted execution contexts. When you authorize Dataset B to access Dataset A's tables, any view defined in Dataset B can query Dataset A, even if the users querying those views have no direct permissions on Dataset A. This enables sophisticated data governance patterns that would require complex middleware in traditional database systems.

Detailed Scenario: A Subscription Box Service

Let's walk through a concrete example with a subscription box service that delivers curated beauty products monthly. The company operates three GCP projects:

  • production-commerce: Contains operational data including customer subscriptions, payment processing, and fulfillment tracking
  • marketing-analytics: Used by the marketing team to analyze campaign performance and customer behavior
  • finance-reporting: Used by the finance team for revenue recognition and forecasting

The company faces a common challenge. Marketing needs to understand which campaigns drive the highest lifetime value customers. Finance needs accurate revenue data but should not see individual customer names or email addresses. Both teams need access to subscription data, but with different columns and aggregation levels.

The initial approach uses direct access. The data engineering team grants the marketing and finance teams BigQuery Data Viewer on the production-commerce.subscriptions dataset. Both teams can now query subscription tables directly.

Problems emerge quickly. Marketing analysts accidentally query the entire customer table without proper WHERE clauses, generating thousands of dollars in unnecessary query costs charged to their project. Finance team members can see customer email addresses in their reports, creating compliance concerns. When the data team needs to refactor the subscription schema to support a new product line, they break queries in both downstream projects.

The team rebuilds the architecture using authorized views. They create two new datasets:

  • shared-analytics.marketing_views
  • shared-analytics.finance_views

For marketing, they create this view:


CREATE VIEW `shared-analytics.marketing_views.customer_subscription_summary` AS
SELECT 
  c.customer_id,
  c.acquisition_channel,
  c.signup_date,
  COUNT(DISTINCT s.subscription_id) as total_subscriptions,
  SUM(s.box_value) as lifetime_value,
  MAX(s.subscription_end_date) as last_subscription_date,
  DATE_DIFF(CURRENT_DATE(), c.signup_date, DAY) as customer_age_days
FROM `production-commerce.core.customers` c
LEFT JOIN `production-commerce.core.subscriptions` s 
  ON c.customer_id = s.customer_id
WHERE c.signup_date >= '2023-01-01'
GROUP BY c.customer_id, c.acquisition_channel, c.signup_date;

This view aggregates data at the customer level, eliminating the need for marketing analysts to write complex joins. It excludes personal identifiable information like names and email addresses. It also pre-filters to recent customers, reducing the data volume that marketing queries touch.

For finance, they create a different view:


CREATE VIEW `shared-analytics.finance_views.monthly_revenue` AS
SELECT 
  DATE_TRUNC(s.billing_date, MONTH) as revenue_month,
  s.subscription_tier,
  COUNT(DISTINCT s.subscription_id) as active_subscriptions,
  SUM(s.box_value) as gross_revenue,
  SUM(s.shipping_cost) as shipping_costs,
  SUM(s.box_value - s.shipping_cost) as net_revenue
FROM `production-commerce.core.subscriptions` s
WHERE s.subscription_status = 'ACTIVE'
GROUP BY revenue_month, subscription_tier;

Finance gets pre-aggregated monthly data with no customer-level details at all. This satisfies their reporting needs while ensuring they cannot access individual customer information.

Both views are authorized to access the base tables in production-commerce. Marketing and finance teams receive permissions only on their respective view datasets. Query costs now flow to the shared-analytics project, giving the central data team visibility into total analytics spending. When the schema changes, the data team updates the view definitions, and downstream queries continue working without modification.

Comparison: Choosing Your Approach

The decision between direct access and authorized views depends on several factors:

FactorDirect AccessAuthorized Views
Security granularityDataset and table level onlyColumn and row level control
Setup complexitySimple IAM grantsRequires view creation and authorization
Schema evolutionBreaking changes impact all usersViews provide interface stability
Cost visibilityCosts scatter across consuming projectsCan consolidate to shared project
Query flexibilityUsers write any queryLimited to view definition
Best forTrusted teams needing full dataset accessControlled access to sensitive data

Use direct access when you have strong trust boundaries and minimal need for fine-grained security. A central data platform team sharing curated datasets with a small number of trusted analytical teams fits this pattern well. The operational simplicity outweighs the limited security controls.

Choose authorized views when you need to share subsets of data, hide sensitive columns, or maintain a stable interface despite underlying schema changes. Any scenario involving external partners, regulatory compliance requirements, or cost allocation across business units pushes you toward authorized views.

Many organizations use both patterns simultaneously. Highly trusted internal teams get direct access to raw data for exploratory analysis, while broader populations access the same data through carefully designed views.

Relevance to Google Cloud Certification Exams

Cross-project data sharing in BigQuery appears in the Professional Data Engineer certification and sometimes in the Professional Cloud Architect exam. You might encounter scenarios that test your understanding of when to recommend each access pattern.

A typical exam question might present this scenario: A financial services company maintains customer transaction data in a production project. The compliance team requires that data analysts can query transaction amounts and timestamps but cannot see customer names or account numbers. The analytics team needs to join transaction data with marketing campaign data stored in a separate project. Which approach should you recommend?

The correct answer involves authorized views. You would create views in a shared analytics project that expose only the permitted columns from the transaction table, authorize those views to access the production dataset, and grant the analytics team access to the views. Direct access would fail the security requirement because BigQuery Data Viewer grants cannot hide specific columns.

Exam questions sometimes test cost implications too. If a question asks how to ensure that query costs for shared data appear in the central data platform budget rather than individual team budgets, you need to recognize that query costs go to the project executing the query. Consolidating queries through views in a centrally managed project shifts those costs appropriately.

The Professional Cloud Architect exam may test cross-project data sharing in the context of organizational structure and project hierarchy. You might need to recommend how to organize projects across business units while enabling data sharing, balancing security isolation with analytical collaboration.

Conclusion

Cross-project data sharing in BigQuery forces you to think carefully about the boundaries between convenience and control. Direct access offers simplicity and flexibility, making it ideal when trust levels are high and security requirements are straightforward. Authorized views add complexity but unlock fine-grained security, interface stability, and better cost management.

The architecture you choose shapes how your teams collaborate and how your data platform scales. As organizations grow, the authorized view pattern tends to win out because the benefits of controlled access and stable interfaces outweigh the operational overhead. But smaller teams or tightly integrated organizations often find that direct access serves them well, at least initially.

The key is recognizing that this decision isn't permanent. BigQuery's flexible architecture lets you start with direct access and migrate to authorized views as security or governance requirements evolve. Understanding both patterns and their trade-offs helps you make informed decisions that serve your organization's current needs while leaving room to adapt as those needs change.