BigQuery
Updated: September 10, 2025Categories: Query, Analytics
Printed from:
BigQuery SQL Cheatsheet: Petabyte-Scale Analytics
1. Connection and Basic Setup
gcloud CLI Authentication
Bash
12345678910# Install gcloud CLI
curl https://sdk.cloud.google.com | bash
# Initialize and authenticate
gcloud init
gcloud auth login
# Set active project
gcloud config set project YOUR_PROJECT_ID
Web UI Access
- Navigate to https://console.cloud.google.com/bigquery
- Select project from dropdown
- Use web interface for query writing, table management
2. Project, Dataset, and Table Operations
Create Dataset
SQL
1234567-- Standard SQL
CREATE SCHEMA `project.dataset`
OPTIONS (
description = 'My analytics dataset',
labels = [('environment', 'production')]
);
Create Table
SQL
12345678910111213-- Create table with schema
CREATE TABLE `project.dataset.table` (
id INT64,
name STRING,
created_at TIMESTAMP
);
-- Create table from query
CREATE TABLE `project.dataset.derived_table` AS
SELECT column1, column2
FROM `source_table`
WHERE condition;
3. BigQuery-Specific Data Types
STRUCT
SQL
12345678910-- Creating a STRUCT
SELECT STRUCT(
'John' AS first_name,
'Doe' AS last_name
) AS person;
-- Accessing STRUCT fields
SELECT person.first_name
FROM table;
ARRAY
SQL
1234567-- Creating an ARRAY
SELECT ['apple', 'banana', 'cherry'] AS fruits;
-- Unnesting ARRAY
SELECT fruit
FROM table, UNNEST(fruits) AS fruit;
GEOGRAPHY
SQL
1234567-- Creating a geographic point
SELECT GEOGRAPHY('POINT(-122.4194 37.7749)') AS san_francisco;
-- Calculating distance
SELECT ST_DISTANCE(point1, point2) AS distance
FROM geo_table;
4. Standard SQL Operations
SELECT with Advanced Filtering
SQL
12345678910SELECT
column1,
column2
FROM `project.dataset.table`
WHERE
column1 IS NOT NULL
AND column2 BETWEEN 10 AND 100
ORDER BY column1 DESC
LIMIT 1000;
INSERT (Streaming)
SQL
12345INSERT INTO `project.dataset.table`
(id, name, created_at)
VALUES
(1, 'Test', CURRENT_TIMESTAMP());
UPDATE
SQL
123456UPDATE `project.dataset.table`
SET
status = 'active',
updated_at = CURRENT_TIMESTAMP()
WHERE id = 123;
DELETE
SQL
1234DELETE FROM `project.dataset.table`
WHERE status = 'inactive'
AND created_at < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY);
5. Analytics Functions
Window Functions
SQL
12345678-- Ranking within partitions
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;
Statistical Functions
SQL
1234567-- Advanced statistical analysis
SELECT
AVG(value) AS mean,
STDDEV(value) AS standard_deviation,
APPROX_QUANTILES(value, 4) AS quartiles
FROM metrics;
6. Array and Struct Operations
Array Aggregation
SQL
1234567-- Transform rows to array
SELECT
user_id,
ARRAY_AGG(DISTINCT product) AS purchased_products
FROM purchases
GROUP BY user_id;
Struct Manipulation
SQL
12345678910-- Complex struct operations
SELECT
ARRAY(
SELECT AS STRUCT
name,
age,
ARRAY(SELECT score FROM UNNEST(scores)) AS test_scores
FROM users
) AS user_details;
7. Date and Time Functions
Date Manipulation
SQL
1234567-- Date arithmetic and parsing
SELECT
DATE_ADD(current_date, INTERVAL 30 DAY) AS future_date,
EXTRACT(YEAR FROM timestamp) AS year,
FORMAT_DATE('%Y-%m', current_date) AS month_year
FROM table;
8. Geographic Functions
Spatial Queries
SQL
12345678-- Find points within a region
SELECT
location,
ST_WITHIN(location, region) AS is_inside
FROM geo_data
WHERE ST_WITHIN(location,
ST_GEOGFROMTEXT('POLYGON((...))'));
9. Machine Learning Integration (BigQuery ML)
Create and Train Model
SQL
1234567CREATE OR REPLACE MODEL `project.dataset.model`
OPTIONS(
model_type='linear_reg',
input_label_cols=['target']
) AS
SELECT * FROM training_data;
Prediction
SQL
123456SELECT *
FROM ML.PREDICT(
MODEL `project.dataset.model`,
(SELECT * FROM prediction_data)
);
10. Performance Optimization
Partitioning
SQL
12345-- Create partitioned table
CREATE TABLE `project.dataset.partitioned_table`
PARTITION BY DATE(timestamp)
AS SELECT * FROM source_table;
Clustering
SQL
12345-- Create clustered table
CREATE TABLE `project.dataset.clustered_table`
CLUSTER BY user_id, country
AS SELECT * FROM source_table;
11. Cost Optimization Strategies
- Use
LIMITto restrict query size - Leverage partitioning and clustering
- Use
dry_run=Trueto estimate query cost - Utilize materialized views
- Set up cost controls and budgets
12. External Data Sources
External Table
SQL
123456CREATE EXTERNAL TABLE `project.dataset.external_table`
OPTIONS (
format = 'CSV',
uris = ['gs://bucket/path/*.csv']
);
13. Best Practices
- Always use standard SQL mode
- Leverage LIMIT for testing queries
- Use approximate aggregation functions
- Partition large tables
- Use clustering for frequent filter columns
- Avoid SELECT *
- Use STRUCT and ARRAY for complex data
- Monitor and optimize query performance
Additional Resources
Continue Learning
Discover more cheatsheets to boost your productivity