Redshift
Updated: September 10, 2025Categories: Query
Printed from:
Amazon Redshift SQL Cheatsheet
1. Connection and Basic Setup
psql Connection
Bash
12psql -h <cluster-endpoint> -p 5439 -U <username> -d <database>
JDBC Connection String
jdbc:redshift://endpoint:5439/database
AWS CLI Cluster Management
Bash
123456# List clusters
aws redshift describe-clusters
# Describe a specific cluster
aws redshift describe-clusters --cluster-identifier <cluster-name>
2. Database, Schema, and Table Operations
Create Database
SQL
12345CREATE DATABASE warehouse
WITH
OWNER = admin
ENCODING = 'UTF8';
Create Schema
SQL
123CREATE SCHEMA analytics
AUTHORIZATION admin;
Create Table (with Distribution and Sort Keys)
SQL
1234567891011CREATE TABLE sales (
sale_id INTEGER ENCODE DELTA,
product_id INTEGER DISTKEY,
sale_date DATE SORTKEY,
amount DECIMAL(12,2) ENCODE ZSTD,
region VARCHAR(50)
)
DISTSTYLE KEY
DISTKEY (product_id)
SORTKEY (sale_date);
3. Redshift-Specific Data Types and Encodings
Data Types
INTEGER,BIGINT,SMALLINTDECIMAL(precision, scale)VARCHAR(n),CHAR(n)DATE,TIMESTAMP,TIMESTAMPTZBOOLEAN
Compression Encodings
ZSTD: Best general-purpose compressionDELTA: Optimal for sequential or incremental dataLZO: Good for mixed workloadsRAW: No compressionBYTEDICT: Small dictionaries of unique values
4. Standard SQL Operations
SELECT with Advanced Features
SQL
123456789101112131415161718-- Window functions
SELECT
product_id,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) as cumulative_sales
FROM sales;
-- Top N with window function
SELECT * FROM (
SELECT
product_id,
amount,
RANK() OVER (ORDER BY amount DESC) as sales_rank
FROM sales
) ranked
WHERE sales_rank <= 10;
COPY Command (Data Loading)
SQL
12345COPY sales
FROM 's3://mybucket/sales_data/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
JSON 'auto';
UNLOAD Command (Data Export)
SQL
12345UNLOAD ('SELECT * FROM sales WHERE sale_date > GETDATE() - INTERVAL 30 DAYS')
TO 's3://mybucket/sales_export/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftUnloadRole'
PARALLEL ON;
5. Distribution Styles
Distribution Styles
EVEN: Distributes rows across slices uniformlyKEY: Distributes based on a specific columnALL: Replicates entire table to all compute nodes
Example Distribution Selection
SQL
1234567891011121314-- Good for fact tables with foreign key joins
CREATE TABLE fact_sales (
sale_id INT DISTKEY,
product_id INT,
customer_id INT
)
DISTSTYLE KEY;
-- Good for small dimension tables
CREATE TABLE dim_products (
product_id INT PRIMARY KEY
)
DISTSTYLE ALL;
6. Compression and Encoding Strategies
Best Practices
- Use
ANALYZE COMPRESSIONto get encoding recommendations - Choose encoding based on data characteristics
- Minimize column width
- Use appropriate encoding for data type
SQL
123-- Analyze compression recommendations
ANALYZE COMPRESSION sales;
7. Analytics and Window Functions
Window Function Examples
SQL
1234567891011121314-- Running total
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) as running_total
FROM sales;
-- Percentile calculation
SELECT
product_id,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) as median_sale
FROM sales
GROUP BY product_id;
8. Performance Optimization
Query Tuning Tips
- Use
EXPLAINto analyze query execution plan - Minimize data movement
- Use sort and dist keys effectively
- Avoid unnecessary joins and subqueries
SQL
123456-- Explain query plan
EXPLAIN
SELECT product_id, SUM(amount)
FROM sales
GROUP BY product_id;
Workload Management (WLM)
- Configure query queues
- Set concurrency and memory allocation
- Prioritize critical queries
9. User and Security Management
Create User
SQL
1234CREATE USER data_analyst
WITH PASSWORD 'SecurePass123'
IN GROUP reporting_group;
Grant Permissions
SQL
1234GRANT SELECT
ON TABLE sales
TO data_analyst;
10. Maintenance Operations
VACUUM and ANALYZE
SQL
12345678910-- Reclaim space and resort table
VACUUM sales;
-- Update table statistics
ANALYZE sales;
-- Comprehensive maintenance
VACUUM FULL;
ANALYZE;
11. Best Practices
Columnar Storage Optimization
- Use narrow, compressed columns
- Leverage sort and distribution keys
- Minimize data type sizes
- Use appropriate encoding
Performance Checklist
- Use COPY for bulk loading
- Implement appropriate distribution strategy
- Use compression
- Regularly run VACUUM and ANALYZE
- Monitor query performance
- Use short, focused queries
- Leverage materialized views for complex aggregations
12. Monitoring and Troubleshooting
Key System Tables
STL_QUERY: Query execution historySTL_WLMQOSCLASS: Workload management detailsSTL_ALERT_EVENT_LOG: Performance alerts
SQL
12345678910-- Check long-running queries
SELECT
query,
starttime,
endtime,
elapsed_time
FROM STL_QUERY
WHERE elapsed_time > 60000 -- Over 1 minute
ORDER BY elapsed_time DESC;
Conclusion
Amazon Redshift is a powerful columnar database designed for analytics and business intelligence. By understanding its architecture, distribution strategies, and optimization techniques, you can build high-performance data warehouses.
Key Takeaways:
- Leverage MPP architecture
- Use appropriate distribution and sort keys
- Implement smart compression
- Regularly maintain and optimize
Continue Learning
Discover more cheatsheets to boost your productivity