Ceph Cheatsheet: Distributed Storage System

Overview

Ceph is an open-source distributed storage system designed to provide high performance, reliability, and scalability through a unified platform for object, block, and file storage.

Core Concepts

Distributed Object Storage
RADOS (Reliable, Autonomous, Distributed Object Store)
Horizontal Scalability
Self-healing Architecture
CRUSH Algorithm for deterministic data placement

Key Components

Monitor Nodes (MONs)
- Cluster state management (cluster map)
- Coordinate cluster operations
- Maintain consensus via Paxos
Manager Daemons (MGRs)
- Required since Luminous; run alongside MONs
- Host pluggable modules (dashboard, prometheus, balancer, orchestrator)
- Expose cluster metrics and management APIs
Object Storage Daemons (OSDs)
- Store data on devices (BlueStore is the default backend)
- Handle data replication, recovery, and scrubbing
- One OSD per storage device is typical
Metadata Servers (MDSs)
- Manage CephFS file system metadata
- Support POSIX file system semantics
- Scale-out via multiple active MDS ranks
RADOS Gateway (RGW)
- S3 and Swift-compatible object storage endpoint

Installation Prerequisites

Modern Linux distribution (RHEL/CentOS Stream, Ubuntu LTS, Debian, openSUSE)
Container runtime (Podman or Docker) for cephadm-based deployments
Python 3, systemd, chrony/ntpd for time sync
Minimum 3 nodes for MON quorum; 3+ for production OSDs
Dedicated public and cluster networks recommended
Hardware requirements:
- 64-bit processors
- 4 GB RAM per OSD (more for large devices)
- SSD/NVMe preferred; HDDs supported with SSD/NVMe DB/WAL

Deployment Workflow (cephadm — current recommended method)

Bash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Bootstrap a new cluster on the first host
curl --silent --remote-name --location https://download.ceph.com/rpm/el9/noarch/cephadm
chmod +x cephadm
./cephadm bootstrap --mon-ip <first-mon-ip>

# Enter the cephadm shell
cephadm shell

# Add additional hosts
ceph orch host add host2 <host2-ip>
ceph orch host add host3 <host3-ip>

# Deploy OSDs on all available devices
ceph orch apply osd --all-available-devices

# Deploy additional services
ceph orch apply mon --placement="3 host1 host2 host3"
ceph orch apply mgr --placement="2 host1 host2"
ceph orch apply rgw myrgw --placement="2 host1 host2"

Note: ceph-deploy is deprecated. Rook is the recommended orchestrator for Kubernetes deployments.

Key CLI Commands

Bash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Cluster Management
ceph -s                       # Cluster status
ceph health detail            # Detailed health report
ceph mon stat                 # Monitor status
ceph mgr services             # Manager modules / endpoints
ceph osd tree                 # OSD topology
ceph df                       # Cluster and pool usage
ceph versions                 # Daemon version overview

# Orchestrator (cephadm)
ceph orch ls                  # List deployed services
ceph orch ps                  # List daemon instances
ceph orch device ls           # List devices across hosts
ceph orch upgrade start --ceph-version <x.y.z>

# Pool Operations
ceph osd pool create <pool> [pg_num] [pgp_num] [replicated|erasure]
ceph osd pool application enable <pool> <rbd|rgw|cephfs>
ceph osd pool set <pool> size 3
ceph osd pool delete <pool> <pool> --yes-i-really-really-mean-it

# Placement Group / Balancer
ceph pg stat
ceph balancer status
ceph mgr module enable pg_autoscaler

# Data Management
rados -p <pool> put <object> /path/to/file
rados -p <pool> get <object> /path/to/destination
rbd create <pool>/<image> --size 10G
rbd map <pool>/<image>

Storage Architectures

Block Storage (RBD)
- Thin-provisioned volumes
- Snapshots, clones, and mirroring (rbd-mirror)
- Integration with KVM/libvirt, OpenStack Cinder, Kubernetes CSI
Object Storage (RADOS Gateway)
- S3 and Swift compatible
- Multi-tenant, multi-site replication
- STS, bucket policies, lifecycle, and object lock support
File Storage (CephFS)
- POSIX-compliant distributed file system
- Dynamic subtree partitioning across multiple active MDSs
- Snapshots, quotas, and per-directory pinning
- NFS export via nfs-ganesha integration

Storage Backend

BlueStore is the default and only supported OSD backend
Manages devices directly; supports separate DB/WAL on faster media
FileStore has been removed

Security Best Practices

Enable cephx authentication (default; do not disable)
Use messenger v2 with ms_*_mode = secure for on-wire encryption
Enable at-rest encryption with dm-crypt on OSD devices
Segment public and cluster networks
Apply least-privilege CephX capabilities per client
Restrict dashboard with TLS and strong auth (SSO/OAuth2 supported)
Patch promptly and follow Ceph security advisories

Performance Optimization

Use NVMe for BlueStore DB/WAL when OSDs are on HDD
Tune osd_memory_target based on available RAM
Enable the balancer (upmap mode) and pg_autoscaler
Use jumbo frames and dedicated cluster network
Use erasure coding for capacity-optimized workloads
Match CRUSH rules to failure domains (host, rack, room)
Monitor with the built-in Prometheus module and Grafana dashboards

Troubleshooting Techniques

ceph health detail and ceph crash ls / ceph crash info
Inspect daemon logs via cephadm logs --name <daemon> or journalctl
Watch PG states: active+clean, peering, degraded, remapped, inconsistent
Validate time sync, MTU, and network reachability between nodes
Use ceph tell <daemon> and ceph daemon for runtime diagnostics
Collect support bundles with ceph-post-file or orchestrator log collection

Common Use Cases

Cloud Storage
- Scalable, resilient backend for OpenStack and Kubernetes
- Unified block, object, and file from a single cluster
Big Data and AI/ML
- High-throughput object storage for data lakes
- S3 endpoint for Spark, Trino, and ML training pipelines
Media Storage and Streaming
- Large-scale media repositories
- Content distribution and archive tiers
Backup and Archive
- Erasure-coded pools for cost-efficient long-term retention
- S3 Object Lock for immutable backups

Scaling Strategies

Add OSDs and hosts horizontally; let the balancer redistribute
Keep MON quorum at an odd number (typically 3 or 5)
Use multiple active MDS ranks with directory pinning for CephFS
Use erasure coding for capacity, replication for performance
Deploy multi-site RGW for geographic distribution

Storage Pools Configuration

Replicated and erasure-coded pool types
Per-pool size, min_size, and CRUSH rule
Placement groups managed by pg_autoscaler
Application tags (rbd, rgw, cephfs) required on new pools
Per-pool quotas and compression settings

Release Cadence

Ceph follows a stable release cycle with named versions (e.g., Quincy, Reef, Squid, Tentacle)
Squid (v19) and Tentacle (v20) are recent stable releases; check the official release notes for the current supported versions
Upgrades are performed in-place via ceph orch upgrade

Integration Ecosystem

Kubernetes via Rook and the Ceph CSI driver
OpenStack (Cinder, Glance, Nova, Manila, Swift)
KVM/libvirt and Proxmox VE
OpenShift Data Foundation (built on Rook-Ceph)
Nfs-ganesha, Samba, and iSCSI gateways

Recommended Learning Resources

Official Ceph Documentation (docs.ceph.com)
Ceph Community Slack and mailing lists
Ceph GitHub repository (ceph/ceph)
Ceph Days and Cephalocon conference talks
Rook documentation for Kubernetes-native deployments

Recommended Learning Path

Distributed systems and consensus fundamentals
Linux system administration and networking
Storage architecture and CRUSH/placement concepts
Hands-on cephadm deployment in a lab
Advanced topics: erasure coding, multi-site RGW, CephFS tuning, Rook on Kubernetes

Ceph

Continue Learning