Partitioning
What is Partitioning?
Partitioning in databases is about splitting a large table into smaller, more manageable pieces, while still letting users query it as if it's a single table.
- Instead of one huge table with millions/billions of rows, you divide it into partitions.
- Queries only scan the relevant partitions, improving performance.
- Admin tasks (vacuuming, indexing, archiving) become more efficient.
Benefits:
- Faster queries (partition pruning).
- Smaller indexes per partition.
- Easier maintenance (drop/archive a partition instead of deleting rows).
- Can align with data lifecycle (e.g., yearly partitions).
Trade-offs:
- Extra complexity in schema design.
- Inserts need to check partition rules → may be slower if not indexed well.
- Joins across partitions may be more expensive.
- Too many partitions can hurt performance.
In PostgreSQL specifically:
- Native partitioning is declarative (PARTITION BY) since version 10.
- It supports range, list, and hash partitions.
- Old versions used table inheritance + check constraints with constraint_exclusion.
Types of partitioning
Horizontal Partitioning (most common)
- Split rows into different partitions based on a rule.
- Example: A sales table partitioned by year:
sales_2023 → rows with sale_date in 2023
sales_2024 → rows with sale_date in 2024
PostgreSQL syntax:
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
sale_date DATE,
amount NUMERIC
) PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
Query:
SELECT SUM(amount) FROM sales WHERE sale_date >= '2024-05-01';
PostgreSQL only scans sales_2024 (partition pruning).
Vertical Partitioning
- Split a table by columns, often to move rarely used or large columns aside.
- Example:
- users_core(id, name, email)
- users_extra(id, bio, profile_pic)
Helps reduce row size in hot queries.
Hash Partitioning
- Rows distributed across partitions using a hash function.
- Useful for evenly distributing data (e.g., user_id spread across N partitions).
CREATE TABLE users (
id INT,
name TEXT
) PARTITION BY HASH (id);
CREATE TABLE users_p0 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE users_p1 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 1);
List Partitioning
- Split rows based on a list of discrete values.
- Example: partition by region.
CREATE TABLE customers (
id INT,
country TEXT
) PARTITION BY LIST (country);
CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('FR', 'DE', 'IT');
CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US');
Partitioning vs Sharding
- Partitioning = one database, sliced into pieces (performance & manageability).
- Sharding = many databases, each holding part of the data (scalability).
Feature | Partitioning | Sharding |
---|---|---|
Location | One database server | Multiple servers / DB instances |
Purpose | Optimize query performance & manageability | Scale beyond one machine |
Query transparency | Transparent to apps | App/middleware must handle routing |
Cross-partition join | Supported within same DB | Difficult/expensive |
Maintenance | Easier (single DB) | Harder (many DBs to manage) |
Scale | Vertical (optimize single DB) | Horizontal (spread across DBs) |
Partitioning
- Scope: Within a single database instance.
- Goal: Make large tables faster and easier to manage.
- How it works: A big table is split into smaller pieces (partitions), but all partitions live in the same DB server (same PostgreSQL instance).
- Queries: Database engine automatically chooses the correct partition(s) → transparent to applications.
Example:
- PostgreSQL sales table partitioned by year.
- All partitions (sales_2023, sales_2024) live on the same PostgreSQL server.
Benefits:
- Efficient queries (partition pruning).
- Easier maintenance (drop old partitions).
- Still just one database to manage.
Sharding
- Scope: Across multiple database instances (servers).
- Goal: Handle data volume or traffic beyond what one server can handle.
- How it works: The dataset is split into "shards", each stored in a separate DB server. Together, shards form the full dataset.
- Queries: Application or middleware must know which shard to query (sometimes with help of a "routing key").
Example: A user database sharded by user_id % 4:
- Shard 1 → PostgreSQL server A (user_id ending in 0–24%)
- Shard 2 → PostgreSQL server B (25–49%)
- Shard 3 → PostgreSQL server C (50–74%)
- Shard 4 → PostgreSQL server D (75–99%)
Benefits:
- Scalability: horizontal scaling across many machines.
- Each shard smaller and faster.
- Can serve massive datasets and high request loads.
Challenges:
- Cross-shard queries are hard (joins/transactions often limited).
- Requires custom routing logic or middleware (e.g., Citus for PostgreSQL, Vitess for MySQL).
- Operationally more complex (multiple DBs to maintain).
Example of partitioning and sharding combined
Partitioning and sharding are combined to handle both scalability and performance. An example of a large e-commerce platform, an online store with hundreds of millions of orders worldwide.
Concept:
- Sharding = splitting data across servers (Europe vs. US vs. Asia).
- Partitioning = splitting tables within each server (orders by year/month).
- Together, they give both horizontal and vertical scalability.
Trade-offs:
- Pros:
- Extreme scalability (global distribution + per-shard performance).
- Maintenance tasks (vacuum, index rebuilds, backups) manageable at shard+partition granularity.
- Transparent partition pruning inside shards.
- Cons:
- Cross-shard queries (e.g., global reporting) require aggregation across shards.
- Higher operational complexity: need orchestration for schema management and shard routing.
Step 1: Sharding by Region
The company splits the database into regional shards:
- Shard 1 → Europe (Postgres server A)
- Shard 2 → North America (Postgres server B)
- Shard 3 → Asia (Postgres server C)
Each shard contains only the users/orders for that region. This distributes the load across multiple servers → horizontal scalability.
Step 2: Partitioning Inside Each Shard
Inside each shard, the orders table is massive (tens of millions of rows). So they use table partitioning by order date:
CREATE TABLE orders (
order_id BIGSERIAL,
user_id BIGINT,
order_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC,
PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (order_date);
-- Create partitions per year
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
- Query for 2024 orders → PostgreSQL automatically prunes partitions → only scans orders_2024.
- Dropping old data (say orders_2020) = just
DROP TABLE orders_2020;
— very fast.
Step 3: Combined Effect
- Sharding handles scale across regions → no single DB is overwhelmed.
- Partitioning inside each shard handles scale within a region → queries faster, maintenance easier.