Skip to main content

Partitioning

What is Partitioning?

Partitioning in databases is about splitting a large table into smaller, more manageable pieces, while still letting users query it as if it's a single table.

  • Instead of one huge table with millions/billions of rows, you divide it into partitions.
  • Queries only scan the relevant partitions, improving performance.
  • Admin tasks (vacuuming, indexing, archiving) become more efficient.

Benefits:

  • Faster queries (partition pruning).
  • Smaller indexes per partition.
  • Easier maintenance (drop/archive a partition instead of deleting rows).
  • Can align with data lifecycle (e.g., yearly partitions).

Trade-offs:

  • Extra complexity in schema design.
  • Inserts need to check partition rules → may be slower if not indexed well.
  • Joins across partitions may be more expensive.
  • Too many partitions can hurt performance.

In PostgreSQL specifically:

  • Native partitioning is declarative (PARTITION BY) since version 10.
  • It supports range, list, and hash partitions.
  • Old versions used table inheritance + check constraints with constraint_exclusion.

Types of partitioning

Horizontal Partitioning (most common)

  • Split rows into different partitions based on a rule.
  • Example: A sales table partitioned by year:
sales_2023 → rows with sale_date in 2023
sales_2024 → rows with sale_date in 2024

PostgreSQL syntax:

CREATE TABLE sales (
id SERIAL PRIMARY KEY,
sale_date DATE,
amount NUMERIC
) PARTITION BY RANGE (sale_date);

CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

Query:

SELECT SUM(amount) FROM sales WHERE sale_date >= '2024-05-01';

PostgreSQL only scans sales_2024 (partition pruning).

Vertical Partitioning

  • Split a table by columns, often to move rarely used or large columns aside.
  • Example:
    • users_core(id, name, email)
    • users_extra(id, bio, profile_pic)

Helps reduce row size in hot queries.

Hash Partitioning

  • Rows distributed across partitions using a hash function.
  • Useful for evenly distributing data (e.g., user_id spread across N partitions).
CREATE TABLE users (
id INT,
name TEXT
) PARTITION BY HASH (id);

CREATE TABLE users_p0 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE users_p1 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 1);

List Partitioning

  • Split rows based on a list of discrete values.
  • Example: partition by region.
CREATE TABLE customers (
id INT,
country TEXT
) PARTITION BY LIST (country);

CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('FR', 'DE', 'IT');
CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US');

Partitioning vs Sharding

  • Partitioning = one database, sliced into pieces (performance & manageability).
  • Sharding = many databases, each holding part of the data (scalability).
FeaturePartitioningSharding
LocationOne database serverMultiple servers / DB instances
PurposeOptimize query performance & manageabilityScale beyond one machine
Query transparencyTransparent to appsApp/middleware must handle routing
Cross-partition joinSupported within same DBDifficult/expensive
MaintenanceEasier (single DB)Harder (many DBs to manage)
ScaleVertical (optimize single DB)Horizontal (spread across DBs)

Partitioning

  • Scope: Within a single database instance.
  • Goal: Make large tables faster and easier to manage.
  • How it works: A big table is split into smaller pieces (partitions), but all partitions live in the same DB server (same PostgreSQL instance).
  • Queries: Database engine automatically chooses the correct partition(s) → transparent to applications.

Example:

  • PostgreSQL sales table partitioned by year.
  • All partitions (sales_2023, sales_2024) live on the same PostgreSQL server.

Benefits:

  • Efficient queries (partition pruning).
  • Easier maintenance (drop old partitions).
  • Still just one database to manage.

Sharding

  • Scope: Across multiple database instances (servers).
  • Goal: Handle data volume or traffic beyond what one server can handle.
  • How it works: The dataset is split into "shards", each stored in a separate DB server. Together, shards form the full dataset.
  • Queries: Application or middleware must know which shard to query (sometimes with help of a "routing key").

Example: A user database sharded by user_id % 4:

  • Shard 1 → PostgreSQL server A (user_id ending in 0–24%)
  • Shard 2 → PostgreSQL server B (25–49%)
  • Shard 3 → PostgreSQL server C (50–74%)
  • Shard 4 → PostgreSQL server D (75–99%)

Benefits:

  • Scalability: horizontal scaling across many machines.
  • Each shard smaller and faster.
  • Can serve massive datasets and high request loads.

Challenges:

  • Cross-shard queries are hard (joins/transactions often limited).
  • Requires custom routing logic or middleware (e.g., Citus for PostgreSQL, Vitess for MySQL).
  • Operationally more complex (multiple DBs to maintain).

Example of partitioning and sharding combined

Partitioning and sharding are combined to handle both scalability and performance. An example of a large e-commerce platform, an online store with hundreds of millions of orders worldwide.

Concept:

  • Sharding = splitting data across servers (Europe vs. US vs. Asia).
  • Partitioning = splitting tables within each server (orders by year/month).
  • Together, they give both horizontal and vertical scalability.

Trade-offs:

  • Pros:
    • Extreme scalability (global distribution + per-shard performance).
    • Maintenance tasks (vacuum, index rebuilds, backups) manageable at shard+partition granularity.
    • Transparent partition pruning inside shards.
  • Cons:
    • Cross-shard queries (e.g., global reporting) require aggregation across shards.
    • Higher operational complexity: need orchestration for schema management and shard routing.

Step 1: Sharding by Region

The company splits the database into regional shards:

  • Shard 1 → Europe (Postgres server A)
  • Shard 2 → North America (Postgres server B)
  • Shard 3 → Asia (Postgres server C)

Each shard contains only the users/orders for that region. This distributes the load across multiple servers → horizontal scalability.

Step 2: Partitioning Inside Each Shard

Inside each shard, the orders table is massive (tens of millions of rows). So they use table partitioning by order date:

CREATE TABLE orders (
order_id BIGSERIAL,
user_id BIGINT,
order_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC,
PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (order_date);

-- Create partitions per year
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
  • Query for 2024 orders → PostgreSQL automatically prunes partitions → only scans orders_2024.
  • Dropping old data (say orders_2020) = just DROP TABLE orders_2020; — very fast.

Step 3: Combined Effect

  • Sharding handles scale across regions → no single DB is overwhelmed.
  • Partitioning inside each shard handles scale within a region → queries faster, maintenance easier.