Database Partitioning Strategies — Sharding, Horizontal/Vertical Splitting

## Structure at a Glance

[Single DB]
    |
    +--> [Partitioning]
    |       same DB / same schema / split by key or range
    |
    +--> [Sharding]
            multiple DBs / split by shard key

Partitioning divides physical or logical pieces within a single database, while sharding splits the database itself across multiple nodes. Both share the idea of “splitting data,” but their operational complexity and scalability limits are completely different, so it helps to keep that structural distinction in mind before going deeper.

Partitioning vs. Sharding

Category	Partitioning	Sharding
Scope	Split tables within a single DB	Distribute across multiple DB servers
Complexity	Low	High
Scalability	Vertical scaling	Horizontal scaling

MySQL Partitioning

RANGE Partitioning (Date-Based)

CREATE TABLE orders (
    id BIGINT,
    created_at DATE,
    amount INT
)
PARTITION BY RANGE (YEAR(created_at)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p2025 VALUES LESS THAN (2026),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

LIST Partitioning (Status-Based)

PARTITION BY LIST (status) (
    PARTITION p_pending  VALUES IN (1),
    PARTITION p_paid     VALUES IN (2),
    PARTITION p_shipped  VALUES IN (3),
    PARTITION p_done     VALUES IN (4)
);

HASH Partitioning (Even Distribution)

PARTITION BY HASH(user_id) PARTITIONS 8;

Partition Management

-- Check partition information
SELECT * FROM information_schema.PARTITIONS
WHERE TABLE_NAME = 'orders';

-- Query a specific partition (partition pruning)
SELECT * FROM orders PARTITION (p2024);

-- Add a partition
ALTER TABLE orders ADD PARTITION (
    PARTITION p2026 VALUES LESS THAN (2027)
);

-- Drop an old partition (fast data cleanup)
ALTER TABLE orders DROP PARTITION p2023;

Sharding Strategies

Range Sharding

user_id 1~1000000  → Shard 1
user_id 1000001~   → Shard 2

Hash Sharding

int shardIndex = userId % NUM_SHARDS;
DataSource ds = shards.get(shardIndex);

Vertical Partitioning

Split out large columns that are not queried often:

-- Original: users (id, name, email, profile_image_blob, bio)
-- After split:
-- users (id, name, email)
-- user_profiles (user_id, profile_image_blob, bio)

What Gets Hard in Production

Partitioning helps only when data volume, retention, and query routing actually justify the extra operational complexity.
Teams often partition too early and discover that their real bottleneck was indexing, query shape, or hardware limits.
Once partitioning exists, maintenance, pruning, and backfill paths become part of daily operations.

Architecture Decisions That Matter

Choose the partition key based on retention and access path, not just on what column grows fastest.
Design application queries so partition pruning can actually happen.
Plan archive, drop, and rebalance procedures before the first partition is created.

Practical Example

Time-based partitioning is common because it aligns query range and retention operations:

events_2026_01
events_2026_02
events_2026_03
...

Anti-Patterns to Avoid

Partitioning tables whose workload rarely benefits from pruning.
Choosing a key that scatters common queries across too many partitions.
Ignoring operational tooling for late-arriving data and repartitioning.

Operational Checklist

Verify pruning with real execution plans.
Automate partition creation and retirement.
Test backfill and historical query behavior.
Watch metadata overhead and maintenance runtime as partition count grows.

Final Judgment

Partitioning is justified when it simplifies scale and retention operations together. If it only adds complexity around ordinary queries, it is usually the wrong first move.

🗄️ Database

Turn AI service development and operations into one improvement loop