Database Partitioning Strategies — Sharding, Horizontal/Vertical Splitting
[Single DB]
|
+--> [Partitioning]
| same DB / same schema / split by key or range
|
+--> [Sharding]
multiple DBs / split by shard key
Partitioning divides physical or logical pieces within a single database, while sharding splits the database itself across multiple nodes. Both share the idea of “splitting data,” but their operational complexity and scalability limits are completely different, so it helps to keep that structural distinction in mind before going deeper.
Partitioning vs. Sharding
| Category | Partitioning | Sharding |
|---|---|---|
| Scope | Split tables within a single DB | Distribute across multiple DB servers |
| Complexity | Low | High |
| Scalability | Vertical scaling | Horizontal scaling |
MySQL Partitioning
RANGE Partitioning (Date-Based)
CREATE TABLE orders (
id BIGINT,
created_at DATE,
amount INT
)
PARTITION BY RANGE (YEAR(created_at)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
LIST Partitioning (Status-Based)
PARTITION BY LIST (status) (
PARTITION p_pending VALUES IN (1),
PARTITION p_paid VALUES IN (2),
PARTITION p_shipped VALUES IN (3),
PARTITION p_done VALUES IN (4)
);
HASH Partitioning (Even Distribution)
PARTITION BY HASH(user_id) PARTITIONS 8;
Partition Management
-- Check partition information
SELECT * FROM information_schema.PARTITIONS
WHERE TABLE_NAME = 'orders';
-- Query a specific partition (partition pruning)
SELECT * FROM orders PARTITION (p2024);
-- Add a partition
ALTER TABLE orders ADD PARTITION (
PARTITION p2026 VALUES LESS THAN (2027)
);
-- Drop an old partition (fast data cleanup)
ALTER TABLE orders DROP PARTITION p2023;
Sharding Strategies
Range Sharding
user_id 1~1000000 → Shard 1
user_id 1000001~ → Shard 2
Hash Sharding
int shardIndex = userId % NUM_SHARDS;
DataSource ds = shards.get(shardIndex);
Vertical Partitioning
Split out large columns that are not queried often:
-- Original: users (id, name, email, profile_image_blob, bio)
-- After split:
-- users (id, name, email)
-- user_profiles (user_id, profile_image_blob, bio)
What Gets Hard in Production
- Partitioning helps only when data volume, retention, and query routing actually justify the extra operational complexity.
- Teams often partition too early and discover that their real bottleneck was indexing, query shape, or hardware limits.
- Once partitioning exists, maintenance, pruning, and backfill paths become part of daily operations.
Architecture Decisions That Matter
- Choose the partition key based on retention and access path, not just on what column grows fastest.
- Design application queries so partition pruning can actually happen.
- Plan archive, drop, and rebalance procedures before the first partition is created.
Practical Example
Time-based partitioning is common because it aligns query range and retention operations:
events_2026_01
events_2026_02
events_2026_03
...
Anti-Patterns to Avoid
- Partitioning tables whose workload rarely benefits from pruning.
- Choosing a key that scatters common queries across too many partitions.
- Ignoring operational tooling for late-arriving data and repartitioning.
Operational Checklist
- Verify pruning with real execution plans.
- Automate partition creation and retirement.
- Test backfill and historical query behavior.
- Watch metadata overhead and maintenance runtime as partition count grows.
Final Judgment
Partitioning is justified when it simplifies scale and retention operations together. If it only adds complexity around ordinary queries, it is usually the wrong first move.
Continue Reading
Related posts
MySQL Index Optimization Strategies — Query Analysis with EXPLAIN
This post explains how MySQL indexes work and how to optimize them using EXPLAIN analysis. It covers composite indexes, covering indexes, and index hints with practical examples.
🗄️ DatabaseA Complete Guide to Database Transactions and Isolation Levels
This post summarizes ACID properties, transaction isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable), and the problems that occur at each level with practical examples.
📈 TrendsPostgreSQL 18 Trends: What Actually Matters in Practice
PostgreSQL 18 is more than an upgrade headline. AIO, skip scan, better post-upgrade recovery, OAuth, and generated columns all point to a release focused on operational cost reduction.
🚀 DevOpsKubernetes Advanced Operations — HPA, Resource Management, and Pod Scheduling
This article explains Kubernetes operations not as a collection of settings but from the perspective of resource placement and resilience. It covers when and how to use requests/limits, HPA, affinity, taints, PDBs, and probes in real environments.
Next Path