TestForge | Aidevops | 📊 Plogger ✍️ Blog 📚 Docs
plogger

AI DevOps Korea

Turn AI service development and operations into one improvement loop

Aidevops.kr covers LLMOps, RAG, agents, observability, evaluation, and cost-performance optimization for production AI services.

A Practical Guide to CQRS and Event Sourcing

· Updated Apr 22
CQRS and Event Sourcing diagram showing commands, aggregates, event store, projections, read model, and queries
CQRS separates write protection from read optimization, while Event Sourcing turns durable events into the backbone for projections and recovery.

CQRS and Event Sourcing are not architecture badges. They are ways to make business rules, state transitions, and read-side requirements explicit when a CRUD model starts hiding too much complexity.

Used well, they help a team answer hard questions with confidence:

  • Which business rule rejected this change?
  • What exactly happened before the incident?
  • How do we build multiple read models without loading the write model with query concerns?
  • Can we rebuild derived state after a bug, schema change, or reporting requirement?

Used poorly, they create a system that is harder to reason about than the original problem. The practical question is not whether the pattern is advanced. The practical question is whether the domain earns the extra operational cost.

What CQRS actually changes

CQRS splits the write side from the read side because they optimize for different concerns.

  • The write side protects invariants and business rules.
  • The read side optimizes for shape, latency, and query flexibility.
  • The two models can evolve independently as long as their contracts are clear.

That does not automatically mean separate databases, separate services, or a microservice architecture. A useful first step is often much smaller:

  • commands and queries use different handlers
  • aggregates only exist on the write side
  • read models are denormalized for the screens or APIs that need them

This smaller interpretation already removes a lot of accidental complexity from a traditional layered application.

What Event Sourcing actually changes

Event Sourcing stores state transitions as an ordered stream of domain events instead of storing only the latest row state.

Instead of persisting account.balance = 150, you persist facts such as:

  • AccountOpened
  • FundsDeposited
  • FundsWithdrawn
  • OverdraftLimitRaised

Current state is reconstructed by replaying those events or by replaying from a snapshot plus the remaining events.

That design gives you valuable properties:

  • a durable audit trail of domain facts
  • a natural input stream for projections
  • the ability to rebuild read models after code or schema changes
  • a clearer model of how state changes over time

It also introduces serious responsibilities:

  • event schema evolution
  • replay performance
  • projection idempotency
  • monitoring lag between write and read sides
  • operational tooling for rebuilds and backfills

When CQRS and Event Sourcing are worth it

These patterns are usually justified when several of the following are true at the same time:

  • write-side business rules are dense and expensive to validate incorrectly
  • the same source of truth must feed many different read models
  • audit history is a business requirement, not a logging nice-to-have
  • time-travel, replay, and retroactive correction matter
  • teams need a stable domain event stream for downstream workflows

Good examples include:

  • financial ledgers and payment workflows
  • order lifecycles with compensation and fulfillment states
  • inventory domains where reservation, allocation, and release have business meaning
  • compliance-heavy back-office systems
  • collaborative systems where change history is part of the product value

Less convincing cases include:

  • straightforward admin CRUD
  • low-risk internal tools with simple reporting needs
  • domains where eventual consistency is unacceptable but distributed read models are still desired

If a normalized transactional model plus a few materialized views solves the problem, that is usually the better engineering decision.

Key decisions

  • events should be domain facts, not generic logs
  • aggregates should guard invariants, not become huge state holders
  • projections should be treated as replayable pipelines
  • eventual consistency must be acceptable
  • snapshots and versioning must be planned early

Those principles sound simple, but each one changes implementation strategy in a meaningful way.

Model events as business facts

Events should describe something the business would recognize as having happened. That usually means past-tense, domain-specific names and payloads that preserve business intent.

Better event names:

  • OrderPlaced
  • PaymentAuthorized
  • ShipmentDispatched

Weaker event names:

  • OrderUpdated
  • RowChanged
  • StatusSet

Generic events are tempting because they look flexible, but they move meaning out of the model and into scattered application code. A stream full of vague mutation events becomes difficult to replay, reason about, and integrate with safely.

As a rule of thumb:

  • event names should reflect domain language
  • payloads should contain the information required to understand the fact later
  • events should be immutable once published

Keep aggregates narrow and defensive

Aggregates are transaction boundaries for enforcing invariants. They are not object graphs designed to satisfy every navigation path in the UI.

A strong aggregate design typically has these traits:

  • it loads only the information required to validate a command
  • it enforces invariants synchronously before producing events
  • it does not reach into external read models to decide critical rules

If an aggregate keeps growing to answer read concerns, the design is drifting back toward a mixed model. In CQRS, reads should move to projections and purpose-built query models, not into the aggregate.

Design projections as disposable but trustworthy

A projection is not just a background consumer. It is a replayable pipeline that turns event streams into query-friendly views.

Reliable projections usually need:

  • idempotent handlers
  • deterministic ordering rules
  • clear checkpointing
  • rebuild tooling
  • visible lag metrics

A projection should be safe to rebuild from scratch. If rebuilding is risky, manual, or inconsistent across environments, the operational value of Event Sourcing drops sharply.

Consistency is a product decision

Most CQRS systems accept eventual consistency between the write model and read model. That is not only a technical tradeoff. It is a product and UX tradeoff.

Teams should decide explicitly:

  • how stale can the read model be?
  • what should the user see immediately after a successful command?
  • which workflows require read-your-own-write behavior?
  • how will support teams diagnose projection lag?

Common patterns include:

  • returning command results directly from the write side for immediate confirmation
  • polling until a read model version catches up
  • pushing updates via WebSocket or SSE when projections complete
  • showing explicit processing states in the UI

If the business cannot tolerate delayed visibility, do not assume CQRS is the right fit.

Snapshots are a performance tool, not a shortcut

As event streams grow, rebuilding aggregate state on every command can become expensive. Snapshots reduce replay cost by storing derived state at a known version and replaying only later events.

Useful snapshot practices include:

  • snapshot by stream version or event count thresholds
  • store snapshot schema version explicitly
  • treat snapshots as disposable caches derived from canonical events
  • verify replay logic still works without snapshots

The critical mental model is this: events are the source of truth, snapshots are optimization artifacts.

Event versioning must be planned early

Once events are consumed by projections, integrations, or analytics pipelines, changing them becomes expensive. Event versioning deserves an explicit strategy from the beginning.

Typical options are:

  • append-only event type evolution with new event names
  • versioned payload contracts such as CustomerAddressChangedV2
  • upcasters that transform old event payloads into the latest in-memory shape during replay

The best choice depends on how many consumers you have and how long old streams remain active, but the worst choice is having no plan at all.

A practical flow in production

In a typical implementation, the request flow looks like this:

  1. A command reaches an application service or command handler.
  2. The handler loads the aggregate from its event stream.
  3. The aggregate validates business rules and emits one or more domain events.
  4. The event store appends the events with optimistic concurrency checks.
  5. Projection workers consume the new events and update read models.
  6. APIs and UI screens query those read models.

That flow makes the concurrency model explicit. It also shows where failures tend to happen:

  • optimistic locking conflicts on the write side
  • projection lag on the read side
  • poison events that break one consumer while others continue
  • mismatched assumptions between command responses and UI queries

Operational traps teams underestimate

The hardest part of Event Sourcing is usually not writing the aggregate. It is operating the system over time.

Watch for these failure modes:

  • projections that are not idempotent and duplicate side effects during replay
  • events that leak internal implementation details instead of business facts
  • rebuilding read models without a tested backfill procedure
  • streams that become too broad because aggregate boundaries were chosen poorly
  • support teams lacking visibility into event history, projection lag, and dead-letter queues

Before adopting the pattern broadly, make sure the team can answer:

  • How do we replay a single projection safely?
  • How do we backfill a new read model in production?
  • How do we inspect an event stream during an incident?
  • How do we roll out an event contract change without breaking consumers?

Spring Boot implementation notes

In a Spring Boot system, a practical design often looks like this:

  • command handlers coordinate transactions and aggregate loading
  • aggregates emit domain events rather than publishing infrastructure events directly
  • the event store append is the write-side commit boundary
  • outbox or streaming integration bridges domain events to external systems when needed
  • projections run as separate consumers with their own retry and observability policies

One important boundary is keeping domain decisions inside the aggregate and infrastructure concerns outside it. If an aggregate starts calling HTTP services, query repositories, or analytics systems to decide whether a command is valid, the model is becoming fragile.

Decision checklist

Adopt CQRS and Event Sourcing when most of these are true:

  • domain history has long-term value
  • read and write concerns diverge materially
  • replayable derived models are useful
  • the team can invest in operational tooling
  • eventual consistency is acceptable in user-facing workflows

Avoid or delay adoption when most of these are true:

  • the domain is simple
  • the team mainly needs CRUD plus reporting
  • operational maturity is still low
  • projections and replay would be difficult to support
  • transactional consistency across read and write paths is mandatory

Wrap-up

CQRS and Event Sourcing are powerful when they make business behavior easier to model, explain, and operate. They are harmful when they are introduced mainly to look scalable or sophisticated.

The right evaluation standard is simple: does the additional complexity buy clarity, auditability, and adaptability that the domain genuinely needs? If yes, the pattern can be transformative. If not, a well-designed transactional model will usually deliver more value with less risk.

Continue Reading

Related posts

Next Path

Keep exploring this topic as a system