Mock, Stub, and Spy Test Double Design Guide

Mocks, stubs, and spies are useful only when they help control the right boundary. They become harmful when they turn tests into a mirror of internal implementation rather than a check of meaningful behavior.

The practical skill is not memorizing the vocabulary. It is knowing why this boundary should be controlled at all.

Test doubles exist to control uncertainty

The main reason to introduce a test double is to isolate a dependency that is:

slow
nondeterministic
expensive
external to the unit under test

That usually includes things like:

email gateways
payment clients
file systems
clocks
random number generation

If the dependency is actually simple, in-memory, and stable, replacing it may reduce confidence rather than improve it.

Stubs provide controlled inputs

A stub is useful when the test needs a dependency to return known data so the unit under test can be evaluated deterministically.

Typical use:

a user repository returns a predefined user
a pricing API returns a known exchange rate
a clock returns a fixed point in time

The point of a stub is not interaction verification. It is predictable input control.

Mocks verify important interaction contracts

Mocks are strongest when the interaction itself matters.

Examples:

an email should be sent after successful registration
a payment gateway should be called once with a specific request
an audit event should be emitted when a privileged action succeeds

The key word is important. If every internal call is treated as an interaction contract, tests become brittle and refactor-hostile.

Spies observe calls without replacing the whole design

Spies are useful when the test wants to observe whether something was called while still keeping more of the original behavior in place.

They work well when:

the dependency is mostly acceptable as-is
one aspect of usage needs observation
replacing the whole object with a full mock would be heavier than necessary

Used carefully, spies can preserve realism better than full mocking.

State verification is often stronger than interaction verification

Many teams overuse mocks because interaction assertions feel precise. In practice, state verification is often more robust.

Stronger example:

after registration, the user record exists and is active

Weaker example:

repository method A was called once, then method B was called once

When a test verifies the resulting state or observable outcome, it usually survives refactors better.

Too much mocking creates false confidence

Heavy mocking tends to create tests that pass even while the real system boundaries drift.

Watch for these patterns:

every dependency mocked by default
most assertions are call-count checks
stubbed values no longer resemble real contracts
tests break on internal refactors more than on real bugs

These are signs that the test is proving the shape of the implementation, not the value of the behavior.

Fakes are often underrated

In many cases, a fake is a better choice than a mock.

Examples:

in-memory repository
fake message bus
temporary file store

Fakes are useful because they preserve behavior realism while still being fast and controllable. When possible, they can offer a better balance of speed and trust.

Choosing the right double

A practical heuristic:

use a stub when you need known input
use a mock when interaction is the thing being proven
use a spy when observation matters but realism should remain higher
use a fake when a lightweight working substitute improves confidence

This choice should always be driven by the defect you want the test to catch first.

Review checklist

Before accepting a test-double-heavy design, ask:

Why is this dependency being replaced?
Is the test checking behavior or just implementation structure?
Would a stub or fake be simpler than a mock?
Would the test still be valuable after internal refactoring?
Are interaction assertions limited to truly important protocol behavior?

Closing judgment

Mocks, stubs, and spies are most useful when their roles stay distinct. Good tests do not maximize doubles; they justify exactly which boundary is being controlled and why.

🧪 Test

Turn AI service development and operations into one improvement loop

Mock, Stub, and Spy Test Double Design Guide

Test doubles exist to control uncertainty

Stubs provide controlled inputs

Mocks verify important interaction contracts

Spies observe calls without replacing the whole design

State verification is often stronger than interaction verification

Too much mocking creates false confidence

Fakes are often underrated

Choosing the right double

Review checklist

Closing judgment

Related posts

Practical React Testing Library Design Guide

TDD in Practice: Red-Green-Refactor

Postman Practical Guide: API Testing, Automation, and Team Collaboration

A Guide to Spring Boot Testing Strategy

Keep exploring this topic as a system