Mock, Stub, and Spy Test Double Design Guide
The practical skill is not memorizing the vocabulary. It is knowing why this boundary should be controlled at all.
Test doubles exist to control uncertainty
The main reason to introduce a test double is to isolate a dependency that is:
- slow
- nondeterministic
- expensive
- external to the unit under test
That usually includes things like:
- email gateways
- payment clients
- file systems
- clocks
- random number generation
If the dependency is actually simple, in-memory, and stable, replacing it may reduce confidence rather than improve it.
Stubs provide controlled inputs
A stub is useful when the test needs a dependency to return known data so the unit under test can be evaluated deterministically.
Typical use:
- a user repository returns a predefined user
- a pricing API returns a known exchange rate
- a clock returns a fixed point in time
The point of a stub is not interaction verification. It is predictable input control.
Mocks verify important interaction contracts
Mocks are strongest when the interaction itself matters.
Examples:
- an email should be sent after successful registration
- a payment gateway should be called once with a specific request
- an audit event should be emitted when a privileged action succeeds
The key word is important. If every internal call is treated as an interaction contract, tests become brittle and refactor-hostile.
Spies observe calls without replacing the whole design
Spies are useful when the test wants to observe whether something was called while still keeping more of the original behavior in place.
They work well when:
- the dependency is mostly acceptable as-is
- one aspect of usage needs observation
- replacing the whole object with a full mock would be heavier than necessary
Used carefully, spies can preserve realism better than full mocking.
State verification is often stronger than interaction verification
Many teams overuse mocks because interaction assertions feel precise. In practice, state verification is often more robust.
Stronger example:
- after registration, the user record exists and is active
Weaker example:
- repository method A was called once, then method B was called once
When a test verifies the resulting state or observable outcome, it usually survives refactors better.
Too much mocking creates false confidence
Heavy mocking tends to create tests that pass even while the real system boundaries drift.
Watch for these patterns:
- every dependency mocked by default
- most assertions are call-count checks
- stubbed values no longer resemble real contracts
- tests break on internal refactors more than on real bugs
These are signs that the test is proving the shape of the implementation, not the value of the behavior.
Fakes are often underrated
In many cases, a fake is a better choice than a mock.
Examples:
- in-memory repository
- fake message bus
- temporary file store
Fakes are useful because they preserve behavior realism while still being fast and controllable. When possible, they can offer a better balance of speed and trust.
Choosing the right double
A practical heuristic:
- use a stub when you need known input
- use a mock when interaction is the thing being proven
- use a spy when observation matters but realism should remain higher
- use a fake when a lightweight working substitute improves confidence
This choice should always be driven by the defect you want the test to catch first.
Review checklist
Before accepting a test-double-heavy design, ask:
- Why is this dependency being replaced?
- Is the test checking behavior or just implementation structure?
- Would a stub or fake be simpler than a mock?
- Would the test still be valuable after internal refactoring?
- Are interaction assertions limited to truly important protocol behavior?
Closing judgment
Mocks, stubs, and spies are most useful when their roles stay distinct. Good tests do not maximize doubles; they justify exactly which boundary is being controlled and why.
Continue Reading
Related posts
Practical React Testing Library Design Guide
How to use React Testing Library as a user-centered testing tool. Covers query priority, interaction tests, async UI, provider wrappers, and how to avoid excessive mocking.
🧪 TestTDD in Practice: Red-Green-Refactor
A practical guide to TDD as a design feedback loop rather than a memorized procedure. Covers the meaning of Red-Green-Refactor, where it works best, where it can be too much, and how to apply it sustainably in real teams.
🔧 ToolsPostman Practical Guide: API Testing, Automation, and Team Collaboration
A practical guide to using Postman for API exploration, environment management, collection design, shared test flows, and Newman-based CI checks.
⚙️ BackendA Guide to Spring Boot Testing Strategy
This guide explains how unit tests, slice tests, and integration tests should be divided in a Spring Boot codebase to balance speed and confidence.
Next Path