TestForge | Aidevops | 📊 Plogger ✍️ Blog 📚 Docs
plogger

AI DevOps Korea

Turn AI service development and operations into one improvement loop

Aidevops.kr covers LLMOps, RAG, agents, observability, evaluation, and cost-performance optimization for production AI services.

Java 21 Virtual Threads: A Practical Concurrency Guide

· Updated Apr 21
Java 21 Virtual Threads: A Practical Concurrency Guide diagram
This diagram helps separate the benefit of virtual threads from the downstream limits that still cap throughput in real services.
Virtual threads became generally available in Java 21, but the release itself does not answer the main production question: should a given service stay with the classic request-per-thread model, move to reactive I/O, or adopt virtual threads as the middle path?

For many backend teams, virtual threads are attractive because they preserve the readability of blocking code while allowing much higher concurrency. That benefit is real, but only when the bottleneck is waiting on I/O. If the real limit is a database pool, a remote API quota, CPU saturation, or lock contention, virtual threads can increase pressure without improving throughput.

What Virtual Threads Actually Change

A virtual thread is scheduled by the JVM rather than being tied one-to-one with an operating-system thread. That makes parking and resuming far cheaper than creating thousands of platform threads.

In practical terms, the feature is not about “making Java async.” It is about keeping synchronous code readable when a service spends much of its time waiting.

This is the useful mental model:

  • platform threads are expensive enough that teams usually cap them aggressively
  • virtual threads are cheap enough to model one unit of work per task
  • cheap threads do not remove downstream limits such as pools, sockets, or rate limits

The third point is where many rollouts go wrong.

When Virtual Threads Are a Good Fit

Virtual threads work best when all of the following are true:

  • the request path spends significant time waiting on network or storage I/O
  • the codebase is easier to maintain in straightforward blocking style than in reactive chains
  • the libraries in use do not pin threads for long periods
  • the team is ready to review concurrency through metrics, not just benchmark screenshots

Typical wins show up in services that aggregate several remote calls, internal APIs that fan out to multiple dependencies, and migration projects where a full reactive rewrite would add more risk than value.

When They Do Not Solve the Real Problem

Virtual threads should not be treated as a concurrency shortcut for every performance issue.

They are usually a poor primary lever when:

  • CPU is already saturated
  • heavy synchronized blocks serialize the hot path
  • the database pool is the real bottleneck
  • a legacy driver blocks in ways that pin carrier threads
  • the team has weak timeout, cancellation, or backpressure discipline

If the database can only sustain 100 active connections, switching to 20,000 virtual threads does not create capacity. It mostly creates a larger queue and a harder failure mode.

A Safe Adoption Boundary

The cleanest rollout is usually at the executor boundary, not through scattered one-off usage.

Good teams define three things early:

  1. Which workloads are allowed to run on virtual threads.
  2. Which calls must keep strict timeout and bulkhead limits.
  3. Which metrics prove the rollout is healthy.

That keeps the feature from becoming an ad hoc style preference.

For a Spring Boot service, a sensible first target is a read-heavy endpoint that fans out to a few remote dependencies and already has good tracing. A bad first target is a large endpoint with hidden blocking calls, unclear pool limits, and weak observability.

Example: Fan-Out With Explicit Limits

The example below shows a realistic pattern: use virtual threads for readable fan-out logic, but keep downstream limits and timeouts explicit.

private static final Semaphore BULKHEAD = new Semaphore(40);

public List<Quote> fetchQuotes(List<QuoteRequest> requests) throws Exception {
    try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
        List<Future<Quote>> futures = requests.stream()
            .map(request -> executor.submit(() -> fetchOneQuote(request)))
            .toList();

        List<Quote> result = new ArrayList<>(futures.size());
        for (Future<Quote> future : futures) {
            result.add(future.get());
        }
        return result;
    }
}

private Quote fetchOneQuote(QuoteRequest request) throws Exception {
    if (!BULKHEAD.tryAcquire(1, TimeUnit.SECONDS)) {
        throw new TimeoutException("quote provider bulkhead is full");
    }

    try {
        return quoteClient.fetch(request);
    } finally {
        BULKHEAD.release();
    }
}

The important part is not newVirtualThreadPerTaskExecutor() by itself. The important part is that concurrency against the remote provider is still bounded.

Spring Boot Rollout Notes

Teams often ask whether enabling virtual threads in Spring Boot is enough. It is not.

A sound rollout also checks:

  • servlet container configuration
  • JDBC driver behavior
  • connection pool sizing
  • timeout defaults for HTTP and database clients
  • trace and metric visibility for blocked work

If those pieces are unclear, the rollout may look successful in functional tests and still fail under real concurrency.

Operational Signals to Watch

Before and after rollout, compare the same workload on:

  • p50, p95, and p99 latency
  • request throughput at a fixed error budget
  • database pool wait time
  • external API saturation
  • heap growth under peak concurrency
  • carrier-thread pinning or blocked-thread symptoms

A healthy migration usually improves throughput and keeps latency stable. A misleading migration often increases throughput briefly while tail latency and downstream queueing get worse.

Common Mistakes

The most frequent production mistakes are predictable:

  • replacing platform threads without revisiting pool limits
  • assuming every blocking library is virtual-thread friendly
  • using virtual threads to compensate for poor timeout design
  • mixing CPU-heavy tasks into the same executor strategy
  • rolling out without observability for queueing and saturation

The language feature is rarely the source of failure. The missing operating model is.

Review Checklist

  • Is the current bottleneck actually thread scarcity, or something downstream?
  • Are timeouts, semaphores, pool limits, and retries still explicit?
  • Do we know which libraries may pin carrier threads?
  • Can the team observe latency, queueing, and blocked work in production?
  • Is the simpler blocking model truly worth more than a reactive alternative here?

Closing Judgment

Virtual threads are best treated as an architecture simplifier, not a magic throughput button. They are excellent when they let a team keep readable blocking code for I/O-heavy workloads while preserving hard limits at dependency boundaries. They are disappointing when used to hide bottlenecks that were never about threads in the first place.

Continue Reading

Related posts

Next Path

Keep exploring this topic as a system