TestForge | Aidevops | 📊 Plogger ✍️ Blog 📚 Docs
plogger

AI DevOps Korea

Turn AI service development and operations into one improvement loop

Aidevops.kr covers LLMOps, RAG, agents, observability, evaluation, and cost-performance optimization for production AI services.

Java Memory Leak Hunting Playbook

· Updated Apr 28

Most Java memory incidents do not start as obvious crashes. They start as a service that restarts more often, pauses longer, or slowly loses headroom after every release.

Start with the shape of growth

Before blaming a leak, ask:

  • is heap usage returning after GC
  • does retained memory grow by deployment or by traffic pattern
  • are off-heap buffers or thread stacks involved instead

The first job is classification, not guesswork.

Common leak sources

  • unbounded in-memory caches
  • listeners or callbacks that are never deregistered
  • request context stored in static references
  • large collections attached to long-lived singleton objects

These are usually lifecycle design bugs, not language flaws.

Use dumps to find retention owners

Heap dumps matter because they show which objects are still strongly reachable. Focus on:

  • dominator tree size
  • suspicious collections
  • classloader retention after redeploy
  • large strings, byte arrays, and serialization buffers

The question is not which object is large. It is why it still has an owner.

Pair runtime metrics with code boundaries

Memory debugging becomes faster when you connect graphs to releases and traffic features. Watch:

  • old gen occupancy
  • allocation rate spikes
  • full GC frequency
  • endpoints or jobs correlated with growth

That creates a shortlist before you open the dump.

Prevention matters more than heroics

Strong teams add guardrails:

  • bounded caches with eviction
  • explicit lifecycle cleanup
  • load tests that watch memory trend, not only latency
  • dashboards comparing heap after each release

The best leak investigation ends with a design rule that prevents the same class of failure from returning.

Continue Reading

Related posts

Next Path

Keep exploring this topic as a system