Runbook Quality for On-Call Teams
What makes an operational runbook actually usable during incidents instead of just technically complete.
AI DevOps Korea
Aidevops.kr covers LLMOps, RAG, agents, observability, evaluation, and cost-performance optimization for production AI services.
This group clusters posts that are best read together inside the DevOps category, so the learning path feels more intentional.
This group currently contains 3 posts.
Start Here
What makes an operational runbook actually usable during incidents instead of just technically complete.
Group Archive
What makes an operational runbook actually usable during incidents instead of just technically complete.
A practical guide to treating observability as an incident response system, covering metrics-log-trace correlation, alert quality, runbooks, SLOs, dashboards, and postmortem feedback loops.
This article explains Prometheus and Grafana from an observability design perspective rather than as an installation guide. Using Spring Boot as the baseline, it covers metric collection, label strategy, PromQL, dashboards, alerting, and common anti-patterns.