threshold
- is a trigger point: when a metric crosses it, we take action (alert, block, degrade, retry, etc.).
- Example:
- Latency threshold
- Meaning: If p95/p99 latency > X ms, user experience is bad → alert or degrade.
- Better wording: “We alert when p95 or p99 crosses threshold, not just average.
- Error-rate threshold
- Meaning: If errors exceed X%, service is failing.
- Better wording: We track error budget with an SLO.
- Alert threshold
- Meaning: It’s the number/condition that triggers an alert.
- Better wording: “We reduce noise using duration, aggregation, and severity levels.”
- Decision threshold (ML)
- Meaning: If model probability > T, predict positive.
- Better wording: We pick T based on precision/recall tradeoff and business cost.