Guide Monitoring • detections • runbooks
From logs to signal: detections that stick.
Most alert programs fail because they start too big and have no ownership. This guide shows a practical way to build coverage, ship a small set of high-value detections, and measure response.
Guide
A detection engineering playbook
Monitoring built for humans: clear alerts, clear actions, clear ownership.
1. Define what “success” means
Pick a few measurable outcomes and align everyone on them.
- Mean time to acknowledge (MTTA) and contain (MTTC)
- Coverage for identity, privilege, and data access events
- False positive rate and alert fatigue (how many pages per week)
2. Create a use-case backlog
Start with the threats that actually happen, and the data you can realistically collect.
- Identity anomalies: MFA bypass indicators, impossible travel, new device logins
- Privilege changes: admin role grants, new service principals, policy changes
- Data access: mass export, unusual queries, new integrations pulling data
- App abuse: brute force, credential stuffing, token replay, admin workflow abuse
3. Map log coverage before you write alerts
Most “detections” fail because the underlying telemetry is missing or inconsistent.
- Identity provider logs (SSO, MFA, user lifecycle)
- Cloud audit logs (API calls, IAM, network changes)
- Application audit logs for sensitive actions
- Endpoint and container runtime signals where applicable
4. Build alerts that include context
Every alert should answer: what happened, why it matters, and what to do next.
- Include actor, target, time, and evidence links (log IDs)
- Assign severity based on impact, not “how rare” it is
- Write one primary action and one escalation action
5. Add runbooks and ownership
A detection without a runbook is just noise.
- Who is responsible for triage during business hours and after hours?
- What is the first containment action (disable token, block IP, revoke role)?
- What evidence should be preserved, and where?
6. Tune relentlessly
Good detections get better with feedback, not with more rules.
- Record “why was this alert useful or not useful?” after each incident.
- Reduce scope (and increase accuracy) before you expand coverage.
- Move “expected noisy events” into dashboards instead of pages.
7. Review monthly and ship improvements
Security monitoring is a product. Give it a cadence.
- Monthly review: top alerts, top gaps, top false positives
- Quarterly tabletop drill to validate response and comms
- Post-incident: convert lessons into new detections or guardrails
Want detections tuned for your stack?
We can build a coverage map, ship initial detections with runbooks, and hand off ownership cleanly. Lex supports teams globally from India.