Dead Man's Switch Monitoring

Q: What is a dead man's switch in monitoring?

A monitoring pattern where a job reports in on success, and the monitor alerts if that report fails to arrive on schedule. The alarm is triggered by silence rather than by an error, so it catches failures that never produce an error message.

Q: How is a dead man's switch different from a normal uptime check?

An uptime check polls a service and expects a response. A dead man's switch waits for the job to call in and alerts on its absence. Polling suits always-on services; the switch suits scheduled or background work that only runs briefly.

Q: How does Sandglass fit the advice in this guide?

Sandglass handles the continuous side — checks, incidents, alert routing, and a public status page — so the decisions in this guide turn into monitoring you can rely on.

Alert when an expected success signal never arrives — the absence is the alarm.

Dead man's switch monitoring explained: alert when an expected success signal never arrives. How the pattern works, where to place the heartbeat, and what it catches.

By H. Marcell, Freelance Software Developer

Updated July 17, 2026

H. Marcell is a freelance software developer who builds and runs web services and APIs, and writes about uptime monitoring, incident response, and status-page communication.

What this guide covers

A dead man's switch flips the usual monitoring logic: instead of alerting when something reports an error, it alerts when something stops reporting success. This guide explains the pattern, why it is the right model for scheduled and background work, where to place the signal, and what failure modes it uniquely catches.

The alert fires on silence, not on an error message.
Place the heartbeat after the work that can fail.
It catches jobs that never ran at all.

Why absence beats error-watching

Error-based monitoring can only alert on failures it observes. But the most dangerous failures are the silent ones: the cron entry that got removed, the server that never booted, the job that hung before it could log anything. None of these produce an error to catch. A dead man's switch catches them all with one rule — "I expected to hear from you by now, and I did not" — which is exactly the class of failure that otherwise goes unnoticed for days.

What it catches that nothing else does

The pattern is uniquely good at detecting: a job that was never scheduled or whose schedule was deleted; a worker or host that is down entirely; a job that hangs indefinitely without erroring; and an environment where the whole scheduler stopped. In each case there is no error, no log line, no exception — just silence where a success should have been. That silence is precisely what the switch is listening for.

Jobs that were silently unscheduled or misconfigured.
Hosts or workers that are completely down.
Jobs that hang and never complete or error.
Whole schedulers that stopped running.

How Sandglass supports the practice

Have the job request a Sandglass heartbeat URL after it successfully completes, and set the expected interval plus a grace buffer. If the signal does not arrive on time, Sandglass alerts you. Because the alarm is triggered by absence, it catches the failures that error-based monitoring cannot see — including the job never starting.

Back the practices here with HTTP, ping, TCP, content, SSL certificate, and heartbeat checks.
Route incidents to email, Slack webhook channels, and generic webhooks so the right people respond fast.
Use a public status page to keep customers informed while the team works the incident.

Common mistakes to avoid

A heartbeat at job start only proves the job began; it will report "success" and then crash on the next line. Put the signal after the critical work so a failure in that work shows up as a missing heartbeat. Also size the grace period to the job's real variability, or normal jitter will page you.

Implementation checklist

Step 1: Start from customer impact

Decide which failures in this topic actually reach customers before adding any monitoring.

Step 2: Choose one signal per risk

Match each risk to a single HTTP, content, TCP, SSL certificate, or heartbeat check instead of stacking duplicates.

Step 3: Assign an owner and a channel

Give each alert one owner and one destination — email, a Slack webhook, or a generic webhook.

Step 4: Review after real incidents

Revisit intervals, thresholds, and ownership once a real incident shows what was missing.

Frequently Asked Questions

What is a dead man's switch in monitoring?

How is a dead man's switch different from a normal uptime check?

How does Sandglass fit the advice in this guide?

Monitor dead man's switch monitoring with Sandglass

Start free

Free plan, no credit card required.