Dead Man's Switch Monitoring

A practical reliability guide for teams with cron jobs.

Dead Man's Switch Monitoring from Sandglass: practical guidance for alerting when an expected success signal never arrives.

What this guide covers

This guide focuses on alerting when an expected success signal never arrives. The goal is to make the operating decision clear before a stressful incident forces the team to improvise.

  • The alert fires on silence, not on an error message.
  • Place the heartbeat after the work that can fail.
  • It catches jobs that never ran at all.

How Sandglass supports the practice

Make the job call a heartbeat URL after successful completion and configure Sandglass to alert when the next expected signal is late. Sandglass supports the continuous side of this work with checks, incidents, alert routing, and public status visibility.

  • Back the practices here with HTTP, ping, TCP, content, SSL certificate, and heartbeat checks.
  • Route incidents to email, Slack webhook channels, and generic webhooks so the right people respond fast.
  • Use a public status page to keep customers informed while the team works the incident.

Common mistakes to avoid

A heartbeat at job start only proves the job began. Put the signal after the critical work so failures are visible.

Implementation checklist

Step 1: Start from customer impact

Decide which failures in this topic actually reach customers before adding any monitoring.

Step 2: Choose one signal per risk

Match each risk to a single HTTP, content, TCP, SSL certificate, or heartbeat check instead of stacking duplicates.

Step 3: Assign an owner and a channel

Give each alert one owner and one destination — email, a Slack webhook, or a generic webhook.

Step 4: Review after real incidents

Revisit intervals, thresholds, and ownership once a real incident shows what was missing.

Frequently Asked Questions

Monitor dead man's switch monitoring with Sandglass

Start free

Free plan, no credit card required.