Cron Job Monitoring Guide

Q: How do I monitor a cron job that has no web interface?

Use a heartbeat: have the job send a request to a monitoring URL as its final successful step. The monitor alerts when the expected request does not arrive, so you do not need the job to expose any endpoint of its own.

Q: What is a good grace period for a cron heartbeat?

Longer than the job's worst realistic runtime plus normal scheduling delay. If a job usually runs in two minutes but can take ten under load, a grace period of fifteen minutes avoids false alarms while still catching a genuinely stuck run.

Q: Should the heartbeat be at the start or end of the job?

At the end, after the critical work completes. A heartbeat at the start only proves the job launched — it cannot tell you whether the work actually finished.

Q: How does Sandglass fit the advice in this guide?

Sandglass handles the continuous side — checks, incidents, alert routing, and a public status page — so the decisions in this guide turn into monitoring you can rely on.

Catch background jobs that stop reporting success — before anyone downstream notices.

How to monitor cron jobs and scheduled tasks with heartbeats: detect skipped, failed, or hung jobs that never report success — before anyone downstream notices.

By H. Marcell, Freelance Software Developer

Updated July 17, 2026

H. Marcell is a freelance software developer who builds and runs web services and APIs, and writes about uptime monitoring, incident response, and status-page communication.

What this guide covers

Cron jobs and scheduled tasks fail quietly. Nobody is watching a nightly backup or an hourly billing sync the moment it breaks, so the first sign is often a missing report or an angry customer days later. This guide covers monitoring the one thing that matters — whether the job actually finished its work — using heartbeats and grace periods rather than watching the scheduler itself.

Send the heartbeat after the work succeeds, not when the job starts.
A grace period absorbs normal scheduling jitter.
A missing signal, not an error message, is what should page you.

Why scheduled jobs need different monitoring

Uptime checks poll a service and expect a response. Scheduled jobs invert that: there is nothing to poll, because the job runs briefly and disappears. You cannot ask a nightly backup "are you up?" — it is not up most of the time by design. Instead, the job reports in when it succeeds, and monitoring watches for the report to go missing. This is the "dead man's switch" pattern: silence is the alarm.

Heartbeats and grace periods

A heartbeat is a URL the job requests when it finishes. You tell the monitor how often to expect it — every hour, every day at 02:00 — plus a grace period to absorb normal variation in start time and runtime. If an hourly job usually finishes in two minutes but occasionally takes eight, a grace period longer than the worst realistic runtime prevents false alarms while still catching a genuinely stuck or skipped run.

Set the interval to the job's schedule (hourly, daily, etc.).
Add a grace period longer than the worst normal runtime.
Alert when the expected heartbeat does not arrive within interval + grace.

Placing the heartbeat call correctly

Position matters. A heartbeat at the start of the job proves only that it launched — it will happily report success right before crashing. Call it as the last step, after the backup is uploaded, the emails are sent, or the sync is committed. For multi-stage jobs where partial completion is meaningful, you can use separate heartbeats per stage, but the simplest reliable pattern is one call at the very end of the happy path.

How Sandglass supports the practice

Add a heartbeat check in Sandglass and have the job call its URL as the final step after its real work completes. Set the expected interval plus a small grace buffer. If the next heartbeat does not arrive in time, Sandglass alerts you — so you learn about a skipped or hung job from the absence of success, not from a downstream failure.

Back the practices here with HTTP, ping, TCP, content, SSL certificate, and heartbeat checks.
Route incidents to email, Slack webhook channels, and generic webhooks so the right people respond fast.
Use a public status page to keep customers informed while the team works the incident.

Common mistakes to avoid

Confirming the scheduler is running is not the same as confirming the job worked. A cron daemon can be perfectly healthy while the job it launches crashes on the first line. Put the success signal after the critical work, or you are monitoring the alarm clock instead of whether anyone got up.

Implementation checklist

Step 1: Start from customer impact

Decide which failures in this topic actually reach customers before adding any monitoring.

Step 2: Choose one signal per risk

Match each risk to a single HTTP, content, TCP, SSL certificate, or heartbeat check instead of stacking duplicates.

Step 3: Assign an owner and a channel

Give each alert one owner and one destination — email, a Slack webhook, or a generic webhook.

Step 4: Review after real incidents

Revisit intervals, thresholds, and ownership once a real incident shows what was missing.

Frequently Asked Questions

How do I monitor a cron job that has no web interface?

What is a good grace period for a cron heartbeat?

Should the heartbeat be at the start or end of the job?

How does Sandglass fit the advice in this guide?

Monitor cron job monitoring guide with Sandglass

Start free

Free plan, no credit card required.