Cron Monitoring Guide from Sandglass: practical guidance for catching background jobs that stop reporting success even though no customer is watching the job directly.
This guide focuses on catching background jobs that stop reporting success even though no customer is watching the job directly. The goal is to make the operating decision clear before a stressful incident forces the team to improvise.
Add a heartbeat at the end of each scheduled job, set the expected interval with a small delay buffer, and alert when the heartbeat goes missing. Sandglass supports the continuous side of this work with checks, incidents, alert routing, and public status visibility.
Checking that a scheduler exists is not enough. The useful signal is whether the actual job completed and reported back after its critical work finished.
Decide which failures in this topic actually reach customers before adding any monitoring.
Match each risk to a single HTTP, content, TCP, SSL certificate, or heartbeat check instead of stacking duplicates.
Give each alert one owner and one destination — email, a Slack webhook, or a generic webhook.
Revisit intervals, thresholds, and ownership once a real incident shows what was missing.