Uptime Monitoring Guide from Sandglass: practical guidance for choosing the smallest set of checks that proves a service is available from the outside.
This guide focuses on choosing the smallest set of checks that proves a service is available from the outside. The goal is to make the operating decision clear before a stressful incident forces the team to improvise.
Start with HTTP status checks for web surfaces, add SSL certificate checks for HTTPS risk, and use ping or TCP checks only where they answer a different operational question. Sandglass supports the continuous side of this work with checks, incidents, alert routing, and public status visibility.
More checks do not automatically mean better monitoring. Duplicate checks create alert noise and make ownership harder during real incidents.
Decide which failures in this topic actually reach customers before adding any monitoring.
Match each risk to a single HTTP, content, TCP, SSL certificate, or heartbeat check instead of stacking duplicates.
Give each alert one owner and one destination — email, a Slack webhook, or a generic webhook.
Revisit intervals, thresholds, and ownership once a real incident shows what was missing.