Skip to main content

Introduction

This guide covers everything you need to know about monitoring your services effectively with StatusStack. Whether you’re tracking third-party services, monitoring your own infrastructure, or managing status for multiple clients, this guide will help you build a robust monitoring strategy.

Getting Started with Monitoring

The Monitoring Workflow

1

Plan Your Stack Structure

Decide how to organize your monitoring (by environment, service type, or client)
2

Create Stacks

Set up logical groupings for your services
3

Add Components

Include third-party services, custom monitors, and manual components
4

Configure Notifications

Set up multi-channel alerts for status changes
5

Monitor & Respond

Watch dashboards, respond to alerts, and maintain uptime

Stack Organization Strategies

By Environment

Separate your monitoring by deployment environment:
📦 Production Stack
  ├─ Production API
  ├─ Production Database
  ├─ Production CDN
  └─ Production Payment Gateway

📦 Staging Stack
  ├─ Staging API
  ├─ Staging Database
  └─ Staging CDN

📦 Development Stack
  └─ Dev Services
Benefits:
  • Clear separation of concerns
  • Different notification rules per environment
  • Easy to identify which environment has issues
  • Prevents mixing prod and dev alerts
Best For: Development teams with multiple environments

By Service Type

Group services by functional area:
📦 Core Infrastructure
  ├─ AWS EC2
  ├─ AWS S3
  ├─ Cloudflare CDN
  └─ DNS Provider

📦 Application Services
  ├─ Web Application
  ├─ API Server
  ├─ Background Workers
  └─ Database

📦 External Dependencies
  ├─ Payment Gateway (Stripe)
  ├─ Email Service (SendGrid)
  ├─ SMS Provider (Twilio)
  └─ Analytics (Google Analytics)
Benefits:
  • Logical grouping by function
  • Easy to see which service layer is affected
  • Helps identify dependency chains
  • Better for incident response
Best For: SaaS companies with complex infrastructure

For MSP Clients

Create client-specific monitoring:
📦 Client A - Production
  ├─ Client A Website
  ├─ Client A API
  ├─ Client A Database
  └─ Client A Email

📦 Client A - Staging
  └─ Client A Test Environment

📦 Client B - Production
  ├─ Client B Website
  └─ Client B Services

📦 Internal Infrastructure
  ├─ MSP Dashboard
  ├─ Monitoring System
  └─ Backup Services
Benefits:
  • Complete client isolation
  • Client-specific notification rules
  • Individual client status pages
  • Billing per client
Best For: MSPs and agencies managing multiple clients See the MSP Setup Guide for detailed MSP configuration.

Adding Third-Party Services

Choosing Services to Monitor

StatusStack supports 5,200+ sources. Add services that your infrastructure depends on: Cloud Infrastructure:
  • AWS, Google Cloud, Azure, DigitalOcean
  • Monitor core services: compute, storage, networking
Content Delivery:
  • Cloudflare, Fastly, Akamai
  • Track CDN and DNS availability
Developer Tools:
  • GitHub, GitLab, Bitbucket
  • Monitor repository and CI/CD availability
Payment Processing:
  • Stripe, PayPal, Square
  • Ensure payment infrastructure is operational
Communication:
  • SendGrid, Twilio, Slack, Discord
  • Track email, SMS, and chat services

Adding Source Components

1

Open Your Stack

Navigate to the Stack where you want to add components
2

Click Add Component

Click the “Add Component” button
3

Choose Source

Browse or search for a service (e.g., “AWS”, “Cloudflare”)
4

Select Components

Choose which components of that service to monitor:
  • For AWS: EC2, S3, Lambda, RDS, etc.
  • For Cloudflare: CDN, DNS, etc.
5

Add to Stack

Click “Add to Stack” to start monitoring
Automatic Updates:
  • Status updates every 1-5 minutes
  • No configuration required
  • Reflects official status page data

Custom Website Monitoring

When to Use Custom Monitors

Use custom monitors for:
  • Your own websites and APIs
  • Internal services without public status pages
  • Services not in our third-party integrations
  • Specific endpoints requiring health checks

Creating Effective Monitors

Monitor Configuration Example:
{
  "name": "Production API Health Check",
  "url": "https://api.yoursite.com/health",
  "interval": 60, // seconds
  "timeout": 10, // seconds
  "alert_threshold": 3, // consecutive failures
  "expected_status": 200,
  "expected_content": "\"status\":\"ok\"",
  "headers": {
    "Authorization": "Bearer health-check-token"
  },
  "ssl_check": true
}

Best Practices for Custom Monitors

30 seconds: Critical production services 60 seconds: Standard production services (recommended) 2-5 minutes: Non-critical or staging servicesConsiderations:
  • More frequent checks = faster detection but higher costs
  • Balance between responsiveness and efficiency
  • Most services don’t need sub-minute monitoring
Rule of Thumb: Set timeout to expected response time + 50% bufferExamples:
  • API typically responds in 200ms → Set 500ms timeout
  • Database query takes 2s → Set 3s timeout
  • Long-running endpoint (10s) → Set 15s timeout
Why: Prevents false positives from occasional slowness
1 failure: Ultra-critical services (use sparingly) 3 failures: Production services (recommended) 5 failures: Staging or dev environmentsWhy: Reduces false alarms from transient network issues
Use When:
  • You need to verify response content, not just HTTP status
  • API returns 200 but with error payload
  • Checking for specific success indicators
Examples:
"expected_content": "\"healthy\":true"
"expected_content": "OK"
"expected_content": "status.*operational"
Regex Support: Use regex patterns for flexible matching
Protected Endpoints:
"headers": {
  "Authorization": "Bearer your-token",
  "X-API-Key": "your-api-key",
  "X-Custom-Header": "value"
}
Security:
  • Use dedicated health check tokens (not admin tokens)
  • Rotate tokens regularly
  • Limit token permissions to health check endpoint only
Enable SSL Checks:
"ssl_check": true
What It Monitors:
  • Certificate expiration (warns 7 days before)
  • Certificate validity
  • Trust chain verification
Benefits:
  • Prevent unexpected certificate expirations
  • Catch SSL configuration issues
  • Maintain security compliance

Health Check Endpoint Design

If you’re creating custom health check endpoints, follow these patterns: Simple Health Check:
// GET /health
{
  "status": "ok",
  "timestamp": "2025-01-19T10:30:00Z"
}
Detailed Health Check:
// GET /health/detailed
{
  "status": "ok",
  "timestamp": "2025-01-19T10:30:00Z",
  "checks": {
    "database": "ok",
    "redis": "ok",
    "queue": "ok",
    "storage": "ok"
  },
  "uptime": 3600000
}
Response Codes:
  • 200 OK: All systems operational
  • 503 Service Unavailable: System degraded or down
  • 5xx errors: Treat as outage

Manual Component Management

When to Use Manual Components

Manual components are ideal for:
  • Internal services without automated monitoring
  • Third-party services not in our integrations
  • Scheduled maintenance windows
  • Services with manual status updates

Creating Manual Components

1

Open Stack

Navigate to your Stack
2

Add Component

Click “Add Component”“Manual Component”
3

Configure

  • Name: “Internal Database Cluster”
  • Description: Optional details
  • Initial Status: Set starting status
4

Save

Component is added to your Stack

Updating Manual Components

Update status when needed:
  1. Open the Stack
  2. Click on the manual component
  3. Select new status (operational, degraded, outage, maintenance)
  4. Add optional message describing the change
  5. Click Update Status
Use Cases:
  • Start of scheduled maintenance: Set to “maintenance”
  • Detected issue: Set to “degraded” or “outage”
  • Issue resolved: Set back to “operational”

Status Monitoring Dashboard

Understanding Stack Status

Each Stack displays an overall status based on its components:
StatusIndicatorMeaning
🟢 OperationalGreenAll components healthy
🟡 DegradedYellowSome components degraded
🔴 CriticalRedOne or more outages
🔵 MaintenanceBlueScheduled maintenance
Status Calculation:
if (any_component === 'outage') return 'CRITICAL'
if (any_component === 'degraded') return 'DEGRADED'
if (any_component === 'maintenance') return 'MAINTENANCE'
return 'OPERATIONAL'

Component Details

Click any component to view:
  • Current Status: Real-time status indicator
  • Last Updated: When status was last checked
  • Uptime: Percentage uptime (last 30 days)
  • Status History: Timeline of all status changes
  • Active Incidents: Ongoing issues affecting this component

Real-Time Updates

The dashboard automatically refreshes:
  • Auto-Refresh: Every 30 seconds
  • Manual Refresh: Click refresh button anytime
  • Live Indicators: Shows when data is updating

Notification Configuration

Creating Notification Rules

1

Navigate to Notifications

Go to SettingsNotifications
2

Create Rule

Click “Create Notification Rule”
3

Name Rule

Use descriptive names:
  • ✅ “Production Outage Alerts - SMS”
  • ✅ “All Stacks - Team Slack”
  • ❌ “Rule 1”
4

Select Alert Levels

  • Info: Status improvements
  • Warning: Degraded performance
  • Alert: Outages
5

Choose Scope

  • All Stacks
  • Specific Stacks
  • Specific Components
6

Configure Channels

Add notification destinations:
  • Discord webhook
  • Slack webhook
  • Teams webhook
  • Email addresses
  • Phone numbers (SMS)
  • Custom webhooks
7

Save & Test

Save rule and test with a manual component

Notification Best Practices

Layered Alerting Strategy:
┌─────────────────────────────────────────┐
│         Critical Production              │
│  Alert Level: OUTAGE only               │
│  Channels: SMS + Slack + Email          │
│  Recipients: On-call engineer           │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│         Production Warnings              │
│  Alert Level: DEGRADED                  │
│  Channels: Slack + Email                │
│  Recipients: DevOps team                │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│         All Status Changes               │
│  Alert Level: INFO + WARNING + ALERT    │
│  Channels: Slack                        │
│  Recipients: #status-updates channel    │
└─────────────────────────────────────────┘
Channel Selection:
  • SMS: Critical outages only (expensive, high urgency)
  • Slack/Discord: Team awareness, all severity levels
  • Email: Record keeping, escalations
  • Webhooks: Integration with other tools (PagerDuty, etc.)
See the Notifications documentation for detailed channel setup.

Monitoring Workflows

Daily Monitoring Routine

Incident Response Workflow

When you receive an alert:
1

Acknowledge Alert

Open dashboard and locate affected Stack/component
2

Assess Severity

  • Outage: Immediate action required
  • Degraded: Investigate and monitor
  • Info: Acknowledge and document
3

Investigate Root Cause

  • Check component status history
  • Review related services for correlation
  • Check third-party status pages
  • Investigate custom monitor failures
4

Take Action

  • Fix issues if within your control
  • Contact third-party support if needed
  • Update manual components if applicable
5

Document Resolution

  • Update component status when resolved
  • Add notes to status history
  • Document root cause and fix
6

Post-Mortem

  • Review incident timeline
  • Identify improvements
  • Update monitoring if needed

Maintenance Window Management

Before Maintenance:
1

Plan Timing

Schedule during low-traffic periods
2

Notify Users

Update public status page if applicable
3

Set Component Status

Change affected components to “maintenance”
4

Adjust Notifications

Temporarily disable alerts for maintenance window
During Maintenance:
  • Keep status updated if timeline changes
  • Monitor related components for unexpected issues
  • Document any problems encountered
After Maintenance:
  • Return components to “operational”
  • Re-enable notifications
  • Verify all services healthy
  • Update status page with completion

Advanced Monitoring Techniques

Dependency Mapping

Organize Stacks to reflect service dependencies:
📦 Frontend Services (depends on Backend + Infrastructure)
  ├─ Web Application
  └─ CDN

    ↓ depends on

📦 Backend Services (depends on Infrastructure)
  ├─ API Server
  ├─ Background Workers
  └─ Queue System

    ↓ depends on

📦 Core Infrastructure (foundational)
  ├─ Database
  ├─ Redis Cache
  └─ AWS Services
Benefits:
  • Understand impact when infrastructure fails
  • Prioritize fixes based on dependency chain
  • Communicate outage scope to users

Uptime Tracking

Monitor uptime percentages over time: Uptime Targets:
  • 99.9% (“Three Nines”): 43 minutes downtime/month
  • 99.95%: 22 minutes downtime/month
  • 99.99% (“Four Nines”): 4 minutes downtime/month
How to Track:
  1. View component details in dashboard
  2. Check uptime percentage (last 30 days)
  3. Review status history for downtime duration
  4. Export reports for stakeholders

Multi-Region Monitoring

For services deployed in multiple regions:
📦 Production - US East
  ├─ API Server (us-east-1)
  ├─ Database (us-east-1)
  └─ CDN Edge (US East)

📦 Production - Europe
  ├─ API Server (eu-west-1)
  ├─ Database (eu-west-1)
  └─ CDN Edge (Europe)
Benefits:
  • Detect region-specific outages
  • Track regional performance differences
  • Better incident response for distributed systems

Troubleshooting Common Issues

False Positives

Problem: Monitor reports outages when service is actually operational Solutions:
  1. Increase Alert Threshold: Change from 1 to 3 consecutive failures
  2. Adjust Timeout: Give service more time to respond
  3. Check Network Path: Verify monitoring from correct region
  4. Review Expected Content: Ensure regex/content match is correct

Missing Alerts

Problem: Service went down but no notification received Checklist:

Delayed Status Updates

Problem: Dashboard shows old status, not reflecting current state Solutions:
  1. Manual Refresh: Click refresh button in dashboard
  2. Check Update Frequency: Third-party sources update every 1-5 minutes
  3. Custom Monitor Interval: Verify check interval is appropriate
  4. Cache Issue: Clear browser cache and reload

Component Stuck in Maintenance

Problem: Component still shows maintenance after completion Solution:
  1. Click on the component
  2. Manually update status to “operational”
  3. Verify status change reflects immediately

Monitoring Limits & Performance

Current Limits

PlanStacksComponents/StackCustom MonitorsUpdate Frequency
Free31055 minutes
Starter1020251 minute
Professional505010030 seconds
EnterpriseUnlimitedUnlimitedUnlimited30 seconds

Optimizing Performance

Tips for Large Deployments:
  • Use Stack organization to group related services
  • Limit components per Stack to 20-30 for best dashboard performance
  • Use appropriate check intervals (not everything needs 30s checks)
  • Archive old/unused Stacks
  • Consolidate duplicate monitors

Next Steps

Customer Dashboard

Explore all dashboard features and shortcuts

MSP Setup

Set up multi-client monitoring

Stacks Concept

Deep dive into Stack organization

Notifications

Advanced notification configuration

Effective monitoring requires planning, appropriate tooling, and responsive workflows. Use this guide to build a robust monitoring strategy that keeps your services reliable and your team informed.