Status Monitoring Guide - StatusStack Documentation

Introduction

This guide covers everything you need to know about monitoring your services effectively with StatusStack. Whether you’re tracking third-party services, monitoring your own infrastructure, or managing status for multiple clients, this guide will help you build a robust monitoring strategy.

Getting Started with Monitoring

The Monitoring Workflow

Plan Your Stack Structure

Decide how to organize your monitoring (by environment, service type, or client)

Create Stacks

Set up logical groupings for your services

Add Components

Include third-party services, custom monitors, and manual components

Configure Notifications

Set up multi-channel alerts for status changes

Monitor & Respond

Watch dashboards, respond to alerts, and maintain uptime

Stack Organization Strategies

By Environment

Separate your monitoring by deployment environment:

📦 Production Stack
  ├─ Production API
  ├─ Production Database
  ├─ Production CDN
  └─ Production Payment Gateway

📦 Staging Stack
  ├─ Staging API
  ├─ Staging Database
  └─ Staging CDN

📦 Development Stack
  └─ Dev Services

Benefits:

Clear separation of concerns
Different notification rules per environment
Easy to identify which environment has issues
Prevents mixing prod and dev alerts

Best For: Development teams with multiple environments

By Service Type

Group services by functional area:

📦 Core Infrastructure
  ├─ AWS EC2
  ├─ AWS S3
  ├─ Cloudflare CDN
  └─ DNS Provider

📦 Application Services
  ├─ Web Application
  ├─ API Server
  ├─ Background Workers
  └─ Database

📦 External Dependencies
  ├─ Payment Gateway (Stripe)
  ├─ Email Service (SendGrid)
  ├─ SMS Provider (Twilio)
  └─ Analytics (Google Analytics)

Benefits:

Logical grouping by function
Easy to see which service layer is affected
Helps identify dependency chains
Better for incident response

Best For: SaaS companies with complex infrastructure

For MSP Clients

Create client-specific monitoring:

📦 Client A - Production
  ├─ Client A Website
  ├─ Client A API
  ├─ Client A Database
  └─ Client A Email

📦 Client A - Staging
  └─ Client A Test Environment

📦 Client B - Production
  ├─ Client B Website
  └─ Client B Services

📦 Internal Infrastructure
  ├─ MSP Dashboard
  ├─ Monitoring System
  └─ Backup Services

Benefits:

Complete client isolation
Client-specific notification rules
Individual client status pages
Billing per client

Best For: MSPs and agencies managing multiple clients See the MSP Setup Guide for detailed MSP configuration.

Adding Third-Party Services

Choosing Services to Monitor

StatusStack supports 5,200+ sources. Add services that your infrastructure depends on: Cloud Infrastructure:

AWS, Google Cloud, Azure, DigitalOcean
Monitor core services: compute, storage, networking

Content Delivery:

Cloudflare, Fastly, Akamai
Track CDN and DNS availability

Developer Tools:

GitHub, GitLab, Bitbucket
Monitor repository and CI/CD availability

Payment Processing:

Stripe, PayPal, Square
Ensure payment infrastructure is operational

Communication:

SendGrid, Twilio, Slack, Discord
Track email, SMS, and chat services

Adding Source Components

Open Your Stack

Navigate to the Stack where you want to add components

Click Add Component

Click the “Add Component” button

Choose Source

Browse or search for a service (e.g., “AWS”, “Cloudflare”)

Select Components

Choose which components of that service to monitor:

For AWS: EC2, S3, Lambda, RDS, etc.
For Cloudflare: CDN, DNS, etc.

Add to Stack

Click “Add to Stack” to start monitoring

Automatic Updates:

Status updates every 1-5 minutes
No configuration required
Reflects official status page data

Custom Website Monitoring

When to Use Custom Monitors

Use custom monitors for:

Your own websites and APIs
Internal services without public status pages
Services not in our third-party integrations
Specific endpoints requiring health checks

Creating Effective Monitors

Monitor Configuration Example:

{
  "name": "Production API Health Check",
  "url": "https://api.yoursite.com/health",
  "interval": 60, // seconds
  "timeout": 10, // seconds
  "alert_threshold": 3, // consecutive failures
  "expected_status": 200,
  "expected_content": "\"status\":\"ok\"",
  "headers": {
    "Authorization": "Bearer health-check-token"
  },
  "ssl_check": true
}

Best Practices for Custom Monitors

Check Interval Selection

30 seconds: Critical production services 60 seconds: Standard production services (recommended) 2-5 minutes: Non-critical or staging servicesConsiderations:

More frequent checks = faster detection but higher costs
Balance between responsiveness and efficiency
Most services don’t need sub-minute monitoring

Timeout Configuration

Rule of Thumb: Set timeout to expected response time + 50% bufferExamples:

API typically responds in 200ms → Set 500ms timeout
Database query takes 2s → Set 3s timeout
Long-running endpoint (10s) → Set 15s timeout

Why: Prevents false positives from occasional slowness

Alert Thresholds

1 failure: Ultra-critical services (use sparingly) 3 failures: Production services (recommended) 5 failures: Staging or dev environmentsWhy: Reduces false alarms from transient network issues

Expected Content Matching

Use When:

You need to verify response content, not just HTTP status
API returns 200 but with error payload
Checking for specific success indicators

Examples:

"expected_content": "\"healthy\":true"
"expected_content": "OK"
"expected_content": "status.*operational"

Regex Support: Use regex patterns for flexible matching

Authentication Headers

Protected Endpoints:

"headers": {
  "Authorization": "Bearer your-token",
  "X-API-Key": "your-api-key",
  "X-Custom-Header": "value"
}

Security:

Use dedicated health check tokens (not admin tokens)
Rotate tokens regularly
Limit token permissions to health check endpoint only

SSL Certificate Monitoring

Enable SSL Checks:

"ssl_check": true

What It Monitors:

Certificate expiration (warns 7 days before)
Certificate validity
Trust chain verification

Benefits:

Prevent unexpected certificate expirations
Catch SSL configuration issues
Maintain security compliance

Health Check Endpoint Design

If you’re creating custom health check endpoints, follow these patterns: Simple Health Check:

// GET /health
{
  "status": "ok",
  "timestamp": "2025-01-19T10:30:00Z"
}

Detailed Health Check:

// GET /health/detailed
{
  "status": "ok",
  "timestamp": "2025-01-19T10:30:00Z",
  "checks": {
    "database": "ok",
    "redis": "ok",
    "queue": "ok",
    "storage": "ok"
  },
  "uptime": 3600000
}

Response Codes:

200 OK: All systems operational
503 Service Unavailable: System degraded or down
5xx errors: Treat as outage

Manual Component Management

When to Use Manual Components

Manual components are ideal for:

Internal services without automated monitoring
Third-party services not in our integrations
Scheduled maintenance windows
Services with manual status updates

Creating Manual Components

Open Stack

Navigate to your Stack

Add Component

Click “Add Component” → “Manual Component”

Configure

Name: “Internal Database Cluster”
Description: Optional details
Initial Status: Set starting status

Save

Component is added to your Stack

Updating Manual Components

Update status when needed:

Open the Stack
Click on the manual component
Select new status (operational, degraded, outage, maintenance)
Add optional message describing the change
Click Update Status

Use Cases:

Start of scheduled maintenance: Set to “maintenance”
Detected issue: Set to “degraded” or “outage”
Issue resolved: Set back to “operational”

Status Monitoring Dashboard

Understanding Stack Status

Each Stack displays an overall status based on its components:

Status	Indicator	Meaning
🟢 Operational	Green	All components healthy
🟡 Degraded	Yellow	Some components degraded
🔴 Critical	Red	One or more outages
🔵 Maintenance	Blue	Scheduled maintenance

Status Calculation:

if (any_component === 'outage') return 'CRITICAL'
if (any_component === 'degraded') return 'DEGRADED'
if (any_component === 'maintenance') return 'MAINTENANCE'
return 'OPERATIONAL'

Component Details

Click any component to view:

Current Status: Real-time status indicator
Last Updated: When status was last checked
Uptime: Percentage uptime (last 30 days)
Status History: Timeline of all status changes
Active Incidents: Ongoing issues affecting this component

Real-Time Updates

The dashboard automatically refreshes:

Auto-Refresh: Every 30 seconds
Manual Refresh: Click refresh button anytime
Live Indicators: Shows when data is updating

Notification Configuration

Creating Notification Rules

Navigate to Notifications

Go to Settings → Notifications

Create Rule

Click “Create Notification Rule”

Name Rule

Use descriptive names:

✅ “Production Outage Alerts - SMS”
✅ “All Stacks - Team Slack”
❌ “Rule 1”

Select Alert Levels

Info: Status improvements
Warning: Degraded performance
Alert: Outages

Choose Scope

All Stacks
Specific Stacks
Specific Components

Configure Channels

Add notification destinations:

Discord webhook
Slack webhook
Teams webhook
Email addresses
Phone numbers (SMS)
Custom webhooks

Save & Test

Save rule and test with a manual component

Notification Best Practices

Layered Alerting Strategy:

┌─────────────────────────────────────────┐
│         Critical Production              │
│  Alert Level: OUTAGE only               │
│  Channels: SMS + Slack + Email          │
│  Recipients: On-call engineer           │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│         Production Warnings              │
│  Alert Level: DEGRADED                  │
│  Channels: Slack + Email                │
│  Recipients: DevOps team                │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│         All Status Changes               │
│  Alert Level: INFO + WARNING + ALERT    │
│  Channels: Slack                        │
│  Recipients: #status-updates channel    │
└─────────────────────────────────────────┘

Channel Selection:

SMS: Critical outages only (expensive, high urgency)
Slack/Discord: Team awareness, all severity levels
Email: Record keeping, escalations
Webhooks: Integration with other tools (PagerDuty, etc.)

See the Notifications documentation for detailed channel setup.

Monitoring Workflows

Daily Monitoring Routine

Incident Response Workflow

When you receive an alert:

Acknowledge Alert

Open dashboard and locate affected Stack/component

Assess Severity

Outage: Immediate action required
Degraded: Investigate and monitor
Info: Acknowledge and document

Investigate Root Cause

Check component status history
Review related services for correlation
Check third-party status pages
Investigate custom monitor failures

Take Action

Fix issues if within your control
Contact third-party support if needed
Update manual components if applicable

Document Resolution

Update component status when resolved
Add notes to status history
Document root cause and fix

Post-Mortem

Review incident timeline
Identify improvements
Update monitoring if needed

Maintenance Window Management

Before Maintenance:

Plan Timing

Schedule during low-traffic periods

Notify Users

Update public status page if applicable

Set Component Status

Change affected components to “maintenance”

Adjust Notifications

Temporarily disable alerts for maintenance window

During Maintenance:

Keep status updated if timeline changes
Monitor related components for unexpected issues
Document any problems encountered

After Maintenance:

Return components to “operational”
Re-enable notifications
Verify all services healthy
Update status page with completion

Advanced Monitoring Techniques

Dependency Mapping

Organize Stacks to reflect service dependencies:

📦 Frontend Services (depends on Backend + Infrastructure)
  ├─ Web Application
  └─ CDN

    ↓ depends on

📦 Backend Services (depends on Infrastructure)
  ├─ API Server
  ├─ Background Workers
  └─ Queue System

    ↓ depends on

📦 Core Infrastructure (foundational)
  ├─ Database
  ├─ Redis Cache
  └─ AWS Services

Benefits:

Understand impact when infrastructure fails
Prioritize fixes based on dependency chain
Communicate outage scope to users

Uptime Tracking

Monitor uptime percentages over time: Uptime Targets:

99.9% (“Three Nines”): 43 minutes downtime/month
99.95%: 22 minutes downtime/month
99.99% (“Four Nines”): 4 minutes downtime/month

How to Track:

View component details in dashboard
Check uptime percentage (last 30 days)
Review status history for downtime duration
Export reports for stakeholders

Multi-Region Monitoring

For services deployed in multiple regions:

📦 Production - US East
  ├─ API Server (us-east-1)
  ├─ Database (us-east-1)
  └─ CDN Edge (US East)

📦 Production - Europe
  ├─ API Server (eu-west-1)
  ├─ Database (eu-west-1)
  └─ CDN Edge (Europe)

Benefits:

Detect region-specific outages
Track regional performance differences
Better incident response for distributed systems

Troubleshooting Common Issues

False Positives

Problem: Monitor reports outages when service is actually operational Solutions:

Increase Alert Threshold: Change from 1 to 3 consecutive failures
Adjust Timeout: Give service more time to respond
Check Network Path: Verify monitoring from correct region
Review Expected Content: Ensure regex/content match is correct

Missing Alerts

Problem: Service went down but no notification received Checklist:

Delayed Status Updates

Problem: Dashboard shows old status, not reflecting current state Solutions:

Manual Refresh: Click refresh button in dashboard
Check Update Frequency: Third-party sources update every 1-5 minutes
Custom Monitor Interval: Verify check interval is appropriate
Cache Issue: Clear browser cache and reload

Component Stuck in Maintenance

Problem: Component still shows maintenance after completion Solution:

Click on the component
Manually update status to “operational”
Verify status change reflects immediately

Monitoring Limits & Performance

Current Limits

Plan	Stacks	Components/Stack	Custom Monitors	Update Frequency
Free	3	10	5	5 minutes
Starter	10	20	25	1 minute
Professional	50	50	100	30 seconds
Enterprise	Unlimited	Unlimited	Unlimited	30 seconds

Optimizing Performance

Tips for Large Deployments:

Use Stack organization to group related services
Limit components per Stack to 20-30 for best dashboard performance
Use appropriate check intervals (not everything needs 30s checks)
Archive old/unused Stacks
Consolidate duplicate monitors

Next Steps

Customer Dashboard

Explore all dashboard features and shortcuts

MSP Setup

Set up multi-client monitoring

Stacks Concept

Deep dive into Stack organization

Notifications

Advanced notification configuration

Effective monitoring requires planning, appropriate tooling, and responsive workflows. Use this guide to build a robust monitoring strategy that keeps your services reliable and your team informed.

​Introduction

​Getting Started with Monitoring

​The Monitoring Workflow

​Stack Organization Strategies

​By Environment

​By Service Type

​For MSP Clients

​Adding Third-Party Services

​Choosing Services to Monitor

​Adding Source Components

​Custom Website Monitoring

​When to Use Custom Monitors

​Creating Effective Monitors

​Best Practices for Custom Monitors

​Health Check Endpoint Design

​Manual Component Management

​When to Use Manual Components

​Creating Manual Components

​Updating Manual Components

​Status Monitoring Dashboard

​Understanding Stack Status

​Component Details

​Real-Time Updates

​Notification Configuration

​Creating Notification Rules

​Notification Best Practices

​Monitoring Workflows

​Daily Monitoring Routine

​Incident Response Workflow

​Maintenance Window Management

​Advanced Monitoring Techniques

​Dependency Mapping

​Uptime Tracking

​Multi-Region Monitoring

​Troubleshooting Common Issues

​False Positives

​Missing Alerts

​Delayed Status Updates

​Component Stuck in Maintenance

​Monitoring Limits & Performance

​Current Limits

​Optimizing Performance

​Next Steps

Customer Dashboard

MSP Setup

Stacks Concept

Notifications

Introduction

Getting Started with Monitoring

The Monitoring Workflow

Stack Organization Strategies

By Environment

By Service Type

For MSP Clients

Adding Third-Party Services

Choosing Services to Monitor

Adding Source Components

Custom Website Monitoring

When to Use Custom Monitors

Creating Effective Monitors

Best Practices for Custom Monitors

Health Check Endpoint Design

Manual Component Management

When to Use Manual Components

Creating Manual Components

Updating Manual Components

Status Monitoring Dashboard

Understanding Stack Status

Component Details

Real-Time Updates

Notification Configuration

Creating Notification Rules

Notification Best Practices

Monitoring Workflows

Daily Monitoring Routine

Incident Response Workflow

Maintenance Window Management

Advanced Monitoring Techniques

Dependency Mapping

Uptime Tracking

Multi-Region Monitoring

Troubleshooting Common Issues

False Positives

Missing Alerts

Delayed Status Updates

Component Stuck in Maintenance

Monitoring Limits & Performance

Current Limits

Optimizing Performance

Next Steps