Introduction
This guide covers everything you need to know about monitoring your services effectively with StatusStack. Whether you’re tracking third-party services, monitoring your own infrastructure, or managing status for multiple clients, this guide will help you build a robust monitoring strategy.Getting Started with Monitoring
The Monitoring Workflow
Plan Your Stack Structure
Decide how to organize your monitoring (by environment, service type, or client)
Stack Organization Strategies
By Environment
Separate your monitoring by deployment environment:- Clear separation of concerns
- Different notification rules per environment
- Easy to identify which environment has issues
- Prevents mixing prod and dev alerts
By Service Type
Group services by functional area:- Logical grouping by function
- Easy to see which service layer is affected
- Helps identify dependency chains
- Better for incident response
For MSP Clients
Create client-specific monitoring:- Complete client isolation
- Client-specific notification rules
- Individual client status pages
- Billing per client
Adding Third-Party Services
Choosing Services to Monitor
StatusStack supports 5,200+ sources. Add services that your infrastructure depends on: Cloud Infrastructure:- AWS, Google Cloud, Azure, DigitalOcean
- Monitor core services: compute, storage, networking
- Cloudflare, Fastly, Akamai
- Track CDN and DNS availability
- GitHub, GitLab, Bitbucket
- Monitor repository and CI/CD availability
- Stripe, PayPal, Square
- Ensure payment infrastructure is operational
- SendGrid, Twilio, Slack, Discord
- Track email, SMS, and chat services
Adding Source Components
Select Components
Choose which components of that service to monitor:
- For AWS: EC2, S3, Lambda, RDS, etc.
- For Cloudflare: CDN, DNS, etc.
- Status updates every 1-5 minutes
- No configuration required
- Reflects official status page data
Custom Website Monitoring
When to Use Custom Monitors
Use custom monitors for:- Your own websites and APIs
- Internal services without public status pages
- Services not in our third-party integrations
- Specific endpoints requiring health checks
Creating Effective Monitors
Monitor Configuration Example:Best Practices for Custom Monitors
Check Interval Selection
Check Interval Selection
30 seconds: Critical production services
60 seconds: Standard production services (recommended)
2-5 minutes: Non-critical or staging servicesConsiderations:
- More frequent checks = faster detection but higher costs
- Balance between responsiveness and efficiency
- Most services don’t need sub-minute monitoring
Timeout Configuration
Timeout Configuration
Rule of Thumb: Set timeout to expected response time + 50% bufferExamples:
- API typically responds in 200ms → Set 500ms timeout
- Database query takes 2s → Set 3s timeout
- Long-running endpoint (10s) → Set 15s timeout
Alert Thresholds
Alert Thresholds
1 failure: Ultra-critical services (use sparingly)
3 failures: Production services (recommended)
5 failures: Staging or dev environmentsWhy: Reduces false alarms from transient network issues
Expected Content Matching
Expected Content Matching
Use When:Regex Support: Use regex patterns for flexible matching
- You need to verify response content, not just HTTP status
- API returns 200 but with error payload
- Checking for specific success indicators
Authentication Headers
Authentication Headers
Protected Endpoints:Security:
- Use dedicated health check tokens (not admin tokens)
- Rotate tokens regularly
- Limit token permissions to health check endpoint only
SSL Certificate Monitoring
SSL Certificate Monitoring
Enable SSL Checks:What It Monitors:
- Certificate expiration (warns 7 days before)
- Certificate validity
- Trust chain verification
- Prevent unexpected certificate expirations
- Catch SSL configuration issues
- Maintain security compliance
Health Check Endpoint Design
If you’re creating custom health check endpoints, follow these patterns: Simple Health Check:200 OK: All systems operational503 Service Unavailable: System degraded or down5xx errors: Treat as outage
Manual Component Management
When to Use Manual Components
Manual components are ideal for:- Internal services without automated monitoring
- Third-party services not in our integrations
- Scheduled maintenance windows
- Services with manual status updates
Creating Manual Components
Configure
- Name: “Internal Database Cluster”
- Description: Optional details
- Initial Status: Set starting status
Updating Manual Components
Update status when needed:- Open the Stack
- Click on the manual component
- Select new status (operational, degraded, outage, maintenance)
- Add optional message describing the change
- Click Update Status
- Start of scheduled maintenance: Set to “maintenance”
- Detected issue: Set to “degraded” or “outage”
- Issue resolved: Set back to “operational”
Status Monitoring Dashboard
Understanding Stack Status
Each Stack displays an overall status based on its components:| Status | Indicator | Meaning |
|---|---|---|
| 🟢 Operational | Green | All components healthy |
| 🟡 Degraded | Yellow | Some components degraded |
| 🔴 Critical | Red | One or more outages |
| 🔵 Maintenance | Blue | Scheduled maintenance |
Component Details
Click any component to view:- Current Status: Real-time status indicator
- Last Updated: When status was last checked
- Uptime: Percentage uptime (last 30 days)
- Status History: Timeline of all status changes
- Active Incidents: Ongoing issues affecting this component
Real-Time Updates
The dashboard automatically refreshes:- Auto-Refresh: Every 30 seconds
- Manual Refresh: Click refresh button anytime
- Live Indicators: Shows when data is updating
Notification Configuration
Creating Notification Rules
Name Rule
Use descriptive names:
- ✅ “Production Outage Alerts - SMS”
- ✅ “All Stacks - Team Slack”
- ❌ “Rule 1”
Configure Channels
Add notification destinations:
- Discord webhook
- Slack webhook
- Teams webhook
- Email addresses
- Phone numbers (SMS)
- Custom webhooks
Notification Best Practices
Layered Alerting Strategy:- SMS: Critical outages only (expensive, high urgency)
- Slack/Discord: Team awareness, all severity levels
- Email: Record keeping, escalations
- Webhooks: Integration with other tools (PagerDuty, etc.)
Monitoring Workflows
Daily Monitoring Routine
Incident Response Workflow
When you receive an alert:Assess Severity
- Outage: Immediate action required
- Degraded: Investigate and monitor
- Info: Acknowledge and document
Investigate Root Cause
- Check component status history
- Review related services for correlation
- Check third-party status pages
- Investigate custom monitor failures
Take Action
- Fix issues if within your control
- Contact third-party support if needed
- Update manual components if applicable
Document Resolution
- Update component status when resolved
- Add notes to status history
- Document root cause and fix
Maintenance Window Management
Before Maintenance:
During Maintenance:
- Keep status updated if timeline changes
- Monitor related components for unexpected issues
- Document any problems encountered
- Return components to “operational”
- Re-enable notifications
- Verify all services healthy
- Update status page with completion
Advanced Monitoring Techniques
Dependency Mapping
Organize Stacks to reflect service dependencies:- Understand impact when infrastructure fails
- Prioritize fixes based on dependency chain
- Communicate outage scope to users
Uptime Tracking
Monitor uptime percentages over time: Uptime Targets:- 99.9% (“Three Nines”): 43 minutes downtime/month
- 99.95%: 22 minutes downtime/month
- 99.99% (“Four Nines”): 4 minutes downtime/month
- View component details in dashboard
- Check uptime percentage (last 30 days)
- Review status history for downtime duration
- Export reports for stakeholders
Multi-Region Monitoring
For services deployed in multiple regions:- Detect region-specific outages
- Track regional performance differences
- Better incident response for distributed systems
Troubleshooting Common Issues
False Positives
Problem: Monitor reports outages when service is actually operational Solutions:- Increase Alert Threshold: Change from 1 to 3 consecutive failures
- Adjust Timeout: Give service more time to respond
- Check Network Path: Verify monitoring from correct region
- Review Expected Content: Ensure regex/content match is correct
Missing Alerts
Problem: Service went down but no notification received Checklist:Delayed Status Updates
Problem: Dashboard shows old status, not reflecting current state Solutions:- Manual Refresh: Click refresh button in dashboard
- Check Update Frequency: Third-party sources update every 1-5 minutes
- Custom Monitor Interval: Verify check interval is appropriate
- Cache Issue: Clear browser cache and reload
Component Stuck in Maintenance
Problem: Component still shows maintenance after completion Solution:- Click on the component
- Manually update status to “operational”
- Verify status change reflects immediately
Monitoring Limits & Performance
Current Limits
| Plan | Stacks | Components/Stack | Custom Monitors | Update Frequency |
|---|---|---|---|---|
| Free | 3 | 10 | 5 | 5 minutes |
| Starter | 10 | 20 | 25 | 1 minute |
| Professional | 50 | 50 | 100 | 30 seconds |
| Enterprise | Unlimited | Unlimited | Unlimited | 30 seconds |
Optimizing Performance
Tips for Large Deployments:- Use Stack organization to group related services
- Limit components per Stack to 20-30 for best dashboard performance
- Use appropriate check intervals (not everything needs 30s checks)
- Archive old/unused Stacks
- Consolidate duplicate monitors
Next Steps
Customer Dashboard
Explore all dashboard features and shortcuts
MSP Setup
Set up multi-client monitoring
Stacks Concept
Deep dive into Stack organization
Notifications
Advanced notification configuration
Effective monitoring requires planning, appropriate tooling, and responsive workflows. Use this guide to build a robust monitoring strategy that keeps your services reliable and your team informed.

