Operations Runbook
Operations Runbook
Section titled “Operations Runbook”This guide covers operational procedures for running Armada in production environments.
Pre-Deployment Checklist
Section titled “Pre-Deployment Checklist”Environment Requirements
Section titled “Environment Requirements”- Jira Cloud instance (Jira Software, Jira Service Management)
- Node.js 18+ (for local development)
- Docker (for containerized deployment)
- Atlassian Forge CLI installed
Configuration Verification
Section titled “Configuration Verification”# Verify Forge credentialsforge settings list
# Check app installation statusforge app list
# Verify environment configurationforge variables list --environment productionSecurity Requirements
Section titled “Security Requirements”- Jira admin permissions for app installation
- API token generated from Atlassian account
- Storage quotas reviewed and allocated
- User permissions configured appropriately
Deployment Procedures
Section titled “Deployment Procedures”Development Environment
Section titled “Development Environment”# Deploy to developmentforge deploy --environment development
# Verify deploymentforge logs --environment development
# Install on test siteforge install --upgrade --site your-site.atlassian.net --product JiraProduction Deployment
Section titled “Production Deployment”-
Pre-deployment validation
Terminal window npm run validate # Run typecheck, lint, and testsnpm run build # Build all resources -
Deploy to production
Terminal window forge deploy --environment production -
Post-deployment verification
Terminal window # Check app statusforge app status# View production logsforge logs --environment production -
Update Marketplace listing (if version changed)
- Increment version in
manifest.yml - Update release notes
- Publish to marketplace
- Increment version in
Monitoring & Observability
Section titled “Monitoring & Observability”Key Metrics to Track
Section titled “Key Metrics to Track”| Metric | Description | Alert Threshold |
|---|---|---|
| Campaign success rate | % of campaigns completing successfully | < 95% |
| Average campaign duration | Time from launch to completion | > 72 hours |
| Auto-nudge effectiveness | % of nudged issues updated | < 50% |
| API error rate | Failed Jira API calls | > 1% |
| Storage utilization | Campaign state storage usage | > 80% |
Log Analysis
Section titled “Log Analysis”# View recent logsforge logs --environment production --tail 100
# Filter by severityforge logs --environment production --filter error
# Search for specific campaignforge logs --environment production --search PROJ-123Health Checks
Section titled “Health Checks”# Check app healthcurl https://your-site.atlassian.net/rest/forge-Health/1.0/health
# Expected response{ "status": "UP", "components": { "storage": "UP", "jiraApi": "UP", "compassApi": "UP" }}Incident Response
Section titled “Incident Response”High Error Rate
Section titled “High Error Rate”-
Immediate actions
Terminal window # View detailed errorsforge logs --environment production --filter error --tail 500# Check Jira API status# Verify app permissions -
Common causes
- Jira API rate limiting → Implement exponential backoff
- Permission changes → Re-install app
- Storage quota exceeded → Archive old campaigns
-
Resolution steps
- Clear campaign state cache
- Restart affected workflows
- Notify affected users
Storage Issues
Section titled “Storage Issues”# Check storage utilization# Navigate to Fleet > Settings in Armada panel
# Archive old campaigns# 1. Export campaign data to JSON# 2. Delete old campaign state# 3. Keep parent issue for audit trailPermission Problems
Section titled “Permission Problems”-
Re-install the app:
Terminal window forge install --upgrade --site your-site.atlassian.net -
Verify scopes in manifest.yml match required permissions
-
Check user has appropriate Jira admin rights
Backup & Recovery
Section titled “Backup & Recovery”Campaign Data Backup
Section titled “Campaign Data Backup”Campaign data is stored in:
- Jira issue properties (campaign state)
- Forge Storage (fleet config, templates)
# Export fleet configuration# Available in Fleet > Settings > Export
# Export campaign data# Available in Campaign Status > ExportRecovery Procedures
Section titled “Recovery Procedures”-
Fleet configuration loss
- Import from previously exported JSON
- Recreate teams and settings from documentation
-
Campaign state corruption
- Delete corrupted state via Armada UI
- Re-sync by fetching current issue status
- Lost history cannot be recovered (stateless by design)
Capacity Planning
Section titled “Capacity Planning”Storage Limits
Section titled “Storage Limits”| Component | Limit | Scaling Strategy |
|---|---|---|
| Campaign children | 400 per campaign | Use sub-campaigns |
| Fleet teams | 50 per installation | Multiple fleets |
| Mission templates | 100 per installation | Archive old templates |
| Storage total | 32KB per property | Chunked storage enabled |
Rate Limiting
Section titled “Rate Limiting”- Jira REST API: 100 requests/minute (adjustable)
- Batch size: 10 issues per request
- Retry with exponential backoff: 1s, 2s, 4s, 8s
Maintenance Windows
Section titled “Maintenance Windows”Recommended Schedule
Section titled “Recommended Schedule”- Weekly: Log review and cleanup
- Monthly: Capacity assessment
- Quarterly: Security audit and updates
Update Procedures
Section titled “Update Procedures”- Review changelog for breaking changes
- Test in development environment
- Schedule maintenance window
- Deploy during low-usage period
- Verify all features working
- Monitor for 24 hours post-update
Support Contacts
Section titled “Support Contacts”| Role | Contact | Response Time |
|---|---|---|
| Technical Support | [email protected] | 24 hours |
| Security Issues | [email protected] | 4 hours |
| Enterprise Support | [email protected] | 1 hour |
| Sales | [email protected] | 24 hours |
Escalation Path
Section titled “Escalation Path”- Level 1: Check documentation and logs
- Level 2: Submit support ticket
- Level 3: Contact account manager
- Level 4: Emergency escalation to engineering lead