AI-powered Site Reliability Engineering for GitHub Actions
An intelligent agent that monitors GitHub Actions workflows, analyzes failures, and takes automated remediation actions using the GitHub Copilot SDK.
Features • Quick Start • How It Works • Configuration • Architecture
The GitHub SRE Agent is an autonomous AI agent that acts as your on-call Site Reliability Engineer for GitHub Actions. When a workflow fails, the agent:
- Analyzes the failure - Fetches logs, checks GitHub status, searches for known issues
- Makes intelligent decisions - Determines if it's a transient failure (retry) or a code bug (create issue)
- Takes action automatically - Retries workflows, creates detailed issues, or skips if appropriate
- Tracks resolution - When a tracked workflow succeeds, automatically closes the related issue
| Capability | Description |
|---|---|
| GitHub MCP Integration | Uses GitHub's Model Context Protocol for Actions, Issues, and Repository operations |
| Exa AI Web Search | Searches the web for error messages, Stack Overflow solutions, and documentation |
| Workflow Tracking | Tracks failed workflows and auto-closes issues when they're fixed |
| Persistent Memory | Maintains notes and context across workflow runs |
|
|
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
│ Workflow Failure Flow │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Workflow Fails ──▶ 2. Agent Analyzes ──▶ 3. Decision │
│ │ │ │ │
│ │ ┌──────┴──────┐ ┌──────┴──────┐ │
│ │ │ • Fetch logs│ │ • RETRY │ │
│ │ │ • Check GH │ │ • CREATE │ │
│ │ │ status │ │ ISSUE │ │
│ │ │ • Search web│ │ • SKIP │ │
│ │ └─────────────┘ └──────┬──────┘ │
│ │ │ │
│ │ ┌───────────┴───────────┐ │
│ │ ▼ ▼ │
│ │ [Create Issue] [Retry Workflow] │
│ │ │ │ │
│ │ ▼ │ │
│ │ [Track Workflow] ◀────────────────┘ │
│ │ │ │
└─────────┼──────────────────────────────┼────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Workflow Success Flow │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Workflow Succeeds ──▶ Check if Tracked ──▶ Yes ──▶ Close Issue │
│ │ │ │
│ ▼ ▼ │
│ No Untrack Workflow │
│ │ │ │
│ ▼ ▼ │
│ [Skip] [Add Comment & Close] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────┐
│ GitHub SRE Agent │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Webhook Server │────▶│ Event Handler │────▶│ SRE Agent Core │ │
│ │ (Hono) │ │ │ │ (Copilot SDK) │ │
│ └─────────────────┘ └─────────────────┘ └─────────┬───────────┘ │
│ │ │
│ ┌─────────────────────────────────┼─────────────┐ │
│ │ MCP Servers │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ GitHub MCP │ │ Exa AI MCP │ │ │
│ │ │ • Actions │ │ • Web Search│ │ │
│ │ │ • Issues │ │ • Research │ │ │
│ │ │ • Repos │ │ • Crawling │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────┬──────────────────────────────┼─────────────┐ │
│ │ Custom Tools │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ ┌─────────┐ │
│ │ check_github │ │ manage_notes │ │ track_ │ │Workflow │ │
│ │ _status │ │ │ │ workflow │ │ Tracker │ │
│ └──────────────┘ └──────────────┘ └────────────┘ └─────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
- Node.js 18.0.0 or higher
- GitHub Copilot CLI installed and authenticated (
gh copilot) - ngrok (for local development)
# Clone the repository
git clone https://github.com/htekdev/github-sre-agent.git
cd github-sre-agent
# Install dependencies
npm install
# Copy environment template
cp .env.example .envEdit .env with your credentials:
# Server
PORT=3000
NODE_ENV=development
# GitHub (webhook secret only - auth handled by Copilot SDK)
GITHUB_WEBHOOK_SECRET=your_webhook_secret
# Exa AI (optional - enables web search)
EXA_API_KEY=your_exa_api_key
# Copilot SDK
COPILOT_MODEL=Claude Sonnet 4
# Logging
LOG_LEVEL=infoNote: No
GITHUB_TOKENneeded! The Copilot SDK handles authentication automatically via GitHub MCP.
# Start the development server
npm run dev
# In another terminal, start ngrok tunnel
npx ngrok http 3000Then configure your GitHub repository webhook:
- Go to Settings → Webhooks → Add webhook
- Set Payload URL to your ngrok URL +
/webhook - Set Content type to
application/json - Enter your Secret
- Select Let me select individual events → ✅ Workflow runs
- Click Add webhook
Create .github/sre-agent.yml in your repository to customize the agent's behavior:
version: 1
enabled: true
# Custom instructions for the AI agent
instructions: |
- This repo uses pnpm, not npm
- Always check if tests pass before suggesting retry
- Create issues with label "ci-failure" for tracking
# Action-specific settings
actions:
retry:
enabled: true
maxAttempts: 3
createIssue:
enabled: true
labels:
- sre-agent
- automated
- ci-failure
assignees: []
# Only monitor specific workflows (empty = all)
workflows: []
# Ignore patterns
ignore:
conclusions:
- cancelled # Don't process cancelled workflows
branches:
- "dependabot/*" # Ignore dependabot branchesgithub-sre-agent/
├── src/
│ ├── index.ts # Entry point
│ ├── config/ # Configuration management
│ ├── server/ # Hono web server
│ │ └── routes/ # API routes
│ ├── agent/ # SRE Agent implementation
│ │ ├── SREAgent.ts # Main agent with MCP config
│ │ └── tools/ # Custom tools (status, notes, tracking)
│ ├── services/ # Service integrations
│ │ ├── StatusService.ts # GitHub status checker
│ │ ├── NoteStore.ts # Notes persistence
│ │ └── WorkflowTracker.ts # Workflow tracking for auto-close
│ ├── handlers/ # Event handlers
│ └── types/ # TypeScript types
├── data/ # Local storage (notes, tracked workflows)
├── prompts/ # Prompt files for agent operations
└── package.json
npm run dev # Start development server with hot reload
npm run build # Build for production
npm run start # Start production serverUse the included test workflows:
- CI Build (
.github/workflows/test.yml) - Simulates a failing/passing CI - Flaky Test (
.github/workflows/flaky-test.yml) - Succeeds on 3rd attempt
Reset experiment state:
# Use the reset prompt with Copilot
# Or manually delete issues and clear data/- No Token Storage: GitHub authentication handled by Copilot SDK OAuth
- Webhook Signature Verification: All webhooks verified using HMAC-SHA256
- MCP Security: GitHub MCP uses Copilot's authenticated session
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ using GitHub Copilot SDK and GitHub MCP