GitHub Availability Report: May 2024
In May, we experienced one incident that resulted in degraded performance across GitHub services.
In May, we experienced one incident that resulted in significant degraded performance across GitHub services.
May 21 11:40 UTC (lasting 7 hours 26 minutes)
On May 21, various GitHub services experienced latency due to a configuration change in an upstream cloud provider. GitHub Copilot Chat experienced p50 latency of up to 2.5s and p95 latency of up to 6s, GitHub Actions was degraded with 20 60 minute delays for workflow run updates, and GitHub Enterprise Importer customers experienced longer migration run times due to Actions delays.
Actions users experienced their runs stuck in stale states for some time even if the underlying runner was completed successfully, and Copilot Chat users experienced delays in receiving responses to their requests. Billing related metrics for budget notifications and UI reporting were also delayed, leading to outdated billing details. No data was lost and reporting was restored after mitigation.
We determined that the issue was caused by a scheduled operating system upgrade that resulted in unintended and uneven distribution of traffic within the cluster. A short- term strategy of increasing the number of network routes between our data centers and cloud provider helped mitigate the incident.
To prevent recurrence of the incidents, we have identified and are fixing gaps in our monitoring and alerting for load thresholds to improve both detection and mitigation time.
Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.
Tags:
Written by
Related posts
Survey: The AI wave continues to grow on software development teams
We surveyed 2,000 people on software development teams at enterprises in the U.S., Brazil, India, and Germany about the use, experience, and expectations around generative AI tools in software development.
GitHub Availability Report: July 2024
In July, we experienced four incidents that resulted in degraded performance across GitHub services.
Found means fixed: Secure code more than three times faster with Copilot Autofix
With Copilot Autofix, developers and security teams can keep new vulnerabilities out of code and confidently remediate their backlog security debt.

