391043 Stack
📖 Tutorial

GitHub April 2026 Availability: Key Incidents and Lessons Learned

Last updated: 2026-05-19 12:27:30 Intermediate
Complete guide
Follow along with this comprehensive guide

In April 2026, GitHub experienced ten incidents that led to degraded performance across its services. This report highlights the most significant events—including a prolonged code search outage and a brief audit log disruption—and outlines the steps GitHub is taking to enhance reliability. Below, we answer key questions about these incidents and GitHub's ongoing improvements.

What happened to GitHub's services in April 2026?

Throughout April 2026, GitHub recorded ten separate incidents that caused degraded performance. The most impactful were a near-complete code search outage on April 1 that lasted over eight hours, and an audit log failure the same day that lasted just four minutes. Later in the month, two additional major incidents occurred on April 23 and April 27, which GitHub detailed in a dedicated blog post. To improve transparency, GitHub also began providing more granular status updates on its status page. The company emphasized that no data was permanently lost in any of these events.

GitHub April 2026 Availability: Key Incidents and Lessons Learned
Source: github.blog

What caused the code search outage on April 1?

The code search outage stemmed from a routine infrastructure upgrade to the messaging system that supports search indexing. An automated change was applied too aggressively, causing a coordination failure between internal services. This halted indexing and made search results stale. While the team worked to recover the messaging infrastructure, an unintended service deployment accidentally cleared internal routing state, escalating the staleness into a complete outage. As a result, for about 2 hours and 20 minutes, all search queries failed. After partial restoration, results remained stale until full re-indexing completed nearly nine hours later.

How severe was the code search outage?

Between 14:40 and 17:00 UTC on April 1, 2026, 100% of code search requests failed—a full unavailability lasting 2 hours and 20 minutes. After initial recovery at 17:00 UTC, search returned results but they were stale: they did not reflect any repository changes made after roughly 07:00 UTC that day. Full indexing and current results were restored by 23:45 UTC, meaning the degraded state persisted for over six hours. Importantly, no Git repository data was affected; the search index is a secondary cache, and all source code remained intact.

How did GitHub resolve the code search issue?

The team restored the messaging infrastructure by performing a controlled restart, which reestablished coordination between the affected services. They then reset the search index to a point in time just before the disruption began. Because the underlying Git repositories were completely unaffected, no data was lost—the index simply needed to be rebuilt from scratch. Re-indexing took several hours but ultimately brought all search results back to the current state of repositories. No manual recovery of Git data was required.

GitHub April 2026 Availability: Key Incidents and Lessons Learned
Source: github.blog

What was the audit log incident on April 1?

On April 1, between 15:34 and 16:02 UTC, GitHub's audit log service lost connection to its backing data store due to a failed credential rotation. For 28 minutes, audit log history was unavailable via both the API and the web UI, causing 5xx errors for over 4,000 API actors and 127 web users. Events generated during this window were delayed by up to 29 minutes but were eventually written and streamed successfully—no events were lost. Customers using GitHub Enterprise Cloud with data residency were not impacted. The incident was detected within six minutes thanks to automated alerts.

What improvements is GitHub making to prevent future incidents?

GitHub has outlined several upgrades to its infrastructure and processes. These include more gradual upgrades with better health checks to catch problems before they cascade; deployment safeguards to prevent unintended changes during active incidents; faster recovery tooling to reduce the time needed to restore service; and better traffic isolation to limit cascading impact from unexpected spikes. The company is also investing in long-term reliability and transparency, such as providing more detailed status page updates and publishing post-incident analyses for major events like those on April 23 and 27.