Partial Actions Cache degradation

Incident Report for GitHub

Resolved

On June 18, 2025, between 08:21 UTC and 18:47 UTC, some Actions jobs experienced intermittent failures downloading from the Actions Cache service. During the incident, 17% of workflow runs experienced cache download failures, resulting in a warning message in the logs and performance degradation. The disruption was caused by a network issue in our database systems that led to a database replica getting out of sync with the primary. We mitigated the incident by routing cache download url requests to bypass the out-of-sync replica until it was fully restored.

To prevent this class of incidents, we are developing capability in our database system to more robustly bypass out-of-sync replicas. We are also implementing improved monitoring to help us detect similar issues more quickly going forward.

Posted Jun 18, 2025 - 18:47 UTC

Update

We are continuing to rollout a mitigation and are progressing towards having this rolled out for all customers.

Posted Jun 18, 2025 - 18:11 UTC

Update

We are currently deploying a mitigation for this issue and will be rolling it out shortly. We will update our progress as we monitor the deployment.

Posted Jun 18, 2025 - 17:22 UTC

Update

We are actively investigating and working on a mitigation for database instability leading to replication lag in the Actions Cache service. We will continue to post updates on progress towards mitigation.

Posted Jun 18, 2025 - 17:03 UTC

Update

The actions cache service is experiencing degradation in a number of regions causing cache misses when attempting to download cache entries. This is not causing workflow failures, but workflow runtime might be elevated for certain runs.

Posted Jun 18, 2025 - 16:46 UTC

Investigating

We are currently investigating this issue.

Posted Jun 18, 2025 - 16:46 UTC