On June 18, 2025, between 08:21 UTC and 18:47 UTC, some Actions jobs experienced intermittent failures downloading from the Actions Cache service. During the incident, 17% of workflow runs experienced cache download failures, resulting in a warning message in the logs and performance degradation. The disruption was caused by a network issue in our database systems that led to a database replica getting out of sync with the primary. We mitigated the incident by routing cache download url requests to bypass the out-of-sync replica until it was fully restored.
To prevent this class of incidents, we are developing capability in our database system to more robustly bypass out-of-sync replicas. We are also implementing improved monitoring to help us detect similar issues more quickly going forward.
Posted Jun 18, 2025 - 18:47 UTC
Update
We are continuing to rollout a mitigation and are progressing towards having this rolled out for all customers.
Posted Jun 18, 2025 - 18:11 UTC
Update
We are currently deploying a mitigation for this issue and will be rolling it out shortly. We will update our progress as we monitor the deployment.
Posted Jun 18, 2025 - 17:22 UTC
Update
We are actively investigating and working on a mitigation for database instability leading to replication lag in the Actions Cache service. We will continue to post updates on progress towards mitigation.
Posted Jun 18, 2025 - 17:03 UTC
Update
The actions cache service is experiencing degradation in a number of regions causing cache misses when attempting to download cache entries. This is not causing workflow failures, but workflow runtime might be elevated for certain runs.