Skip to content

Sporadic DeadlineExceeded GRPC errors when queuing tasks to Cloud Task #13938

Open
@smallwat3r

Description

@smallwat3r

Determine this is the right repository

  • I determined this is the correct repository in which to report this bug.

Summary of the issue

Context

We have a service that uses Cloud Task to dispatch and run long running tasks. It has been running well for months, but since the 15th May we started noticing some sporadic DeadlineExceeded errors raised from gRPC. We have not changed anything in our system or in the way we enqueue tasks.

The amount of failed tasks raising DeadlineExceeded is probably around 20%.

After seeing these new recent errors, we decided to add support for retries and a timeout when creating tasks, here is a similar configuration than the we used:

retry = Retry(
    predicate=if_exception_type(
        exceptions.TooManyRequests,
        exceptions.ServiceUnavailable,
        requests.exceptions.ConnectionError,
        requests.exceptions.ChunkedEncodingError,
        auth_exceptions.TransportError,
        exceptions.DeadlineExceeded,  # exception we started noticing
    ),
    initial=1.0,
    maximum=2.0,
    multiplier=2.0,
    timeout=15.0,  # how long to keep retrying
)
client.create_task(
    request={"parent": parent, "task": task}, timeout=5.0, retry=retry
)

We saw the tasks that were raising an exception being retried, but none of them got successfully queued before timing out, and re-raising DeadlineExceeded.

The ones that gets queued successfully are very quick, so the configured timeouts should have been enough.

After these failed attempts, the only way for us to actually make it work, and ensure 100% of our tasks are successfully put to the queue, was to switch the transport protocol from GRPC to HTTP.

client = CloudTasksClient(
    transport=CloudTasksRestTransport(credentials=...)
)

Is there anything that we are missing, or anything that has changed in the way the library is using GRPC that would explain why we started noticing this behaviour?

Expected Behavior:
All our tasks should be queued successfully when retrying on sporadic DeadlineExceeded errors.

Actual Behavior:
We are getting sporadic DeadlineExceeded errors, the task never gets queued, even on retry.

API client name and version

google-cloud-tasks 2.19.2

Reproduction steps: code

This is some pseudo code using the GRPC transport protocol, before we switched to using HTTP. Note that reaching the DeadlineExceeded exception seems to only happen on ~20% of the tasks we queued. The rest of the tasks were queued successfully.

file: main.py

client = CloudTasksClient()
retry = Retry(
    predicate=if_exception_type(
        exceptions.DeadlineExceeded,  # exception we started noticing
    ),
    initial=1.0,
    maximum=2.0,
    multiplier=2.0,
    timeout=15.0,  # how long to keep retrying
)
client.create_task(
    request={"parent": parent, "task": task}, timeout=5.0, retry=retry
)

Reproduction steps: supporting files

No response

Reproduction steps: actual results

No response

Reproduction steps: expected results

OS & version + platform

Debian Bookworm (12) on App Engine flex

Python environment

Python 3.12.10

Python dependencies

Here is the list of our dependencies related to gRPC

grpc-google-iam-v1                       0.14.2
grpcio                                   1.71.0
grpcio-status                            1.62.3

Additional context

Here is a sample of the traceback of the actual exception we've been seeing:

  File "/opt/.venv/lib/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2025-05-21T14:14:37.173146012+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/.venv/lib/python3.12/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
    result = target()
             ^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded

Metadata

Metadata

Assignees

Labels

needs more infoThis issue needs more information from the customer to proceed.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions