Skip to content

Shutdown gets hung permanently on waiting for websockets #12494

Closed
@deansheather

Description

@deansheather

A customer's replica terminated due to pubsub watchdog, but it hung and never shutdown properly because (*API).Close() hung permanently waiting for websockets to drain.

We should:

  • Have a shutdown timeout. We don't have one because we assume all customers run in Kubernetes which kills pods if they don't shutdown in time. This doesn't work when K8s didn't initiate the shutdown anyways. This customer was using a custom solution with Systemd.
  • Kick agent RPC connections as soon as the API closes so they can find a new home quickly.
  • Critical shutdowns due to the watchdog should kick all websockets immediately. Almost every websocket on Coder relies on pubsub to work properly.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions