Skip to content

Expose client-to-agent disconnection reasons in prometheus metric #18601

@dannykopping

Description

@dannykopping

Related to #16482

Why
Customers need proactive alerts when users experience unexpected workspace disconnects (network blips, agent crashes). Current Prometheus metrics (coderd_agentstats_connection_count, coderd_agents_connections{status="disconnected"}) only expose agent ↔ coderd state and cannot distinguish intentional session closures from unexpected client↔agent drops.

Proposal

  1. Emit Prometheus metrics that increment on:
    • graceful (user-initiated) disconnects
    • ungraceful (timeout/crash/network) disconnects
  2. Document new metrics in docs/admin/integrations/prometheus with example alert rules.
  3. Ensure metrics are exposed per workspace and, if feasible, per user for targeted alerting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    networkingArea: networkingobservabilityIssues related to observability (metrics, dashboards, alerts, opentelemetry)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions