Skip to content

feat: ensure coder remains healthy with single degraded DERP server #10813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 21, 2023

Conversation

mtojek
Copy link
Member

@mtojek mtojek commented Nov 21, 2023

Fixes: #8966

This PR modifies DERP healthcheck logic to consider a region with a single degraded DERP to be healthy. I will add severity levels in #9754.

@mtojek mtojek self-assigned this Nov 21, 2023
@mtojek mtojek changed the title feat: coder is healthy with single degraded DERP server feat: ensure coder remains healthy with single degraded DERP server Nov 21, 2023
@mtojek mtojek requested a review from johnstcn November 21, 2023 11:19
@mtojek mtojek marked this pull request as ready for review November 21, 2023 11:19
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially out of scope of this PR, but do we need to handle the case where there are no nodes?

Co-authored-by: Cian Johnston <cian@coder.com>
@mtojek
Copy link
Member Author

mtojek commented Nov 21, 2023

Potentially out of scope of this PR, but do we need to handle the case where there are no nodes?

I checked it now. Tailscale does not like regions without nodes and it panics. I suppose that we're good then.

"runtime error: integer divide by zero"
Stack:
	 2  0x00000001005f469c in testing.tRunner.func1.2
	     at /opt/homebrew/Cellar/go/1.20.1/libexec/src/testing/testing.go:1526
	 3  0x00000001005f3f90 in testing.tRunner.func1
	     at /opt/homebrew/Cellar/go/1.20.1/libexec/src/testing/testing.go:1529
	 6  0x0000000100a1f4b4 in tailscale.com/net/netcheck.makeProbePlanInitial
	     at /Users/mtojek/go/pkg/mod/github.com/coder/tailscale@v1.1.1-0.20231106123012-ba3acaa26275/net/netcheck/netcheck.go:503
	 7  0x0000000100a1e550 in tailscale.com/net/netcheck.makeProbePlan
	     at /Users/mtojek/go/pkg/mod/github.com/coder/tailscale@v1.1.1-0.20231106123012-ba3acaa26275/net/netcheck/netcheck.go:408
	 8  0x0000000100a22d44 in tailscale.com/net/netcheck.(*Client).GetReport
	     at /Users/mtojek/go/pkg/mod/github.com/coder/tailscale@v1.1.1-0.20231106123012-ba3acaa26275/net/netcheck/netcheck.go:981
	 9  0x0000000100b2cd18 in github.com/coder/coder/v2/coderd/healthcheck/derphealth.(*Report).Run
	     at /Users/mtojek/code/coder/coderd/healthcheck/derphealth/derp.go:139
	10  0x0000000100dc3b64 in github.com/coder/coder/v2/coderd/healthcheck/derphealth_test.TestDERP.func3
	     at /Users/mtojek/code/coder/coderd/healthcheck/derphealth/derp_test.go:151
	11  0x00000001005f398c in testing.tRunner
	     at /opt/homebrew/Cellar/go/1.20.1/libexec/src/testing/testing.go:1576
	12  0x00000001005f4fb8 in testing.(*T).Run.func1
	     at /opt/homebrew/Cellar/go/1.20.1/libexec/src/testing/testing.go:1629

@mtojek mtojek merged commit 048dc04 into main Nov 21, 2023
@mtojek mtojek deleted the 8966-partially-healthy branch November 21, 2023 11:58
@github-actions github-actions bot locked and limited conversation to collaborators Nov 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

healthcheck: DERP servers partially unhealthy
2 participants