-
Notifications
You must be signed in to change notification settings - Fork 937
fix: accumulate agentstats until reported and fix insights DAU offset #15832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a1757f0
to
e97f3a9
Compare
@@ -89,7 +89,7 @@ func (api *API) returnDAUsInternal(rw http.ResponseWriter, r *http.Request, temp | |||
} | |||
for _, row := range rows { | |||
resp.Entries = append(resp.Entries, codersdk.DAUEntry{ | |||
Date: row.StartTime.Format(time.DateOnly), | |||
Date: row.StartTime.In(loc).Format(time.DateOnly), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: Drive-by fix, the date was off-by-one depending on timezone.
} else { | ||
s.networkStats = maps.Clone(virtual) | ||
s.unreported = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: If the callback was called multiple times before reporting, we lost data as each update is a snapshot since the last.
This can happen if:
- The interval is short (tests)
- Report takes a long time
I believe the assumption is that the "ConnStatsCallback" reports a realistic count for "now", however, what it actually returns is closer to an additive diff between this and the previous report. Thus, if two callbacks happen in quick succession we're effectively zeroing the actual data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
@@ -76,7 +86,7 @@ func TestDeploymentInsights(t *testing.T) { | |||
workspace := coderdtest.CreateWorkspace(t, client, template.ID) | |||
coderdtest.AwaitWorkspaceBuildJobCompleted(t, client, workspace.LatestBuild.ID) | |||
|
|||
ctx := testutil.Context(t, testutil.WaitLong) | |||
ctx := testutil.Context(t, testutil.WaitSuperLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: In race mode, propagating the agent connection stats can take a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
agent/stats.go
Outdated
// Accumulate stats until they've been reported. | ||
if s.unreported { | ||
if s.networkStats == nil && virtual != nil { | ||
s.networkStats = make(map[netlogtype.Connection]netlogtype.Counts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's save some allocations.
s.networkStats = make(map[netlogtype.Connection]netlogtype.Counts) | |
s.networkStats = make(map[netlogtype.Connection]netlogtype.Counts, len(virtual)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never actually benchmarked how much a difference a size hint gives for maps, especially ones that don't have a lot of data. Is there a significant difference?
Your suggestion made me realize this had a better fix 😄.
} else { | ||
s.networkStats = maps.Clone(virtual) | ||
s.unreported = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
Co-authored-by: Danny Kopping <danny@coder.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR addresses a flake in
TestDeploymentInsights
caused by missing agent network stats. It also fixes the assumption that we should discard and not accumulate agent network stats if we can't keep up. Without accumulation we risk losing data.Fixes coder/internal#259