-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add allocation write load stats to write thread pool #130373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add allocation write load stats to write thread pool #130373
Conversation
Instrument the WRITE thread pool to collect: 1) pool utilization EWMA 2) queue latency EWMA Relates ES-12233
4a7987e
to
4037ccf
Compare
if (queueLatencyMillis > 0) { | ||
queueLatencyMillisEWMA.addValue(queueLatencyMillis); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 0 is meaningful for queue time and should be included?
if (trackUtilizationEWMA) { | ||
percentPoolUtilizationEWMA.addValue(utilizationSinceLastPoll); | ||
// Test only tracking. | ||
assert setUtilizationSinceLastPoll(utilizationSinceLastPoll); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels odd to tie utilization polling to APM. I think they can be separate. For example, we can have a separate set of lastTotalExecutionTime
and lastPollTime
which should allow us compute the utlization in a different frequency than the APM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the decision originally was because it avoided the need for an extra thread etc. given that we're now extending it's use, I think it makes sense to rethink all of that.
it sounds like we want a more sophisticated average utilisation number.
If we're willing to wear the cost of a thread to do the polling, I think it makes sense for that thread to maintain THE average (which will become an EWMA) and publish that as the es.thread_pool.{name}.threads.utilization.current
value.
public double getPercentPoolUtilizationEWMA() { | ||
if (trackUtilizationEWMA == false) { | ||
return 0; | ||
} | ||
return this.percentPoolUtilizationEWMA.getAverage(); | ||
} | ||
|
||
public double getQueuedTaskLatencyMillisEWMA() { | ||
if (trackQueueLatencyEWMA == false) { | ||
return 0; | ||
} | ||
return queueLatencyMillisEWMA.getAverage(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some service needs to call these methods. Is it going to be AverageWriteLoadSampler
or is it still in discussion?
/** | ||
* If the queue latency reaches a high value (e.g. 10-30 seconds), then this thread pool is overwhelmed. It may be temporary, but that | ||
* spike warrants the allocation balancer adjusting some number of shards, if possible. Therefore, it is alright to react quickly. | ||
* | ||
* As an example, suppose the EWMA is 10_000ms, i.e. 10 seconds. | ||
* A single task in the queue that takes 30_000ms, i.e. 30 seconds, would result in a new EWMA of ~12_000ms | ||
* 0.1 x 30_000ms + 0.9 x 10_000 = 3_000ms + 9_000ms = 12_000ms | ||
*/ | ||
public static final double DEFAULT_WRITE_THREAD_POOL_QUEUE_LATENCY_EWMA_ALPHA = 0.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively, I feel this value could use some relationship with the one for executionEWMA
since they are updated with the same fequency before and after the task execution, respectively. If the node has N
write threads and we observe N
large execution times in a row, it feels roughly a similar effect as seeing a single large queue latency, i.e. all threads were busy executing tasks that took long to complete. This makes me think whether this value should be N * executionEWMA_alpha
. I could be be wrong on this. Just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling a bit with the ewma used in this way. We get a value weighted towards the last N values, those could have arrived in the last 0.5ms or the last 30seconds, so it seems difficult to reason about. I wonder if it'd be easier to reason about if the queue were polled at a fixed interval (by the same regular thread that we introduce to poll the ulitlization) and ask the task at the front of the queue how long it'd been waiting?
That way I think the utilisation and latency would reflect observations over the same fixed time period and we could tweak the alpha values to find a period that was meaningful.
Instrument the WRITE thread pool to collect:
Relates ES-12233
See the ES ticket for details. I would like to collect some agreement on the approach. For example, taking an EWMA of the queue latencies; and taking the EWMA of the thread utilization metric, which is sampled periodically by the APM system. This approach is notably attempting to be 'good enough' for a first implementation of the write load balancing.
I still need to work out the thread pool utilization EWMA's alpha seed. I believe it must be tied to the APM period in order to be sensible. Unfortunately the APM collection frequency is configurable as well. Unless I make a separate and parallel utilization sampler, we're stuck with a dependency on APM's frequency. Anyway, still working on that bit.