Add allocation write load stats to write thread pool #130373

DiannaHohensee · 2025-07-01T04:41:14Z

Instrument the WRITE thread pool to collect:

pool utilization EWMA
queue latency EWMA

Relates ES-12233

See the ES ticket for details. I would like to collect some agreement on the approach. For example, taking an EWMA of the queue latencies; and taking the EWMA of the thread utilization metric, which is sampled periodically by the APM system. This approach is notably attempting to be 'good enough' for a first implementation of the write load balancing.

I still need to work out the thread pool utilization EWMA's alpha seed. I believe it must be tied to the APM period in order to be sensible. Unfortunately the APM collection frequency is configurable as well. Unless I make a separate and parallel utilization sampler, we're stuck with a dependency on APM's frequency. Anyway, still working on that bit.

Instrument the WRITE thread pool to collect: 1) pool utilization EWMA 2) queue latency EWMA Relates ES-12233

ywangd · 2025-07-01T05:51:36Z

.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java

+            if (queueLatencyMillis > 0) {
+                queueLatencyMillisEWMA.addValue(queueLatencyMillis);
+            }


I think 0 is meaningful for queue time and should be included?

ywangd · 2025-07-01T06:18:25Z

.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java

+        if (trackUtilizationEWMA) {
+            percentPoolUtilizationEWMA.addValue(utilizationSinceLastPoll);
+            // Test only tracking.
+            assert setUtilizationSinceLastPoll(utilizationSinceLastPoll);
+        }


It feels odd to tie utilization polling to APM. I think they can be separate. For example, we can have a separate set of lastTotalExecutionTime and lastPollTime which should allow us compute the utlization in a different frequency than the APM?

Yeah the decision originally was because it avoided the need for an extra thread etc. given that we're now extending it's use, I think it makes sense to rethink all of that.

it sounds like we want a more sophisticated average utilisation number.

If we're willing to wear the cost of a thread to do the polling, I think it makes sense for that thread to maintain THE average (which will become an EWMA) and publish that as the es.thread_pool.{name}.threads.utilization.current value.

ywangd · 2025-07-01T06:20:11Z

.../org/elasticsearch/common/util/concurrent/TaskExecutionTimeTrackingEsThreadPoolExecutor.java

+    public double getPercentPoolUtilizationEWMA() {
+        if (trackUtilizationEWMA == false) {
+            return 0;
+        }
+        return this.percentPoolUtilizationEWMA.getAverage();
+    }
+
+    public double getQueuedTaskLatencyMillisEWMA() {
+        if (trackQueueLatencyEWMA == false) {
+            return 0;
+        }
+        return queueLatencyMillisEWMA.getAverage();
+    }


Some service needs to call these methods. Is it going to be AverageWriteLoadSampler or is it still in discussion?

ywangd · 2025-07-01T06:24:03Z

server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java

+    /**
+     * If the queue latency reaches a high value (e.g. 10-30 seconds), then this thread pool is overwhelmed. It may be temporary, but that
+     * spike warrants the allocation balancer adjusting some number of shards, if possible. Therefore, it is alright to react quickly.
+     *
+     * As an example, suppose the EWMA is 10_000ms, i.e. 10 seconds.
+     * A single task in the queue that takes 30_000ms, i.e. 30 seconds, would result in a new EWMA of ~12_000ms
+     * 0.1 x 30_000ms + 0.9 x 10_000 = 3_000ms + 9_000ms = 12_000ms
+     */
+    public static final double DEFAULT_WRITE_THREAD_POOL_QUEUE_LATENCY_EWMA_ALPHA = 0.1;


Intuitively, I feel this value could use some relationship with the one for executionEWMA since they are updated with the same fequency before and after the task execution, respectively. If the node has N write threads and we observe N large execution times in a row, it feels roughly a similar effect as seeing a single large queue latency, i.e. all threads were busy executing tasks that took long to complete. This makes me think whether this value should be N * executionEWMA_alpha. I could be be wrong on this. Just an idea.

I'm struggling a bit with the ewma used in this way. We get a value weighted towards the last N values, those could have arrived in the last 0.5ms or the last 30seconds, so it seems difficult to reason about. I wonder if it'd be easier to reason about if the queue were polled at a fixed interval (by the same regular thread that we introduce to poll the ulitlization) and ask the task at the front of the queue how long it'd been waiting?

That way I think the utilisation and latency would reflect observations over the same fixed time period and we could tweak the alpha values to find a period that was meaningful.

DiannaHohensee self-assigned this Jul 1, 2025

elasticsearchmachine added the v9.2.0 label Jul 1, 2025

DiannaHohensee added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team and removed v9.2.0 labels Jul 1, 2025

Add allocation write load stats to write thread pool

4037ccf

Instrument the WRITE thread pool to collect: 1) pool utilization EWMA 2) queue latency EWMA Relates ES-12233

DiannaHohensee force-pushed the 2025/06/30/add-node-level-write-load-stats branch from 4a7987e to 4037ccf Compare July 1, 2025 04:43

DiannaHohensee requested review from nicktindall and ywangd July 1, 2025 04:48

ywangd reviewed Jul 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add allocation write load stats to write thread pool #130373

Add allocation write load stats to write thread pool #130373

DiannaHohensee commented Jul 1, 2025 •

edited

Loading

Uh oh!

ywangd Jul 1, 2025

Uh oh!

ywangd Jul 1, 2025

Uh oh!

nicktindall Jul 1, 2025

Uh oh!

ywangd Jul 1, 2025

Uh oh!

ywangd Jul 1, 2025

Uh oh!

nicktindall Jul 1, 2025

Uh oh!

Uh oh!

Add allocation write load stats to write thread pool #130373

Are you sure you want to change the base?

Add allocation write load stats to write thread pool #130373

Conversation

DiannaHohensee commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywangd Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DiannaHohensee commented Jul 1, 2025 •

edited

Loading