-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] BlockingBuffer.bufferUsage metric does not include records in-flight #3936
Comments
An alternative solution is to also update the Or we could add a new metric that expresses both: |
I think there is still value in knowing the number of |
As discussed offline, I think it is better to have a metric that directly corresponds to the semaphore count that controls the capacity |
I'm thinking of making two changes:
|
…between the bufferCapacity and the available permits in the semaphore. Adds a new capacityUsed metric which tracks the actual capacity used by the semaphore which blocks. Resolves opensearch-project#3936. Signed-off-by: David Venable <[email protected]>
…between the bufferCapacity and the available permits in the semaphore. Adds a new capacityUsed metric which tracks the actual capacity used by the semaphore which blocks. Resolves #3936. (#3937) Signed-off-by: David Venable <[email protected]>
…between the bufferCapacity and the available permits in the semaphore. Adds a new capacityUsed metric which tracks the actual capacity used by the semaphore which blocks. Resolves #3936. (#3937) Signed-off-by: David Venable <[email protected]> (cherry picked from commit d61b0c5)
…between the bufferCapacity and the available permits in the semaphore. Adds a new capacityUsed metric which tracks the actual capacity used by the semaphore which blocks. Resolves #3936. (#3937) (#3940) Signed-off-by: David Venable <[email protected]> (cherry picked from commit d61b0c5) Co-authored-by: David Venable <[email protected]>
Describe the bug
The
BlockingBuffer.bufferUsage
metric is inaccurate. It indicates that it is the percentage of the buffer used. However, it is only the percentage of the buffer used for messages that are waiting. It does not include in-flight messages.Expected behavior
I expect this metric to include both the records waiting and the records-in-flight when calculating the usage.
Screenshots
We sent these metrics to AWS CloudWatch. You can see in the screenshot below that the sum of
recordsInBuffer
andrecordsInFlight
(the line in blue) reaches near the maximum defined size of 1,000,000 records. However, thebufferUsage
metric is around 70% at the time that happens.Additional context
This can lead to confusion when trying to see why events are not writing to the buffer. The metrics indicate that there is capacity, but we fail to write to the buffer due to a timeout exception.
You can see that below.
data-prepper/data-prepper-plugins/blocking-buffer/src/main/java/org/opensearch/dataprepper/plugins/buffer/blockingbuffer/BlockingBuffer.java
Lines 120 to 125 in b0d253c
This code is what appears to be incorrect. It does not account for the records in-flight.
data-prepper/data-prepper-plugins/blocking-buffer/src/main/java/org/opensearch/dataprepper/plugins/buffer/blockingbuffer/BlockingBuffer.java
Lines 202 to 205 in b0d253c
The text was updated successfully, but these errors were encountered: