-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release #14160
Comments
@rodolfobrunner Does this issue happen while using the same deployment as #14144? I'm trying to reproduce it. |
Hey @ProBrian, I am part of the same team as @rodolfobrunner. Yes, it's the same deployment. |
Hello @ProBrian some additional info on what we're seeing. We added at some point some debug instructions to figure out what was being stored. Something like: -- Adapted from the prometheus metric_data function
local function collect()
ngx.header["Content-Type"] = "text/plain; charset=UTF-8"
ngx.header["Kong-NodeId"] = node_id
local prometheus = exporter.get_prometheus()
local write_fn = ngx.print
local keys = prometheus.dict:get_keys(0) <--- this is the ngx.shared["prometheus_metrics"]
local count = #keys
table_sort(keys)
local output = buffer.new(DATA_BUFFER_SIZE_HINT)
local output_count = 0
local function buffered_print(fmt, ...)
if fmt then
output_count = output_count + 1
output:putf(fmt, ...)
end
if output_count >= 100 or not fmt then
write_fn(output:get()) -- consume the whole buffer
output_count = 0
end
end
for i = 1, count do
local key = keys[i]
local value = prometheus.dict[key]
buffered_print("%s: %s\n", key, value)
end
buffered_print(nil)
output:free()
end ... which outputs (when the error occurs):
How can a dictionary support duplicate keys? Even if it's a shared dictionary? |
Is there an existing issue for this?
Kong version (
$ kong version
)3.7.1 / 3.9.0
Current Behavior
I am having problems with metrics & prometheus plugin after bumping to the 3.7.1 release. (I already bumped Kong until 3.9.0 and the issue still persists)
I have the following entry in my logs:
[lua] prometheus.lua:1020: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, client: 10.145.40.1, server: kong_status, request: "GET /metrics HTTP/1.1", host: "10.145.12.54:8100"
Interesting facts:
I already tried:
One pod contains:
While another is missing the le "80"
We are running our Kong in AWS EKS, upgraded from 3.6.1
Expected Behavior
The bucket should not disappear, but if it does for any reason I would expect Kong to be able to recover from an inconsistent state. (maybe metric reset?)
Steps To Reproduce
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: