Skip to content

Commit

Permalink
PrometheusMetric class (#770)
Browse files Browse the repository at this point in the history
  • Loading branch information
WindzCUHK authored and havetisyan committed Sep 8, 2019
1 parent 6e96d48 commit 3274dbc
Show file tree
Hide file tree
Showing 17 changed files with 1,464 additions and 0 deletions.
27 changes: 27 additions & 0 deletions contributions/metric/prometheus/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Compiled class file
target/*
*.class

# Log file
*.log

# BlueJ files
*.ctxt

# Mobile Tools for Java (J2ME)
.mtj.tmp/

# Package Files #
*.jar
*.war
*.nar
*.ear
*.zip
*.tar.gz
*.rar

# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

# hidden files
.*
86 changes: 86 additions & 0 deletions contributions/metric/prometheus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<a id="markdown-athenz-metric-for-prometheus" name="athenz-metric-for-prometheus"></a>
# Athenz metric for Prometheus
Athenz Yahoo Server metrics interface implementation for Prometheus

<!-- TOC -->

- [Athenz metric for Prometheus](#athenz-metric-for-prometheus)
- [Usage](#usage)
- [Build](#build)
- [Integrate with Athenz](#integrate-with-athenz)
- [For developer](#for-developer)
- [Test coverage](#test-coverage)
- [Performance test result](#performance-test-result)
- [Design concerns](#design-concerns)
- [example main for integration test](#example-main-for-integration-test)

<!-- /TOC -->

<a id="markdown-usage" name="usage"></a>
## Usage

<a id="markdown-build" name="build"></a>
### Build
```bash
mvn clean package
ls ./target/athenz_metrics_prometheus-*.jar
```

<a id="markdown-integrate-with-athenz" name="integrate-with-athenz"></a>
### Integrate with Athenz
1. add `athenz_metrics_prometheus-*.jar` in Athenz server's classpath
1. overwrite existing system property
```properties
# ZMS server
athenz.zms.metric_factory_class=com.yahoo.athenz.common.metrics.impl.prometheus.PrometheusMetricFactory

# ZTS server
athenz.zts.metric_factory_class=com.yahoo.athenz.common.metrics.impl.prometheus.PrometheusMetricFactory
```
1. add system property for `PrometheusMetric`
```properties
# enable PrometheusMetric class
athenz.metrics.prometheus.enable=true
# export JVM metrics
athenz.metrics.prometheus.jvm.enable=true
# the Prometheus /metrics endpoint
athenz.metrics.prometheus.http_server.enable=true
athenz.metrics.prometheus.http_server.port=8181
# Prometheus metric prefix
athenz.metrics.prometheus.namespace=athenz_zms
# for dev. env. ONLY, record Athenz domain data as label
athenz.metrics.prometheus.label.request_domain_name.enable=false
athenz.metrics.prometheus.label.principal_domain_name.enable=false
```
1. verify setup: `curl localhost:8181/metrics`
1. add job in your Prometheus server
```yaml
scrape_configs:
- job_name: 'athenz-server'
scrape_interval: 10s
honor_labels: true
static_configs:
- targets: ['athenz.server.domain:8181']
```

<a id="markdown-for-developer" name="for-developer"></a>
## For developer

<a id="markdown-test-coverage" name="test-coverage"></a>
### Test coverage
```bash
mvn clover:instrument clover:aggregate clover:clover clover:check
open ./target/site/clover/index.html
```

<a id="markdown-performance-test-result" name="performance-test-result"></a>
### Performance test result
- [performance.md](./doc/performance.md)

<a id="markdown-design-concerns" name="design-concerns"></a>
### Design concerns
- [design-concerns.md](./doc/design-concerns.md)

<a id="markdown-example-main-for-integration-test" name="example-main-for-integration-test"></a>
### example main for integration test
- [example-main.md](./doc/example-main.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Label,# Samples,Average,Min,Max,Std. Dev.,Error %,Throughput,Received KB/sec,Sent KB/sec,Avg. Bytes
get metrics,537,223,122,501,46.75,0.000%,4.47139,2819.47,0.62,645691.9
TOTAL,537,223,122,501,46.75,0.000%,4.47139,2819.47,0.62,645691.9
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Label,# Samples,Average,Min,Max,Std. Dev.,Error %,Throughput,Received KB/sec,Sent KB/sec,Avg. Bytes
get metrics,5372,22,18,646,10.21,0.000%,44.76443,421.94,6.25,9651.9
TOTAL,5372,22,18,646,10.21,0.000%,44.76443,421.94,6.25,9651.9
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Label,# Samples,Average,Min,Max,Std. Dev.,Error %,Throughput,Received KB/sec,Sent KB/sec,Avg. Bytes
get user token,48,429,162,707,153.93,0.000%,30.88803,22.50,6.94,745.9
get domain list,61130,26,10,333,13.96,0.026%,510.33953,75.11,392.08,150.7
get role list,61113,66,33,576,22.58,0.034%,510.39361,57.79,401.56,115.9
TOTAL,122291,46,10,707,28.67,0.030%,1018.69268,132.88,791.83,133.6
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Label,# Samples,Average,Min,Max,Std. Dev.,Error %,Throughput,Received KB/sec,Sent KB/sec,Avg. Bytes
get user token,48,399,125,988,199.86,0.000%,27.11864,19.75,6.09,745.9
get domain list,89780,27,10,642,16.25,0.012%,499.12717,73.27,383.53,150.3
get role list,89769,68,32,536,23.14,0.035%,499.19923,56.53,392.76,116.0
TOTAL,179597,47,10,988,29.35,0.023%,997.47295,129.85,775.48,133.3
20 changes: 20 additions & 0 deletions contributions/metric/prometheus/doc/design-concerns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Design concerns

1. metric name format
1. `{namespace}_{metric}_{unit}`
1. namespace = set by system properties
1. metric hard coded inside Athenz
1. unit = `total` or `seconds`
1. reference: [Metric and label naming | Prometheus](https://prometheus.io/docs/practices/naming/#metric-names)
1. labels for `requestDomainName` and `principalDomainName`
1. disable by default
1. reasons
1. not a suggested way in Prometheus
- [Instrumentation#Use labels | Prometheus](https://prometheus.io/docs/practices/instrumentation/#use-labels)
- [Instrumentation#Do not overuse labels | Prometheus](https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels)
1. the response's size of the `/metrics` request will become very large, causing bandwidth/latency problem at the prometheus side
- [performance test result](./performance.md#without-domain-vs-with-2000-domain-prometheus-endpoint)
1. Prometheus pull as default
1. require same network (Prometheus server, Athenz server)
1. the suggested deployment for Prometheus
1. open firewall port for Grafana for query from prometheus server
70 changes: 70 additions & 0 deletions contributions/metric/prometheus/doc/example-main.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Example main

`Main.java`
```java
package com.yahoo.athenz.common.metrics;

import com.yahoo.athenz.common.metrics.Metric;
import com.yahoo.athenz.common.metrics.impl.prometheus.PrometheusMetricFactory;

public class Main {
public static void main(String[] args) throws InterruptedException {
System.out.println("PrometheusMetric start");

PrometheusMetricFactory pmf = new PrometheusMetricFactory();
Metric pm = pmf.create();

// counter
pm.increment("request_no_label");
pm.increment("request01", null, 5);
pm.increment("request01", "domain01", 10);
pm.increment("request01", "domain02", 20);

// timer
Object timer = pm.startTiming("timer_test", null);
Thread.sleep(99L);
pm.stopTiming(timer);

Object timerD = pm.startTiming("timer_test_domain", "domain01");
Thread.sleep(111L);
pm.stopTiming(timerD);

// flush
System.out.println("before flush...");
pm.flush();
System.out.println("If you are using pull exporter, run 'curl localhost:8181/metrics' to verify");

// quit
System.out.println("wait 1 min, before quit...");
Thread.sleep(1L * 1000 * 60);
pm.quit();
}
}
```

## Run
```bash
cat > "$(git rev-parse --show-toplevel)/contributions/metric/prometheus/src/main/java/com/yahoo/athenz/common/metrics/Main.java"
# copy and paste the Main.java's content
cd "$(git rev-parse --show-toplevel)/contributions/metric/prometheus"
mvn package exec:java -Dexec.mainClass="com.yahoo.athenz.common.metrics.Main"
```

## sample output (with default values)
```bash
$ curl localhost:8181/metrics
# HELP athenz_server_request_no_label_total request_no_label_total
# TYPE athenz_server_request_no_label_total counter
athenz_server_request_no_label_total{domain="",principal="",} 1.0
# HELP athenz_server_request01_total request01_total
# TYPE athenz_server_request01_total counter
athenz_server_request01_total{domain="",principal="",} 35.0
# HELP athenz_server_timer_test_domain_seconds timer_test_domain_seconds
# TYPE athenz_server_timer_test_domain_seconds summary
athenz_server_timer_test_domain_seconds_count{domain="",principal="",} 1.0
athenz_server_timer_test_domain_seconds_sum{domain="",principal="",} 0.113545231
# HELP athenz_server_timer_test_seconds timer_test_seconds
# TYPE athenz_server_timer_test_seconds summary
athenz_server_timer_test_seconds_count{domain="",principal="",} 1.0
athenz_server_timer_test_seconds_sum{domain="",principal="",} 0.101996235
```
17 changes: 17 additions & 0 deletions contributions/metric/prometheus/doc/performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Test Summary

## NoOps V.S. Prometheus (Athenz endpoint)
- [Using NoOpMetric](./assets/jmeter/no_ops/summary.csv)
- [Using PrometheusMetric](./assets/jmeter/prometheus/summary.csv)

### Conclusion
- Throughput: (499-510)/510 * 100% = `-2.16%`
- **not much performance impact on existing API**

## without domain V.S. with 2000 domain (prometheus endpoint)
- [label disabled](./assets/jmeter/metric-no-label/summary.csv)
- [label enabled, with 2000 domain as label](./assets/jmeter/metric-2000-domain/summary.csv)

### Conclusion
- Throughput: (4-44)/44 * 100% = `-90.9%`
- **should not enable metric label for Athenz domain**
Loading

0 comments on commit 3274dbc

Please sign in to comment.