Skip to content

Commit eef702e

Browse files
authored
Implement self-monitoring for BanyanDB via OAP Server and UI (#13527)
1 parent eabddf2 commit eef702e

File tree

19 files changed

+1696
-34
lines changed

19 files changed

+1696
-34
lines changed

.github/workflows/skywalking.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,8 @@ jobs:
366366
config: test/e2e-v2/cases/storage/banyandb/e2e.yaml
367367
- name: BanyanDB TLS
368368
config: test/e2e-v2/cases/storage/banyandb/tls/e2e.yaml
369+
- name: BanyanDB monitoring
370+
config: test/e2e-v2/cases/banyandb/e2e.yaml
369371
- name: Storage MySQL
370372
config: test/e2e-v2/cases/storage/mysql/e2e.yaml
371373
- name: Storage PostgreSQL
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# BanyanDB self observability dashboard
2+
3+
[BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/), as an observability database, aims to ingest, analyze and store Metrics, Tracing, and Logging data. It's designed to handle observability data generated by **Apache SkyWalking**,it also provides a dashboard to visualize the self-observability metrics.
4+
5+
## Data flow
6+
1. [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/) collects metrics data internally and exposes a Prometheus http endpoint to retrieve the metrics.
7+
2. OpenTelemetry Collector fetches metrics from BanyanDB and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter.
8+
3. The SkyWalking OAP Server parses the expression with [MAL](../concepts-and-designs/mal.md) to filter/calculate/aggregate and store the results.
9+
10+
## Set up
11+
1. Start [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/),supporting both [Standalone Mode](https://skywalking.apache.org/docs/skywalking-banyandb/next/installation/standalone/) and [Cluster Mode](https://skywalking.apache.org/docs/skywalking-banyandb/next/installation/cluster/).
12+
2. Set up [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/getting-started/#docker). For details on Prometheus Receiver in OpenTelemetry Collector, refer to [here](../../../test/e2e-v2/cases/banyandb/otel-collector-config.yaml).
13+
3. Config SkyWalking [OpenTelemetry receiver](https://skywalking.apache.org/docs/main/next/en/setup/backend/opentelemetry-receiver/).
14+
15+
## BanyanDB monitoring
16+
Self observability monitoring provides monitoring of the status and resources of the [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/) server itself. `banyandb-server` is a `Service` in BanyanDB, and land on the `Layer: BANYANDB`.
17+
18+
### Self observability metrics
19+
20+
| Unit | Metric Name | Description | Data Source |
21+
|------|---------------------------------------------------|-------------|-------------|
22+
| o/s | meter_banyandb_write_rate | Write Rate (Operations per Second) | BanyanDB |
23+
| GiB | meter_banyandb_total_memory | Total Memory | BanyanDB |
24+
| GiB | meter_banyandb_disk_usage | Disk Usage | BanyanDB |
25+
| r/s | meter_banyandb_query_rate | Query Rate (Requests per Second) | BanyanDB |
26+
| Count | meter_banyandb_total_cpu | Total CPU Cores | BanyanDB |
27+
| c/m | meter_banyandb_write_and_query_errors_rate | Write and Query Errors Rate(Counts per Minute) | BanyanDB |
28+
| c/s | meter_banyandb_etcd_operation_rate | Etcd Operation Rate(Counts per Second) | BanyanDB |
29+
| Count | meter_banyandb_active_instance | Active Instances | BanyanDB |
30+
| % | meter_banyandb_cpu_usage | CPU Usage Percentage | BanyanDB |
31+
| % | meter_banyandb_rss_memory_usage | RSS Memory Usage Percentage | BanyanDB |
32+
| % | meter_banyandb_disk_usage_all | Disk Usage Percentage | BanyanDB |
33+
| KiB/s | meter_banyandb_network_usage_recv | Network Receive Rate | BanyanDB |
34+
| KiB/s | meter_banyandb_network_usage_sent | Network Send Rate | BanyanDB |
35+
| o/s | meter_banyandb_storage_write_rate | Storage Write Rate (Operations per Second) | BanyanDB |
36+
| s | meter_banyandb_query_latency | Query Latency (s) | BanyanDB |
37+
| Count | meter_banyandb_total_data | Total Data Elements | BanyanDB |
38+
| r/m | meter_banyandb_merge_file_data | Merge File Data Rate(Revolutions per Minute) | BanyanDB |
39+
| s | meter_banyandb_merge_file_latency | Merge File Latency(s) | BanyanDB |
40+
| Count | meter_banyandb_merge_file_partitions | Merge File Partitions | BanyanDB |
41+
| o/s | meter_banyandb_series_write_rate | Series Write Rate (Operations per Second) | BanyanDB |
42+
| o/s | meter_banyandb_series_term_search_rate | Series Term Search Rate (Operations per Second) | BanyanDB |
43+
| Count | meter_banyandb_total_series | Total Series Count | BanyanDB |
44+
| ops | meter_banyandb_stream_write_rate | Stream Write Rate (Operations per Second) | BanyanDB |
45+
| ops | meter_banyandb_term_search_rate | Term Search Rate (Operations per Second) | BanyanDB |
46+
| Count | meter_banyandb_total_document | Total Document Count | BanyanDB |
47+
48+
## Customizations
49+
You can customize your own metrics/expression/dashboard panel.The metrics definition and expression rules are found in `/config/otel-rules/banyandb`.The [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/) dashboard panel configurations are found in `/config/ui-initialized-templates/banyandb`.

docs/en/changes/changes.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
#### OAP Server
1212

13+
* Implement self-monitoring for BanyanDB via OAP Server.
1314
* BanyanDB: Support `hot/warm/cold` stages configuration.
1415
* Fix query continues profiling policies error when the policy is already in the cache.
1516
* Support `hot/warm/cold` stages TTL query in the status API and graphQL API.
@@ -116,6 +117,7 @@
116117

117118
#### UI
118119

120+
* Implement self-monitoring for BanyanDB via UI.
119121
* Enhance the trace `List/Tree/Table` graph to support displaying multiple refs of spans and distinguishing different parents.
120122
* Fix: correct the same labels for metrics.
121123
* Refactor: use the Fetch API to instead of Axios.

0 commit comments

Comments
 (0)