diff --git a/.github/workflows/skywalking.yaml b/.github/workflows/skywalking.yaml index 0bd11fca045c..0850b6750dc8 100644 --- a/.github/workflows/skywalking.yaml +++ b/.github/workflows/skywalking.yaml @@ -366,6 +366,8 @@ jobs: config: test/e2e-v2/cases/storage/banyandb/e2e.yaml - name: BanyanDB TLS config: test/e2e-v2/cases/storage/banyandb/tls/e2e.yaml + - name: BanyanDB monitoring + config: test/e2e-v2/cases/banyandb/e2e.yaml - name: Storage MySQL config: test/e2e-v2/cases/storage/mysql/e2e.yaml - name: Storage PostgreSQL diff --git a/docs/en/banyandb/dashboards-banyandb.md b/docs/en/banyandb/dashboards-banyandb.md new file mode 100644 index 000000000000..8c5f8bdfc058 --- /dev/null +++ b/docs/en/banyandb/dashboards-banyandb.md @@ -0,0 +1,49 @@ +# BanyanDB self observability dashboard + +[BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/), as an observability database, aims to ingest, analyze and store Metrics, Tracing, and Logging data. It's designed to handle observability data generated by **Apache SkyWalking**,it also provides a dashboard to visualize the self-observability metrics. + +## Data flow +1. [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/) collects metrics data internally and exposes a Prometheus http endpoint to retrieve the metrics. +2. OpenTelemetry Collector fetches metrics from BanyanDB and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter. +3. The SkyWalking OAP Server parses the expression with [MAL](../concepts-and-designs/mal.md) to filter/calculate/aggregate and store the results. + +## Set up +1. Start [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/),supporting both [Standalone Mode](https://skywalking.apache.org/docs/skywalking-banyandb/next/installation/standalone/) and [Cluster Mode](https://skywalking.apache.org/docs/skywalking-banyandb/next/installation/cluster/). +2. Set up [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/getting-started/#docker). For details on Prometheus Receiver in OpenTelemetry Collector, refer to [here](../../../test/e2e-v2/cases/banyandb/otel-collector-config.yaml). +3. Config SkyWalking [OpenTelemetry receiver](https://skywalking.apache.org/docs/main/next/en/setup/backend/opentelemetry-receiver/). + +## BanyanDB monitoring +Self observability monitoring provides monitoring of the status and resources of the [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/) server itself. `banyandb-server` is a `Service` in BanyanDB, and land on the `Layer: BANYANDB`. + +### Self observability metrics + +| Unit | Metric Name | Description | Data Source | +|------|---------------------------------------------------|-------------|-------------| +| o/s | meter_banyandb_write_rate | Write Rate (Operations per Second) | BanyanDB | +| GiB | meter_banyandb_total_memory | Total Memory | BanyanDB | +| GiB | meter_banyandb_disk_usage | Disk Usage | BanyanDB | +| r/s | meter_banyandb_query_rate | Query Rate (Requests per Second) | BanyanDB | +| Count | meter_banyandb_total_cpu | Total CPU Cores | BanyanDB | +| c/m | meter_banyandb_write_and_query_errors_rate | Write and Query Errors Rate(Counts per Minute) | BanyanDB | +| c/s | meter_banyandb_etcd_operation_rate | Etcd Operation Rate(Counts per Second) | BanyanDB | +| Count | meter_banyandb_active_instance | Active Instances | BanyanDB | +| % | meter_banyandb_cpu_usage | CPU Usage Percentage | BanyanDB | +| % | meter_banyandb_rss_memory_usage | RSS Memory Usage Percentage | BanyanDB | +| % | meter_banyandb_disk_usage_all | Disk Usage Percentage | BanyanDB | +| KiB/s | meter_banyandb_network_usage_recv | Network Receive Rate | BanyanDB | +| KiB/s | meter_banyandb_network_usage_sent | Network Send Rate | BanyanDB | +| o/s | meter_banyandb_storage_write_rate | Storage Write Rate (Operations per Second) | BanyanDB | +| s | meter_banyandb_query_latency | Query Latency (s) | BanyanDB | +| Count | meter_banyandb_total_data | Total Data Elements | BanyanDB | +| r/m | meter_banyandb_merge_file_data | Merge File Data Rate(Revolutions per Minute) | BanyanDB | +| s | meter_banyandb_merge_file_latency | Merge File Latency(s) | BanyanDB | +| Count | meter_banyandb_merge_file_partitions | Merge File Partitions | BanyanDB | +| o/s | meter_banyandb_series_write_rate | Series Write Rate (Operations per Second) | BanyanDB | +| o/s | meter_banyandb_series_term_search_rate | Series Term Search Rate (Operations per Second) | BanyanDB | +| Count | meter_banyandb_total_series | Total Series Count | BanyanDB | +| ops | meter_banyandb_stream_write_rate | Stream Write Rate (Operations per Second) | BanyanDB | +| ops | meter_banyandb_term_search_rate | Term Search Rate (Operations per Second) | BanyanDB | +| Count | meter_banyandb_total_document | Total Document Count | BanyanDB | + +## Customizations +You can customize your own metrics/expression/dashboard panel.The metrics definition and expression rules are found in `/config/otel-rules/banyandb`.The [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/next/readme/) dashboard panel configurations are found in `/config/ui-initialized-templates/banyandb`. diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md index e2bd5937b8bc..caa250e63cdc 100644 --- a/docs/en/changes/changes.md +++ b/docs/en/changes/changes.md @@ -10,6 +10,7 @@ #### OAP Server +* Implement self-monitoring for BanyanDB via OAP Server. * BanyanDB: Support `hot/warm/cold` stages configuration. * Fix query continues profiling policies error when the policy is already in the cache. * Support `hot/warm/cold` stages TTL query in the status API and graphQL API. @@ -116,6 +117,7 @@ #### UI +* Implement self-monitoring for BanyanDB via UI. * Enhance the trace `List/Tree/Table` graph to support displaying multiple refs of spans and distinguishing different parents. * Fix: correct the same labels for metrics. * Refactor: use the Fetch API to instead of Axios. diff --git a/docs/en/setup/backend/opentelemetry-receiver.md b/docs/en/setup/backend/opentelemetry-receiver.md index 6e60c866b2e0..e17e309b87cc 100644 --- a/docs/en/setup/backend/opentelemetry-receiver.md +++ b/docs/en/setup/backend/opentelemetry-receiver.md @@ -28,39 +28,41 @@ for identification of the metric data. **Notice:** In the resource scope, dots (.) in the attributes' key names are converted to underscores (_), whereas in the metrics scope, they are not converted. -| Description | Configuration File | Data Source | -|-----------------------------------------|-----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------| -| Metrics of Istio Control Plane | otel-rules/istio-controlplane.yaml | Istio Control Plane -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of SkyWalking OAP server itself | otel-rules/oap.yaml | SkyWalking OAP Server(SelfObservability) -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Linux OS | otel-rules/vm.yaml | prometheus/node_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Windows OS | otel-rules/windows.yaml | prometheus-community/windows_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of K8s cluster | otel-rules/k8s/k8s-cluster.yaml | K8s kube-state-metrics -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of K8s cluster | otel-rules/k8s/k8s-node.yaml | cAdvisor & K8s kube-state-metrics -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of K8s cluster | otel-rules/k8s/k8s-service.yaml | cAdvisor & K8s kube-state-metrics -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of MYSQL | otel-rules/mysql/mysql-instance.yaml | prometheus/mysqld_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of MYSQL | otel-rules/mysql/mysql-service.yaml | prometheus/mysqld_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of PostgreSQL | otel-rules/postgresql/postgresql-instance.yaml | prometheus-community/postgres_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of PostgreSQL | otel-rules/postgresql/postgresql-service.yaml | prometheus-community/postgres_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Apache APISIX | otel-rules/apisix.yaml | apisix prometheus plugin -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of AWS Cloud EKS | otel-rules/aws-eks/eks-cluster.yaml | AWS Container Insights Receiver -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of AWS Cloud EKS | otel-rules/aws-eks/eks-service.yaml | AWS Container Insights Receiver -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of AWS Cloud EKS | otel-rules/aws-eks/eks-node.yaml | AWS Container Insights Receiver -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Elasticsearch | otel-rules/elasticsearch/elasticsearch-cluster.yaml | prometheus-community/elasticsearch_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Elasticsearch | otel-rules/elasticsearch/elasticsearch-index.yaml | prometheus-community/elasticsearch_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Elasticsearch | otel-rules/elasticsearch/elasticsearch-node.yaml | prometheus-community/elasticsearch_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Redis | otel-rules/redis/redis-service.yaml | oliver006/redis_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of Redis | otel-rules/redis/redis-instance.yaml | oliver006/redis_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of RabbitMQ | otel-rules/rabbitmq/rabbitmq-cluster.yaml | rabbitmq-prometheus -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of RabbitMQ | otel-rules/rabbitmq/rabbitmq-node.yaml | rabbitmq-prometheus -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of MongoDB | otel-rules/mongodb/mongodb-cluster.yaml | percona/mongodb_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of MongoDB | otel-rules/mongodb/mongodb-node.yaml | percona/mongodb_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Description | Configuration File | Data Source | +|-----------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------| +| Metrics of Istio Control Plane | otel-rules/istio-controlplane.yaml | Istio Control Plane -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of SkyWalking OAP server itself | otel-rules/oap.yaml | SkyWalking OAP Server(SelfObservability) -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Linux OS | otel-rules/vm.yaml | prometheus/node_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Windows OS | otel-rules/windows.yaml | prometheus-community/windows_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of K8s cluster | otel-rules/k8s/k8s-cluster.yaml | K8s kube-state-metrics -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of K8s cluster | otel-rules/k8s/k8s-node.yaml | cAdvisor & K8s kube-state-metrics -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of K8s cluster | otel-rules/k8s/k8s-service.yaml | cAdvisor & K8s kube-state-metrics -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of MYSQL | otel-rules/mysql/mysql-instance.yaml | prometheus/mysqld_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of MYSQL | otel-rules/mysql/mysql-service.yaml | prometheus/mysqld_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of PostgreSQL | otel-rules/postgresql/postgresql-instance.yaml | prometheus-community/postgres_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of PostgreSQL | otel-rules/postgresql/postgresql-service.yaml | prometheus-community/postgres_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Apache APISIX | otel-rules/apisix.yaml | apisix prometheus plugin -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of AWS Cloud EKS | otel-rules/aws-eks/eks-cluster.yaml | AWS Container Insights Receiver -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of AWS Cloud EKS | otel-rules/aws-eks/eks-service.yaml | AWS Container Insights Receiver -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of AWS Cloud EKS | otel-rules/aws-eks/eks-node.yaml | AWS Container Insights Receiver -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Elasticsearch | otel-rules/elasticsearch/elasticsearch-cluster.yaml | prometheus-community/elasticsearch_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Elasticsearch | otel-rules/elasticsearch/elasticsearch-index.yaml | prometheus-community/elasticsearch_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Elasticsearch | otel-rules/elasticsearch/elasticsearch-node.yaml | prometheus-community/elasticsearch_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Redis | otel-rules/redis/redis-service.yaml | oliver006/redis_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Redis | otel-rules/redis/redis-instance.yaml | oliver006/redis_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of RabbitMQ | otel-rules/rabbitmq/rabbitmq-cluster.yaml | rabbitmq-prometheus -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of RabbitMQ | otel-rules/rabbitmq/rabbitmq-node.yaml | rabbitmq-prometheus -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of MongoDB | otel-rules/mongodb/mongodb-cluster.yaml | percona/mongodb_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of MongoDB | otel-rules/mongodb/mongodb-node.yaml | percona/mongodb_exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | | Metrics of Kafka | otel-rules/kafka/kafka-cluster.yaml | prometheus/jmx_exporter/jmx_prometheus_javaagent -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | | Metrics of Kafka | otel-rules/kafka/kafka-broker.yaml | prometheus/jmx_exporter/jmx_prometheus_javaagent -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of ClickHouse | otel-rules/clickhouse/clickhouse-instance.yaml | ClickHouse(embedded prometheus endpoint) -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of ClickHouse | otel-rules/clickhouse/clickhouse-service.yaml | ClickHouse(embedded prometheus endpoint) -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-cluster.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-broker.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-topic.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of ClickHouse | otel-rules/clickhouse/clickhouse-instance.yaml | ClickHouse(embedded prometheus endpoint) -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of ClickHouse | otel-rules/clickhouse/clickhouse-service.yaml | ClickHouse(embedded prometheus endpoint) -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-cluster.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-broker.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-topic.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | | Metrics of Flink | otel-rules/flink/flink-jobManager.yaml | flink jobManager -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | | Metrics of Flink | otel-rules/flink/flink-taskManager.yaml | flink taskManager -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | | Metrics of Flink | otel-rules/flink/flink-job.yaml | flink jobManager & flink taskManager-> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of BanyanDB | otel-rules/banyandb/banyandb-instance.yaml | BanyanDB(embedded prometheus endpoint) -> OpenTelemetry Collector – OTLP exporter –> SkyWalking OAP Server | +| Metrics of BanyanDB | otel-rules/banyandb/banyandb-service.yaml | BanyanDB(embedded prometheus endpoint) -> OpenTelemetry Collector – OTLP exporter –> SkyWalking OAP Server | \ No newline at end of file diff --git a/docs/menu.yml b/docs/menu.yml index 702fd1923453..f2a2fe425797 100644 --- a/docs/menu.yml +++ b/docs/menu.yml @@ -210,6 +210,8 @@ catalog: path: "/en/banyandb/ttl" - name: "Data Lifecycle Stages(Hot/Warm/Cold)" path: "/en/banyandb/stages" + - name: "BanyanDB self observability dashboard" + path: "/en/banyandb/dashboards-banyandb" - name: "Tracing" catalog: - name: "Trace Sampling" diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java index 6e7380d409a5..487b88627897 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java @@ -256,7 +256,12 @@ public enum Layer { /** * Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams */ - FLINK(42, true); + FLINK(42, true), + + /** + * BanyanDB is a distributed time-series database with built-in self-monitoring for real-time tracking of system health, performance, and resource utilization. + */ + BANYANDB(43, true); private final int value; /** diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java index c859c0de2133..dd01e97c19d8 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java @@ -80,6 +80,7 @@ public class UITemplateInitializer { Layer.KONG.name(), Layer.SO11Y_GO_AGENT.name(), Layer.FLINK.name(), + Layer.BANYANDB.name(), "custom" }; private final UITemplateManagementService uiTemplateManagementService; diff --git a/oap-server/server-starter/src/main/resources/application.yml b/oap-server/server-starter/src/main/resources/application.yml index 7f082649425d..fd6356f20374 100644 --- a/oap-server/server-starter/src/main/resources/application.yml +++ b/oap-server/server-starter/src/main/resources/application.yml @@ -383,7 +383,7 @@ receiver-otel: selector: ${SW_OTEL_RECEIVER:default} default: enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"} - enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*,flink/*"} + enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*,flink/*,banyandb/*"} receiver-zipkin: selector: ${SW_RECEIVER_ZIPKIN:-} diff --git a/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-instance.yaml b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-instance.yaml new file mode 100644 index 000000000000..21955331f317 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-instance.yaml @@ -0,0 +1,86 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This will parse a textual representation of a duration. The formats +# accepted are based on the ISO-8601 duration format {@code PnDTnHnMn.nS} +# with days considered to be exactly 24 hours. +#

+# Examples: +#

+#    "PT20.345S" -- parses as "20.345 seconds"
+#    "PT15M"     -- parses as "15 minutes" (where a minute is 60 seconds)
+#    "PT10H"     -- parses as "10 hours" (where an hour is 3600 seconds)
+#    "P2D"       -- parses as "2 days" (where a day is 24 hours or 86400 seconds)
+#    "P2DT3H4M"  -- parses as "2 days, 3 hours and 4 minutes"
+#    "P-6H3M"    -- parses as "-6 hours and +3 minutes"
+#    "-P6H3M"    -- parses as "-6 hours and -3 minutes"
+#    "-P-6H+3M"  -- parses as "+6 hours and -3 minutes"
+# 
+filter: "{ tags -> tags.job_name == 'banyandb-monitoring' }" +expSuffix: tag({tags -> tags.host_name = 'banyandb::' + tags.host_name}).service(['host_name'] , Layer.BANYANDB).instance(['host_name'], ['service_instance_id'], Layer.BANYANDB) +metricPrefix: meter_banyandb +metricsRules: + - name: instance_write_rate + exp: banyandb_measure_total_written.rate('PT15S')+banyandb_stream_tst_total_written.rate('PT15S') + - name: instance_total_memory + exp: banyandb_system_memory_state.tagEqual('kind','total') + - name: instance_disk_usage + exp: banyandb_system_disk.tagEqual('kind','used').sum(['host_name','service_instance_id']) + - name: instance_query_rate + exp: banyandb_liaison_grpc_total_started.sum(['method','host_name','service_instance_id']) + - name: instance_total_cpu + exp: banyandb_system_cpu_num + - name: instance_write_and_query_errors_rate + exp: banyandb_liaison_grpc_total_err.tagEqual('method','query').sum(['method','host_name','service_instance_id']).rate('PT15S')*60 + banyandb_liaison_grpc_total_stream_msg_sent_err.sum(['host_name','service_instance_id']).rate('PT15S')*60 + banyandb_liaison_grpc_total_stream_msg_received_err.sum(['host_name','service_instance_id']).rate('PT15S')*60 + banyandb_queue_sub_total_msg_sent_err.sum(['host_name','service_instance_id']).rate('PT15S')*60 + - name: instance_etcd_operation_rate + exp: banyandb_liaison_grpc_total_registry_started.sum(['host_name','service_instance_id']).rate('PT15S') + banyandb_liaison_grpc_total_started.sum(['host_name','service_instance_id']).rate('PT15S') + - name: instance_active_instance + exp: up.sum(['host_name','service_instance_id']).downsampling(MIN) + - name: instance_cpu_usage + exp: (((process_cpu_seconds_total.sum(['host_name','service_instance_id']).rate('PT15S') / banyandb_system_cpu_num.sum(['host_name','service_instance_id']))).max(['host_name','service_instance_id']))*1000 + - name: instance_rss_memory_usage + exp: ((process_resident_memory_bytes.sum(['host_name','service_instance_id']).downsampling(MAX) / banyandb_system_memory_state.tagEqual('kind','total').sum(['host_name','service_instance_id'])).max(['host_name','service_instance_id']))*1000 + - name: instance_disk_usage_all + exp: ((banyandb_system_disk.tagEqual('kind','used').sum(['host_name','service_instance_id']) / banyandb_system_memory_state.tagEqual('kind','total').sum(['host_name','service_instance_id'])).max(['host_name','service_instance_id']))*1000 + - name: instance_network_usage_recv + exp: banyandb_system_net_state.tagEqual('kind','bytes_recv').sum(['host_name','service_instance_id']).rate('PT15S') + - name: instance_network_usage_sent + exp: banyandb_system_net_state.tagEqual('kind','bytes_sent').sum(['host_name','service_instance_id']).rate('PT15S') + - name: instance_storage_write_rate + exp: banyandb_measure_total_written.sum(['group','host_name','service_instance_id']).rate('PT15S')*1000 + - name: instance_query_latency + exp: (banyandb_liaison_grpc_total_latency.tagEqual('method','query').sum(['group','host_name','service_instance_id']).rate('PT15S') / banyandb_liaison_grpc_total_started.tagEqual('method','query').sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: instance_total_data + exp: banyandb_measure_total_file_elements.sum(['group','host_name','service_instance_id']) + - name: instance_merge_file_data + exp: banyandb_measure_total_merge_loop_started.sum(['group','host_name','service_instance_id']).rate('PT15S') * 60 *1000 + - name: instance_merge_file_latency + exp: (banyandb_measure_total_merge_latency.tagEqual('type','file').sum(['group','host_name','service_instance_id']).rate('PT15S') / banyandb_measure_total_merge_loop_started.sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: instance_merge_file_partitions + exp: (banyandb_measure_total_merged_parts.tagEqual('type','file').sum(['group','host_name','service_instance_id']).rate('PT15S') / banyandb_measure_total_merge_loop_started.sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: instance_series_write_rate + exp: (banyandb_measure_inverted_index_total_updates.sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: instance_series_term_search_rate + exp: banyandb_stream_storage_inverted_index_total_term_searchers_started.sum(['group','host_name','service_instance_id']).rate('PT15S') + - name: instance_total_series + exp: banyandb_measure_inverted_index_total_doc_count.sum(['group','host_name','service_instance_id']) + - name: instance_stream_write_rate + exp: banyandb_stream_tst_inverted_index_total_updates.sum(['group','host_name','service_instance_id']).rate('PT15S') + - name: instance_term_search_rate + exp: banyandb_stream_tst_inverted_index_total_term_searchers_started.sum(['group','host_name','service_instance_id']).rate('PT15S')* 1000 + - name: instance_total_document + exp: banyandb_stream_tst_inverted_index_total_doc_count.sum(['group','host_name','service_instance_id']) + + diff --git a/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-service.yaml b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-service.yaml new file mode 100644 index 000000000000..566f893cc4a5 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/otel-rules/banyandb/banyandb-service.yaml @@ -0,0 +1,86 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This will parse a textual representation of a duration. The formats +# accepted are based on the ISO-8601 duration format {@code PnDTnHnMn.nS} +# with days considered to be exactly 24 hours. +#

+# Examples: +#

+#    "PT20.345S" -- parses as "20.345 seconds"
+#    "PT15M"     -- parses as "15 minutes" (where a minute is 60 seconds)
+#    "PT10H"     -- parses as "10 hours" (where an hour is 3600 seconds)
+#    "P2D"       -- parses as "2 days" (where a day is 24 hours or 86400 seconds)
+#    "P2DT3H4M"  -- parses as "2 days, 3 hours and 4 minutes"
+#    "P-6H3M"    -- parses as "-6 hours and +3 minutes"
+#    "-P6H3M"    -- parses as "-6 hours and -3 minutes"
+#    "-P-6H+3M"  -- parses as "+6 hours and -3 minutes"
+# 
+filter: "{ tags -> tags.job_name == 'banyandb-monitoring' }" +expSuffix: tag({tags -> tags.host_name = 'banyandb::' + tags.host_name}).service(['host_name'] , Layer.BANYANDB) +metricPrefix: meter_banyandb +metricsRules: + - name: write_rate + exp: (banyandb_measure_total_written.sum(['host_name','service_instance_id']).rate('PT15S') + banyandb_stream_tst_total_written.sum(['host_name','service_instance_id']).rate('PT15S')) + - name: total_memory + exp: banyandb_system_memory_state.tagEqual('kind','total').sum(['host_name']) + - name: disk_usage + exp: banyandb_system_disk.tagEqual('kind','used').sum(['host_name','service_instance_id']) + - name: query_rate + exp: banyandb_liaison_grpc_total_started.sum(['method','host_name','service_instance_id']) + - name: total_cpu + exp: banyandb_system_cpu_num.sum(['method','host_name','service_instance_id']) + - name: write_and_query_errors_rate + exp: banyandb_liaison_grpc_total_err.tagEqual('method','query').sum(['method','host_name','service_instance_id']).rate('PT15S')*60 + banyandb_liaison_grpc_total_stream_msg_sent_err.sum(['host_name','service_instance_id']).rate('PT15S')*60 + banyandb_liaison_grpc_total_stream_msg_received_err.sum(['host_name','service_instance_id']).rate('PT15S')*60 + banyandb_queue_sub_total_msg_sent_err.sum(['host_name','service_instance_id']).rate('PT15S')*60 + - name: etcd_operation_rate + exp: banyandb_liaison_grpc_total_registry_started.sum(['host_name','service_instance_id']).rate('PT15S') + banyandb_liaison_grpc_total_started.sum(['host_name','service_instance_id']).rate('PT15S') + - name: active_instance + exp: up.sum(['host_name','service_instance_id']).downsampling(MIN) + - name: cpu_usage + exp: (((process_cpu_seconds_total.sum(['host_name','service_instance_id']).rate('PT15S') / banyandb_system_cpu_num.sum(['host_name','service_instance_id']))).max(['host_name','service_instance_id']))*1000 + - name: rss_memory_usage + exp: ((process_resident_memory_bytes.sum(['host_name','service_instance_id']).downsampling(MAX) / banyandb_system_memory_state.tagEqual('kind','total').sum(['host_name','service_instance_id'])).max(['host_name','service_instance_id']))*1000 + - name: disk_usage_all + exp: ((banyandb_system_disk.tagEqual('kind','used').sum(['host_name','service_instance_id']) / banyandb_system_memory_state.tagEqual('kind','total').sum(['host_name','service_instance_id'])).max(['host_name','service_instance_id']))*1000 + - name: network_usage_recv + exp: banyandb_system_net_state.tagEqual('kind','bytes_recv').sum(['host_name','service_instance_id']).rate('PT15S') + - name: network_usage_sent + exp: banyandb_system_net_state.tagEqual('kind','bytes_sent').sum(['host_name','service_instance_id']).rate('PT15S') + - name: storage_write_rate + exp: banyandb_measure_total_written.sum(['group','host_name','service_instance_id']).rate('PT15S')*1000 + - name: query_latency + exp: (banyandb_liaison_grpc_total_latency.tagEqual('method','query').sum(['group','host_name','service_instance_id']).rate('PT15S') / banyandb_liaison_grpc_total_started.tagEqual('method','query').sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: total_data + exp: banyandb_measure_total_file_elements.sum(['group','host_name','service_instance_id']) + - name: merge_file_data + exp: banyandb_measure_total_merge_loop_started.sum(['group','host_name','service_instance_id']).rate('PT15S') * 60 *1000 + - name: merge_file_latency + exp: (banyandb_measure_total_merge_latency.tagEqual('type','file').sum(['group','host_name','service_instance_id']).rate('PT15S') / banyandb_measure_total_merge_loop_started.sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: merge_file_partitions + exp: (banyandb_measure_total_merged_parts.tagEqual('type','file').sum(['group','host_name','service_instance_id']).rate('PT15S') / banyandb_measure_total_merge_loop_started.sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: series_write_rate + exp: (banyandb_measure_inverted_index_total_updates.sum(['group','host_name','service_instance_id']).rate('PT15S'))*1000 + - name: series_term_search_rate + exp: banyandb_stream_storage_inverted_index_total_term_searchers_started.sum(['group','host_name','service_instance_id']).rate('PT15S') + - name: total_series + exp: banyandb_measure_inverted_index_total_doc_count.sum(['group','host_name','service_instance_id']) + - name: stream_write_rate + exp: banyandb_stream_tst_inverted_index_total_updates.sum(['group','host_name','service_instance_id']).rate('PT15S') + - name: term_search_rate + exp: banyandb_stream_tst_inverted_index_total_term_searchers_started.sum(['group','host_name','service_instance_id']).rate('PT15S')* 1000 + - name: total_document + exp: banyandb_stream_tst_inverted_index_total_doc_count.sum(['group','host_name','service_instance_id']) + + diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-instance.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-instance.json new file mode 100644 index 000000000000..82d4c2c24645 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-instance.json @@ -0,0 +1,561 @@ +[ + { + "id": "BanyanDB-Instance", + "configuration": { + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 11, + "i": "0", + "type": "Widget", + "widget": { + "title": "Write Rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_instance_write_rate)" + ], + "metricConfig": [ + { + "label": "Write Rate" + } + ] + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 11, + "i": "1", + "type": "Widget", + "widget": { + "title": "Total memory" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_instance_total_memory/1024/1024/1024)" + ], + "metricConfig": [ + { + "unit": "Gib" + } + ] + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 11, + "i": "2", + "type": "Widget", + "widget": { + "title": "Disk usage" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_instance_disk_usage/1024/1024/1024)" + ], + "metricConfig": [ + { + "unit": "Gib" + } + ] + }, + { + "x": 0, + "y": 11, + "w": 8, + "h": 11, + "i": "3", + "type": "Widget", + "widget": { + "title": "Query rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_instance_query_rate{method=\"query\"})" + ], + "metricConfig": [ + { + "unit": "rps" + } + ] + }, + { + "x": 8, + "y": 11, + "w": 8, + "h": 11, + "i": "4", + "type": "Widget", + "widget": { + "title": "Total cpu" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_instance_total_cpu)" + ] + }, + { + "x": 16, + "y": 11, + "w": 8, + "h": 11, + "i": "5", + "type": "Widget", + "widget": { + "title": "Write and query errors rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_instance_write_and_query_errors_rate)" + ] + }, + { + "x": 8, + "y": 22, + "w": 16, + "h": 11, + "i": "6", + "type": "Widget", + "widget": { + "title": "Active Instances" + }, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_banyandb_instance_active_instance" + ] + }, + { + "x": 0, + "y": 22, + "w": 8, + "h": 11, + "i": "7", + "type": "Widget", + "widget": { + "title": "Etcd operation rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_instance_etcd_operation_rate)" + ], + "metricConfig": [ + { + "unit": "c/s" + } + ] + }, + { + "x": 0, + "y": 33, + "w": 12, + "h": 12, + "i": "8", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_cpu_usage/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Cpu Usage" + } + }, + { + "x": 12, + "y": 33, + "w": 12, + "h": 12, + "i": "9", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_rss_memory_usage/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "RSS memory usage all" + } + }, + { + "x": 0, + "y": 45, + "w": 12, + "h": 13, + "i": "10", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_disk_usage_all/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Disk Usage(ALL)" + } + }, + { + "x": 12, + "y": 45, + "w": 12, + "h": 13, + "i": "11", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_network_usage_recv/1024", + "meter_banyandb_instance_network_usage_sent/1024" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Network Usage(ALL)" + }, + "metricConfig": [ + { + "unit": "KiB/s" + }, + { + "unit": "KiB/s" + } + ] + }, + { + "x": 8, + "y": 69, + "w": 8, + "h": 11, + "i": "12", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_merge_file_latency/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Latency" + } + }, + { + "x": 16, + "y": 58, + "w": 8, + "h": 11, + "i": "13", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_total_data" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Total Data" + } + }, + { + "x": 16, + "y": 80, + "w": 8, + "h": 11, + "i": "14", + "type": "Widget", + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Total Series" + }, + "expressions": [ + "meter_banyandb_instance_total_series" + ] + }, + { + "x": 8, + "y": 80, + "w": 8, + "h": 11, + "i": "15", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_merge_file_data" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Data" + } + }, + { + "x": 16, + "y": 69, + "w": 8, + "h": 11, + "i": "16", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_merge_file_partitions" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Partitions" + } + }, + { + "x": 0, + "y": 91, + "w": 8, + "h": 12, + "i": "17", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_stream_write_rate" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Write Rate For Streams" + } + }, + { + "x": 0, + "y": 58, + "w": 8, + "h": 11, + "i": "18", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_storage_write_rate/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Write Rate By Group" + } + }, + { + "x": 0, + "y": 80, + "w": 8, + "h": 11, + "i": "19", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_series_write_rate/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Series Write Rate" + } + }, + { + "x": 8, + "y": 58, + "w": 8, + "h": 11, + "i": "20", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_query_latency/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Query Latency" + } + }, + { + "x": 16, + "y": 91, + "w": 8, + "h": 12, + "i": "21", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_total_document" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Total Document" + } + }, + { + "x": 0, + "y": 69, + "w": 8, + "h": 11, + "i": "22", + "type": "Widget", + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Data" + }, + "expressions": [ + "meter_banyandb_instance_merge_file_data/1000" + ] + }, + { + "x": 8, + "y": 91, + "w": 8, + "h": 12, + "i": "23", + "type": "Widget", + "expressions": [ + "meter_banyandb_instance_term_search_rate/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Term Search Rate" + } + } + ], + "layer": "BANYANDB", + "entity": "ServiceInstance", + "name": "BanyanDB-Instance", + "isRoot": false + } + } +] \ No newline at end of file diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-root.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-root.json new file mode 100644 index 000000000000..9b712ec49f66 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-root.json @@ -0,0 +1,45 @@ +[ + { + "id": "BanyanDB-root", + "configuration": { + "children": [ + { + "x": 0, + "y": 3, + "w": 24, + "h": 43, + "i": "0", + "type": "Widget", + "graph": { + "type": "ServiceList", + "dashboardName": "BanyanDB-Service", + "fontSize": 12, + "showXAxis": false, + "showYAxis": false, + "showGroup": true + } + }, + { + "x": 0, + "y": 0, + "w": 24, + "h": 3, + "i": "1", + "type": "Text", + "graph": { + "fontColor": "theme", + "backgroundColor": "theme", + "content": "BanyanDB is a distributed time-series database with built-in self-monitoring for real-time tracking of system health, performance, and resource utilization.", + "fontSize": 14, + "textAlign": "left", + "url": "https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-banyandb/" + } + } + ], + "layer": "BANYANDB", + "entity": "All", + "name": "BanyanDB-Root", + "isRoot": true + } + } +] \ No newline at end of file diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-service.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-service.json new file mode 100644 index 000000000000..bc171cbda5e0 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/banyandb/banyandb-service.json @@ -0,0 +1,601 @@ +[ + { + "id": "BanyanDB-Service", + "configuration": { + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 33, + "i": "0", + "type": "Tab", + "children": [ + { + "name": "Inspections", + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 11, + "i": "0", + "type": "Widget", + "widget": { + "title": "Write Rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_write_rate)" + ], + "metricConfig": [ + { + "label": "Write Rate" + } + ] + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 11, + "i": "1", + "type": "Widget", + "widget": { + "title": "Total memory" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_total_memory/1024/1024/1024)" + ], + "metricConfig": [ + { + "unit": "Gib" + } + ] + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 11, + "i": "2", + "type": "Widget", + "widget": { + "title": "Disk usage" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_disk_usage/1024/1024/1024)" + ], + "metricConfig": [ + { + "unit": "Gib" + } + ] + }, + { + "x": 0, + "y": 11, + "w": 8, + "h": 11, + "i": "3", + "type": "Widget", + "widget": { + "title": "Query rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_query_rate{method=\"query\"})" + ], + "metricConfig": [ + { + "unit": "rps" + } + ] + }, + { + "x": 8, + "y": 11, + "w": 8, + "h": 11, + "i": "4", + "type": "Widget", + "widget": { + "title": "Total cpu" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_total_cpu)" + ] + }, + { + "x": 16, + "y": 11, + "w": 8, + "h": 11, + "i": "5", + "type": "Widget", + "widget": { + "title": "Write and query errors rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_write_and_query_errors_rate)" + ] + }, + { + "x": 8, + "y": 22, + "w": 16, + "h": 11, + "i": "6", + "type": "Widget", + "widget": { + "title": "Active Instances" + }, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_banyandb_active_instance" + ] + }, + { + "x": 0, + "y": 22, + "w": 8, + "h": 11, + "i": "7", + "type": "Widget", + "widget": { + "title": "Etcd operation rate" + }, + "graph": { + "type": "Card", + "fontSize": 50, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_banyandb_etcd_operation_rate)" + ], + "metricConfig": [ + { + "unit": "c/s" + } + ] + }, + { + "x": 0, + "y": 33, + "w": 12, + "h": 12, + "i": "8", + "type": "Widget", + "expressions": [ + "meter_banyandb_cpu_usage/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Cpu Usage" + } + }, + { + "x": 12, + "y": 33, + "w": 12, + "h": 12, + "i": "9", + "type": "Widget", + "expressions": [ + "meter_banyandb_rss_memory_usage/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "RSS memory usage all" + } + }, + { + "x": 0, + "y": 45, + "w": 12, + "h": 13, + "i": "10", + "type": "Widget", + "expressions": [ + "meter_banyandb_disk_usage_all/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Disk Usage(ALL)" + } + }, + { + "x": 12, + "y": 45, + "w": 12, + "h": 13, + "i": "11", + "type": "Widget", + "expressions": [ + "meter_banyandb_network_usage_recv/1024", + "meter_banyandb_network_usage_sent/1024" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Network Usage(ALL)" + }, + "metricConfig": [ + { + "unit": "KiB/s" + }, + { + "unit": "KiB/s" + } + ] + }, + { + "x": 8, + "y": 69, + "w": 8, + "h": 11, + "i": "12", + "type": "Widget", + "expressions": [ + "meter_banyandb_merge_file_latency/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Latency" + } + }, + { + "x": 16, + "y": 58, + "w": 8, + "h": 11, + "i": "13", + "type": "Widget", + "expressions": [ + "meter_banyandb_total_data" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Total Data" + } + }, + { + "x": 16, + "y": 80, + "w": 8, + "h": 11, + "i": "14", + "type": "Widget", + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Total Series" + }, + "expressions": [ + "meter_banyandb_total_series" + ] + }, + { + "x": 8, + "y": 80, + "w": 8, + "h": 11, + "i": "15", + "type": "Widget", + "expressions": [ + "meter_banyandb_merge_file_data" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Data" + } + }, + { + "x": 16, + "y": 69, + "w": 8, + "h": 11, + "i": "16", + "type": "Widget", + "expressions": [ + "meter_banyandb_merge_file_partitions" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Partitions" + } + }, + { + "x": 0, + "y": 91, + "w": 8, + "h": 12, + "i": "17", + "type": "Widget", + "expressions": [ + "meter_banyandb_stream_write_rate" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Write Rate For Streams" + } + }, + { + "x": 0, + "y": 58, + "w": 8, + "h": 11, + "i": "18", + "type": "Widget", + "expressions": [ + "meter_banyandb_storage_write_rate/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Write Rate By Group" + } + }, + { + "x": 0, + "y": 80, + "w": 8, + "h": 11, + "i": "19", + "type": "Widget", + "expressions": [ + "meter_banyandb_series_write_rate/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Series Write Rate" + } + }, + { + "x": 8, + "y": 58, + "w": 8, + "h": 11, + "i": "20", + "type": "Widget", + "expressions": [ + "meter_banyandb_query_latency/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Query Latency" + } + }, + { + "x": 16, + "y": 91, + "w": 8, + "h": 12, + "i": "21", + "type": "Widget", + "expressions": [ + "meter_banyandb_total_document" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Total Document" + } + }, + { + "x": 0, + "y": 69, + "w": 8, + "h": 11, + "i": "22", + "type": "Widget", + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Merge File Data" + }, + "expressions": [ + "meter_banyandb_merge_file_data/1000" + ] + }, + { + "x": 8, + "y": 91, + "w": 8, + "h": 12, + "i": "23", + "type": "Widget", + "expressions": [ + "meter_banyandb_term_search_rate/1000" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "Term Search Rate" + } + } + ] + }, + { + "name": "Instances", + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 29, + "i": "0", + "type": "Widget", + "graph": { + "type": "InstanceList", + "dashboardName": "BanyanDB-Instance", + "fontSize": 12 + }, + "expressions": [ + "" + ], + "subExpressions": [ + "" + ] + } + ] + } + + ] + } + ], + "layer": "BANYANDB", + "entity": "Service", + "name": "BanyanDB-Service", + "isRoot": false + } + } +] \ No newline at end of file diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml b/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml index eec56f04268d..d3eaa65bccf5 100644 --- a/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml @@ -257,6 +257,11 @@ menus: description: "Satellite: an open-source agent designed for the cloud-native infrastructures, which provides a low-cost, high-efficient, and more secure way to collect telemetry data. It is the recommended load balancer for telemetry collecting." documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-load-balancer/ i18nKey: self_observability_satellite + - title: BanyanDB Server + layer: BANYANDB + description: BanyanDB is a time-series database designed for observability data storage and analysis. + documentLink: https://skywalking.apache.org/docs/main/next/en/banyandb/dashboards-banyandb/ + i18nKey: self_observability_banyandb - title: SkyWalking Java Agent layer: SO11Y_JAVA_AGENT description: The Java Agent for Apache SkyWalking, which provides the native tracing/metrics/logging/event/profiling abilities for Java projects. @@ -266,4 +271,4 @@ menus: layer: SO11Y_GO_AGENT description: The Go Agent for Apache SkyWalking, which provides the native tracing/metrics/logging abilities for Golang projects. documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/dashboards-so11y-go-agent/ - i18nKey: self_observability_go_agent \ No newline at end of file + i18nKey: self_observability_go_agent diff --git a/test/e2e-v2/cases/banyandb/banyandb-cases.yaml b/test/e2e-v2/cases/banyandb/banyandb-cases.yaml new file mode 100644 index 000000000000..11bc0042fb7b --- /dev/null +++ b/test/e2e-v2/cases/banyandb/banyandb-cases.yaml @@ -0,0 +1,42 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file contains BanyanDB instance metrics queries, referencing +# oap-server/server-starter/src/main/resources/otel-rules/banyandb.yaml + +cases: + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_total_memory + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_write_rate + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_total_memory + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_total_cpu + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_etcd_operation_rate + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_active_instance + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_cpu_usage + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_rss_memory_usage + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_disk_usage_all + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_network_usage_recv + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_banyandb_instance_network_usage_sent + expected: expected/metrics-has-value.yml + diff --git a/test/e2e-v2/cases/banyandb/docker-compose.yml b/test/e2e-v2/cases/banyandb/docker-compose.yml new file mode 100644 index 000000000000..c95c595a7577 --- /dev/null +++ b/test/e2e-v2/cases/banyandb/docker-compose.yml @@ -0,0 +1,51 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +version: '2.1' + +services: + oap: + extends: + file: ../../script/docker-compose/base-compose.yml + service: oap + expose: + - 11800 + ports: + - "11800:11800" + - "12800:12800" + networks: + - e2e + banyandb: + extends: + file: ../../script/docker-compose/base-compose.yml + service: banyandb + ports: + - "17913:17913" + - "2121:2121" + + otel-collector: + image: otel/opentelemetry-collector:${OTEL_COLLECTOR_VERSION} + networks: + - e2e + command: [ "--config=/etc/otel-collector-config.yaml" ] + volumes: + - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml + expose: + - 55678 + +networks: + e2e: + + diff --git a/test/e2e-v2/cases/banyandb/e2e.yaml b/test/e2e-v2/cases/banyandb/e2e.yaml new file mode 100644 index 000000000000..ce742441ceed --- /dev/null +++ b/test/e2e-v2/cases/banyandb/e2e.yaml @@ -0,0 +1,39 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file is used to show how to write configuration files and can be used to test. + +setup: + env: compose + file: docker-compose.yml + timeout: 20m + init-system-environment: ../../script/env + steps: + - name: set PATH + command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + - name: install yq + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq + - name: install swctl + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl + +verify: + retry: + count: 60 + interval: 3s + cases: + - includes: + - ./banyandb-cases.yaml + + diff --git a/test/e2e-v2/cases/banyandb/expected/metrics-has-value.yml b/test/e2e-v2/cases/banyandb/expected/metrics-has-value.yml new file mode 100644 index 000000000000..e071b36c2224 --- /dev/null +++ b/test/e2e-v2/cases/banyandb/expected/metrics-has-value.yml @@ -0,0 +1,35 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: [] + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ .value }} + traceid: null + owner: null + - id: {{ notEmpty .id }} + value: null + traceid: null + owner: null + {{- end}} + {{- end}} +error: null diff --git a/test/e2e-v2/cases/banyandb/otel-collector-config.yaml b/test/e2e-v2/cases/banyandb/otel-collector-config.yaml new file mode 100644 index 000000000000..356c5d0920dd --- /dev/null +++ b/test/e2e-v2/cases/banyandb/otel-collector-config.yaml @@ -0,0 +1,48 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +receivers: + prometheus: + config: + scrape_configs: + - job_name: "banyandb-monitoring" + scrape_interval: 5s + static_configs: + - targets: ["banyandb:2121"] + labels: + host_name: root[root] + +processors: + batch: + +exporters: + otlp: + endpoint: "oap:11800" + tls: + insecure: true + debug: + verbosity: detailed + +service: + pipelines: + metrics: + receivers: + - prometheus + processors: + - batch + exporters: + - otlp + +