Skip to content

Commit

Permalink
Support the telegraf receiver plugin module (apache#9620)
Browse files Browse the repository at this point in the history
* The telegraf receiver plugin module development.

* Refactored code of converting telegraf data and fixed some errors.

* Support Telegraf receiver plugin module.

* Add telegraf-receiver.md, update changes.md and vm-monitoring.md.

* Rename YAML file and change code style.

* Change receiver-telegraf of application.yml.

* The receiver-telegraf e2e test.

* The telegraf receiver e2e test.

* Fix an issue and delete redundant configs.

* Add Unit Test about converting Telegraf metrics.

* Adjust Unit Test about converting Telegraf metrics.

* Add License Header to Unit Test and change binary.xml.

* Change module provider's config initialization mechanism.

* Exclude the telegraf-rules in server-starter pom.xml.

* Fix telegraf e2e test issues.

* Change telegraf e2e test.

* Fix issues

* Fix issues.

* Add Sample convert Unit Test.

* Change vm.yaml, related documents and fix some issues.

* Change backend-vm-monitoring.md.

* Fix SampleConvertTest checkstyle issue.

* Change vm.yaml swap MAL.

* Update menu.yml and binary.xml.

* Update Telegraf Unit test.

* Delete telegraf config package, use meter.analyzer.prometheus package to load config file.

* Reorder telegraf metrics in menu.yml.

* Change e2e, vm, config, linux-service and vm.md.

* Update telegraf.conf file.

* Change url of telegraf.conf file.

* Update e2e.yaml, conf file, menu.yml and delete useless code of provider.

* Update grouping sampleFamily by timestamp and name, and add new UTs.

* Update vm-monitoring.md and .asf.yaml.

Co-authored-by: Superskyyy (ONLINE) <[email protected]>
Co-authored-by: 吴晟 Wu Sheng <[email protected]>
  • Loading branch information
3 people authored Nov 5, 2022
1 parent d32a318 commit 42f3396
Show file tree
Hide file tree
Showing 26 changed files with 1,561 additions and 16 deletions.
1 change: 1 addition & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ github:
- open-telemetry
- zabbix
- ebpf
- telegraf
enabled_merge_buttons:
squash: true
merge: false
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/skywalking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,8 @@ jobs:
config: test/e2e-v2/cases/vm/zabbix/e2e.yaml
- name: VM Prometheus
config: test/e2e-v2/cases/vm/prometheus-node-exporter/e2e.yaml
- name: VM Telegraf
config: test/e2e-v2/cases/vm/telegraf/e2e.yaml
- name: So11y
config: test/e2e-v2/cases/so11y/e2e.yaml
- name: MySQL Prometheus and slowsql
Expand Down
1 change: 1 addition & 0 deletions apm-dist/src/main/assembly/binary.xml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@
<include>ui-initialized-templates/*/*.json</include>
<include>lal/*</include>
<include>log-mal-rules/*</include>
<include>telegraf-rules/*</include>
</includes>
<outputDirectory>config</outputDirectory>
</fileSet>
Expand Down
48 changes: 32 additions & 16 deletions docs/en/setup/backend/backend-vm-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,54 @@ SkyWalking leverages Prometheus node-exporter to collect metrics data from the V
[OpenTelemetry receiver](opentelemetry-receiver.md) and into the [Meter System](./../../concepts-and-designs/meter.md).
VM entity as a `Service` in OAP and on the `Layer: OS_LINUX`.

SkyWalking also provides InfluxDB Telegraf to receive VMs' metrics data by [Telegraf receiver](./telegraf-receiver.md).
The telegraf receiver plugin receiver, process and convert the metrics, then it send converted metrics to [Meter System](./../../concepts-and-designs/meter.md).
VM entity as a `Service` in OAP and on the `Layer: OS_LINUX`.

## Data flow
**For OpenTelemetry receiver:**
1. The Prometheus node-exporter collects metrics data from the VMs.
2. The OpenTelemetry Collector fetches metrics from node-exporter via Prometheus Receiver and pushes metrics to the SkyWalking OAP Server via the OpenCensus gRPC Exporter or OpenTelemetry gRPC exporter.
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to filter/calculate/aggregate and store the results.

**For Telegraf receiver:**
1. The InfluxDB Telegraf [input plugins](https://docs.influxdata.com/telegraf/v1.24/plugins/) collects various metrics data from the VMs.
2. The cpu, mem, system, disk and diskio input plugins should be set in telegraf.conf file.
2. The InfluxDB Telegraf send `JSON` format metrics by `HTTP` messages to Telegraf Receiver, then pushes converted metrics to the SkyWalking OAP Server [Meter System](./../../concepts-and-designs/meter.md).
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to filter/calculate/aggregate ad store the results.
4. The meter_vm_cpu_average_used metrics indicates the average usage of each CPU core for telegraf receiver.

## Setup

**For OpenTelemetry receiver:**
1. Setup [Prometheus node-exporter](https://prometheus.io/docs/guides/node-exporter/).
2. Setup [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/). This is an example for OpenTelemetry Collector configuration [otel-collector-config.yaml](../../../../test/e2e-v2/cases/vm/prometheus-node-exporter/otel-collector-config.yaml).
3. Config SkyWalking [OpenTelemetry receiver](opentelemetry-receiver.md).

**For Telegraf receiver:**
1. Setup InfluxDB Telegraf's `telegraf.conf file` according to [Telegraf office document](https://docs.influxdata.com/telegraf/v1.24/).
2. Setup InfluxDB Telegraf's `telegraf.conf file` specific rules according to [Telegraf receiver document](telegraf-receiver.md).
3. Config SkyWalking [Telegraf receiver](telegraf-receiver.md).

## Supported Metrics

| Monitoring Panel | Unit | Metric Name | Description | Data Source |
|-----|-----|-----|-----|-----|
| CPU Usage | % | cpu_total_percentage | The total percentage usage of the CPU core. If there are 2 cores, the maximum usage is 200%. | Prometheus node-exporter |
| Memory RAM Usage | MB | meter_vm_memory_used | The total RAM usage | Prometheus node-exporter |
| Memory Swap Usage | % | meter_vm_memory_swap_percentage | The percentage usage of swap memory | Prometheus node-exporter |
| CPU Average Used | % | meter_vm_cpu_average_used | The percentage usage of the CPU core in each mode | Prometheus node-exporter |
| CPU Load | | meter_vm_cpu_load1<br />meter_vm_cpu_load5<br />meter_vm_cpu_load15 | The CPU 1m / 5m / 15m average load | Prometheus node-exporter |
| Memory RAM | MB | meter_vm_memory_total<br />meter_vm_memory_available<br />meter_vm_memory_used | The RAM statistics, including Total / Available / Used | Prometheus node-exporter |
| Memory Swap | MB | meter_vm_memory_swap_free<br />meter_vm_memory_swap_total | Swap memory statistics, including Free / Total | Prometheus node-exporter |
| File System Mountpoint Usage | % | meter_vm_filesystem_percentage | The percentage usage of the file system at each mount point | Prometheus node-exporter |
| Disk R/W | KB/s | meter_vm_disk_read,meter_vm_disk_written | The disk read and written | Prometheus node-exporter |
| Network Bandwidth Usage | KB/s | meter_vm_network_receive<br />meter_vm_network_transmit | The network receive and transmit | Prometheus node-exporter |
| Network Status | | meter_vm_tcp_curr_estab<br />meter_vm_tcp_tw<br />meter_vm_tcp_alloc<br />meter_vm_sockets_used<br />meter_vm_udp_inuse | The number of TCPs established / TCP time wait / TCPs allocated / sockets in use / UDPs in use | Prometheus node-exporter |
| Filefd Allocated | | meter_vm_filefd_allocated | The number of file descriptors allocated | Prometheus node-exporter |
| Monitoring Panel | Unit | Metric Name | Description | Data Source |
|------------------------------|------|-------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------------------------------|
| CPU Usage | % | meter_vm_cpu_total_percentage | The total percentage usage of the CPU core. If there are 2 cores, the maximum usage is 200%. | Prometheus node-exporter<br />Telegraf input plugin |
| Memory RAM Usage | MB | meter_vm_memory_used | The total RAM usage | Prometheus node-exporter<br />Telegraf input plugin |
| Memory Swap Usage | % | meter_vm_memory_swap_percentage | The percentage usage of swap memory | Prometheus node-exporter<br />Telegraf input plugin |
| CPU Average Used | % | meter_vm_cpu_average_used | The percentage usage of the CPU core in each mode | Prometheus node-exporter<br />Telegraf input plugin |
| CPU Load | | meter_vm_cpu_load1<br />meter_vm_cpu_load5<br />meter_vm_cpu_load15 | The CPU 1m / 5m / 15m average load | Prometheus node-exporter<br />Telegraf input plugin |
| Memory RAM | MB | meter_vm_memory_total<br />meter_vm_memory_available<br />meter_vm_memory_used | The RAM statistics, including Total / Available / Used | Prometheus node-exporter<br />Telegraf input plugin |
| Memory Swap | MB | meter_vm_memory_swap_free<br />meter_vm_memory_swap_total | Swap memory statistics, including Free / Total | Prometheus node-exporter<br />Telegraf input plugin |
| File System Mountpoint Usage | % | meter_vm_filesystem_percentage | The percentage usage of the file system at each mount point | Prometheus node-exporter<br />Telegraf input plugin |
| Disk R/W | KB/s | meter_vm_disk_read,meter_vm_disk_written | The disk read and written | Prometheus node-exporter<br />Telegraf input plugin |
| Network Bandwidth Usage | KB/s | meter_vm_network_receive<br />meter_vm_network_transmit | The network receive and transmit | Prometheus node-exporter<br />Telegraf input plugin |
| Network Status | | meter_vm_tcp_curr_estab<br />meter_vm_tcp_tw<br />meter_vm_tcp_alloc<br />meter_vm_sockets_used<br />meter_vm_udp_inuse | The number of TCPs established / TCP time wait / TCPs allocated / sockets in use / UDPs in use | Prometheus node-exporter<br />Telegraf input plugin |
| Filefd Allocated | | meter_vm_filefd_allocated | The number of file descriptors allocated | Prometheus node-exporter |

## Customizing
You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found in `/config/otel-rules/vm.yaml`.
The metrics definition and expression rules are found in `/config/otel-rules/vm.yaml` and `/config/telegraf-rules/vm.yaml`.
The dashboard panel confirmations are found in `/config/ui-initialized-templates/os_linux`.

## Blog
Expand Down
45 changes: 45 additions & 0 deletions docs/en/setup/backend/telegraf-receiver.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Telegraf receiver

The Telegraf receiver supports receiving InfluxDB Telegraf's metrics by meter-system.
The OAP can load the configuration at bootstrap. The files are located at `$CLASSPATH/telegraf-rules`.
If the new configuration is not well-formed, the OAP may fail to start up.

This is the [InfluxDB Telegraf](https://docs.influxdata.com/telegraf/v1.24/) Document,
the Telegraf receiver can handle Telegraf's [CPU Input Plugin](https://github.com/influxdata/telegraf/blob/release-1.24/plugins/inputs/cpu/README.md),
[Memory Input Plugin](https://github.com/influxdata/telegraf/blob/release-1.24/plugins/inputs/mem/README.md).

There are many other telegraf input plugins, users can customize different input plugins' rule files.
The rule file should be in YAML format, defined by the scheme described in [MAL](../../concepts-and-designs/mal.md).
Please see the [telegraf plugin directory](https://docs.influxdata.com/telegraf/v1.24/plugins/) for more input plugins information.

**Notice:**
* The Telegraf receiver module uses `HTTP` to receive telegraf's metrics,
so the outputs method should be set `[[outputs.http]]` in telegraf.conf file.
Please see the [http outputs](https://github.com/influxdata/telegraf/blob/release-1.24/plugins/outputs/http/README.md)
for more details.

* The Telegraf receiver module **only** process telegraf's `JSON` metrics format,
the data format should be set `data_format = "json"` in telegraf.conf file.
Please see the [JSON data format](https://docs.influxdata.com/telegraf/v1.24/data_formats/output/json/)
for more details.

* The default `json_timestamp_units` is second in JSON output,
and the Telegraf receiver module **only** process `second` timestamp unit.
If users configure `json_timestamp_units` in telegraf.conf file, `json_timestamp_units = "1s"` is feasible.
Please see the [JSON data format](https://docs.influxdata.com/telegraf/v1.24/data_formats/output/json/)
for more details.

The following is the default telegraf receiver YAML rule file in the `application.yml`,
Set `SW_RECEIVER_TELEGRAF:default` through system environment or change `SW_RECEIVER_TELEGRAF_ACTIVE_FILES:vm`
to activate the OpenTelemetry receiver with `vm.yml` in telegraf-rules.
```yaml
receiver-telegraf:
selector: ${SW_RECEIVER_TELEGRAF:default}
default:
activeFiles: ${SW_RECEIVER_TELEGRAF_ACTIVE_FILES:vm}
```
| Rule Name | Description | Configuration File | Data Source |
|-----------|----------------|------------------------|-------------------------------------------------------------------------|
| vm | Metrics of VMs | telegraf-rules/vm.yaml | Telegraf inputs plugins --> Telegraf Receiver --> SkyWalking OAP Server |
2 changes: 2 additions & 0 deletions docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,8 @@ catalog:
path: "/en/setup/backend/backend-zabbix"
- name: "Meter Analysis"
path: "/en/setup/backend/backend-meter"
- name: "Telegraf Metrics"
path: "/en/setup/backend/telegraf-receiver"
- name: "Apdex Threshold"
path: "/en/setup/backend/apdex-threshold"
- name: "Spring Sleuth Metrics Analysis"
Expand Down
1 change: 1 addition & 0 deletions oap-server/server-receiver-plugin/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
<module>skywalking-event-receiver-plugin</module>
<module>skywalking-zabbix-receiver-plugin</module>
<module>skywalking-ebpf-receiver-plugin</module>
<module>skywalking-telegraf-receiver-plugin</module>
</modules>

<dependencies>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
~
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>server-receiver-plugin</artifactId>
<groupId>org.apache.skywalking</groupId>
<version>9.3.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>skywalking-telegraf-receiver-plugin</artifactId>
<packaging>jar</packaging>

<dependencies>
<dependency>
<groupId>org.apache.skywalking</groupId>
<artifactId>agent-analyzer</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.skywalking</groupId>
<artifactId>skywalking-sharing-server-plugin</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package org.apache.skywalking.oap.server.receiver.telegraf.module;

import org.apache.skywalking.oap.server.library.module.ModuleDefine;

public class TelegrafReceiverModule extends ModuleDefine {

public TelegrafReceiverModule() {
super("receiver-telegraf");
}

@Override
public Class[] services() {
return new Class[0];
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package org.apache.skywalking.oap.server.receiver.telegraf.provider;

import lombok.Getter;
import lombok.Setter;
import org.apache.skywalking.oap.server.core.Const;
import org.apache.skywalking.oap.server.library.module.ModuleConfig;

@Setter
@Getter
public class TelegrafModuleConfig extends ModuleConfig {

public static final String CONFIG_PATH = "telegraf-rules";

/**
* active receive configs, files split by ","
*/
private String activeFiles = Const.EMPTY_STRING;

}
Loading

0 comments on commit 42f3396

Please sign in to comment.