Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf Configuration - Recommended approach for configuring multiple independent cylos of input-output plugins #16328

Open
keshav613 opened this issue Dec 18, 2024 · 1 comment

Comments

@keshav613
Copy link

keshav613 commented Dec 18, 2024

Requirement:
I have to read data from kafka and send it to datadog. The catch here is I have one kafka topic and one datadog endpoint for each customer. And there are 5000 customers, so in total we have 5000 of kafka topics and 5000 datadog plugins.

When the scale was low, I was creating one telegraf pod per customer to read from kafka topic and to send to datadog. But as the scale went to 5000, the Ops is worried about the resources constraints and monitoring of all those 5000 telegraf pods. Kafka topic will receive 5000 * 1KB of data every 10sec, the scale of data can also increase in future.

Is there any optimized way to handle this? Upon researching a bit I came accross two approaches

  1. To have input plugins of 5000 kafka topics and output plugins in the same telegraf.conf file. By default telegraf sends data from all the input plugins to all the output plugins, but with tagpass(unique tag for each customer) we can restrict metric from one topic to be routed for its corresponding datadog output plugin. But I doubt if telegraf node can handle this at the scale of 5000 customers, because the time complexity will become O(N^2) and not sure how much resources(cpu, mem) should be given for that single telegraf pod.

  2. To have individual telegraf services running in the same pod ... as discussed in Telegraf Configuration - Recommended approach for multiple .conf files? #6334 (comment). But won't be possible for the scale of 5000 customers.

I understand that telegraf might not be built to handle such usecase and I should probably use a microservice which should do it, but would love to know if it's possible to achieve this with telegraf?

@Hipska
Copy link
Contributor

Hipska commented Jan 15, 2025

I would go for option 1 and split into multiple instances (maybe per 1000?) if needed for the resources (mem,cpu) used.

This is not really an issue, but more a support question. This should better be placed at the Discourse or Slack channels..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants