Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ Then, apply the Kubernetes manifests directly from this repo:
kubectl kustomize https://github.com/GoogleCloudPlatform/otlp-k8s-ingest/k8s/base | envsubst | kubectl apply -f -
```

If the prefer way is DaemonSet instead of Deployment, apply the following mainfests

```console
kubectl kustomize https://github.com/GoogleCloudPlatform/otlp-k8s-ingest/k8s/daemonset | envsubst | kubectl apply -f -
```

(Remember to set the `GOOGLE_CLOUD_PROJECT` environment variable.)

### [Optional] Run the OpenTelemetry demo application alongside the collector
Expand Down
1 change: 1 addition & 0 deletions k8s/base/3_service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ spec:
selector:
app: opentelemetry-collector
internalTrafficPolicy: Cluster
trafficDistribution: PreferClose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a good option for some users, but i'm not sure we want to document this as a best-practice for everyone. I realized this means it won't distribute traffic equally cluster-wide anymore: https://kubernetes.io/docs/reference/networking/virtual-ips/#considerations-for-using-traffic-distribution-control. HPA scaling won't interact properly with this because traffic will still prefer a node-local collector, even if that collector is overloaded and others are underutilized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer zonal might make sense, but that also assumes that traffic is distributed equally across zones. I wonder if HPA can independently scale the deployment in each zone to match the zonal traffic...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the HPA to scale zonally also. I believe that a multizone GKE cluster would be the standard for most installations, however I can be wrong about this. This configuration has a preference for the zone communication and should work with a single zone. The equal distribution of traffic is also a standard, to my knowledge. However, we can also add these configurations as an overlay.

ports:
- name: otel-grpc
protocol: TCP
Expand Down
9 changes: 8 additions & 1 deletion k8s/base/4_deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ metadata:
labels:
app: opentelemetry-collector
spec:
replicas: 2
replicas: 1
selector:
matchLabels:
app: opentelemetry-collector
Expand All @@ -32,6 +32,13 @@ spec:
serviceAccountName: opentelemetry-collector
securityContext:
{}
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: opentelemetry-collector
containers:
- name: opentelemetry-collector
imagePullPolicy: Always
Expand Down
18 changes: 18 additions & 0 deletions k8s/daemonset/0_namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Namespace
metadata:
name: opentelemetry
234 changes: 234 additions & 0 deletions k8s/daemonset/1_configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
apiVersion: v1
data:
collector.yaml: |
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

exporters:
googlecloud:
log:
default_log_name: opentelemetry-collector
user_agent: Google-Cloud-OTLP manifests:0.5.0 OpenTelemetry Collector Built By Google/0.131.0 (linux/amd64)
googlemanagedprometheus:
user_agent: Google-Cloud-OTLP manifests:0.5.0 OpenTelemetry Collector Built By Google/0.131.0 (linux/amd64)
# The otlphttp exporter is used to send traces to Google Cloud Trace using OTLP http/proto
# The otlp exporter could also be used to send them using OTLP grpc
otlphttp:
encoding: proto
endpoint: https://telemetry.googleapis.com
# Use the googleclientauth extension to authenticate with Google credentials
auth:
authenticator: googleclientauth


extensions:
health_check:
endpoint: ${env:MY_POD_IP}:13133
googleclientauth:


processors:
filter/self-metrics:
metrics:
include:
match_type: strict
metric_names:
- otelcol_process_uptime
- otelcol_process_memory_rss
- otelcol_grpc_io_client_completed_rpcs
- otelcol_googlecloudmonitoring_point_count
batch:
send_batch_max_size: 200
send_batch_size: 200
timeout: 5s

k8sattributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor#deployment-scenarios to set the

k8sattributes:
  filter:
    node_from_env_var: KUBE_NODE_NAME # this should be same as the var name used in previous step

in DS mode so the collector won't buffer all pods in the cluster.

filter:
node_from_env_var: KUBE_NODE_NAME
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.replicaset.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection

memory_limiter:
check_interval: 1s
limit_percentage: 65
spike_limit_percentage: 20

metricstransform/self-metrics:
transforms:
- action: update
include: otelcol_process_uptime
operations:
- action: add_label
new_label: version
new_value: Google-Cloud-OTLP manifests:0.5.0 OpenTelemetry Collector Built By Google/0.131.0 (linux/amd64)

resourcedetection:
detectors: [gcp]
timeout: 10s

transform/collision:
metric_statements:
- context: datapoint
statements:
- set(attributes["exported_location"], attributes["location"])
- delete_key(attributes, "location")
- set(attributes["exported_cluster"], attributes["cluster"])
- delete_key(attributes, "cluster")
- set(attributes["exported_namespace"], attributes["namespace"])
- delete_key(attributes, "namespace")
- set(attributes["exported_job"], attributes["job"])
- delete_key(attributes, "job")
- set(attributes["exported_instance"], attributes["instance"])
- delete_key(attributes, "instance")
- set(attributes["exported_project_id"], attributes["project_id"])
- delete_key(attributes, "project_id")

# The relative ordering of statements between ReplicaSet & Deployment and Job & CronJob are important.
# The ordering of these controllers is decided based on the k8s controller documentation available at
# https://kubernetes.io/docs/concepts/workloads/controllers.
# The relative ordering of the other controllers in this list is inconsequential since they directly
# create pods.
transform/aco-gke:
metric_statements:
- context: datapoint
statements:
- set(attributes["top_level_controller_type"], "ReplicaSet") where resource.attributes["k8s.replicaset.name"] != nil
- set(attributes["top_level_controller_name"], resource.attributes["k8s.replicaset.name"]) where resource.attributes["k8s.replicaset.name"] != nil
- set(attributes["top_level_controller_type"], "Deployment") where resource.attributes["k8s.deployment.name"] != nil
- set(attributes["top_level_controller_name"], resource.attributes["k8s.deployment.name"]) where resource.attributes["k8s.deployment.name"] != nil
- set(attributes["top_level_controller_type"], "DaemonSet") where resource.attributes["k8s.daemonset.name"] != nil
- set(attributes["top_level_controller_name"], resource.attributes["k8s.daemonset.name"]) where resource.attributes["k8s.daemonset.name"] != nil
- set(attributes["top_level_controller_type"], "StatefulSet") where resource.attributes["k8s.statefulset.name"] != nil
- set(attributes["top_level_controller_name"], resource.attributes["k8s.statefulset.name"]) where resource.attributes["k8s.statefulset.name"] != nil
- set(attributes["top_level_controller_type"], "Job") where resource.attributes["k8s.job.name"] != nil
- set(attributes["top_level_controller_name"], resource.attributes["k8s.job.name"]) where resource.attributes["k8s.job.name"] != nil
- set(attributes["top_level_controller_type"], "CronJob") where resource.attributes["k8s.cronjob.name"] != nil
- set(attributes["top_level_controller_name"], resource.attributes["k8s.cronjob.name"]) where resource.attributes["k8s.cronjob.name"] != nil

# When sending telemetry to the GCP OTLP endpoint, the gcp.project_id resource attribute is required to be set to your project ID.
resource/gcp_project_id:
attributes:
- key: gcp.project_id
# MAKE SURE YOU REPLACE THIS WITH YOUR PROJECT ID
value: ${GOOGLE_CLOUD_PROJECT}
action: insert
# The metricstarttime processor is important to include if you are using the prometheus receiver to ensure the start time is set properly.
# It is a no-op otherwise.
metricstarttime:
strategy: subtract_initial_point

receivers:
# This collector is configured to accept OTLP metrics, logs, and traces, and is designed to receive OTLP from workloads running in the cluster.
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
cors:
allowed_origins:
- http://*
- https://*
endpoint: ${env:MY_POD_IP}:4318
otlp/self-metrics:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:14317

service:
extensions:
- health_check
- googleclientauth
pipelines:
logs:
exporters:
- googlecloud
processors:
- k8sattributes
- resourcedetection
- memory_limiter
- batch
receivers:
- otlp
metrics/otlp:
exporters:
- googlemanagedprometheus
processors:
- k8sattributes
- memory_limiter
- metricstarttime
- resourcedetection
- transform/collision
- transform/aco-gke
- batch
receivers:
- otlp
metrics/self-metrics:
exporters:
- googlemanagedprometheus
processors:
- filter/self-metrics
- metricstransform/self-metrics
- k8sattributes
- memory_limiter
- resourcedetection
- batch
receivers:
- otlp/self-metrics
traces:
exporters:
- otlphttp
processors:
- k8sattributes
- memory_limiter
- resource/gcp_project_id
- resourcedetection
- batch
receivers:
- otlp
telemetry:
logs:
encoding: json
metrics:
readers:
- periodic:
exporter:
otlp:
protocol: grpc
endpoint: ${env:MY_POD_IP}:14317
kind: ConfigMap
metadata:
creationTimestamp: null
name: collector-config
namespace: opentelemetry
57 changes: 57 additions & 0 deletions k8s/daemonset/2_rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: ServiceAccount
metadata:
name: opentelemetry-collector
namespace: opentelemetry
labels:
app.kubernetes.io/name: google-built-opentelemetry-collector
app.kubernetes.io/version: "0.131.0"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: opentelemetry-collector
namespace: opentelemetry
labels:
app.kubernetes.io/name: google-built-opentelemetry-collector
app.kubernetes.io/version: "0.131.0"
rules:
- apiGroups: [""]
resources: ["pods", "namespaces", "nodes"]
verbs: ["get", "watch", "list"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources: ["replicasets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: opentelemetry-collector
labels:
app.kubernetes.io/name: google-built-opentelemetry-collector
app.kubernetes.io/version: "0.131.0"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: opentelemetry-collector
subjects:
- kind: ServiceAccount
name: opentelemetry-collector
namespace: opentelemetry
36 changes: 36 additions & 0 deletions k8s/daemonset/3_service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Service
metadata:
name: opentelemetry-collector
namespace: opentelemetry
labels:
app: opentelemetry-collector
spec:
type: ClusterIP
selector:
app: opentelemetry-collector
internalTrafficPolicy: Cluster
trafficDistribution: PreferClose
ports:
- name: otel-grpc
protocol: TCP
port: 4317
targetPort: 4317
- name: otlp-http
port: 4318
targetPort: 4318
protocol: TCP
Loading