diff --git a/README.md b/README.md index e6b64a8..b2116e8 100644 --- a/README.md +++ b/README.md @@ -118,6 +118,7 @@ operator and auto-instrumentation along with setup guides for each recipe. Curre * [Trace enhancements](recipes/trace-enhancements) * [Cloud Trace integration](recipes/cloud-trace) * [Resource detection](recipes/resource-detection) +* [Kubernetes cluster metrics](recipes/k8s-cluster-receiver) * [Daemonset and Deployment](recipes/daemonset-and-deployment) * [eBPF HTTP Golden Signals with Beyla](recipes/beyla-golden-signals) * [eBPF HTTP Service Graph with Beyla](recipes/beyla-service-graph) diff --git a/recipes/README.md b/recipes/README.md index 237df44..422dbf6 100644 --- a/recipes/README.md +++ b/recipes/README.md @@ -9,6 +9,7 @@ operator and auto-instrumentation. See below to get started: * [Trace enhancements](trace-enhancements) * [Cloud trace integration](cloud-trace) * [Resource detection](resource-detection) +* [Kubernetes cluster metrics](k8s-cluster-receiver) * [Daemonset and Deployment](daemonset-and-deployment) * [eBPF HTTP Golden Signals with Beyla](beyla-golden-signals) * [eBPF HTTP Service Graph with Beyla](beyla-service-graph) diff --git a/recipes/k8s-cluster-receiver/README.md b/recipes/k8s-cluster-receiver/README.md new file mode 100644 index 0000000..54adb58 --- /dev/null +++ b/recipes/k8s-cluster-receiver/README.md @@ -0,0 +1,93 @@ +# Kubernetes cluster metrics + +This recipe demonstrates how to configure the OpenTelemetry Collector +(as deployed by the Operator) with the +[`k8sclusterreceiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver). + +The receiver watches the Kubernetes API server and emits cluster-level metrics +similar to kube-state-metrics. This recipe exports those metrics through +Google Managed Service for Prometheus and logs entity events with the debug +exporter. + +## Prerequisites + +* Cloud Monitoring API enabled in your GCP project +* The `roles/monitoring.metricWriter` + [IAM permissions](https://cloud.google.com/trace/docs/iam#roles) for your + cluster's service account, or Workload Identity configured for the collector + service account +* A running GKE cluster +* The OpenTelemetry Operator installed in your cluster + +The `k8sclusterreceiver` watches cluster-wide resources. This recipe uses one +collector replica and a dedicated service account with read-only RBAC for the +Kubernetes resources the receiver observes. + +## Running + +Create the service account and RBAC resources: + +``` +kubectl apply -f rbac.yaml +``` + +Apply the `OpenTelemetryCollector` object from this recipe: + +``` +kubectl apply -f collector-config.yaml +``` + +After the collector starts, check its logs: + +``` +kubectl logs deployment/k8s-cluster-receiver-collector -f +``` + +## Workload Identity Setup + +If you use Workload Identity, bind a Google service account with permission to +write metrics to the collector's Kubernetes service account. + +``` +export GCLOUD_PROJECT= +gcloud iam service-accounts create otel-k8s-cluster-receiver --project=${GCLOUD_PROJECT} +``` + +``` +gcloud projects add-iam-policy-binding $GCLOUD_PROJECT \ + --member "serviceAccount:otel-k8s-cluster-receiver@${GCLOUD_PROJECT}.iam.gserviceaccount.com" \ + --role "roles/monitoring.metricWriter" +``` + +``` +gcloud iam service-accounts add-iam-policy-binding \ + "otel-k8s-cluster-receiver@${GCLOUD_PROJECT}.iam.gserviceaccount.com" \ + --role roles/iam.workloadIdentityUser \ + --member "serviceAccount:${GCLOUD_PROJECT}.svc.id.goog[default/otel-k8s-cluster-receiver]" +``` + +``` +kubectl annotate serviceaccount otel-k8s-cluster-receiver \ + iam.gke.io/gcp-service-account=otel-k8s-cluster-receiver@${GCLOUD_PROJECT}.iam.gserviceaccount.com +``` + +Restart the collector pod after adding the annotation. + +## View your Metrics + +Navigate to console.cloud.google.com/monitoring/metrics-explorer, and in the +"Select a metric" dropdown, search for "prometheus/k8s" to see Kubernetes +cluster metrics. + +## Troubleshooting + +### PermissionDenied from Google Managed Service for Prometheus + +Check that the collector service account can use a Google service account with +the `roles/monitoring.metricWriter` role. + +### Forbidden from the Kubernetes API + +Check that `rbac.yaml` was applied in the same namespace as the +`OpenTelemetryCollector` object, and that the collector pod is using the +`otel-k8s-cluster-receiver` service account. diff --git a/recipes/k8s-cluster-receiver/collector-config.yaml b/recipes/k8s-cluster-receiver/collector-config.yaml new file mode 100644 index 0000000..a05c287 --- /dev/null +++ b/recipes/k8s-cluster-receiver/collector-config.yaml @@ -0,0 +1,71 @@ +# Copyright 2026 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +--- +apiVersion: opentelemetry.io/v1alpha1 +kind: OpenTelemetryCollector +metadata: + name: k8s-cluster-receiver +spec: + image: otel/opentelemetry-collector-contrib:0.112.0 + mode: deployment + replicas: 1 + serviceAccount: otel-k8s-cluster-receiver + config: | + receivers: + k8s_cluster: + auth_type: serviceAccount + collection_interval: 60s + metadata_collection_interval: 5m + node_conditions_to_report: + - Ready + - MemoryPressure + - DiskPressure + - PIDPressure + allocatable_types_to_report: + - cpu + - memory + - ephemeral-storage + - pods + + processors: + batch: + send_batch_max_size: 200 + send_batch_size: 200 + timeout: 5s + memory_limiter: + check_interval: 1s + limit_percentage: 65 + spike_limit_percentage: 20 + + exporters: + debug: + verbosity: detailed + googlemanagedprometheus: + metric: + extra_metrics_config: + enable_target_info: false + resource_filters: + - regex: "k8s.*" + + service: + pipelines: + metrics: + receivers: [k8s_cluster] + processors: [memory_limiter, batch] + exporters: [googlemanagedprometheus, debug] + logs/entity_events: + receivers: [k8s_cluster] + processors: [memory_limiter, batch] + exporters: [debug] diff --git a/recipes/k8s-cluster-receiver/rbac.yaml b/recipes/k8s-cluster-receiver/rbac.yaml new file mode 100644 index 0000000..e88eeea --- /dev/null +++ b/recipes/k8s-cluster-receiver/rbac.yaml @@ -0,0 +1,96 @@ +# Copyright 2026 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: otel-k8s-cluster-receiver +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: otel-k8s-cluster-receiver +rules: + - apiGroups: + - "" + resources: + - events + - namespaces + - namespaces/status + - nodes + - nodes/spec + - persistentvolumes + - persistentvolumeclaims + - pods + - pods/status + - replicationcontrollers + - replicationcontrollers/status + - resourcequotas + - services + verbs: + - get + - list + - watch + - apiGroups: + - apps + resources: + - daemonsets + - deployments + - replicasets + - statefulsets + verbs: + - get + - list + - watch + - apiGroups: + - extensions + resources: + - daemonsets + - deployments + - replicasets + verbs: + - get + - list + - watch + - apiGroups: + - batch + resources: + - jobs + - cronjobs + verbs: + - get + - list + - watch + - apiGroups: + - autoscaling + resources: + - horizontalpodautoscalers + verbs: + - get + - list + - watch +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: otel-k8s-cluster-receiver +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: otel-k8s-cluster-receiver +subjects: + - kind: ServiceAccount + name: otel-k8s-cluster-receiver + namespace: default