Update memory_query and cpu_query for Prometheus by tjmoyes · Pull Request #420 · robusta-dev/krr

tjmoyes · 2025-03-25T22:21:52Z

machine_memory_bytes and machine_cpu_cores have been [deprecated] (https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/cluster/node-metrics.md) [as per StackOverflow] (https://stackoverflow.com/questions/63901926/how-to-query-the-total-memory-available-to-kubernetes-nodes)

This issue came up for me when querying Azure Managed Prometheus, as these queries no longer exist there. I've tested these queries on both manually deployed Prometheus as well as Azure Managed Prometheus and the query works and returns the same data.

…ch no longer works in Azure Monitored Prometheus

CLAassistant · 2025-03-25T22:21:58Z

All committers have signed the CLA.

…-queries

coderabbitai · 2025-10-26T17:52:37Z

Walkthrough

Prometheus queries in get_cluster_summary now use node capacity metrics (kube_node_status_capacity{resource="memory"} and kube_node_status_capacity{resource="cpu"}) instead of machine_memory_bytes/machine_cpu_cores. Some kube_pod_container_resource_requests label matchers switched from single to double quotes. Retrieval and return logic remain unchanged.

Changes

Cohort / File(s)	Summary
Prometheus metric queries `robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py`	Replaced `machine_memory_bytes` and `machine_cpu_cores` with `kube_node_status_capacity{resource="memory"}` and `kube_node_status_capacity{resource="cpu"}` in `get_cluster_summary`; changed `kube_pod_container_resource_requests` label matchers to use double-quoted `namespace="kube-system"` and `resource="memory"/"cpu"`; result keys (`cluster_memory`, `cluster_cpu`, `kube_system_mem_req`, `kube_system_cpu_req`) and parsing unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Update memory_query and cpu_query for Prometheus' directly describes the main change: updating Prometheus queries for memory and CPU metrics.
Description check	✅ Passed	The description explains the rationale for the change (deprecated metrics in kube-state-metrics), the issue encountered (queries failing in Azure Managed Prometheus), and testing performed on both platforms.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7afd7d0 and 9988554.

📒 Files selected for processing (1)

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py (1 hunks)

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py

…-queries

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py`:
- Around line 253-262: The PromQL matcher construction for cluster filtering is
producing malformed expressions because single_cluster_label is inserted without
a separating comma; update the cluster-label handling so it yields either an
empty string or a comma+space-prefixed matcher (e.g., ', cluster="xxx"') instead
of just removing the comma. Locate the cluster_label / single_cluster_label
logic and change it so single_cluster_label = f', {cluster_label.strip(", ")}'
if cluster_label.strip() else '' (or otherwise ensure a leading ", " is added
when non-empty) and keep the memory_query and cpu_query f-strings unchanged to
produce valid matchers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 52b71b99-4e00-4662-9716-f941e71d5240

📥 Commits

Reviewing files that changed from the base of the PR and between 7de0fd5 and 8284b08.

📒 Files selected for processing (1)

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py

tjmoyes · 2026-03-26T16:57:32Z

@coderabbitai resume

@coderabbitai review

coderabbitai · 2026-03-26T16:57:47Z

@tjmoyes Sure! Resuming reviews and kicking off a review of the PR now.

✅ Actions performed

Reviews resumed.

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py`:
- Around line 253-259: The memory_query and cpu_query use
sum(kube_node_status_capacity{...}) which double-counts nodes when metrics are
scraped from multiple targets; change both queries to first deduplicate by node
then sum the per-node totals, e.g. use sum(sum by (node)
(kube_node_status_capacity{ resource="memory" {cluster_label} })) and the
analogous expression for cpu, ensuring you keep the existing cluster_label
interpolation and update the memory_query and cpu_query strings in
prometheus_metrics_service.py.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 32a98242-5117-4124-8155-eace6f1d2375

📥 Commits

Reviewing files that changed from the base of the PR and between 7de0fd5 and 433fca8.

📒 Files selected for processing (1)

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py

coderabbitai · 2026-03-26T17:02:35Z

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py

        memory_query = f"""
-            sum(max by (instance) (machine_memory_bytes{{ {single_cluster_label} }}))
+            sum(kube_node_status_capacity{{ resource="memory" {cluster_label} }})
        """
+
        cpu_query = f"""
-            sum(max by (instance) (machine_cpu_cores{{ {single_cluster_label} }}))
+            sum(kube_node_status_capacity{{ resource="cpu" {cluster_label} }})
        """


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail PROM="${PROMETHEUS_URL:?Set PROMETHEUS_URL (e.g. https://prom.example.com)}" queries=( 'sum(kube_node_status_capacity{resource="memory"})' 'sum(max by (node) (kube_node_status_capacity{resource="memory"}))' 'sum(kube_node_status_capacity{resource="cpu"})' 'sum(max by (node) (kube_node_status_capacity{resource="cpu"}))' ) for q in "${queries[@]}"; do echo "QUERY: $q" curl -sG "$PROM/api/v1/query" --data-urlencode "query=$q" | jq -r '.data.result[0].value // "no result"' echo done

Repository: robusta-dev/krr

Length of output: 144

🏁 Script executed:

# Read the file at the specified lines cat -n robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py | sed -n '240,270p'

Repository: robusta-dev/krr

Length of output: 1751

🏁 Script executed:

# Check the get_prometheus_cluster_label function referenced in the scratchpad grep -n "get_prometheus_cluster_label" robusta_krr/core/integrations/prometheus/metrics/base.py | head -5

Repository: robusta-dev/krr

Length of output: 112

🏁 Script executed:

# Find where these queries are used rg "memory_query|cpu_query" --context 3 -A 2 -B 2 robusta_krr/core/integrations/prometheus/metrics_service/

Repository: robusta-dev/krr

Length of output: 2031

🏁 Script executed:

# Check the implementation of get_prometheus_cluster_label cat -n robusta_krr/core/integrations/prometheus/metrics/base.py | sed -n '75,95p'

Repository: robusta-dev/krr

Length of output: 979

🏁 Script executed:

# Look for the class definition and understand the context grep -n "class.*PrometheusMetricsService" robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py

Repository: robusta-dev/krr

Length of output: 109

🏁 Script executed:

# Search for kube_node_status_capacity usage in the codebase to understand metric structure rg "kube_node_status_capacity" robusta_krr/

Repository: robusta-dev/krr

Length of output: 393

🏁 Script executed:

# Check if there's any documentation or references to the metric labels rg "node.*resource.*capacity" robusta_krr/ -A 2 -B 2

Repository: robusta-dev/krr

Length of output: 41

🏁 Script executed:

# Look for any test data or example Prometheus queries find robusta_krr -name "*.py" -type f | xargs grep -l "kube_state_metrics\|kube-state-metrics" | head -5

Repository: robusta-dev/krr

Length of output: 41

🏁 Script executed:

# Search for how this metric is documented or used in comments rg "kube_node_status_capacity" -B 5 -A 5 robusta_krr/

Repository: robusta-dev/krr

Length of output: 1965

🏁 Script executed:

# Check the structure of the queries - look for any other aggregations in similar contexts rg "kube_pod_container_resource_requests|kube_node" robusta_krr/core/integrations/prometheus/metrics_service/ -A 3 -B 1 | head -40

Repository: robusta-dev/krr

Length of output: 2096

🏁 Script executed:

# Look at the similar query patterns for kube_system to understand the aggregation style cat -n robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py | sed -n '260,265p'

Repository: robusta-dev/krr

Length of output: 501

🏁 Script executed:

# Check if there are any tests or documentation about this query rg "get_cluster_summary" robusta_krr/ -A 10 -B 2 | head -50

Repository: robusta-dev/krr

Length of output: 4489

🏁 Script executed:

# Research kube-state-metrics documentation about kube_node_status_capacity metric # Check if there are any comments or references in the codebase about metric labels rg "node.*label\|resource.*label" robusta_krr/ | head -20

Repository: robusta-dev/krr

Length of output: 41

🏁 Script executed:

# Check the kube-state-metrics version or any vendor-specific documentation find robusta_krr -name "*.md" -o -name "*.txt" | xargs grep -l "kube.state\|prometheus" 2>/dev/null | head -5

Repository: robusta-dev/krr

Length of output: 41

🌐 Web query:

kube_node_status_capacity kube-state-metrics metric labels

💡 Result:

The kube_node_status_capacity metric from kube-state-metrics is a Gauge that represents the total amount of resources available (capacity) for a Kubernetes node. Labels: - node: The node address/name (e.g., node=) - resource: The name of the resource (e.g., cpu, memory, ephemeral_storage, pods, attachable_volumes_, hugepages_) Example values: - cpu= - memory= - ephemeral_storage= - pods= Status: STABLE This metric is sourced from the Kubernetes Node .status.capacity field.

Citations:

1: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/cluster/node-metrics.md

2: https://github.com/kubernetes/kube-state-metrics/blob/3193891ee442ee9552745d054c8d0140295431f3/docs/node-metrics.md

3: https://docs.cloudera.com/management-console/1.5.4/monitoring-metrics/topics/cdppvc_ds_kube_node_status_capacity_trics.html

🏁 Script executed:

# Check if there are any tests or example configurations find robusta_krr -name "*test*" -type f -path "*prometheus*" | xargs grep -l "kube_node_status" 2>/dev/null | head -3

Repository: robusta-dev/krr

Length of output: 41

🏁 Script executed:

# Look for any existing aggregation patterns for node metrics in the codebase rg "by $node$" robusta_krr/ -B 2 -A 2

Repository: robusta-dev/krr

Length of output: 41

🏁 Script executed:

# Verify there are no other similar problematic sum() aggregations for node metrics rg "sum\(kube_node" robusta_krr/

Repository: robusta-dev/krr

Length of output: 393

🏁 Script executed:

# Double-check the cluster_label format one more time cat -n robusta_krr/core/integrations/prometheus/metrics/base.py | sed -n '79,88p'

Repository: robusta-dev/krr

Length of output: 523

Deduplicate node capacity series before summing.

sum(kube_node_status_capacity...) counts the same node's capacity multiple times when the metric appears in duplicate scrape targets or jobs (e.g., HA kube-state-metrics). Aggregate by node first to deduplicate, then sum.

🔧 Proposed fix

memory_query = f""" - sum(kube_node_status_capacity{{ resource="memory" {cluster_label} }}) + sum(max by (node) (kube_node_status_capacity{{ resource="memory" {cluster_label} }})) """ cpu_query = f""" - sum(kube_node_status_capacity{{ resource="cpu" {cluster_label} }}) + sum(max by (node) (kube_node_status_capacity{{ resource="cpu" {cluster_label} }})) """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py` around lines 253 - 259, The memory_query and cpu_query use sum(kube_node_status_capacity{...}) which double-counts nodes when metrics are scraped from multiple targets; change both queries to first deduplicate by node then sum the per-node totals, e.g. use sum(sum by (node) (kube_node_status_capacity{ resource="memory" {cluster_label} })) and the analogous expression for cpu, ensuring you keep the existing cluster_label interpolation and update the memory_query and cpu_query strings in prometheus_metrics_service.py.

Updated memory_query and cpu_query to not use deprecated version, whi…

6700a5d

…ch no longer works in Azure Monitored Prometheus

Merge branch 'main' into feature/update-deprecated-prometheus-mem-cpu…

9988554

…-queries

coderabbitai bot reviewed Oct 26, 2025

View reviewed changes

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py Show resolved Hide resolved

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py Show resolved Hide resolved

Merge branch 'main' into feature/update-deprecated-prometheus-mem-cpu…

fc104cb

…-queries

tjmoyes force-pushed the feature/update-deprecated-prometheus-mem-cpu-queries branch 2 times, most recently from e268dd1 to 7de0fd5 Compare March 26, 2026 16:29

Fixing single quotes to double quotes per Coderabbit recommendation

8284b08

tjmoyes force-pushed the feature/update-deprecated-prometheus-mem-cpu-queries branch from 7de0fd5 to 8284b08 Compare March 26, 2026 16:33

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py Outdated Show resolved Hide resolved

Remove single_cluster_label, Coderabbit recommendation

433fca8

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update memory_query and cpu_query for Prometheus#420

Update memory_query and cpu_query for Prometheus#420
tjmoyes wants to merge 5 commits intorobusta-dev:mainfrom
tjmoyes:feature/update-deprecated-prometheus-mem-cpu-queries

tjmoyes commented Mar 25, 2025

Uh oh!

CLAassistant commented Mar 25, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 26, 2025 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

tjmoyes commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tjmoyes commented Mar 25, 2025

Uh oh!

CLAassistant commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tjmoyes commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Mar 25, 2025 •

edited

Loading

coderabbitai bot commented Oct 26, 2025 •

edited

Loading