add E2E testing framework #26

mtodor · 2026-01-19T17:02:41Z

Is it possible to define assertions in tasks file? i.e. in for this case tasks/cve-affecting-workloads.yaml

nope, actually we can define glob instead of path and assert tools used for the whole suite (all tasks) and rely on verify and judge to score them

mtodor · 2026-01-19T17:15:12Z

What maxToolCalls means? Tools defined in toolsUsed can be called up to 3 times?

nope, tools used are expected (required calls) and max tools calls is the total tools calls

mtodor · 2026-01-19T17:16:06Z

LLM should not fetch list of clusters twice:

Suggested change

maxToolCalls: 2

maxToolCalls: 1

I made it 2 to pass, I think claude decides to get more data sometimes

-Original file line number
+Diff line change
@@ Expand Up / @@ -16,3 +16,9 @@ @@
     # Lint output
     /report.xml
+    # E2E tests
+    /e2e-tests/.env
+    /e2e-tests/mcp-reports/
+    /e2e-tests/bin/
+    /e2e-tests/**/*-out.json

-Original file line number
+Diff line change
@@ Expand Up / @@ -57,6 +57,10 @@ helm-lint: ## Run helm lint for Helm chart @@
     test: ## Run unit tests
     	$(GOTEST) -v ./...
+    .PHONY: e2e-test
+    e2e-test: ## Run E2E tests
+    	@cd e2e-tests && ./scripts/run-tests.sh
     .PHONY: test-coverage-and-junit
     test-coverage-and-junit: ## Run unit tests with coverage and junit output
     	go install github.com/jstemmer/go-junit-report/[email protected]
@@ Expand Down @@

-Original file line number
+Diff line change
@@ -0,0 +1,105 @@
+    # StackRox MCP E2E Testing
+    End-to-end tests for the StackRox MCP server using [gevals](https://github.com/genmcp/gevals).
+    ## Prerequisites
+    - Go 1.25+
+    - Google Cloud Project with Vertex AI enabled (for Claude agent)
+    - OpenAI API Key (for LLM judge)
+    - StackRox API Token
+    ## Setup
+    ### 1. Build gevals
+    ```bash
+    cd e2e-tests
+    ./scripts/build-gevals.sh
+    ```
+    ### 2. Configure Environment
+    Create `.env` file:
+    ```bash
+    # Required: GCP Project for Vertex AI (Claude agent)
+    ANTHROPIC_VERTEX_PROJECT_ID=<GCP Project ID>
+    # Required: StackRox Central API Token
+    STACKROX_MCP__CENTRAL__API_TOKEN=<StackRox API Token>
+    # Required: OpenAI API Key (for LLM judge)
+    OPENAI_API_KEY=<OpenAI API Key>
+    # Optional: Vertex AI region (defaults to us-east5)
+    CLOUD_ML_REGION=us-east5
+    # Optional: Judge configuration (defaults to OpenAI)
+    JUDGE_MODEL_NAME=gpt-5-nano
+    ```
+    ## Running Tests
+    ```bash
+    ./scripts/run-tests.sh
+    ```
+    Results are saved to `gevals/gevals-stackrox-mcp-e2e-out.json`.
+    ### View Results
+    ```bash
+    # Summary
+    jq '.[] | {taskName, taskPassed}' gevals/gevals-stackrox-mcp-e2e-out.json
+    # Tool calls
+    jq '.[].callHistory[] | {toolName, arguments}' gevals/gevals-stackrox-mcp-e2e-out.json
+    ```
+    ## Test Cases
+    | Test | Description | Tool |
+    |------|-------------|------|
+    | `list-clusters` | List all clusters | `list_clusters` |
+    | `cve-detected-workloads` | CVE detected in deployments | `get_deployments_for_cve` |
+    | `cve-detected-clusters` | CVE detected in clusters | `get_clusters_with_orchestrator_cve` |
+    | `cve-nonexistent` | Handle non-existent CVE | `get_clusters_with_orchestrator_cve` |
+    | `cve-cluster-does-exist` | CVE with cluster filter | `get_clusters_with_orchestrator_cve` |
+    | `cve-cluster-does-not-exist` | CVE with cluster filter | `get_clusters_with_orchestrator_cve` |
+    | `cve-clusters-general` | General CVE query | `get_clusters_with_orchestrator_cve` |
+    | `cve-cluster-list` | CVE across clusters | `get_clusters_with_orchestrator_cve` |
+    ## Configuration
+    - **`gevals/eval.yaml`**: Main test configuration, agent settings, assertions
+    - **`gevals/mcp-config.yaml`**: MCP server configuration
+    - **`gevals/tasks/*.yaml`**: Individual test task definitions
+    ## How It Works
+    Gevals uses a proxy architecture to intercept MCP tool calls:
+. AI agent receives task prompt
+. Agent calls MCP tool
+. Gevals proxy intercepts and records the call
+. Call forwarded to StackRox MCP server
+. Server executes and returns result
+. Gevals validates assertions and response quality
+    ## Troubleshooting
+    **Tests fail - no tools called**
+    - Verify StackRox Central is accessible
+    - Check API token permissions
+    **Build errors**
+    ```bash
+    go mod tidy
+    ./scripts/build-gevals.sh
+    ```
+    ## Further Reading
+    - [Gevals Documentation](https://github.com/genmcp/gevals)
+    - [StackRox MCP Server](../README.md)

-Original file line number
+Diff line change
@@ -0,0 +1,12 @@
+    mcpServers:
+      stackrox-mcp:
+        command: go
+        args:
+          - run
+          - ../../cmd/stackrox-mcp/...
+          - --config
+          - ../stackrox-mcp-e2e-config.yaml
+        # API token loaded from parent shell environment (.env file)
+        # No env section = full environment inheritance
+        # Auto-approve all tools
+        enableAllTools: true

-Original file line number
+Diff line change
@@ -0,0 +1,9 @@
+    kind: Task
+    metadata:
+      name: "cve-cluster-does-exist"
+      difficulty: easy
+    steps:
+      prompt:
+        inline: "Is this CVE-2016-1000031 detected in cluster name staging-central-cluster?"
+      verify:
+        contains: "A response indicating whether CVE-2016-1000031 is detected in staging-central-cluster"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add E2E testing framework #26

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

mtodor Jan 19, 2026

Uh oh!

janisz Jan 29, 2026

Uh oh!

mtodor Jan 19, 2026

Uh oh!

janisz Jan 29, 2026

Uh oh!

mtodor Jan 19, 2026

Uh oh!

janisz Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

-Original file line number
+Diff line change
@@ -0,0 +1,10 @@
+    #!/bin/bash
+    set -e
+    cd "$(dirname "$0")/.."
+    echo "Building gevals from tool dependencies..."
+    go build -o bin/gevals github.com/genmcp/gevals/cmd/gevals
+    echo "gevals built successfully: bin/gevals"
+    ./bin/gevals help

add E2E testing framework #26

Are you sure you want to change the base?

Uh oh!

add E2E testing framework #26

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

mtodor Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

janisz Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

mtodor Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

janisz Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

mtodor Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

janisz Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!