-
Notifications
You must be signed in to change notification settings - Fork 0
add E2E testing framework #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b489f19
2cafa48
c8c4798
e566126
03fe2e4
2463626
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -57,6 +57,10 @@ helm-lint: ## Run helm lint for Helm chart | |
| test: ## Run unit tests | ||
| $(GOTEST) -v ./... | ||
|
|
||
| .PHONY: e2e-test | ||
| e2e-test: ## Run E2E tests | ||
| @cd e2e-tests && ./scripts/run-tests.sh | ||
|
|
||
| .PHONY: test-coverage-and-junit | ||
| test-coverage-and-junit: ## Run unit tests with coverage and junit output | ||
| go install github.com/jstemmer/go-junit-report/[email protected] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| # StackRox MCP E2E Testing | ||
|
|
||
| End-to-end tests for the StackRox MCP server using [gevals](https://github.com/genmcp/gevals). | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Go 1.25+ | ||
| - Google Cloud Project with Vertex AI enabled (for Claude agent) | ||
| - OpenAI API Key (for LLM judge) | ||
| - StackRox API Token | ||
|
|
||
| ## Setup | ||
|
|
||
| ### 1. Build gevals | ||
|
|
||
| ```bash | ||
| cd e2e-tests | ||
| ./scripts/build-gevals.sh | ||
| ``` | ||
|
|
||
| ### 2. Configure Environment | ||
|
|
||
| Create `.env` file: | ||
|
|
||
| ```bash | ||
| # Required: GCP Project for Vertex AI (Claude agent) | ||
| ANTHROPIC_VERTEX_PROJECT_ID=<GCP Project ID> | ||
|
|
||
| # Required: StackRox Central API Token | ||
| STACKROX_MCP__CENTRAL__API_TOKEN=<StackRox API Token> | ||
|
|
||
| # Required: OpenAI API Key (for LLM judge) | ||
| OPENAI_API_KEY=<OpenAI API Key> | ||
|
|
||
| # Optional: Vertex AI region (defaults to us-east5) | ||
| CLOUD_ML_REGION=us-east5 | ||
|
|
||
| # Optional: Judge configuration (defaults to OpenAI) | ||
| JUDGE_MODEL_NAME=gpt-5-nano | ||
| ``` | ||
|
|
||
| ## Running Tests | ||
|
|
||
| ```bash | ||
| ./scripts/run-tests.sh | ||
| ``` | ||
|
|
||
| Results are saved to `gevals/gevals-stackrox-mcp-e2e-out.json`. | ||
|
|
||
| ### View Results | ||
|
|
||
| ```bash | ||
| # Summary | ||
| jq '.[] | {taskName, taskPassed}' gevals/gevals-stackrox-mcp-e2e-out.json | ||
|
|
||
| # Tool calls | ||
| jq '.[].callHistory[] | {toolName, arguments}' gevals/gevals-stackrox-mcp-e2e-out.json | ||
| ``` | ||
|
|
||
| ## Test Cases | ||
|
|
||
| | Test | Description | Tool | | ||
| |------|-------------|------| | ||
| | `list-clusters` | List all clusters | `list_clusters` | | ||
| | `cve-detected-workloads` | CVE detected in deployments | `get_deployments_for_cve` | | ||
| | `cve-detected-clusters` | CVE detected in clusters | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-nonexistent` | Handle non-existent CVE | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-cluster-does-exist` | CVE with cluster filter | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-cluster-does-not-exist` | CVE with cluster filter | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-clusters-general` | General CVE query | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-cluster-list` | CVE across clusters | `get_clusters_with_orchestrator_cve` | | ||
|
|
||
| ## Configuration | ||
|
|
||
| - **`gevals/eval.yaml`**: Main test configuration, agent settings, assertions | ||
| - **`gevals/mcp-config.yaml`**: MCP server configuration | ||
| - **`gevals/tasks/*.yaml`**: Individual test task definitions | ||
|
|
||
| ## How It Works | ||
|
|
||
| Gevals uses a proxy architecture to intercept MCP tool calls: | ||
|
|
||
| 1. AI agent receives task prompt | ||
| 2. Agent calls MCP tool | ||
| 3. Gevals proxy intercepts and records the call | ||
| 4. Call forwarded to StackRox MCP server | ||
| 5. Server executes and returns result | ||
| 6. Gevals validates assertions and response quality | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| **Tests fail - no tools called** | ||
| - Verify StackRox Central is accessible | ||
| - Check API token permissions | ||
|
|
||
| **Build errors** | ||
| ```bash | ||
| go mod tidy | ||
janisz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ./scripts/build-gevals.sh | ||
| ``` | ||
|
|
||
| ## Further Reading | ||
|
|
||
| - [Gevals Documentation](https://github.com/genmcp/gevals) | ||
| - [StackRox MCP Server](../README.md) | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,110 @@ | ||||||
| kind: Eval | ||||||
| metadata: | ||||||
| name: "stackrox-mcp-e2e" | ||||||
| config: | ||||||
| agent: | ||||||
| type: "builtin.claude-code" | ||||||
| model: "claude-sonnet-4-5" | ||||||
| llmJudge: | ||||||
| env: | ||||||
| baseUrlKey: JUDGE_BASE_URL | ||||||
| apiKeyKey: JUDGE_API_KEY | ||||||
| modelNameKey: JUDGE_MODEL_NAME | ||||||
| mcpConfigFile: mcp-config.yaml | ||||||
| taskSets: | ||||||
| # Assertion Fields Explained: | ||||||
| # - toolsUsed: List of tools that MUST be called at least once | ||||||
| # - minToolCalls: Minimum TOTAL number of tool calls across ALL tools (not per-tool) | ||||||
| # - maxToolCalls: Maximum TOTAL number of tool calls across ALL tools (prevents runaway tool usage) | ||||||
| # Example: If maxToolCalls=3, the agent can make up to 3 tool calls total in the test, | ||||||
| # regardless of which tools are called. | ||||||
|
|
||||||
| # Test 1: List clusters | ||||||
| - path: tasks/list-clusters.yaml | ||||||
| assertions: | ||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "list_clusters" | ||||||
| minToolCalls: 1 | ||||||
| maxToolCalls: 1 | ||||||
|
|
||||||
| # Test 2: CVE detected in workloads | ||||||
| # Claude does comprehensive CVE checking (orchestrator, deployments, nodes) | ||||||
| - path: tasks/cve-detected-workloads.yaml | ||||||
| assertions: | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to define
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nope, actually we can define glob instead of path and assert tools used for the whole suite (all tasks) and rely on verify and judge to score them |
||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "get_deployments_for_cve" | ||||||
| argumentsMatch: | ||||||
| cveName: "CVE-2021-31805" | ||||||
| minToolCalls: 1 | ||||||
| maxToolCalls: 3 | ||||||
|
|
||||||
| # Test 3: CVE detected in clusters - basic | ||||||
| - path: tasks/cve-detected-clusters.yaml | ||||||
| assertions: | ||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||||||
| argumentsMatch: | ||||||
| cveName: "CVE-2016-1000031" | ||||||
| minToolCalls: 1 | ||||||
| maxToolCalls: 3 | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nope, tools used are expected (required calls) and max tools calls is the total tools calls |
||||||
|
|
||||||
| # Test 4: Non-existent CVE | ||||||
| # Expects 3 calls because "Is CVE detected in my clusters?" triggers comprehensive check | ||||||
| # (orchestrator, deployments, nodes). The LLM cannot know beforehand if CVE exists. | ||||||
| - path: tasks/cve-nonexistent.yaml | ||||||
| assertions: | ||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||||||
| argumentsMatch: | ||||||
| cveName: "CVE-2099-00001" | ||||||
| minToolCalls: 1 | ||||||
| maxToolCalls: 3 | ||||||
|
|
||||||
| # Test 5: CVE with specific cluster filter (does exist) | ||||||
| # Claude does comprehensive checking even for single cluster (orchestrator, deployments, nodes) | ||||||
| - path: tasks/cve-cluster-does-exist.yaml | ||||||
| assertions: | ||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "list_clusters" | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||||||
| argumentsMatch: | ||||||
| cveName: "CVE-2016-1000031" | ||||||
| minToolCalls: 2 | ||||||
| maxToolCalls: 4 | ||||||
|
|
||||||
| # Test 6: CVE with specific cluster filter (does not exist) | ||||||
| - path: tasks/cve-cluster-does-not-exist.yaml | ||||||
| assertions: | ||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "list_clusters" | ||||||
| minToolCalls: 1 | ||||||
| maxToolCalls: 2 | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LLM should not fetch list of clusters twice:
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made it 2 to pass, I think claude decides to get more data sometimes |
||||||
|
|
||||||
| # Test 7: CVE detected in clusters - general | ||||||
| - path: tasks/cve-clusters-general.yaml | ||||||
| assertions: | ||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||||||
| argumentsMatch: | ||||||
| cveName: "CVE-2021-31805" | ||||||
| minToolCalls: 1 | ||||||
| maxToolCalls: 5 | ||||||
|
|
||||||
| # Test 8: CVE check with cluster list reference | ||||||
| - path: tasks/cve-cluster-list.yaml | ||||||
| assertions: | ||||||
| toolsUsed: | ||||||
| - server: stackrox-mcp | ||||||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||||||
| argumentsMatch: | ||||||
| cveName: "CVE-2024-52577" | ||||||
| minToolCalls: 1 | ||||||
| maxToolCalls: 5 | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| mcpServers: | ||
| stackrox-mcp: | ||
| command: go | ||
| args: | ||
| - run | ||
| - ../../cmd/stackrox-mcp/... | ||
| - --config | ||
| - ../stackrox-mcp-e2e-config.yaml | ||
| # API token loaded from parent shell environment (.env file) | ||
| # No env section = full environment inheritance | ||
| # Auto-approve all tools | ||
| enableAllTools: true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-cluster-does-exist" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2016-1000031 detected in cluster name staging-central-cluster?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2016-1000031 is detected in staging-central-cluster" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-cluster-does-not-exist" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2024-52577 detected in cluster name prod-hal-9000?" | ||
| verify: | ||
| contains: "A response indicating that CVE-2024-52577 is not detected in prod-hal-9000 cluster, or that the cluster does not exist" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-cluster-list" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2024-52577 detected in any of my clusters?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2024-52577 is detected in clusters" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-clusters-general" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2021-31805 detected in my clusters?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2021-31805 is detected in any clusters, with specific cluster names if detected, or a statement that no clusters have this CVE." |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-detected-clusters" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2016-1000031 detected in my clusters?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2016-1000031 is detected in clusters, nodes, or deployments" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-detected-workloads" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2021-31805 detected in my workloads?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2021-31805 is detected in workloads/deployments, with specific deployment names if detected, or a statement that no deployments have this CVE." |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-nonexistent" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is CVE-2099-00001 detected in my clusters?" | ||
| verify: | ||
| contains: "A response indicating that CVE-2099-00001 is not found or not detected in any clusters" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "list-clusters" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "List my clusters" | ||
| verify: | ||
| contains: "A response containing a list of cluster names" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| cd "$(dirname "$0")/.." | ||
|
|
||
| echo "Building gevals from tool dependencies..." | ||
| go build -o bin/gevals github.com/genmcp/gevals/cmd/gevals | ||
janisz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| echo "gevals built successfully: bin/gevals" | ||
| ./bin/gevals help | ||
Uh oh!
There was an error while loading. Please reload this page.