Skip to content

✨ feat(stress): add distributed stress testing infrastructure#306

Closed
sodre wants to merge 88 commits intomainfrom
perf/stress-test
Closed

✨ feat(stress): add distributed stress testing infrastructure#306
sodre wants to merge 88 commits intomainfrom
perf/stress-test

Conversation

@sodre
Copy link
Member

@sodre sodre commented Feb 2, 2026

Summary

  • Add Locust-based load testing with Fargate Spot master and Lambda workers
  • Implement distributed architecture: orchestrator auto-scales Lambda workers based on user count
  • Add CloudFormation template for load test infrastructure (ECS, Lambda, VPC endpoints)
  • Add CLI commands: load deploy, load connect, load run, load calibrate, load teardown, load list
  • Add Docker and Lambda builders for deployment artifacts
  • Add RateLimiterUser and RateLimiterSession for Locust integration with instrumented events
  • Add load calibrate command: binary-search for optimal per-worker concurrency using Little's Law efficiency metric

Calibration Results

Lambda calibration (90% efficiency threshold, us-east-1, 1 vCPU Lambda, 60s per step):

max_rps.py — MaxRpsUser (zero wait, max throughput)

Users/Lambda RPS p50 p95 p99 Reqs Efficiency
1 122.0 8ms 9ms 12ms 7,321 100% (baseline)
2 250.1 8ms 9ms 11ms 15,009 100%
3 363.6 8ms 9ms 13ms 21,818 100%
4 376.4 9ms 14ms 17ms 22,595 89%
5 388.8 11ms 17ms 20ms 23,333 73%
40 392.1 69ms 96ms 140ms 23,726 12%

Optimal: 3 users/lambda — p50 stays at floor (8ms), ~364 RPS per worker.

max_rps.py — MaxRpsCascadeUser (zero wait, cascade overhead)

Users/Lambda RPS p50 p95 p99 Reqs Efficiency
1 100.4 9ms 11ms 23ms 6,027 100% (baseline)
2 176.1 10ms 15ms 21ms 10,570 90%
3 179.4 16ms 22ms 27ms 10,769 56%
5 185.2 26ms 37ms 45ms 11,129 35%
40 166.6 160ms 260ms 890ms 10,077 6%

Optimal: 2 users/lambda — p50 stays near floor (10ms), ~176 RPS per worker. Cascade adds ~2ms floor latency vs non-cascade.

simple.py — SimpleUser (wait 0.1–1.0s, realistic traffic)

Users/Lambda RPS p50 p95 p99 Reqs Efficiency
1 2.0 9ms 10ms 28ms 117 100% (baseline)
40 68.0 10ms 33ms 140ms 4,091 90%

Optimal: 40+ users/lambda — p50 stays at floor (10ms), wait time dominates. No GIL pressure at any concurrency.

simple.py — SimpleCascadeUser (wait 0.1–1.0s, cascade)

Users/Lambda RPS p50 p95 p99 Reqs Efficiency
1 1.7 9ms 13ms 60ms 100 100% (baseline)
40 69.0 9ms 20ms 210ms 4,140 100%

Optimal: 40+ users/lambda — p50 stays at floor (9ms), wait time dominates. No GIL pressure at any concurrency.

Test plan

  • Unit tests for Locust integration (61 tests)
  • Unit tests for Docker builder (21 tests)
  • Unit tests for Lambda worker (22 tests)
  • Unit tests for orchestrator, CLI, CloudFormation
  • All 2074 unit tests passing
  • Lambda calibration: MaxRpsUser (364 RPS, p50=8ms, p95=9ms, p99=13ms at 3 users/lambda)
  • Lambda calibration: MaxRpsCascadeUser (176 RPS, p50=10ms, p95=15ms, p99=21ms at 2 users/lambda)
  • Lambda calibration: SimpleUser (68 RPS, p50=10ms, p95=33ms, p99=140ms at 40 users/lambda)
  • Lambda calibration: SimpleCascadeUser (69 RPS, p50=9ms, p95=20ms, p99=210ms at 40 users/lambda)

🤖 Generated with Claude Code

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.09%. Comparing base (38a0e68) to head (42438bd).
⚠️ Report is 41 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #306      +/-   ##
==========================================
+ Coverage   96.96%   97.09%   +0.13%     
==========================================
  Files          24       25       +1     
  Lines        5099     5343     +244     
==========================================
+ Hits         4944     5188     +244     
  Misses        155      155              
Flag Coverage Δ
doctest 36.32% <1.62%> (-1.68%) ⬇️
e2e 53.15% <4.06%> (-1.45%) ⬇️
integration 60.86% <4.06%> (-1.98%) ⬇️
unit 96.78% <100.00%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 01539fa Previous: 38a0e68 Ratio
tests/benchmark/test_latency.py::TestLatencyComparison::test_two_limits 235.00870426785562 iter/sec (stddev: 0.00850320509097727) 288.3994402811384 iter/sec (stddev: 0.00014876579195594755) 1.23
tests/benchmark/test_operations.py::TestTransactionOverheadBenchmarks::test_transactional_acquire 299.2736512902029 iter/sec (stddev: 0.0082067744513638) 374.0754667581391 iter/sec (stddev: 0.0005332760263343914) 1.25
tests/benchmark/test_operations.py::TestConfigLookupBenchmarks::test_acquire_cold_config 227.09609023350396 iter/sec (stddev: 0.008100116269139448) 283.85108120986445 iter/sec (stddev: 0.00035752013219356377) 1.25

This comment was automatically generated by workflow using github-action-benchmark.

@sodre sodre marked this pull request as draft February 4, 2026 04:32
@sodre sodre added this to the v1.0.0 milestone Feb 4, 2026
@sodre sodre force-pushed the perf/stress-test branch 2 times, most recently from 4053f02 to c515738 Compare February 8, 2026 09:35
sodre and others added 13 commits February 9, 2026 22:28
These outputs allow the stress test stack to import IAM configuration
from the target stack, ensuring consistent permission boundaries and
providing an S3 bucket for large Lambda packages.

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Configuration dataclasses for stress testing with:
- Whale entity (50% traffic)
- Spike entity (3% baseline + 1500 RPM bursts)
- Power law distribution (47% across remaining entities)

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implements traffic distribution:
- Whale entity receives ~50% of traffic
- Spike entity has baseline + periodic bursts
- Power law (Zipf) distribution for remaining entities

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…cture

Creates:
- ECS cluster with Fargate Spot capacity provider
- Locust master task definition and service (desired=0)
- Lambda functions with placeholder code
- VPC endpoints for SSM (ECS Exec)
- Security groups for master and workers
- IAM roles with optional permission boundary

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Builds Locust master image with zae-limiter:
- Auto-detects dev mode (builds wheel) vs installed (uses PyPI version)
- Creates build context with Dockerfile and locustfile
- Pushes to ECR repository

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Builds Lambda deployment zip with:
- locust and gevent dependencies
- zae-limiter (from wheel or PyPI)
- stress_lambda handler code
- locustfile and distribution modules

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Worker handler:
- Headless mode: self-contained test, returns stats
- Worker mode: connects to Fargate master

Setup handler:
- Creates whale and spike entities
- Configures system and resource defaults
- Creates entities with custom limits

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Implements RateLimiterUser with:
- acquire_tokens task (100 weight) - primary workload
- check_available task (5 weight) - read-only checks
- Shared SyncRateLimiter across users
- Traffic distribution via TrafficDistributor
- Proper Locust event firing for stats

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Commands:
- stress deploy: Deploy Fargate + Lambda infrastructure
- stress setup: Create test entities and configure limits
- stress connect: Print SSM port-forward command
- stress teardown: Delete stress test stack

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds [stress] extra with:
- locust>=2.20
- gevent>=23.0

Install with: pip install 'zae-limiter[stress]'

Refs: discussion#7

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The stress module contains code that requires AWS/Docker/Locust runtime
to test (Lambda handlers, CLI commands, locustfile). Excluding from
coverage measurement via [tool.coverage.run] omit.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Use nest_asyncio to allow nested event loops with gevent
- Fix stats collection: initialize env.stats before starting runner
- Use self.environment.events instead of global events for request firing
- Change from SyncRateLimiter to async RateLimiter with _run_async wrapper
- Use gevent.sleep() instead of greenlet.join() for proper greenlet execution
- Add RoleNameFormat parameter to CFN templates for IAM role naming
- Change Lambda timeout from 900s to 60s for faster iteration
- Use AdminPolicyArn for Lambda worker role (TransactWriteItems needs underlying permissions)
- Add nest-asyncio dependency to Lambda package

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove per-user startup logging
- Replace per-entry stats with single summary line

Co-Authored-By: Claude Opus 4.5 <[email protected]>
sodre and others added 12 commits February 9, 2026 22:28
…er detection

Add --benchmark-disable to addopts so benchmark warnings don't appear
during regular test runs. CI benchmark steps now pass --benchmark-enable
explicitly.

Fix pytest_ignore_collect to also detect xdist worker processes via
hasattr(config, 'workerinput'), since workers don't have numprocesses
set and were still importing gevent files during collection.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Cover the two missing lines flagged by diff-cover:
- locust.py:59 — _configure_boto3_pool early return on second call
- discovery.py:118 — list_limiters stack_type filter branch

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Update test expectations across load test suite to match recent changes:
- TestScalingConfig: users_per_worker=10, remove rps_per_worker
- TestCalculateDesiredWorkers: new scaling math without rps_per_worker
- TestMain: update worker counts, remove DESIRED_WORKERS/RPS_PER_WORKER env
- TestGenerateDockerfile: LOCUSTFILE set at runtime, not baked in
- TestDeployCommand: mock AcquireOnlyPolicyArn/FullAccessPolicyArn outputs
- TestConnectCommand: add required -f locustfile to all 16 invocations
- TestBuildTaskOverrides: remove desired_workers/rps_per_worker params
- TestSetupCommand: remove test for nonexistent setup subcommand
- Document gevent marker and two-step unit test workflow

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…oint reuse

Add --policy-name-format to the required IAM flags alongside --permission-boundary
and --role-name-format. PowerUserAccess requires all three for both iam:CreateRole
and iam:CreatePolicy. Document common mistakes (missing policy format, using --no-iam)
and add copy-paste deploy/benchmark commands to aws-testing.md.

Update CLAUDE.md load testing section with step 0 (deploy limiter stack first).

Fix load deploy to skip DynamoDB VPC endpoint creation when one already exists,
preventing CloudFormation AlreadyExists errors on redeployment.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add --capacity-provider option to `load deploy` (FARGATE_SPOT default,
FARGATE for on-demand). FARGATE_SPOT is only registered on the ECS
cluster when selected, so partitions without Spot support (GovCloud,
China) work correctly with --capacity-provider FARGATE.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Discard unused return values from client.images.build() (builder.py)
- Log warning on teardown scale-down failure instead of bare pass (cli.py)
- Refactor _boto3_pool_configured global to function attribute pattern
  in both worker.py and locust.py to satisfy CodeQL analysis
- Use importlib.import_module() to avoid mixed import styles in tests

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add MaxRpsCascadeUser and SimpleCascadeUser to locustfiles for measuring
cascade overhead. Add --user-classes CLI option to benchmark command to
select specific User classes without needing --class-picker UI.

Bump system default limits to 1B RPM to avoid false rate limiting during
throughput benchmarks.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace hardcoded ThreadPoolExecutor with configurable parallel_mode
parameter ("auto", "gevent", "threadpool", "serial") on SyncRepository.
ThreadPoolExecutor caused 58% cascade throughput regression under
concurrent load on Lambda (1 vCPU, GIL contention).

Auto mode silently picks the best strategy: gevent (if monkey-patched),
serial (single-CPU), or threadpool (multi-CPU). Explicit modes warn on
suboptimal conditions. Resolution happens once at __init__ time.
ThreadPoolExecutor is created lazily on first cascade request.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…AWS cascade tests

Add speculative_writes=False baseline to TestAWSCascadeSpeculativeComparison
for side-by-side comparison with speculative path. Parametrize
test_cascade_concurrent_throughput_aws by parallel_mode (serial/gevent/threadpool)
to measure how internal cascade parallelism interacts with external thread
contention. Enable gevent in CI benchmark-aws job (--extra stress + GEVENT=1).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Rename `load benchmark` → `load run` and add `load calibrate` for
binary-search calibration of optimal per-worker user count using
Little's Law efficiency metric (baseline_p50 / observed_p50).

- Extract `_invoke_lambda_headless()` and `_get_lambda_client_and_config()` helpers
- Add `_calibrate_lambda()` with binary search and `_display_calibration_results()`
- Auto-extend baseline duration when requests < 100, extrapolating from
  observed rate (e.g. 28 reqs in 15s → retry at 60s) instead of doubling
- Cap baseline retry at Lambda timeout - 30s headroom
- Report request count per step in calibration output
- Update locustfile docstrings with calibration results:
  MaxRpsUser: 3 users/worker optimal (p50=7ms, 371 RPS, 90% efficiency)
  SimpleUser: 40+ users/worker (no GIL pressure due to wait time)
- Fix OOM in cascade locustfiles: use shared `self.client._limiter`
  instead of creating throwaway SyncRateLimiter per user

Co-Authored-By: Claude Opus 4.6 <[email protected]>
sodre and others added 9 commits February 10, 2026 10:32
Replace plain midpoint bisection with linear interpolation of
efficiency values. The midpoint is weighted toward where the
threshold is likely to fall, converging faster when efficiency
drops steeply (e.g. MaxRpsCascadeUser: 100% at 1 user, 8% at 40).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…rallel_mode

Add logger.debug() message when gevent ImportError is caught during
auto mode resolution, replacing the bare pass that was flagged in
PR review. The debug log survives AST round-tripping (comments don't).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Use baseline duration (possibly auto-extended) for all search steps,
  ensuring fair p50 comparison across rows
- Replace Reqs column with Reqs/s in calibration table (RPS already
  shown, raw request counts were misleading across different durations)
- Add --extra stress to CI unit test job so questionary is available

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Now that all steps use the same duration, raw request counts are
comparable across rows. Show both columns for completeness.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Use RPS column header instead of Reqs/s in calibration table
- Change --baseline-duration default from 15s to 60s for stable
  percentiles without needing auto-retry

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ean up deps

- Add p95 and p99 latency columns to calibration output (step logs, table, summary)
- Fix weighted bisection searching wrong direction (toward high instead of low)
- Move questionary from [stress] extra to core dependencies
- Remove unused asyncio-gevent from [stress] extra
- Add docker to [stress] extra
- Add types-gevent to dev dependencies and pre-commit mypy env
- Add questionary to pre-commit mypy env
- Remove gevent/questionary from mypy ignore_missing_imports overrides
- Revert --extra stress workaround in CI (questionary now core dep)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The gevent tests need gevent which moved from implicit availability
to the [bench] extra (renamed from [stress]).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add questionary to required deps in CLAUDE.md
- Add [local] and [bench] extras to CLAUDE.md
- Add types-gevent to [dev] description
- Fix --baseline-duration default from 15 to 60 in load-testing.md

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sodre sodre marked this pull request as ready for review February 10, 2026 17:19
@claude
Copy link

claude bot commented Feb 10, 2026

Code Review

I found 4 issues during review:

1. 🐛 CloudFormation deployment will fail with empty RoleNameFormat

File: src/zae_limiter/load/cli.py line 391

The load CLI uses:

role_name_format = outputs.get("RoleNameFormat", "{}")

When the main limiter stack is deployed without --role-name-format, the RoleNameFormat output is present with value '' (empty string). The fallback "{}" only triggers when the key is absent, not when it's an empty string.

This empty string is passed to the load stack's CloudFormation template, which uses:

Prefix: \!Select [0, \!Split ['{}', \!Ref RoleNameFormat]]
Suffix: \!Select [1, \!Split ['{}', \!Ref RoleNameFormat]]

\!Split ['{}', ''] produces [''] (single element), and \!Select [1, ['']] fails with index out of bounds.

Fix: Change line 391 to:

role_name_format = outputs.get("RoleNameFormat") or "{}"

This treats both absent keys and empty strings as falsy values, using "{}" as the fallback.

See:

permission_boundary = outputs.get("PermissionBoundaryArn", "")
role_name_format = outputs.get("RoleNameFormat", "{}")
if permission_boundary:


2. 📝 CLAUDE.md Project Structure missing load/ directory

File: CLAUDE.md lines 310-350

The Project Structure section lists other directories like infra/ and visualization/, but the new load/ directory is not documented.

Fix: Add to the Project Structure section:

├── locust.py          # Locust integration (RateLimiterUser, task helpers)
├── load/              # Distributed load testing infrastructure
│   ├── __init__.py
│   ├── cli.py        # Load testing CLI commands (deploy, connect, run, calibrate)
│   ├── builder.py    # Docker image builder for Fargate
│   ├── lambda_builder.py # Lambda worker package builder
│   ├── orchestrator.py   # Worker orchestration and auto-scaling
│   ├── cfn_template.yaml # CloudFormation template for load stack
│   └── lambda/
│       ├── __init__.py
│       └── worker.py # Lambda worker entry point

See:

zae-limiter/CLAUDE.md

Lines 308 to 350 in 42438bd

## Project Structure
```
src/zae_limiter/
├── __init__.py # Public API exports
├── models.py # Limit, Entity, LimitStatus, BucketState, StackOptions, AuditEvent, AuditAction, UsageSnapshot, UsageSummary, LimiterInfo, BackendCapabilities, Status, LimitName, ResourceCapacity, EntityCapacity
├── exceptions.py # RateLimitExceeded, RateLimiterUnavailable, StackCreationError, VersionError, ValidationError, EntityNotFoundError, InfrastructureNotFoundError
├── naming.py # Resource name validation (ZAEL- prefix retained for legacy discovery)
├── bucket.py # Token bucket math (integer arithmetic)
├── schema.py # DynamoDB key builders
├── repository_protocol.py # RepositoryProtocol for backend abstraction
├── repository.py # DynamoDB operations
├── lease.py # Lease context manager
├── limiter.py # RateLimiter (async)
├── config_cache.py # Client-side config caching with TTL (CacheStats)
├── sync_repository_protocol.py # Generated: SyncRepositoryProtocol
├── sync_repository.py # Generated: SyncRepository
├── sync_limiter.py # Generated: SyncRateLimiter
├── sync_lease.py # Generated: SyncLease
├── sync_config_cache.py # Generated: SyncConfigCache
├── cli.py # CLI commands (deploy, delete, status, list, cfn-template, lambda-export, version, upgrade, check, audit, usage, entity, resource, system, local)
├── version.py # Version tracking and compatibility
├── migrations/ # Schema migration framework
│ └── __init__.py # Migration registry and runner
├── visualization/ # Usage snapshot formatting and display
│ ├── __init__.py # UsageFormatter enum, format_usage_snapshots()
│ ├── factory.py # Formatter factory
│ ├── formatters.py # PlotFormatter (ASCII charts)
│ └── table.py # TableFormatter for tabular output
└── infra/
├── stack_manager.py # CloudFormation stack operations
├── sync_stack_manager.py # Generated: SyncStackManager
├── discovery.py # Multi-stack discovery and listing
├── sync_discovery.py # Generated: SyncInfrastructureDiscovery
├── lambda_builder.py # Lambda deployment package builder
└── cfn_template.yaml # CloudFormation template
src/zae_limiter_aggregator/ # Lambda aggregator (top-level package)
├── __init__.py # Re-exports handler, processor types (ProcessResult, ConsumptionDelta, BucketRefillState, LimitRefillInfo, ParsedBucketRecord, ParsedBucketLimit)
├── handler.py # Lambda entry point (returns refills_written count)
├── processor.py # Stream processing: usage snapshots + bucket refill (Issue #317)
└── archiver.py # S3 audit archival (gzip JSONL)
```


3. 📝 CLAUDE.md CLI commands list incomplete

File: CLAUDE.md line 328

The CLI commands list doesn't include the new load command group.

Fix: Update line 328 to include load:

├── cli.py             # CLI commands (deploy, delete, status, list, cfn-template, lambda-export, version, upgrade, check, audit, usage, entity, resource, system, local, load)

See:

├── cli.py # CLI commands (deploy, delete, status, list, cfn-template, lambda-export, version, upgrade, check, audit, usage, entity, resource, system, local)


4. 📝 CLAUDE.md Project scopes list missing "load"

File: CLAUDE.md line 13

The project scopes list doesn't include the new load scope, which is comparable to infra, aggregator, and local.

Fix: Update line 13 to include load:

**Project scopes:** `limiter`, `bucket`, `cli`, `infra`, `ci`, `aggregator`, `models`, `schema`, `repository`, `lease`, `exceptions`, `cache`, `test`, `benchmark`, `local`, `load`.

Also update .claude/rules/release-planning.md to add:

| `load` | `area/load` | Load testing infrastructure |

See:

**Project scopes:** `limiter`, `bucket`, `cli`, `infra`, `ci`, `aggregator`, `models`, `schema`, `repository`, `lease`, `exceptions`, `cache`, `test`, `benchmark`, `local`. See `release-planning.md` for area labels.

@sodre
Copy link
Member Author

sodre commented Feb 11, 2026

Superseded by the split PRs: #350 (343c), #355 (343a), #356 (343b), #359 (#346), #361 (#347). All content landed on main.

@sodre sodre closed this Feb 11, 2026
@sodre sodre modified the milestones: v1.0.0, v0.9.0 Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant