Skip to content

Add webhook action example#1

Draft
memgrafter wants to merge 11 commits into
mainfrom
claude/agent-webhook-example-rCv4Q
Draft

Add webhook action example#1
memgrafter wants to merge 11 commits into
mainfrom
claude/agent-webhook-example-rCv4Q

Conversation

@memgrafter

Copy link
Copy Markdown
Owner

Adds a comprehensive example demonstrating webhook integration via hook actions in flatmachines.

Features:

  • Custom webhook hook action for sentiment analysis
  • HTTP request handling with httpx
  • Retry logic for failed webhook calls
  • Mock mode for testing without external dependencies
  • AI agent integration to process webhook results
  • Graceful error handling

The example shows how to:

  1. Trigger webhook calls from specific states via hook actions
  2. Update context with webhook responses
  3. Handle failures and implement retry patterns
  4. Integrate webhook results with AI agents

Files added:

  • config/machine.yml: State machine with webhook action states
  • config/analyzer.yml: AI agent for interpreting sentiment results
  • src/webhook_action/hooks.py: WebhookActionHooks implementation
  • src/webhook_action/main.py: Demo entry point with CLI
  • README.md: Comprehensive documentation with examples
  • run.sh: UV venv-based runner script
  • pyproject.toml: Project dependencies

memgrafter and others added 11 commits January 6, 2026 09:13
Adds a comprehensive example demonstrating webhook integration via hook actions in flatmachines.

Features:
- Custom webhook hook action for sentiment analysis
- HTTP request handling with httpx
- Retry logic for failed webhook calls
- Mock mode for testing without external dependencies
- AI agent integration to process webhook results
- Graceful error handling

The example shows how to:
1. Trigger webhook calls from specific states via hook actions
2. Update context with webhook responses
3. Handle failures and implement retry patterns
4. Integrate webhook results with AI agents

Files added:
- config/machine.yml: State machine with webhook action states
- config/analyzer.yml: AI agent for interpreting sentiment results
- src/webhook_action/hooks.py: WebhookActionHooks implementation
- src/webhook_action/main.py: Demo entry point with CLI
- README.md: Comprehensive documentation with examples
- run.sh: UV venv-based runner script
- pyproject.toml: Project dependencies
Adds a comprehensive example demonstrating long-running job handling
with checkpoint-safe polling loops and pluggable backends.

Key architectural pattern:
- Hook actions are FAST (< 30s): submit_job, poll_once
- Polling loop is in STATE MACHINE with checkpoints between iterations
- Survives restarts by checkpointing after each poll
- Avoids duplicate job submissions on restart

Features:
- Pluggable backend interface (CallbackBackend)
- Three backend implementations:
  * MockBackend: Simulates long jobs in-memory
  * PollingBackend: HTTP polling for real APIs
  * WebhookServerBackend: Embedded aiohttp server for callbacks
- State-based polling loop (not blocking hook action)
- Checkpoint after each poll iteration
- Clean separation: quick actions vs long loops

Files added:
- backends/base.py: CallbackBackend interface
- backends/mock.py: Mock backend for testing
- backends/polling.py: HTTP polling backend
- backends/webhook.py: Embedded webhook server backend
- config/machine.yml: State machine with polling loop
- config/processor.yml: Result analysis agent
- hooks.py: LongRunningJobHooks with async backend calls
- main.py: CLI with backend selection
- README.md: Comprehensive documentation with patterns
- run.sh: UV venv-based runner

Demonstrates:
1. State loops vs hook actions (when to use each)
2. Checkpoint-safe long-running operations
3. Pluggable backend architecture
4. Async hook actions (on_action can be async)
5. Restart resilience via checkpointing
Implements sophisticated error handling with classification and retry logic
to properly handle transient vs permanent failures in long-running polling.

Key improvements:

1. Error Counter System (all checkpointed):
   - poll_count: Total attempts (success + failures)
   - consecutive_error_count: Resets on success
   - total_error_count: Cumulative failures

2. Error Classification:
   - 4xx Client Errors: Likely permanent, fail fast after 2 attempts
   - 5xx Server Errors: Likely transient, retry with timeout counter
   - Network/Timeout: Transient, retry with timeout counter

3. State Machine Enhancements:
   - poll_success state: Resets error counters on success
   - handle_poll_error state: Routes based on error type
   - polling_error_limit: Exit after max consecutive errors
   - polling_client_error: Exit on permanent errors
   - poll_count incremented in all cases

4. Hook Error Classification:
   - _classify_error() method analyzes httpx exceptions
   - Sets poll_error_type in context (client_error/server_error/transient)
   - Sets last_poll_error with details

Benefits:
- 5xx errors don't kill long-running jobs (transient handling)
- 4xx errors fail fast (don't waste retries on config errors)
- Poll timeout still enforced via poll_count
- All counters checkpoint for restart resilience

Documentation:
- Comprehensive error handling section in README
- Error classification strategies
- Counter scenarios with examples
- Terminal state descriptions
- Configuration options
Splits 4xx errors into three distinct categories with different handling:

1. **Rate Limit (429, 408)** - Automatic retry with exponential backoff
   - Backoff: 5s → 10s → 20s → ... → 300s (capped)
   - Resets on success
   - Handles API rate limits gracefully

2. **Auth Required (401, 402, 403)** - Checkpoint and pause for human intervention
   - 401 Unauthorized: Invalid credentials
   - 402 Payment Required: Need to pay/upgrade
   - 403 Forbidden: Insufficient permissions
   - **Key pattern**: Exit to terminal state, checkpoint saved
   - User fixes issue, resumes from checkpoint → continues same job_id
   - No duplicate job submission!

3. **Permanent (400, 404, 410)** - Fail immediately
   - 400 Bad Request: Malformed request
   - 404 Not Found: Wrong URL/job_id
   - 410 Gone: Resource deleted
   - No retry, fast failure

Changes:

State Machine (machine.yml):
- Add backoff_seconds and max_backoff_seconds to context
- Add backoff_delay state with exponential backoff
- Add auth_intervention_required terminal state (checkpoint resume)
- Add polling_permanent_error terminal state (fast failure)
- Replace generic client_error handling with specific routing

Hooks (hooks.py):
- Refine _classify_error() to return 5 types:
  * rate_limit, auth_required, permanent, server_error, transient
- Each 4xx code classified individually

Main (main.py):
- Add auth_intervention_required status handler
- Show clear resume instructions with checkpoint dir
- Add polling_permanent_error status handler

README:
- Document 5-way error classification
- Add checkpoint resume pattern examples
- Show 402 Payment Required workflow
- Document backoff behavior
- Add new terminal states

Key Innovation:
Checkpoint resume enables human-in-loop for fixable errors (auth, payment)
without losing job state. User can fix issue offline and resume same job.
@memgrafter memgrafter force-pushed the main branch 2 times, most recently from f5983f3 to d4468db Compare January 22, 2026 12:18
@memgrafter memgrafter marked this pull request as draft March 4, 2026 20:19
@memgrafter memgrafter added the invalid This doesn't seem right label Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invalid This doesn't seem right

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants