Pipeline resilience by howethomas · Pull Request #128 · vcon-dev/vcon-server

howethomas · 2026-03-02T20:56:46Z

Note

Medium Risk
Medium risk because it changes the vCon ingest indexing path and adds a new unauthenticated /stats/queue endpoint that exposes Redis list names/depths, which can affect search/monitoring behavior and surface operational data.

Overview
Adds a new transcription pipeline path centered on vfun integration: a wtf_transcribe link that transcribes recording dialogs, supports multiple vfun URLs with health-aware failover, and caches transcription results in Redis; plus a keyword_tagger link that tags vCons based on keyword matches in transcriptions.

Improves operational resilience/interop by forcing outbound webhook payloads (links.webhook and new storage.webhook) to emit vcon: "0.3.0", adding a public /stats/queue endpoint for Redis queue depth monitoring, and optimizing ingest-time indexing by indexing parties directly (index_vcon_parties) to avoid redundant Redis reads.

Adds optional SigNoz/OpenTelemetry docker-compose stack and configs, updates the Dockerfile to use HTTPS apt sources, and introduces NAS/vfun performance testing docs and helper scripts for stress testing and auto-restart runs.

^{Written by Cursor Bugbot for commit 631ca58. Configure here.}

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Updates webhook link to set vcon version to 0.3.0 for compatibility with vcon-mcp REST API. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Configure apt to use HTTPS sources for environments where HTTP port 80 is blocked. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Includes docker-compose and config files for SigNoz observability stack with OpenTelemetry collector. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Public endpoint (no auth) that returns the depth of any Redis list, used by the audio adapter for backpressure control. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The post_vcon and external_ingress_vcon paths called index_vcon() which re-read the vCon from Redis (JSON.GET) and duplicated the sorted set add (ZADD) that was already done by the caller. This added 2 unnecessary Redis round-trips per ingest. Extract index_vcon_parties() that takes the vCon dict directly, and use it in both POST paths. The original index_vcon() is preserved for the bulk re-indexing endpoint. Reduces ingest from 11 to 9 Redis ops per vCon, measured 4.9x improvement in adapter posting throughput. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The supabase_webhook was running as a sequential chain link, blocking each worker for ~560ms per vCon. By moving it to a storage slot, the webhook now executes post-chain in parallel via ThreadPoolExecutor, reducing per-vCon P50 latency from 617ms to 123ms (5x improvement). New module server/storage/webhook/ wraps the existing HTTP POST logic with the storage save() interface. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The wtf_transcribe link had no retry logic — a single vfun failure silently dropped the transcription. This adds: - _VfunHealthTracker: thread-safe singleton tracking instance health across all workers, with 30-second self-healing recovery window - get_vfun_urls(): returns URLs in priority order (healthy shuffled, then recovering oldest-first, then down instances) - Fallback loop: tries all configured vfun instances before giving up - Redis transcription cache: skips vfun calls for previously transcribed audio files (7-day TTL) On failure, instances are marked DOWN and bypassed until the recovery window expires, then automatically retried and restored on success. Co-Authored-By: Claude Opus 4.6 <[email protected]>

- SignOz OTEL collector config and docker-compose integration - Performance testing and vfun crash/stress test reports - Utility scripts for NAS pipeline operations and debugging Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Free Tier Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-03-02T21:09:15Z

docker-compose.signoz.yml

+      - DASHBOARDS_PATH=/root/config/dashboards
+      - STORAGE=clickhouse
+      - GODEBUG=netdns=go
+      - TELEMETRY_ENABLED=true


TELEMETRY_ENABLED=true contradicts README fix for SigNoz crash

Medium Severity

TELEMETRY_ENABLED is set to true in the compose file, but the README explicitly documents that this causes the SigNoz query service to panic (nil pointer crash) because the telemetry cron checks TTL for signoz_logs.logs, a table that doesn't exist after schema migration. The README even states the fix (TELEMETRY_ENABLED=false) is "already set in this repo," but it isn't.

Additional Locations (1)

signoz/README.md#L155-L158

cursor · 2026-03-02T21:09:15Z

server/api.py

+        return JSONResponse(content={"list_name": list_name, "depth": depth})
+    except Exception as e:
+        logger.error(f"Error getting queue depth for '{list_name}': {str(e)}")
+        raise HTTPException(status_code=500, detail="Failed to get queue depth")


Unauthenticated endpoint allows arbitrary Redis key probing

Medium Severity

The /stats/queue endpoint is public (no auth) and accepts any user-supplied list_name directly passed to redis_async.llen(). While the intention is monitoring known queue names, nothing restricts the parameter to valid queue names. An unauthenticated caller can probe any Redis key to learn whether it exists and its list length, leaking internal infrastructure details.

cursor · 2026-03-02T21:09:15Z

scripts/run_pipeline_with_restart.sh

+        --server 1 \
+        --workers "$WORKERS" \
+        --limit "$BATCH_SIZE" \
+        --store-vcons 2>&1)


Restart script reprocesses same files without offset tracking

Medium Severity

Each batch iteration calls the pipeline with the same --date, --server, and --limit but no offset or checkpoint. Since find_audio_files always returns the first N files, every batch reprocesses the same files. With --store-vcons, this creates duplicate vCons (each gets a new uuid.uuid4()). The loop runs until the safety limit of 100 batches instead of progressing through all files.

cursor · 2026-03-02T21:09:15Z

server/storage/webhook/__init__.py

+        resp = requests.post(url, json=json_dict, headers=headers)
+        logger.info(
+            f"webhook storage response for {vcon_uuid}: {resp.status_code} {resp.text}"
+        )


Storage webhook duplicates links webhook implementation

Low Severity

The save function in server/storage/webhook/__init__.py is nearly identical to the run function in server/links/webhook/__init__.py. Both fetch the vCon from Redis, convert to dict, patch the version from 0.0.1 to 0.3.0, and POST to each configured webhook URL. This duplicated logic means a bug fix in one location could easily be missed in the other.

Additional Locations (1)

server/links/webhook/__init__.py#L22-L63

pavanputhra · 2026-03-04T13:41:27Z

docs/PERFORMANCE_TESTING.md

Details are very specific to one env. We should remove it from open source repo.

pavanputhra · 2026-03-04T13:41:51Z

docs/VFUN_CRASH_REPORT.md

Details are very specific to one env. We should remove it from open source repo.

pavanputhra · 2026-03-04T13:45:55Z

We don't need this PR. Its client specific implementation

howethomas and others added 12 commits January 27, 2026 22:24

Add keyword_tagger link for automatic tagging

59bff4c

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add wtf_transcribe link for WTF transcription

950bc3c

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Ensure vcon version 0.3.0 for webhook compatibility

7f74f86

Updates webhook link to set vcon version to 0.3.0 for compatibility with vcon-mcp REST API. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Use HTTPS for apt sources in Dockerfile

161b623

Configure apt to use HTTPS sources for environments where HTTP port 80 is blocked. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add SigNoz observability configuration

74e747b

Includes docker-compose and config files for SigNoz observability stack with OpenTelemetry collector. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Merge all feature branches into onsite-dev

c2bcba9

Add /stats/queue endpoint for Redis queue depth monitoring

cefef64

Public endpoint (no auth) that returns the depth of any Redis list, used by the audio adapter for backpressure control. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Update chain diagram for webhook-as-storage architecture

631ca58

Co-Authored-By: Claude Opus 4.6 <[email protected]>

cursor bot reviewed Mar 2, 2026

View reviewed changes

pavanputhra reviewed Mar 4, 2026

View reviewed changes

docs/PERFORMANCE_TESTING.md

Copy link

Contributor

pavanputhra Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details are very specific to one env. We should remove it from open source repo.

pavanputhra reviewed Mar 4, 2026

View reviewed changes

docs/VFUN_CRASH_REPORT.md

Copy link

Contributor

pavanputhra Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details are very specific to one env. We should remove it from open source repo.

pavanputhra closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline resilience#128

Pipeline resilience#128
howethomas wants to merge 12 commits intomainfrom
pipeline-resilience

howethomas commented Mar 2, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

pavanputhra Mar 4, 2026

Uh oh!

pavanputhra Mar 4, 2026

Uh oh!

pavanputhra commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

howethomas commented Mar 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

TELEMETRY_ENABLED=true contradicts README fix for SigNoz crash

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Unauthenticated endpoint allows arbitrary Redis key probing

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Restart script reprocesses same files without offset tracking

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Storage webhook duplicates links webhook implementation

Uh oh!

pavanputhra Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

pavanputhra Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

pavanputhra commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

howethomas commented Mar 2, 2026 •

edited by cursor bot

Loading

pavanputhra commented Mar 4, 2026 •

edited

Loading