Optimize ingest indexing by howethomas · Pull Request #129 · vcon-dev/vcon-server

howethomas · 2026-03-02T20:57:50Z

Note

Medium Risk
Moderate risk: adds a new unauthenticated monitoring endpoint and changes ingest-time indexing behavior; also introduces new optional link modules and a sizeable observability docker-compose stack that could affect local deployments if enabled.

Overview
Adds an optional SigNoz observability stack via docker-compose.signoz.yml plus ClickHouse/collector configs and docs under signoz/.

Updates the server to expose a new public GET /stats/queue endpoint for Redis list depth, and optimizes ingest-time indexing by indexing parties directly (index_vcon_parties) rather than re-reading the vCon from Redis.

Introduces two new link modules: links/wtf_transcribe (sends audio dialogs to an external vfun service and stores results as wtf_transcription analysis) and links/keyword_tagger (tags vCons based on keyword matches in transcription/WTF analysis). The webhook link now normalizes the vcon version to 0.3.0 for downstream compatibility.

Also tweaks the Docker image build to force Debian APT sources to HTTPS.

^{Written by Cursor Bugbot for commit bd0715f. Configure here.}

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Updates webhook link to set vcon version to 0.3.0 for compatibility with vcon-mcp REST API. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Configure apt to use HTTPS sources for environments where HTTP port 80 is blocked. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Includes docker-compose and config files for SigNoz observability stack with OpenTelemetry collector. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Public endpoint (no auth) that returns the depth of any Redis list, used by the audio adapter for backpressure control. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The post_vcon and external_ingress_vcon paths called index_vcon() which re-read the vCon from Redis (JSON.GET) and duplicated the sorted set add (ZADD) that was already done by the caller. This added 2 unnecessary Redis round-trips per ingest. Extract index_vcon_parties() that takes the vCon dict directly, and use it in both POST paths. The original index_vcon() is preserved for the bulk re-indexing endpoint. Reduces ingest from 11 to 9 Redis ops per vCon, measured 4.9x improvement in adapter posting throughput. Co-Authored-By: Claude Opus 4.6 <[email protected]>

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Free Tier Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-03-02T21:07:24Z

server/api.py

+        return JSONResponse(content={"list_name": list_name, "depth": depth})
+    except Exception as e:
+        logger.error(f"Error getting queue depth for '{list_name}': {str(e)}")
+        raise HTTPException(status_code=500, detail="Failed to get queue depth")


Unauthenticated endpoint allows arbitrary Redis key querying

Medium Severity

The /stats/queue endpoint is mounted directly on app (no auth, like /health and /version) but accepts an arbitrary list_name parameter that's passed directly to redis_async.llen(). Unlike the health endpoint, this allows unauthenticated callers to probe any Redis key — discovering queue names, measuring queue depths, and distinguishing key types (list keys return a count while non-list keys trigger a WRONGTYPE error, returned as a 500). Consider restricting the list_name to a known allowlist of queue names or placing this endpoint behind api_router authentication.

cursor · 2026-03-02T21:07:24Z

server/links/keyword_tagger/__init__.py

+
+    # Other Content
+    "other": {
+        "profanity": ["fuck", "shit", "damn", "ass"],


Substring matching causes false positives for short keywords

Medium Severity

The find_keywords function uses Python's in operator for substring matching. The profanity keyword "ass" will match extremely common words like "class", "assist", "pass", "mass", "assessment", "classic", etc. This will produce a high volume of false positive profanity tags on virtually any transcription of meaningful length. While some partial keywords (like "enforc") are intentionally designed for substring matching, "ass" is too short and appears in too many benign words.

Additional Locations (1)

server/links/keyword_tagger/__init__.py#L132-L144

cursor · 2026-03-02T21:07:24Z

signoz/README.md

+### Start with SigNoz
+
+```bash
+cd /home/thomas/bds/vcon-dev/vcon-server


Hardcoded personal filesystem path in documentation

Low Severity

The README contains a hardcoded personal development path /home/thomas/bds/vcon-dev/vcon-server in its usage examples. This is a developer-specific local filesystem path that won't work for other contributors and leaks information about a developer's local environment.

Additional Locations (1)

signoz/README.md#L66-L67

cursor · 2026-03-02T21:07:24Z

server/links/keyword_tagger/__init__.py

+    "categories": None,  # None means all categories
+    "custom_keywords": {},  # Additional tag -> keywords mappings
+    "case_sensitive": False,
+    "min_confidence": 0.0,  # Minimum transcription confidence to process


Declared min_confidence option is never used

Low Severity

The min_confidence option is declared in default_options with a comment suggesting it controls minimum transcription confidence to process, but it's never read or checked anywhere in the run function. Users who configure this value will see no effect—low-confidence transcriptions will still be processed and tagged.

pavanputhra · 2026-03-04T13:40:00Z

We are going to remove index party feature in future. Not needed.

howethomas and others added 8 commits January 27, 2026 22:24

Add keyword_tagger link for automatic tagging

59bff4c

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add wtf_transcribe link for WTF transcription

950bc3c

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Ensure vcon version 0.3.0 for webhook compatibility

7f74f86

Updates webhook link to set vcon version to 0.3.0 for compatibility with vcon-mcp REST API. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Use HTTPS for apt sources in Dockerfile

161b623

Configure apt to use HTTPS sources for environments where HTTP port 80 is blocked. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add SigNoz observability configuration

74e747b

Includes docker-compose and config files for SigNoz observability stack with OpenTelemetry collector. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Merge all feature branches into onsite-dev

c2bcba9

Add /stats/queue endpoint for Redis queue depth monitoring

cefef64

Public endpoint (no auth) that returns the depth of any Redis list, used by the audio adapter for backpressure control. Co-Authored-By: Claude Opus 4.6 <[email protected]>

cursor bot reviewed Mar 2, 2026

View reviewed changes

pavanputhra closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ingest indexing#129

Optimize ingest indexing#129
howethomas wants to merge 8 commits intomainfrom
optimize-ingest-indexing

howethomas commented Mar 2, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

pavanputhra commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

howethomas commented Mar 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Unauthenticated endpoint allows arbitrary Redis key querying

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Substring matching causes false positives for short keywords

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Hardcoded personal filesystem path in documentation

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Declared min_confidence option is never used

Uh oh!

pavanputhra commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

howethomas commented Mar 2, 2026 •

edited by cursor bot

Loading

Declared `min_confidence` option is never used