Skip to content

feat: OpenTelemetry integration for distributed tracing and metrics #217

@pigri

Description

@pigri

Summary

Add native OpenTelemetry (OTel) support to Synapse for distributed tracing, metrics, and log correlation. While Prometheus metrics are already available, OTel would provide standardized distributed tracing across the proxy layer, giving operators full request lifecycle visibility.

Motivation

  • Operators using modern observability stacks (Jaeger, Tempo, Datadog, etc.) expect OTel-native instrumentation
  • Distributed tracing across Synapse → upstream backends enables end-to-end latency debugging
  • OTel is becoming the industry standard, replacing vendor-specific integrations
  • Correlating WAF decisions, fingerprint lookups, and upstream routing with trace spans would significantly improve incident investigation

Proposed Features

Tracing

  • Generate trace spans for key operations:
    • Incoming request handling (root span)
    • TLS handshake + JA4+ fingerprint computation
    • WAF rule evaluation (wirefilter expression matching)
    • GeoIP / threat intelligence lookups
    • Rate limiter checks
    • Upstream selection and load balancing
    • Upstream request + response
    • CAPTCHA verification (when triggered)
    • Content scanning / ClamAV (when triggered)
  • Propagate traceparent / tracestate headers (W3C Trace Context) to upstream backends
  • Support configurable sampling rates

Metrics (OTel native)

  • Export existing Prometheus metrics via OTel metrics SDK as an alternative
  • Add histogram metrics for per-stage latencies (fingerprinting, WAF eval, upstream RTT)

Configuration

telemetry:
  opentelemetry:
    enabled: true
    exporter: "otlp"  # otlp, jaeger, zipkin
    endpoint: "http://otel-collector:4317"
    protocol: "grpc"   # grpc, http
    sampling_rate: 0.1  # 10% sampling
    propagation: "w3c"  # w3c, b3, jaeger
    resource_attributes:
      service.name: "synapse"
      deployment.environment: "production"

Environment Variables

  • AX_OTEL_ENABLED — Enable/disable OTel
  • AX_OTEL_ENDPOINT — Collector endpoint
  • AX_OTEL_SAMPLING_RATE — Sampling rate (0.0–1.0)

Implementation Notes

  • Use opentelemetry + opentelemetry-otlp + tracing-opentelemetry crates to bridge existing tracing instrumentation
  • Since Synapse already uses tracing/tracing-subscriber, the integration should layer on top of existing spans
  • Consider making this feature-gated (--features otel) to avoid adding dependencies for users who don't need it
  • OTLP exporter should support both gRPC and HTTP protocols
  • Graceful shutdown must flush pending spans before exit

Labels

enhancement, observability

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestrustPull requests that update rust code

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions