feat(telemetry): enrich self-hosted instance stats with enterprise signals and instanceId linkage#6896
Conversation
…gnals and instanceId linkage - Add 17 new feature-adoption counts to daily stats: SSO configs (SAML, OIDC, LDAP), SCIM tokens, audit log streams, secret rotations, webhooks, custom roles (project + org), KMIP clients, SSH infrastructure (hosts, CAs, certs), PAM resources/accounts, access approval policies, honey tokens - Add per-type identity auth method breakdown (e.g. kubernetes-auth: 5, aws-auth: 3) alongside the existing total count - Add integration type breakdown, project type breakdown, and secret sync destination breakdown as Record<string, number> histograms - Add per-org breakdown with orgId, name, user count, and project count - Add infisicalVersion from INFISICAL_PLATFORM_VERSION env var - Attach instanceId to all real-time and aggregated PostHog events on non-cloud instances, enabling joins between per-user events and daily instance stats in PostHog Co-Authored-By: arsh <arshsb1998@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3070d15e68
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| countTable(db, TableName.ProjectRoles), | ||
| countTable(db, TableName.OrgRoles), |
There was a problem hiding this comment.
Use unified role tables for telemetry counts
On instances migrated past backend/src/db/migrations/20260107083948_remove-old-memberships.ts, project_roles and org_roles are dropped (lines 52-53 and 70-71), so these new countTable calls throw relation does not exist inside getTelemetryInstanceStats and the daily self-hosted telemetry stats job fails before emitting anything. The same function also queries the dropped org_memberships table later; these counts need to come from the current roles/memberships tables instead.
Useful? React with 👍 / 👎.
|
| Filename | Overview |
|---|---|
| backend/src/services/telemetry/telemetry-dal.ts | Adds 17 new countTable() calls and 5 GROUP BY breakdown queries; org membership count includes pending invites, inconsistent with global users metric |
| backend/src/services/telemetry/telemetry-service.ts | Adds instanceId enrichment to real-time and aggregated events via a closure-cached getServerCfg() call; cache has a benign race condition on concurrent first calls |
| backend/src/services/telemetry/telemetry-queue.ts | Minor addition of optional infisicalVersion from INFISICAL_PLATFORM_VERSION env var to the daily stats payload; straightforward and correct |
| backend/src/services/telemetry/telemetry-types.ts | Type definitions updated to match all new DAL return fields; consistent with the DAL and queue changes |
Reviews (1): Last reviewed commit: "feat(telemetry): enrich self-hosted inst..." | Re-trigger Greptile
| let cachedInstanceId: string | undefined; | ||
| const getInstanceId = async (): Promise<string | undefined> => { | ||
| if (appCfg.INFISICAL_CLOUD) return undefined; | ||
| if (cachedInstanceId) return cachedInstanceId; | ||
| try { | ||
| const { instanceId } = await getServerCfg(); | ||
| cachedInstanceId = instanceId; | ||
| return instanceId; | ||
| } catch { | ||
| return undefined; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Unguarded concurrent cache miss calls
getServerCfg() repeatedly
cachedInstanceId is a plain closure variable with no mutex. If multiple requests call getInstanceId() concurrently before the first one resolves (e.g., a burst of events at startup), every concurrent caller finds cachedInstanceId falsy and fires its own getServerCfg() fetch. The calls are idempotent so there's no data corruption, but it defeats the caching intent. A common fix is to store the in-flight Promise itself so all concurrent callers share a single resolution.
| const orgUserResult = await db.raw<{ rows: { orgId: string; count: string }[] }>( | ||
| `SELECT "orgId", COUNT(*)::text AS count FROM ${TableName.OrgMembership} GROUP BY "orgId"` | ||
| ); |
There was a problem hiding this comment.
organizationBreakdown.users counts all memberships, including pending invites
The raw COUNT(*) FROM org_memberships GROUP BY "orgId" includes rows with status = 'invited' (users who have not yet accepted). The global users metric counts Users WHERE isGhost = false, so a per-org user breakdown that includes unaccepted invites will be inconsistent with it. Adding a WHERE status = 'accepted' filter here would align the semantics.
| const orgUserResult = await db.raw<{ rows: { orgId: string; count: string }[] }>( | |
| `SELECT "orgId", COUNT(*)::text AS count FROM ${TableName.OrgMembership} GROUP BY "orgId"` | |
| ); | |
| const orgUserResult = await db.raw<{ rows: { orgId: string; count: string }[] }>( | |
| `SELECT "orgId", COUNT(*)::text AS count FROM ${TableName.OrgMembership} WHERE status = 'accepted' GROUP BY "orgId"` | |
| ); |
…he race, filter invited members - Switch customProjectRoles/customOrgRoles from dropped ProjectRoles/OrgRoles tables to the unified Role table (Codex review) - Switch per-org user count from dropped OrgMembership to unified Membership table with WHERE status = 'accepted' filter (Greptile review) - Replace plain closure cache with in-flight promise pattern for getInstanceId to prevent redundant getServerCfg() calls on concurrent cache miss (Greptile review) Co-Authored-By: arsh <arshsb1998@gmail.com>
Context
The daily "Self Hosted Instance Stats" PostHog event previously only captured volume metrics (user count, secret count, project count, etc.) — enough to gauge scale but not what features a self-hosted instance is actually using. This made lead qualification from telemetry data difficult.
Additionally, there was no way to join per-user real-time events (signup, login, secret CRUD) with the daily instance-level stats in PostHog because they used different
distinctIdvalues with no shared key.What changed
1. 17 new feature-adoption counts in the daily stats event:
samlConfigs,oidcConfigs,ldapConfigsscimTokensauditLogStreamssecretRotationswebhookscustomProjectRoles,customOrgRoleskmipClientssshHosts,sshCertificateAuthorities,sshCertificatespamResources,pamAccountsaccessApprovalPolicieshoneyTokens2. Type breakdowns (as
Record<string, number>histograms):identityAuthMethodBreakdown— e.g.{"kubernetes-auth": 5, "aws-auth": 3}instead of just total=8integrationBreakdown— which integration platforms are in useprojectTypeBreakdown— SecretManager vs CertManager vs SSH vs PAM etc.secretSyncBreakdown— which sync destinations are active3. Per-org breakdown:
Replaces the flat
organizationNamesarray (which is still sent for backward compat) with actionable per-org data. User counts use the unifiedmembershipstable filtered tostatus = 'accepted'(excludes pending invites).4.
instanceIdon all real-time events (non-cloud only):Every
sendPostHogEventscall and every aggregated event now includesinstanceIdas a property on non-cloud instances. Uses an in-flight promise pattern to avoid redundantgetServerCfg()calls on concurrent cache miss. This enables joining:instanceId→ daily instance statsorganizationBreakdown[].orgId→ org group → per-user events5.
infisicalVersion: Populated fromINFISICAL_PLATFORM_VERSIONenv var when set.6. Uses unified tables: Custom role counts query the new
rolestable (filtering out built-in slugs) instead of the droppedproject_roles/org_rolestables. Per-org user counts use the unifiedmembershipstable instead of the droppedorg_memberships.Steps to verify the change
telemetry-dal.ts— all new counts use the existingcountTablehelper, Knex query builder for filtered counts, ordb.rawfor GROUP BY queriesTTelemetryInstanceStatsEventtype intelemetry-types.tsmatches the DAL return shapeinstanceIdis attached in both the non-aggregated path (sendPostHogEvents) and the aggregated event path (processBucketEvents)npm run type:checkandnpm run lint:fixboth pass cleanType
Checklist
type(scope): short descriptionLink to Devin session: https://app.devin.ai/sessions/9d43f79b874d48809465a253ad75fe10
Requested by: @0xArshdeep