Skip to content

feat(telemetry): enrich self-hosted instance stats with enterprise signals and instanceId linkage#6896

Open
devin-ai-integration[bot] wants to merge 2 commits into
mainfrom
devin/1781647190-enrich-self-hosted-telemetry
Open

feat(telemetry): enrich self-hosted instance stats with enterprise signals and instanceId linkage#6896
devin-ai-integration[bot] wants to merge 2 commits into
mainfrom
devin/1781647190-enrich-self-hosted-telemetry

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Context

The daily "Self Hosted Instance Stats" PostHog event previously only captured volume metrics (user count, secret count, project count, etc.) — enough to gauge scale but not what features a self-hosted instance is actually using. This made lead qualification from telemetry data difficult.

Additionally, there was no way to join per-user real-time events (signup, login, secret CRUD) with the daily instance-level stats in PostHog because they used different distinctId values with no shared key.

What changed

1. 17 new feature-adoption counts in the daily stats event:

Property Signal
samlConfigs, oidcConfigs, ldapConfigs SSO maturity
scimTokens Automated provisioning (enterprise)
auditLogStreams Compliance/audit infrastructure
secretRotations Active rotation policies
webhooks Event-driven integrations
customProjectRoles, customOrgRoles RBAC maturity
kmipClients Key management interop
sshHosts, sshCertificateAuthorities, sshCertificates SSH infrastructure
pamResources, pamAccounts Privileged access management
accessApprovalPolicies Access governance
honeyTokens Intrusion detection

2. Type breakdowns (as Record<string, number> histograms):

  • identityAuthMethodBreakdown — e.g. {"kubernetes-auth": 5, "aws-auth": 3} instead of just total=8
  • integrationBreakdown — which integration platforms are in use
  • projectTypeBreakdown — SecretManager vs CertManager vs SSH vs PAM etc.
  • secretSyncBreakdown — which sync destinations are active

3. Per-org breakdown:

"organizationBreakdown": [
  { "orgId": "uuid", "name": "Acme Corp", "users": 40, "projects": 12 },
  { "orgId": "uuid", "name": "Acme Dev", "users": 7, "projects": 3 }
]

Replaces the flat organizationNames array (which is still sent for backward compat) with actionable per-org data. User counts use the unified memberships table filtered to status = 'accepted' (excludes pending invites).

4. instanceId on all real-time events (non-cloud only):

Every sendPostHogEvents call and every aggregated event now includes instanceId as a property on non-cloud instances. Uses an in-flight promise pattern to avoid redundant getServerCfg() calls on concurrent cache miss. This enables joining:

  • Person (email) → real-time events → instanceId → daily instance stats
  • Daily instance stats → organizationBreakdown[].orgId → org group → per-user events

5. infisicalVersion: Populated from INFISICAL_PLATFORM_VERSION env var when set.

6. Uses unified tables: Custom role counts query the new roles table (filtering out built-in slugs) instead of the dropped project_roles/org_roles tables. Per-org user counts use the unified memberships table instead of the dropped org_memberships.

Steps to verify the change

  1. Review the new properties in telemetry-dal.ts — all new counts use the existing countTable helper, Knex query builder for filtered counts, or db.raw for GROUP BY queries
  2. Verify the TTelemetryInstanceStatsEvent type in telemetry-types.ts matches the DAL return shape
  3. Check that instanceId is attached in both the non-aggregated path (sendPostHogEvents) and the aggregated event path (processBucketEvents)
  4. npm run type:check and npm run lint:fix both pass clean

Type

  • Fix
  • Feature
  • Improvement
  • Breaking
  • Docs
  • Chore

Checklist

Link to Devin session: https://app.devin.ai/sessions/9d43f79b874d48809465a253ad75fe10
Requested by: @0xArshdeep

…gnals and instanceId linkage

- Add 17 new feature-adoption counts to daily stats: SSO configs (SAML, OIDC,
  LDAP), SCIM tokens, audit log streams, secret rotations, webhooks, custom
  roles (project + org), KMIP clients, SSH infrastructure (hosts, CAs, certs),
  PAM resources/accounts, access approval policies, honey tokens
- Add per-type identity auth method breakdown (e.g. kubernetes-auth: 5,
  aws-auth: 3) alongside the existing total count
- Add integration type breakdown, project type breakdown, and secret sync
  destination breakdown as Record<string, number> histograms
- Add per-org breakdown with orgId, name, user count, and project count
- Add infisicalVersion from INFISICAL_PLATFORM_VERSION env var
- Attach instanceId to all real-time and aggregated PostHog events on
  non-cloud instances, enabling joins between per-user events and daily
  instance stats in PostHog

Co-Authored-By: arsh <arshsb1998@gmail.com>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3070d15e68

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +94 to +95
countTable(db, TableName.ProjectRoles),
countTable(db, TableName.OrgRoles),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use unified role tables for telemetry counts

On instances migrated past backend/src/db/migrations/20260107083948_remove-old-memberships.ts, project_roles and org_roles are dropped (lines 52-53 and 70-71), so these new countTable calls throw relation does not exist inside getTelemetryInstanceStats and the daily self-hosted telemetry stats job fails before emitting anything. The same function also queries the dropped org_memberships table later; these counts need to come from the current roles/memberships tables instead.

Useful? React with 👍 / 👎.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR enriches the daily self-hosted telemetry event with 17 new feature-adoption counters, four type-breakdown histograms, a per-org breakdown array, and attaches instanceId as a property on every real-time and aggregated PostHog event so per-user events can be joined to the daily instance stats.

  • telemetry-dal.ts: Replaces the flat IDENTITY_AUTH_TABLES array with an ordered map to support per-method breakdown, adds 17 countTable() calls for enterprise features, and introduces five db.raw GROUP BY queries for integration/project/sync breakdowns and per-org user/project counts. All table names come from the TableName enum (no user input).
  • telemetry-service.ts: Adds a closure-level cachedInstanceId populated once from getServerCfg(), then stamps instanceId onto every non-cloud PostHog capture and onto every aggregated event flush.
  • telemetry-queue.ts / telemetry-types.ts: Adds the optional infisicalVersion property from INFISICAL_PLATFORM_VERSION and updates the TypeScript type to match the expanded DAL return shape.

Confidence Score: 4/5

Safe to merge; changes are confined to the daily telemetry job and event enrichment paths with no impact on product functionality.

The DAL changes run once daily in a background job and are wrapped in a try/catch that rethrows as DatabaseError, limiting blast radius. The instanceId caching in the service layer has a benign race on concurrent first calls (all callers get the same value), and the organizationBreakdown.users count includes pending-invite memberships, slightly inflating per-org user figures relative to the global user count. Neither issue affects correctness of the product; they only affect the accuracy of a subset of telemetry numbers.

backend/src/services/telemetry/telemetry-dal.ts — org membership count query and the expanded Promise.all warrant a second look.

Important Files Changed

Filename Overview
backend/src/services/telemetry/telemetry-dal.ts Adds 17 new countTable() calls and 5 GROUP BY breakdown queries; org membership count includes pending invites, inconsistent with global users metric
backend/src/services/telemetry/telemetry-service.ts Adds instanceId enrichment to real-time and aggregated events via a closure-cached getServerCfg() call; cache has a benign race condition on concurrent first calls
backend/src/services/telemetry/telemetry-queue.ts Minor addition of optional infisicalVersion from INFISICAL_PLATFORM_VERSION env var to the daily stats payload; straightforward and correct
backend/src/services/telemetry/telemetry-types.ts Type definitions updated to match all new DAL return fields; consistent with the DAL and queue changes

Reviews (1): Last reviewed commit: "feat(telemetry): enrich self-hosted inst..." | Re-trigger Greptile

Comment on lines +114 to +125
let cachedInstanceId: string | undefined;
const getInstanceId = async (): Promise<string | undefined> => {
if (appCfg.INFISICAL_CLOUD) return undefined;
if (cachedInstanceId) return cachedInstanceId;
try {
const { instanceId } = await getServerCfg();
cachedInstanceId = instanceId;
return instanceId;
} catch {
return undefined;
}
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unguarded concurrent cache miss calls getServerCfg() repeatedly

cachedInstanceId is a plain closure variable with no mutex. If multiple requests call getInstanceId() concurrently before the first one resolves (e.g., a burst of events at startup), every concurrent caller finds cachedInstanceId falsy and fires its own getServerCfg() fetch. The calls are idempotent so there's no data corruption, but it defeats the caching intent. A common fix is to store the in-flight Promise itself so all concurrent callers share a single resolution.

Comment on lines +176 to +178
const orgUserResult = await db.raw<{ rows: { orgId: string; count: string }[] }>(
`SELECT "orgId", COUNT(*)::text AS count FROM ${TableName.OrgMembership} GROUP BY "orgId"`
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 organizationBreakdown.users counts all memberships, including pending invites

The raw COUNT(*) FROM org_memberships GROUP BY "orgId" includes rows with status = 'invited' (users who have not yet accepted). The global users metric counts Users WHERE isGhost = false, so a per-org user breakdown that includes unaccepted invites will be inconsistent with it. Adding a WHERE status = 'accepted' filter here would align the semantics.

Suggested change
const orgUserResult = await db.raw<{ rows: { orgId: string; count: string }[] }>(
`SELECT "orgId", COUNT(*)::text AS count FROM ${TableName.OrgMembership} GROUP BY "orgId"`
);
const orgUserResult = await db.raw<{ rows: { orgId: string; count: string }[] }>(
`SELECT "orgId", COUNT(*)::text AS count FROM ${TableName.OrgMembership} WHERE status = 'accepted' GROUP BY "orgId"`
);

…he race, filter invited members

- Switch customProjectRoles/customOrgRoles from dropped ProjectRoles/OrgRoles
  tables to the unified Role table (Codex review)
- Switch per-org user count from dropped OrgMembership to unified Membership
  table with WHERE status = 'accepted' filter (Greptile review)
- Replace plain closure cache with in-flight promise pattern for getInstanceId
  to prevent redundant getServerCfg() calls on concurrent cache miss (Greptile review)

Co-Authored-By: arsh <arshsb1998@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant