Skip to content

Commit dee8752

Browse files
feat(agentos): discovery endpoint, OTLP gateway, cak_ ingest auth
Three server-side additions that complete the "one key" SDK contract: Discovery (§2.6d) - `routes/discovery.ts` — GET /agentos/api/discovery (public, no auth). Returns { version, endpoints: { ingest, api, otel, messages } } with absolute URLs derived from the request origin (X-Forwarded-* honoured; AGENTOS_PUBLIC_URL pins it behind a proxy). Endpoint paths live here only — the single source of truth — so moving a route never forces an SDK release. - Mounted in app.ts before the auth guards (OIDC .well-known pattern). OTLP trace gateway - `routes/otlp.ts` — POST /agentos/api/otel/v1/{traces,metrics,logs}. Reads the raw request stream (no body parser) and forwards verbatim to OTEL_GATEWAY_UPSTREAM_ENDPOINT with OTEL_GATEWAY_UPSTREAM_HEADERS. Binary protobuf/gzip pass through untouched; mirrors upstream status. 503 when upstream unset; 16 MB cap. - `otel-auth.ts` — fail-closed cak_ guard (invalid/missing → 401, Mongo-down → 503). - Mounted as a SERVICE boundary before the global json parser. cak_ ingest auth refactor - `cak-bearer.ts` — shared `verifyCakBearer()` helper used by both ingest-auth and otel-auth (DRY, single validation path). - `ingest-auth.ts` — refactored to use the shared helper; behaviour unchanged. Tests: 164 pass (21 files), typecheck clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent a4797a4 commit dee8752

10 files changed

Lines changed: 420 additions & 22 deletions

File tree

CLAUDE.md

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,10 @@ The Mongo cluster is the source of truth for AgentOS. **Only the `agentos-server
105105
Resources stamped with `ownerGroup`/`ownerUser` are **hard-isolated**: a non-admin sees only their own or their group's; admins (`*`) see all (§2.6b).
106106

107107
**Credentials required:**
108-
- On the **SDK** side: `AGENTOS_INGEST_URL` (e.g. `https://<host>/agentos/api/ingest/events`) + optional `AGENTOS_INGEST_TOKEN` (sent as `Authorization: Bearer …`). No Mongo creds.
108+
- On the **SDK** side: **`AGENTOS_DISCOVERY_URL`** (e.g. `https://<host>/agentos/api/discovery`) + the `cak_` key (`COMPUTERAGENT_HARNESS_TOKEN`). The SDK reads the ingest URL (and api/otel) from the discovery document — no endpoint path is baked into the SDK (§2.6d). A full-URL `AGENTOS_INGEST_URL` override still wins for the ingest endpoint. No Mongo creds.
109109
- On the **server** side: `MONGO_URL` + `MONGO_DATABASE` (this is the DB the collections above live in).
110110

111-
**Behaviour:** when `AGENTOS_INGEST_URL` is set, the SDK's default telemetry pipeline auto-attaches `AgentOSHttpSink` (gated on the `[agentos]` extra, which is now `httpx`-based). Each event carries a stable `event_id` so the server's writes are idempotent on retry. **Ingest auth** (`ingest-auth.ts`): the SDK presents the **same `cak_` API key it uses everywhere** as the ingest `Bearer`, and the server validates it via `apiKeyStore.verify` (the same path the CAS introspection uses). A legacy static `AGENTOS_INGEST_TOKEN` string is still accepted for back-compat. ⚠️ Only when **no `cak_` is presented AND `AGENTOS_INGEST_TOKEN` is unset** does the route fall **open** (anonymous writes) — present a key (or set the token) on any network-exposed deployment. A presented `cak_` is always validated (invalid → 401, Mongo-down → 503), never waved through.
111+
**Behaviour:** when an ingest URL resolves (from `AGENTOS_DISCOVERY_URL` or an `AGENTOS_INGEST_URL` override), the SDK's default telemetry pipeline auto-attaches `AgentOSHttpSink` (gated on the `[agentos]` extra, which is now `httpx`-based). Each event carries a stable `event_id` so the server's writes are idempotent on retry. **Ingest auth** (`ingest-auth.ts`): the SDK presents the **same `cak_` API key it uses everywhere** as the ingest `Bearer`, and the server validates it via `apiKeyStore.verify` (the same path the CAS introspection uses). A legacy static `AGENTOS_INGEST_TOKEN` string is still accepted for back-compat. ⚠️ Only when **no `cak_` is presented AND `AGENTOS_INGEST_TOKEN` is unset** does the route fall **open** (anonymous writes) — present a key (or set the token) on any network-exposed deployment. A presented `cak_` is always validated (invalid → 401, Mongo-down → 503), never waved through.
112112

113113
---
114114

@@ -147,7 +147,20 @@ Resources stamped with `ownerGroup`/`ownerUser` are **hard-isolated**: a non-adm
147147

148148
**Required env:**
149149
- Server: `AGENTOS_CREDENTIALS_KEY` (base64 of 32 random bytes; **fail-closed** — credentials CRUD/resolve 503 without it). Optional `AGENTOS_CREDENTIALS_KEY_OLD` for rotation.
150-
- SDK: `AGENTOS_API_URL` (e.g. `https://<host>/agentos/api/v1`) + the same `cak_` key it already uses (`COMPUTERAGENT_HARNESS_TOKEN` / `AGENTOS_INGEST_TOKEN`). The key's role must include `git-credentials:read`.
150+
- SDK: `AGENTOS_DISCOVERY_URL` (the SDK reads the api endpoint from it; `AGENTOS_API_URL` overrides) + the same `cak_` key it already uses (`COMPUTERAGENT_HARNESS_TOKEN`). The key's role must include `git-credentials:read`.
151+
152+
---
153+
154+
### 2.6d AgentOS discovery — one SDK config, server-owned endpoint paths
155+
156+
> So the SDK never hardcodes an endpoint path: change a route server-side and existing SDKs follow without a release. The OIDC `.well-known` pattern. Code: `routes/discovery.ts` (server), `computeragent/agentos.py` (SDK).
157+
158+
- **Server.** `GET /agentos/api/discovery`**public** (URLs only, no secrets), mounted before the auth guards (`app.ts`). Returns `{ version, endpoints: { ingest, api, otel, messages } }` with **absolute** URLs built from the request origin (honours `X-Forwarded-Proto`/`-Host`; pin with `AGENTOS_PUBLIC_URL`). The endpoint **paths live here only** — the single source of truth — kept in sync with the `app.ts`/`dashboard.ts` mounts.
159+
- **SDK.** A consumer sets just **`AGENTOS_DISCOVERY_URL`** (the full discovery URL) + the `cak_` key. `agentos.py` GETs the document once (memoized) and resolves ingest / api / otel from it — reading the **absolute URLs verbatim**, never constructing a path. Resolution order per endpoint: explicit full-URL override env (`AGENTOS_INGEST_URL`/`AGENTOS_API_URL`/`AGENTOS_OTEL_URL`) → discovery → unconfigured (no baked fallback). The remote-harness URL is **not** discovered (separate service; `harness_url=` / `COMPUTERAGENT_HARNESS_URL`).
160+
161+
**Required env:**
162+
- Server: none (derives the origin from the request; optional `AGENTOS_PUBLIC_URL` to pin it behind a proxy).
163+
- SDK: `AGENTOS_DISCOVERY_URL` + `COMPUTERAGENT_HARNESS_TOKEN` (the `cak_`).
151164

152165
---
153166

@@ -317,12 +330,16 @@ GITCLAW_MODEL_BASE_URL=https://api.lyzr.ai/v1
317330
OPENAI_API_KEY=sk-...
318331

319332
# AgentOS persistence — SDK POSTs telemetry to the server; the server writes Mongo.
320-
# On the SDK (library/worker) side:
321-
AGENTOS_INGEST_URL=https://<agentos-host>/agentos/api/ingest/events
322-
AGENTOS_INGEST_TOKEN=<shared-secret> # optional; must match the server's
323-
AGENTOS_API_URL=https://<agentos-host>/agentos/api/v1 # for private-GAP credential resolve (§2.6c)
324-
COMPUTERAGENT_HARNESS_TOKEN=cak_... # the AgentOS API key the SDK presents (role needs git-credentials:read)
333+
# On the SDK (library/worker) side — ONE discovery URL + the cak_ key (§2.6d):
334+
AGENTOS_DISCOVERY_URL=https://<agentos-host>/agentos/api/discovery # SDK reads ingest/api/otel from it
335+
COMPUTERAGENT_HARNESS_TOKEN=cak_... # the AgentOS API key the SDK presents everywhere (role needs git-credentials:read)
336+
# Optional full-URL overrides (win over discovery; for pinning one endpoint):
337+
# AGENTOS_INGEST_URL=https://<agentos-host>/agentos/api/ingest/events
338+
# AGENTOS_API_URL=https://<agentos-host>/agentos/api/v1
339+
# AGENTOS_OTEL_URL=https://<agentos-host>/agentos/api/otel
340+
# COMPUTERAGENT_HARNESS_URL=https://<harness-host> # remote harness/CAS (separate service, not discovered)
325341
# On the agentos-server side (NOT the SDK):
342+
# AGENTOS_PUBLIC_URL=https://<agentos-host> # optional — pin the origin in discovery URLs behind a proxy
326343
MONGO_URL=mongodb+srv://user:pass@cluster.mongodb.net
327344
MONGO_DATABASE=computeragent
328345
# AgentOS auth / RBAC (§2.6b) — SSO via Keycloak (Okta brokered), DB-backed roles:
@@ -586,8 +603,10 @@ kustomize edit set image \
586603
| AgentOS auth / OIDC / BFF + refresh | `packages/agentos-server/src/auth/{oidc,authenticate,authorize,ownership,keycloak-admin}.ts`, `routes/auth.ts` |
587604
| Permission catalog + role seeds | `packages/agentos-server/src/auth/permissions.ts`, `stores/role-store.ts` |
588605
| Route composition / trust boundaries | `packages/agentos-server/src/app.ts`, `routes/dashboard.ts` |
606+
| AgentOS discovery (one SDK config, server-owned paths) | `packages/agentos-server/src/routes/discovery.ts` (`GET /agentos/api/discovery`, public, absolute URLs); SDK resolver `computeragent-py/src/computeragent/agentos.py` (`AGENTOS_DISCOVERY_URL`) — §2.6d |
589607
| Anthropic model gateway (cak_-authed proxy) | `packages/agentos-server/src/routes/messages.ts` (`POST /agentos/api/v1/messages`); smoke `computeragent-smoke/scripts/07_local_via_agentos_gateway.py` |
590608
| Run-an-agent-by-id (resolve handle) | `packages/agentos-server/src/routes/agents.ts` (`POST /agentos/api/v1/agents/resolve`, by `agentId`, group-scoped); SDK `computeragent-py/src/computeragent/harness/agent_resolve_client.py`; smoke `…/scripts/08_*.py` |
609+
| OTLP trace gateway (cak_-authed forwarder) | `packages/agentos-server/src/routes/otlp.ts` (`POST /agentos/api/otel/v1/{traces,metrics,logs}`) + `otel-auth.ts` + shared `cak-bearer.ts`. Env: `OTEL_GATEWAY_UPSTREAM_ENDPOINT`/`_HEADERS` (server-held vendor/collector creds). SDK side: `AGENTOS_OTEL_URL``OtelSink` auto-injects the `cak_` Bearer (`telemetry/sinks/otel.py`); smoke `…/scripts/09_*.py` |
591610
| Git-credential store + resolve endpoint | `packages/agentos-server/src/crypto/secret-box.ts`, `stores/git-credential-store.ts`, `routes/git-credentials.ts` |
592611
| SDK private-repo clone (PAT + SHA) | `computeragent-py/src/computeragent/harness/git_credential_client.py`, `substrates/local.py` |
593612
| Keycloak provisioning script | `packages/agentos-server/scripts/provision-keycloak.mjs` (`pnpm provision:keycloak`) |

packages/agentos-server/src/app.smoke.test.ts

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,16 @@ describe("app routing + auth gate", () => {
5151
expect(j.error.code).toBe("UNAUTHENTICATED");
5252
});
5353

54+
it("401s on the OTLP trace gateway when unauthenticated (cak_-gated, fail-closed)", async () => {
55+
delete process.env["AGENTOS_DEV_AUTH"];
56+
const r = await fetch(`${base}/agentos/api/otel/v1/traces`, {
57+
method: "POST",
58+
headers: { "content-type": "application/x-protobuf" },
59+
body: Buffer.from("x"),
60+
});
61+
expect(r.status).toBe(401);
62+
});
63+
5464
it("401s on the agent-resolve gateway when unauthenticated (cak_-gated)", async () => {
5565
delete process.env["AGENTOS_DEV_AUTH"];
5666
const r = await fetch(`${base}/agentos/api/v1/agents/resolve`, {
@@ -67,6 +77,18 @@ describe("app routing + auth gate", () => {
6777
expect(r.status).toBe(401);
6878
});
6979

80+
it("GET /agentos/api/discovery is public and returns absolute endpoint URLs", async () => {
81+
delete process.env["AGENTOS_DEV_AUTH"]; // no auth — discovery is public
82+
const r = await fetch(`${base}/agentos/api/discovery`);
83+
expect(r.status).toBe(200);
84+
const j = (await r.json()) as { version: string; endpoints: Record<string, string> };
85+
expect(j.version).toBe("1");
86+
// Absolute URLs (origin derived from the request) so the SDK does no path math.
87+
expect(j.endpoints.ingest).toBe(`${base}/agentos/api/ingest/events`);
88+
expect(j.endpoints.api).toBe(`${base}/agentos/api/v1`);
89+
expect(j.endpoints.otel).toBe(`${base}/agentos/api/otel`);
90+
});
91+
7092
it("login redirects to Keycloak only when OIDC is configured (else 503)", async () => {
7193
delete process.env["AGENTOS_DEV_AUTH"];
7294
const r = await fetch(`${base}/agentos/api/v1/auth/login`, { redirect: "manual" });

packages/agentos-server/src/app.ts

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,11 @@ import cors from "cors";
1414

1515
import { requireIngestAuth } from "./ingest-auth.js";
1616
import { requireIntrospectionAuth } from "./introspection-auth.js";
17+
import { requireOtelAuth } from "./otel-auth.js";
18+
import { discoveryRouter } from "./routes/discovery.js";
1719
import { ingestRouter } from "./routes/ingest.js";
1820
import { keysIntrospectRouter } from "./routes/keys-introspect.js";
21+
import { otlpRouter } from "./routes/otlp.js";
1922
import { mountDashboard } from "./routes/dashboard.js";
2023
import { mountObs } from "./routes/obs.js";
2124
import { errorHandler } from "./http/error-handler.js";
@@ -31,11 +34,22 @@ export function buildApp(): Express {
3134
.filter(Boolean);
3235
app.use(cors({ origin: corsOrigins.length ? corsOrigins : false, credentials: true }));
3336

37+
// ── DISCOVERY — public, no auth, no secrets (OIDC `.well-known` style). The
38+
// SDK is given only this URL + its cak_ key and reads every other endpoint's
39+
// absolute URL from here, so route changes never force an SDK release. No body
40+
// parser (GET only). Mounted first so it's reachable regardless of the guards. ──
41+
app.use("/agentos/api", discoveryRouter);
42+
3443
// ── SERVICE — machine-to-machine, own guards, BEFORE the global json/cookie ──
3544
// Ingest takes a larger batch body; introspection is tiny. Both bypass the
3645
// dashboard's cookie auth and use their own bearer guards.
3746
app.use("/agentos/api/ingest", express.json({ limit: "5mb" }), requireIngestAuth, ingestRouter);
3847
app.use("/agentos/api/keys", express.json({ limit: "16kb" }), requireIntrospectionAuth, keysIntrospectRouter);
48+
// OTLP trace gateway — cak_-authed, forwards the binary OTLP body (protobuf,
49+
// maybe gzip) verbatim to the upstream backend. NO body parser: the router
50+
// reads the raw stream itself so bytes pass through untouched regardless of
51+
// content-encoding (routes/otlp.ts).
52+
app.use("/agentos/api/otel", requireOtelAuth, otlpRouter);
3953

4054
app.use(express.json({ limit: "1mb" }));
4155
app.use(cookieParser());
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
// Shared `Bearer cak_…` validation — the primitive behind the machine-to-machine
2+
// guards (ingest, OTLP gateway). Each guard wraps this with its own policy
3+
// (ingest falls open / accepts a legacy token; the OTLP gateway is fail-closed).
4+
5+
import { apiKeyStore, KEY_PREFIX } from "./stores/api-key-store.js";
6+
7+
export type CakResult =
8+
| "valid" // a live, non-revoked, non-expired cak_ key
9+
| "invalid" // a cak_-shaped token that isn't a known active key
10+
| "not-cak" // no `Bearer cak_…` present (caller decides: legacy token / open / 401)
11+
| "error"; // the store (Mongo) couldn't be reached — caller should fail 503
12+
13+
/** Validate the `Authorization` header's `Bearer cak_…` against `api_keys`. */
14+
export async function verifyCakBearer(authHeader: string | undefined): Promise<CakResult> {
15+
const m = /^Bearer\s+(.+)$/i.exec(authHeader ?? "");
16+
const presented = m ? m[1]!.trim() : "";
17+
if (!presented.startsWith(KEY_PREFIX)) return "not-cak";
18+
try {
19+
return (await apiKeyStore.verify(presented)) ? "valid" : "invalid";
20+
} catch {
21+
return "error";
22+
}
23+
}

packages/agentos-server/src/ingest-auth.ts

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -19,32 +19,29 @@
1919

2020
import type { RequestHandler } from "express";
2121
import { timingSafeEqual } from "node:crypto";
22-
import { apiKeyStore, KEY_PREFIX } from "./stores/api-key-store.js";
22+
import { verifyCakBearer } from "./cak-bearer.js";
2323

2424
function unauthenticated(res: Parameters<RequestHandler>[1]): void {
2525
res.status(401).json({ error: { code: "UNAUTHENTICATED" } });
2626
}
2727

2828
export const requireIngestAuth: RequestHandler = async (req, res, next) => {
2929
const header = req.header("authorization") ?? "";
30-
const prefix = "Bearer ";
31-
const presented = header.startsWith(prefix) ? header.slice(prefix.length) : "";
3230

3331
// (1) API key — the same `cak_` key the SDK uses everywhere.
34-
if (presented.startsWith(KEY_PREFIX)) {
35-
try {
36-
const result = await apiKeyStore.verify(presented);
37-
if (result) return next();
38-
return unauthenticated(res); // recognized shape, but inactive/revoked/unknown
39-
} catch (err) {
40-
// Validation infra (Mongo) is down — fail closed, but distinguish from a
41-
// bad key so the caller can retry rather than treating it as a 401.
42-
console.warn("[agentos-server] ingest key verification failed:", (err as Error).message);
43-
return res.status(503).json({ error: { code: "KEY_VERIFICATION_UNAVAILABLE" } });
44-
}
32+
const cak = await verifyCakBearer(header);
33+
if (cak === "valid") return next();
34+
if (cak === "invalid") return unauthenticated(res); // cak_-shaped but inactive/unknown
35+
if (cak === "error") {
36+
// Validation infra (Mongo) is down — fail closed, distinct from a bad key.
37+
console.warn("[agentos-server] ingest key verification unavailable");
38+
return res.status(503).json({ error: { code: "KEY_VERIFICATION_UNAVAILABLE" } });
4539
}
40+
// cak === "not-cak" → fall through to the legacy/open paths.
4641

4742
// (2) Back-compat: legacy static shared ingest token.
43+
const prefix = "Bearer ";
44+
const presented = header.startsWith(prefix) ? header.slice(prefix.length) : "";
4845
const expected = process.env["AGENTOS_INGEST_TOKEN"];
4946
if (expected) {
5047
if (presented) {
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
// Machine-to-machine auth for the OTLP trace gateway (routes/otlp.ts).
2+
//
3+
// Unlike ingest (which falls open / accepts a legacy token), the OTLP gateway is
4+
// FAIL-CLOSED: it forwards to a vendor/collector using a server-held credential,
5+
// so an unauthenticated caller must never get through. The SDK presents the same
6+
// `cak_` key it uses everywhere as `Authorization: Bearer` (set via the OTLP
7+
// exporter headers). Missing/invalid → 401; store unreachable → 503.
8+
9+
import type { RequestHandler } from "express";
10+
import { verifyCakBearer } from "./cak-bearer.js";
11+
12+
export const requireOtelAuth: RequestHandler = async (req, res, next) => {
13+
const cak = await verifyCakBearer(req.header("authorization"));
14+
if (cak === "valid") return next();
15+
if (cak === "error") {
16+
return res.status(503).json({ error: { code: "KEY_VERIFICATION_UNAVAILABLE" } });
17+
}
18+
// "invalid" or "not-cak" → reject (fail-closed; no legacy/open fallback).
19+
return res.status(401).json({ error: { code: "UNAUTHENTICATED" } });
20+
};
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
// Unit test for the discovery document. Mounts the router in isolation and
2+
// drives it via fetch. The absolute URLs must reflect the request origin
3+
// (honouring X-Forwarded-* behind a proxy) or the AGENTOS_PUBLIC_URL override.
4+
5+
import { describe, it, expect, beforeEach, afterEach } from "vitest";
6+
import express from "express";
7+
import type { Server } from "node:http";
8+
9+
import { discoveryRouter } from "./discovery.js";
10+
11+
let server: Server;
12+
let base = "";
13+
14+
beforeEach(async () => {
15+
const app = express();
16+
app.use("/agentos/api", discoveryRouter);
17+
await new Promise<void>((r) => {
18+
server = app.listen(0, "127.0.0.1", () => r());
19+
});
20+
const addr = server.address();
21+
base = `http://127.0.0.1:${typeof addr === "object" && addr ? addr.port : 0}`;
22+
});
23+
24+
afterEach(() => {
25+
server?.close();
26+
delete process.env["AGENTOS_PUBLIC_URL"];
27+
});
28+
29+
describe("discovery document", () => {
30+
it("returns absolute URLs from the request origin + a cache header", async () => {
31+
const r = await fetch(`${base}/agentos/api/discovery`);
32+
expect(r.status).toBe(200);
33+
expect(r.headers.get("cache-control")).toContain("max-age=300");
34+
const j = (await r.json()) as { version: string; endpoints: Record<string, string> };
35+
expect(j.version).toBe("1");
36+
expect(j.endpoints.ingest).toBe(`${base}/agentos/api/ingest/events`);
37+
expect(j.endpoints.api).toBe(`${base}/agentos/api/v1`);
38+
expect(j.endpoints.otel).toBe(`${base}/agentos/api/otel`);
39+
expect(j.endpoints.messages).toBe(`${base}/agentos/api/v1/messages`);
40+
});
41+
42+
it("honours X-Forwarded-Proto/Host so URLs are correct behind an ingress", async () => {
43+
const r = await fetch(`${base}/agentos/api/discovery`, {
44+
headers: { "x-forwarded-proto": "https", "x-forwarded-host": "agentos.example.com" },
45+
});
46+
const j = (await r.json()) as { endpoints: Record<string, string> };
47+
expect(j.endpoints.ingest).toBe("https://agentos.example.com/agentos/api/ingest/events");
48+
expect(j.endpoints.otel).toBe("https://agentos.example.com/agentos/api/otel");
49+
});
50+
51+
it("AGENTOS_PUBLIC_URL pins the origin and trims a trailing slash", async () => {
52+
process.env["AGENTOS_PUBLIC_URL"] = "https://canonical.example.com/";
53+
const r = await fetch(`${base}/agentos/api/discovery`, {
54+
headers: { "x-forwarded-host": "ignored.example.com" },
55+
});
56+
const j = (await r.json()) as { endpoints: Record<string, string> };
57+
expect(j.endpoints.api).toBe("https://canonical.example.com/agentos/api/v1");
58+
});
59+
});

0 commit comments

Comments
 (0)