arabold · arabold · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -92,12 +92,23 @@ jobs:
           npm version "$version" --no-git-tag-version
           npm publish --access public
 
+      - name: Publish companion package (manual)
+        if: steps.release-mode.outputs.manual_release == 'true'
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+        run: |
+          version='${{ steps.release-mode.outputs.release_version }}'
+          # Keep the optional Transformers.js companion in lockstep with the main package so a
+          # compatible version always exists on npm. prepack builds dist before publishing.
+          npm version "$version" --no-git-tag-version --workspace @arabold/docs-mcp-server-transformers
+          npm publish --workspace @arabold/docs-mcp-server-transformers
+
       - name: Commit manual version bump
         if: steps.release-mode.outputs.manual_release == 'true'
         run: |
           git config user.name "github-actions[bot]"
           git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
-          git add package.json package-lock.json
+          git add package.json package-lock.json packages/transformers/package.json
           git commit -m "chore(release): ${{ steps.release-mode.outputs.release_version }} [skip ci]"
           git push origin "HEAD:${GITHUB_REF_NAME}"
 

diff --git a/.releaserc.json b/.releaserc.json
@@ -9,6 +9,13 @@
         "changelogFile": "CHANGELOG.md"
       }
     ],
+    [
+      "@semantic-release/npm",
+      {
+        "npmPublish": true,
+        "pkgRoot": "packages/transformers"
+      }
+    ],
     [
       "@semantic-release/npm",
       {
@@ -19,7 +26,11 @@
     [
       "@semantic-release/git",
       {
-        "assets": ["package.json", "CHANGELOG.md"],
+        "assets": [
+          "package.json",
+          "packages/transformers/package.json",
+          "CHANGELOG.md"
+        ],
         "message": "chore(release): ${nextRelease.version} [skip ci]\n\n${nextRelease.notes}"
       }
     ],

diff --git a/Dockerfile b/Dockerfile
@@ -15,11 +15,12 @@
 FROM base AS builder

 # Accept build argument for PostHog API key
 ARG POSTHOG_API_KEY
 ENV POSTHOG_API_KEY=$POSTHOG_API_KEY
 
-# Copy package files
+# Copy package files (root + workspaces so `npm ci` can link the companion package)
 COPY package*.json ./
+COPY packages/transformers/package.json ./packages/transformers/package.json
 
 # Install all dependencies (including dev dependencies for building)
 RUN npm ci
@@ -51,21 +52,27 @@
 COPY --from=builder /app/node_modules ./node_modules
 COPY --from=builder /app/public ./public
 COPY --from=builder /app/dist ./dist
+# Include the built Transformers.js companion package so the workspace symlink in
+# node_modules resolves and local (offline) embeddings work out of the box.
+COPY --from=builder /app/packages ./packages
 
 # Set data directory for the container
 ENV DOCS_MCP_STORE_PATH=/data
 ENV XDG_CONFIG_HOME=/config
+# Cache directory for Transformers.js models (downloaded on first use).
+ENV TRANSFORMERS_CACHE=/models
 
 # Create the writable runtime directories and hand ownership to the
 # unprivileged `node` user that ships with the base image (uid 1000).
 # `/app` is intentionally left root-owned so the runtime user cannot
 # tamper with code or `node_modules` if it is ever compromised.
-RUN mkdir -p /data /config \
-  && chown node:node /data /config
+RUN mkdir -p /data /config /models \
+  && chown node:node /data /config /models
 
 # Define volumes
 VOLUME /data
 VOLUME /config
+VOLUME /models
 
 # Expose the default port of the application
 EXPOSE 6280

diff --git a/README.md b/README.md
@@ -130,7 +130,7 @@ Using an embedding model is **optional** but dramatically improves search qualit
 OPENAI_API_KEY="sk-proj-..." npx @arabold/docs-mcp-server@latest
 ```
 
-See **[Embedding Models](docs/guides/embedding-models.md)** for configuring **Ollama**, **Gemini**, **Azure**, and others.
+See **[Embedding Models](docs/guides/embedding-models.md)** for configuring **Ollama**, **Gemini**, **Azure**, fully **offline/local** embeddings, and others.
 
 ---
 
@@ -142,7 +142,7 @@ See **[Embedding Models](docs/guides/embedding-models.md)** for configuring **Ol
 -   **[Basic Usage](docs/guides/basic-usage.md)**: Using the Web UI, CLI, and scraping local files.
 -   **[Configuration](docs/setup/configuration.md)**: Full reference for config files and environment variables.
 -   **[Supported Formats](docs/concepts/supported-formats.md)**: Complete file format and MIME type reference.
--   **[Embedding Models](docs/guides/embedding-models.md)**: Configure OpenAI, Ollama, Gemini, and other providers.
+-   **[Embedding Models](docs/guides/embedding-models.md)**: Configure OpenAI, Ollama, Gemini, local/offline, and other providers.
 -   **[Search Quality Benchmark](docs/guides/benchmarking.md)**: Measure retrieval quality with IR metrics + LLM-judged scores; prerequisites, how to run, how to interpret results.
 
 ### Hash-Routed SPAs

diff --git a/docs/guides/embedding-models.md b/docs/guides/embedding-models.md
@@ -14,6 +14,7 @@ If you leave the model empty but provide `OPENAI_API_KEY`, the server defaults t
 - `gemini:embedding-001` (Google Gemini)
 - `aws:amazon.titan-embed-text-v1` (AWS Bedrock)
 - `microsoft:text-embedding-ada-002` (Azure OpenAI)
+- `transformers:BAAI/bge-small-en-v1.5` (local, offline — requires the optional companion package)
 - Or any OpenAI-compatible model name
 
 ## Provider Configuration
@@ -34,6 +35,8 @@ Provider credentials use the provider-specific environment variables listed belo
 | `AZURE_OPENAI_API_INSTANCE_NAME`   | Azure OpenAI instance name.                           |
 | `AZURE_OPENAI_API_DEPLOYMENT_NAME` | Azure OpenAI deployment name.                         |
 | `AZURE_OPENAI_API_VERSION`         | Azure OpenAI API version.                             |
+| `TRANSFORMERS_DEVICE`              | Device for local embeddings: `cpu` (default) or `webgpu`. |
+| `TRANSFORMERS_CACHE`               | Directory for caching downloaded local models.        |
 
 ### Examples
 
@@ -114,7 +117,29 @@ DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
 npx @arabold/docs-mcp-server@latest
 ```
 
-## Changing the Embedding Model
+#### Local / Offline (Transformers.js)
+
+Generate embeddings entirely on your machine — no API key, no network calls, no data leaving your host. This uses [Transformers.js](https://huggingface.co/docs/transformers.js) with the ONNX runtime.
+
+Because the runtime is large, it ships as an optional **companion package** that you install alongside the server:
+
+```bash
+npm install -g @arabold/docs-mcp-server @arabold/docs-mcp-server-transformers
+
+DOCS_MCP_EMBEDDING_MODEL="transformers:BAAI/bge-small-en-v1.5" \
+docs-mcp-server
+```
+
+Or run both with `npx` in a single command:
+
+```bash
+DOCS_MCP_EMBEDDING_MODEL="transformers:BAAI/bge-small-en-v1.5" \
+npx -p @arabold/docs-mcp-server -p @arabold/docs-mcp-server-transformers docs-mcp-server
+```
+
+The model is downloaded on first use and cached (set `TRANSFORMERS_CACHE` to choose the directory). Any sentence-transformers model on Hugging Face works; the vector dimension is detected automatically. Set `TRANSFORMERS_DEVICE=webgpu` to enable GPU acceleration on supported hardware.
+
+> The official Docker image already bundles the companion package, so `transformers:` models work out of the box there with no extra install.
 
 When you change the embedding model or vector dimension after initial setup, existing embedding vectors become semantically incompatible with the new configuration. The server detects this automatically by tracking the active model identity in a metadata table.
 

diff --git a/openspec/changes/add-transformers-companion/.openspec.yaml b/openspec/changes/add-transformers-companion/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-06-06
diff --git a/openspec/changes/add-transformers-companion/design.md b/openspec/changes/add-transformers-companion/design.md
@@ -0,0 +1,69 @@
+## Context
+
+Embeddings power the server's semantic vector search. Providers so far (`openai`, `gemini`, `vertex`, `aws`, `microsoft`) are all hosted APIs: they need credentials and send chunk text off-host. Some users cannot or will not do that (air-gapped, regulated, privacy-sensitive, or simply cost-averse) and want fully local embeddings.
+
+Transformers.js (`@huggingface/transformers`) can run sentence-transformer models locally on the ONNX runtime. The blocker is size: `@huggingface/transformers` plus `onnxruntime-node` (which bundles native binaries for win32 + linux + darwin) and `onnxruntime-web` totals **~370 MB** installed. Adding that to the base package would roughly 10× the install for the majority of users who use hosted APIs and never touch local embeddings.
+
+The project ships to npm (`@arabold/docs-mcp-server`, currently a single package) and to a Docker image, and releases via semantic-release with a manual-dispatch fallback. It targets Node 22 (ESM), builds with Vite/Rollup (SSR config externalizes `dependencies`), and uses npm as the package manager.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Offer a fully local, offline `transformers:` embedding provider — no credentials, no network at inference time.
+- Keep the default install of `@arabold/docs-mcp-server` free of the ~370 MB ONNX runtime.
+- Make enabling local embeddings a single explicit action (`npm i -g @arabold/docs-mcp-server-transformers`) with a clear error when it's missing.
+- Keep all integration logic in the main repo (testable, reviewable here); the external package carries only the dependency.
+- Guarantee a version-compatible companion is always available on npm and pre-installed in Docker.
+- Avoid any behavior or footprint change for existing non-`transformers` users.
+
+**Non-Goals:**
+- A general plugin/extension system. This is a single, purpose-built companion, not a registry of providers.
+- Bundling/vendoring the ONNX runtime into the main package or the npm tarball in any form.
+- GPU support beyond exposing the existing `webgpu` device toggle.
+- Pre-downloading models (they download on first use, like Playwright browsers already do).
+
+## Decisions
+
+### 1. Companion package that re-exports, loaded by name via dynamic import
+The heavy dependency lives in a separate package, `@arabold/docs-mcp-server-transformers` (`packages/transformers/`), whose entire job is `export { pipeline, env } from "@huggingface/transformers"`. The main server imports **the companion by name**, never `@huggingface/transformers` directly.
+
+- *Why re-export instead of importing transformers directly with it marked optional?* Module resolution is deterministic: the server resolves the companion as a sibling package, and the companion resolves `@huggingface/transformers` from its own dependency subtree, regardless of how npm hoists things. The server never needs to know transformers' resolution path.
+- *Why a companion at all vs. `optionalDependencies`?* `optionalDependencies` are still installed by default; they only add install-failure resilience, not size savings. They would not keep the runtime out of the base install.
+- *Alternative considered — runtime auto-install (`npm i` on demand):* rejected as fragile (no network in air-gapped installs, permission issues for global installs, surprising side effects).
+
+### 2. Lazy dynamic import with a literal specifier
+`transformersLoader.ts` does `import("@arabold/docs-mcp-server-transformers")` only on first use, caching the promise. The specifier is a **string literal** so Rollup can externalize it (a variable specifier cannot be externalized) and the import survives bundling as a real runtime `import()`. The companion is added to `vite.config.ts`'s `external` list explicitly because it is a `devDependency`, not a runtime `dependency`, so it isn't covered by the automatic `Object.keys(dependencies)` externalization.
+
+### 3. Structural types instead of importing transformers' types
+The loader declares minimal structural types (`FeatureExtractionPipeline`, `TransformersModule`, etc.) and `TransformersJSEmbeddings.ts` imports those from the loader (a local file). The main package therefore never imports `@huggingface/transformers` even at the type level, so `tsc` and the bundler never pull it in. The companion is only required to be installed at runtime, and only when local embeddings are actually used.
+
+### 4. Reuse `main`'s generic dimension resolution; no transformers-specific store logic
+Recent work (#431) resolves vector dimensions generically: known models via a lookup table, unknown models via a one-shot `embedQuery("test").length` probe before the vector table is created. `TransformersJSEmbeddings.embedQuery` returns a plain `number[]`, so this path handles it with no `DocumentStore` changes. We deliberately do **not** add a `transformers`-specific branch to the store. We only correct the known dimension for `BAAI/bge-small-en-v1.5` (512 → 384, its real hidden size).
+
+### 5. Provider requires no credentials; companion presence checked lazily
+`areCredentialsAvailable("transformers")` returns `true` (local models need none). The companion's availability is verified when the model is first loaded, not at credential-check or startup time. For known-dimension models this means a missing companion surfaces at first embed rather than at init — an accepted trade-off (documented below).
+
+### 6. Lockstep versioning, published by the release pipeline
+The companion is versioned and published in lockstep with the main package. Implementation: a second `@semantic-release/npm` instance with `pkgRoot: packages/transformers` (and the same in the manual path) publishes the companion at the release version; `prepack` builds its `dist` first; `publishConfig.access: public` covers the first publish. The main package's `devDependency` on the companion is the wildcard `"*"` so lockstep version bumps never break npm-workspace linking in dev/CI/Docker.
+
+- *Why lockstep vs. independent versioning?* It gives users a trivial compatibility rule ("install the same version of both") and guarantees that every server release has a matching companion on npm.
+
+### 7. Docker bakes the companion in
+The image installs the whole workspace (`npm ci`) and copies `packages/` into the runtime stage so the workspace symlink resolves. `TRANSFORMERS_CACHE=/models` with a `/models` volume caches downloaded models across runs.
+
+## Risks / Trade-offs
+
+- **Late failure for known models** → For models in the dimension lookup table, a missing companion isn't detected at init (no probe runs), so the error appears at first actual embed. Mitigated by a clear, actionable `TransformersCompanionMissingError`. A future refinement could do a cheap existence check (e.g. `import.meta.resolve`) at init.
+- **Misclassifying load errors as "companion missing"** → A broken file inside an installed companion could be mistaken for the package being absent. Mitigated: `isCompanionMissingError` only matches `ERR_MODULE_NOT_FOUND`/`MODULE_NOT_FOUND` whose message quotes the bare package specifier (`'@arabold/...'`), not an internal file path; everything else is rethrown unchanged.
+- **`npx` isolation** → `npx @arabold/docs-mcp-server` will not see a globally-installed companion (separate trees). Mitigated by documenting `npm i -g <both>` or `npx -p <both>`.
+- **CI/dev now pulls ~370 MB** → `npm ci` installs the companion's transformers dependency for the workspace. This is necessary to build/test the companion and is acceptable (it affects CI, not end users). End-user install size is unchanged.
+- **Two-package release complexity** → A second publish step adds release surface. Mitigated by the wildcard devDependency (no version-sync script needed) and `prepack` guaranteeing `dist` is built before publish.
+
+## Migration Plan
+
+- Purely additive; no data migration. Existing databases and non-`transformers` configs are unaffected.
+- Rollback: revert the change. Because the companion is a `devDependency` and loaded lazily, removing it has no effect on any other provider. Users who installed the companion can simply uninstall it.
+
+## Open Questions
+
+- None blocking. Possible future refinements: an init-time companion existence check to convert late failures to early ones, and an optional Docker variant without the companion for size-sensitive deployments.
diff --git a/openspec/changes/add-transformers-companion/proposal.md b/openspec/changes/add-transformers-companion/proposal.md
@@ -0,0 +1,32 @@
+## Why
+
+Users who want semantic search without sending document content to a third-party API (air-gapped, privacy-sensitive, or cost-averse setups) currently have no offline embedding option. Transformers.js can run sentence-transformer models locally, but its ONNX runtime weighs ~370 MB across all platform binaries — far too heavy to add to every install of a tool whose default users rely on hosted APIs. We need offline embeddings available to those who want them without inflating the install footprint for everyone else.
+
+## What Changes
+
+- Add a new `transformers` embedding provider that generates embeddings locally and offline, with no API key or network access required.
+- Externalize the heavy `@huggingface/transformers` dependency into an **optional companion package**, `@arabold/docs-mcp-server-transformers`, kept in this repo as an npm workspace. The main server depends on it only as a wildcard `devDependency`, so a default end-user install does **not** pull in the ONNX runtime.
+- Load the companion lazily via a dynamic import the first time a `transformers:` model is used. When the companion is not installed, surface an actionable error telling the user to install it; never fail at startup or for non-transformers users.
+- Keep the integration code (the embeddings class, provider wiring, loader) in the main repo; the companion is a thin re-exporter only.
+- Publish the companion in lockstep with the main package from the release pipeline (semantic-release and manual paths) so a version-compatible companion always exists on npm.
+- Bundle the companion into the official Docker image so `transformers:` models work out of the box there, caching models under `TRANSFORMERS_CACHE=/models`.
+- Fix the known vector dimension for `BAAI/bge-small-en-v1.5` from 512 to its actual 384.
+
+## Capabilities
+
+### New Capabilities
+- `local-embeddings`: Offline, local embedding generation via the optional Transformers.js companion package — provider behavior, lazy companion loading, missing-companion handling, device/cache configuration, and how the companion is distributed (workspace, lockstep release, Docker).
+
+### Modified Capabilities
+- `embedding-resolution`: The supported-provider list and credential-validation rules change — `transformers` becomes a fully supported provider (not parse-only) that requires no credentials, and the `BAAI/bge-small-en-v1.5` known-dimension entry is corrected to 384.
+
+## Impact
+
+- **New package:** `packages/transformers/` (`@arabold/docs-mcp-server-transformers`), an npm workspace re-exporting `@huggingface/transformers`.
+- **New source:** `src/store/embeddings/transformersLoader.ts`, `src/store/embeddings/TransformersJSEmbeddings.ts`.
+- **Modified source:** `EmbeddingFactory.ts` and `EmbeddingConfig.ts` (provider plumbing + dimension fix).
+- **Build/dist:** `vite.config.ts` externalizes the companion; root `package.json` gains `workspaces` and a wildcard companion `devDependency`; the `build` script compiles the companion first.
+- **Release/CI:** `.releaserc.json` and `.github/workflows/release.yml` publish the companion in lockstep. Existing `ci.yml`/`eval.yml` need no change (npm workspaces + build cover it).
+- **Docker:** `Dockerfile` copies the companion and adds the `/models` cache volume.
+- **Docs:** `docs/guides/embedding-models.md`, `README.md`.
+- **No breaking changes** for existing users: no new runtime dependency in the default install, no behavior change for non-`transformers` providers.