Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,23 @@ jobs:
npm version "$version" --no-git-tag-version
npm publish --access public

- name: Publish companion package (manual)
if: steps.release-mode.outputs.manual_release == 'true'
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: |
version='${{ steps.release-mode.outputs.release_version }}'
# Keep the optional Transformers.js companion in lockstep with the main package so a
# compatible version always exists on npm. prepack builds dist before publishing.
npm version "$version" --no-git-tag-version --workspace @arabold/docs-mcp-server-transformers
npm publish --workspace @arabold/docs-mcp-server-transformers

- name: Commit manual version bump
if: steps.release-mode.outputs.manual_release == 'true'
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add package.json package-lock.json
git add package.json package-lock.json packages/transformers/package.json
git commit -m "chore(release): ${{ steps.release-mode.outputs.release_version }} [skip ci]"
git push origin "HEAD:${GITHUB_REF_NAME}"

Expand Down
13 changes: 12 additions & 1 deletion .releaserc.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@
"changelogFile": "CHANGELOG.md"
}
],
[
"@semantic-release/npm",
{
"npmPublish": true,
"pkgRoot": "packages/transformers"
}
],
[
"@semantic-release/npm",
{
Expand All @@ -19,7 +26,11 @@
[
"@semantic-release/git",
{
"assets": ["package.json", "CHANGELOG.md"],
"assets": [
"package.json",
"packages/transformers/package.json",
"CHANGELOG.md"
],
"message": "chore(release): ${nextRelease.version} [skip ci]\n\n${nextRelease.notes}"
}
],
Expand Down
13 changes: 10 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,12 @@
FROM base AS builder

# Accept build argument for PostHog API key
ARG POSTHOG_API_KEY

Check warning on line 18 in Dockerfile

View workflow job for this annotation

GitHub Actions / Docker image E2E

Sensitive data should not be used in the ARG or ENV commands

SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ARG "POSTHOG_API_KEY") More info: https://docs.docker.com/go/dockerfile/rule/secrets-used-in-arg-or-env/
ENV POSTHOG_API_KEY=$POSTHOG_API_KEY

Check warning on line 19 in Dockerfile

View workflow job for this annotation

GitHub Actions / Docker image E2E

Sensitive data should not be used in the ARG or ENV commands

SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ENV "POSTHOG_API_KEY") More info: https://docs.docker.com/go/dockerfile/rule/secrets-used-in-arg-or-env/

# Copy package files
# Copy package files (root + workspaces so `npm ci` can link the companion package)
COPY package*.json ./
COPY packages/transformers/package.json ./packages/transformers/package.json

# Install all dependencies (including dev dependencies for building)
RUN npm ci
Expand Down Expand Up @@ -51,21 +52,27 @@
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/public ./public
COPY --from=builder /app/dist ./dist
# Include the built Transformers.js companion package so the workspace symlink in
# node_modules resolves and local (offline) embeddings work out of the box.
COPY --from=builder /app/packages ./packages

# Set data directory for the container
ENV DOCS_MCP_STORE_PATH=/data
ENV XDG_CONFIG_HOME=/config
# Cache directory for Transformers.js models (downloaded on first use).
ENV TRANSFORMERS_CACHE=/models

# Create the writable runtime directories and hand ownership to the
# unprivileged `node` user that ships with the base image (uid 1000).
# `/app` is intentionally left root-owned so the runtime user cannot
# tamper with code or `node_modules` if it is ever compromised.
RUN mkdir -p /data /config \
&& chown node:node /data /config
RUN mkdir -p /data /config /models \
&& chown node:node /data /config /models

# Define volumes
VOLUME /data
VOLUME /config
VOLUME /models

# Expose the default port of the application
EXPOSE 6280
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Using an embedding model is **optional** but dramatically improves search qualit
OPENAI_API_KEY="sk-proj-..." npx @arabold/docs-mcp-server@latest
```

See **[Embedding Models](docs/guides/embedding-models.md)** for configuring **Ollama**, **Gemini**, **Azure**, and others.
See **[Embedding Models](docs/guides/embedding-models.md)** for configuring **Ollama**, **Gemini**, **Azure**, fully **offline/local** embeddings, and others.

---

Expand All @@ -142,7 +142,7 @@ See **[Embedding Models](docs/guides/embedding-models.md)** for configuring **Ol
- **[Basic Usage](docs/guides/basic-usage.md)**: Using the Web UI, CLI, and scraping local files.
- **[Configuration](docs/setup/configuration.md)**: Full reference for config files and environment variables.
- **[Supported Formats](docs/concepts/supported-formats.md)**: Complete file format and MIME type reference.
- **[Embedding Models](docs/guides/embedding-models.md)**: Configure OpenAI, Ollama, Gemini, and other providers.
- **[Embedding Models](docs/guides/embedding-models.md)**: Configure OpenAI, Ollama, Gemini, local/offline, and other providers.
- **[Search Quality Benchmark](docs/guides/benchmarking.md)**: Measure retrieval quality with IR metrics + LLM-judged scores; prerequisites, how to run, how to interpret results.

### Hash-Routed SPAs
Expand Down
27 changes: 26 additions & 1 deletion docs/guides/embedding-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ If you leave the model empty but provide `OPENAI_API_KEY`, the server defaults t
- `gemini:embedding-001` (Google Gemini)
- `aws:amazon.titan-embed-text-v1` (AWS Bedrock)
- `microsoft:text-embedding-ada-002` (Azure OpenAI)
- `transformers:BAAI/bge-small-en-v1.5` (local, offline — requires the optional companion package)
- Or any OpenAI-compatible model name

## Provider Configuration
Expand All @@ -34,6 +35,8 @@ Provider credentials use the provider-specific environment variables listed belo
| `AZURE_OPENAI_API_INSTANCE_NAME` | Azure OpenAI instance name. |
| `AZURE_OPENAI_API_DEPLOYMENT_NAME` | Azure OpenAI deployment name. |
| `AZURE_OPENAI_API_VERSION` | Azure OpenAI API version. |
| `TRANSFORMERS_DEVICE` | Device for local embeddings: `cpu` (default) or `webgpu`. |
| `TRANSFORMERS_CACHE` | Directory for caching downloaded local models. |

### Examples

Expand Down Expand Up @@ -114,7 +117,29 @@ DOCS_MCP_EMBEDDING_MODEL="microsoft:text-embedding-ada-002" \
npx @arabold/docs-mcp-server@latest
```

## Changing the Embedding Model
#### Local / Offline (Transformers.js)

Generate embeddings entirely on your machine — no API key, no network calls, no data leaving your host. This uses [Transformers.js](https://huggingface.co/docs/transformers.js) with the ONNX runtime.

Because the runtime is large, it ships as an optional **companion package** that you install alongside the server:

```bash
npm install -g @arabold/docs-mcp-server @arabold/docs-mcp-server-transformers

DOCS_MCP_EMBEDDING_MODEL="transformers:BAAI/bge-small-en-v1.5" \
docs-mcp-server
```

Or run both with `npx` in a single command:

```bash
DOCS_MCP_EMBEDDING_MODEL="transformers:BAAI/bge-small-en-v1.5" \
npx -p @arabold/docs-mcp-server -p @arabold/docs-mcp-server-transformers docs-mcp-server
```

The model is downloaded on first use and cached (set `TRANSFORMERS_CACHE` to choose the directory). Any sentence-transformers model on Hugging Face works; the vector dimension is detected automatically. Set `TRANSFORMERS_DEVICE=webgpu` to enable GPU acceleration on supported hardware.

> The official Docker image already bundles the companion package, so `transformers:` models work out of the box there with no extra install.

When you change the embedding model or vector dimension after initial setup, existing embedding vectors become semantically incompatible with the new configuration. The server detects this automatically by tracking the active model identity in a metadata table.

Expand Down
2 changes: 2 additions & 0 deletions openspec/changes/add-transformers-companion/.openspec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-06-06
69 changes: 69 additions & 0 deletions openspec/changes/add-transformers-companion/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
## Context

Embeddings power the server's semantic vector search. Providers so far (`openai`, `gemini`, `vertex`, `aws`, `microsoft`) are all hosted APIs: they need credentials and send chunk text off-host. Some users cannot or will not do that (air-gapped, regulated, privacy-sensitive, or simply cost-averse) and want fully local embeddings.

Transformers.js (`@huggingface/transformers`) can run sentence-transformer models locally on the ONNX runtime. The blocker is size: `@huggingface/transformers` plus `onnxruntime-node` (which bundles native binaries for win32 + linux + darwin) and `onnxruntime-web` totals **~370 MB** installed. Adding that to the base package would roughly 10× the install for the majority of users who use hosted APIs and never touch local embeddings.

The project ships to npm (`@arabold/docs-mcp-server`, currently a single package) and to a Docker image, and releases via semantic-release with a manual-dispatch fallback. It targets Node 22 (ESM), builds with Vite/Rollup (SSR config externalizes `dependencies`), and uses npm as the package manager.

## Goals / Non-Goals

**Goals:**
- Offer a fully local, offline `transformers:` embedding provider — no credentials, no network at inference time.
- Keep the default install of `@arabold/docs-mcp-server` free of the ~370 MB ONNX runtime.
- Make enabling local embeddings a single explicit action (`npm i -g @arabold/docs-mcp-server-transformers`) with a clear error when it's missing.
- Keep all integration logic in the main repo (testable, reviewable here); the external package carries only the dependency.
- Guarantee a version-compatible companion is always available on npm and pre-installed in Docker.
- Avoid any behavior or footprint change for existing non-`transformers` users.

**Non-Goals:**
- A general plugin/extension system. This is a single, purpose-built companion, not a registry of providers.
- Bundling/vendoring the ONNX runtime into the main package or the npm tarball in any form.
- GPU support beyond exposing the existing `webgpu` device toggle.
- Pre-downloading models (they download on first use, like Playwright browsers already do).

## Decisions

### 1. Companion package that re-exports, loaded by name via dynamic import
The heavy dependency lives in a separate package, `@arabold/docs-mcp-server-transformers` (`packages/transformers/`), whose entire job is `export { pipeline, env } from "@huggingface/transformers"`. The main server imports **the companion by name**, never `@huggingface/transformers` directly.

- *Why re-export instead of importing transformers directly with it marked optional?* Module resolution is deterministic: the server resolves the companion as a sibling package, and the companion resolves `@huggingface/transformers` from its own dependency subtree, regardless of how npm hoists things. The server never needs to know transformers' resolution path.
- *Why a companion at all vs. `optionalDependencies`?* `optionalDependencies` are still installed by default; they only add install-failure resilience, not size savings. They would not keep the runtime out of the base install.
- *Alternative considered — runtime auto-install (`npm i` on demand):* rejected as fragile (no network in air-gapped installs, permission issues for global installs, surprising side effects).

### 2. Lazy dynamic import with a literal specifier
`transformersLoader.ts` does `import("@arabold/docs-mcp-server-transformers")` only on first use, caching the promise. The specifier is a **string literal** so Rollup can externalize it (a variable specifier cannot be externalized) and the import survives bundling as a real runtime `import()`. The companion is added to `vite.config.ts`'s `external` list explicitly because it is a `devDependency`, not a runtime `dependency`, so it isn't covered by the automatic `Object.keys(dependencies)` externalization.

### 3. Structural types instead of importing transformers' types
The loader declares minimal structural types (`FeatureExtractionPipeline`, `TransformersModule`, etc.) and `TransformersJSEmbeddings.ts` imports those from the loader (a local file). The main package therefore never imports `@huggingface/transformers` even at the type level, so `tsc` and the bundler never pull it in. The companion is only required to be installed at runtime, and only when local embeddings are actually used.

### 4. Reuse `main`'s generic dimension resolution; no transformers-specific store logic
Recent work (#431) resolves vector dimensions generically: known models via a lookup table, unknown models via a one-shot `embedQuery("test").length` probe before the vector table is created. `TransformersJSEmbeddings.embedQuery` returns a plain `number[]`, so this path handles it with no `DocumentStore` changes. We deliberately do **not** add a `transformers`-specific branch to the store. We only correct the known dimension for `BAAI/bge-small-en-v1.5` (512 → 384, its real hidden size).

### 5. Provider requires no credentials; companion presence checked lazily
`areCredentialsAvailable("transformers")` returns `true` (local models need none). The companion's availability is verified when the model is first loaded, not at credential-check or startup time. For known-dimension models this means a missing companion surfaces at first embed rather than at init — an accepted trade-off (documented below).

### 6. Lockstep versioning, published by the release pipeline
The companion is versioned and published in lockstep with the main package. Implementation: a second `@semantic-release/npm` instance with `pkgRoot: packages/transformers` (and the same in the manual path) publishes the companion at the release version; `prepack` builds its `dist` first; `publishConfig.access: public` covers the first publish. The main package's `devDependency` on the companion is the wildcard `"*"` so lockstep version bumps never break npm-workspace linking in dev/CI/Docker.

- *Why lockstep vs. independent versioning?* It gives users a trivial compatibility rule ("install the same version of both") and guarantees that every server release has a matching companion on npm.

### 7. Docker bakes the companion in
The image installs the whole workspace (`npm ci`) and copies `packages/` into the runtime stage so the workspace symlink resolves. `TRANSFORMERS_CACHE=/models` with a `/models` volume caches downloaded models across runs.

## Risks / Trade-offs

- **Late failure for known models** → For models in the dimension lookup table, a missing companion isn't detected at init (no probe runs), so the error appears at first actual embed. Mitigated by a clear, actionable `TransformersCompanionMissingError`. A future refinement could do a cheap existence check (e.g. `import.meta.resolve`) at init.
- **Misclassifying load errors as "companion missing"** → A broken file inside an installed companion could be mistaken for the package being absent. Mitigated: `isCompanionMissingError` only matches `ERR_MODULE_NOT_FOUND`/`MODULE_NOT_FOUND` whose message quotes the bare package specifier (`'@arabold/...'`), not an internal file path; everything else is rethrown unchanged.
- **`npx` isolation** → `npx @arabold/docs-mcp-server` will not see a globally-installed companion (separate trees). Mitigated by documenting `npm i -g <both>` or `npx -p <both>`.
- **CI/dev now pulls ~370 MB** → `npm ci` installs the companion's transformers dependency for the workspace. This is necessary to build/test the companion and is acceptable (it affects CI, not end users). End-user install size is unchanged.
- **Two-package release complexity** → A second publish step adds release surface. Mitigated by the wildcard devDependency (no version-sync script needed) and `prepack` guaranteeing `dist` is built before publish.

## Migration Plan

- Purely additive; no data migration. Existing databases and non-`transformers` configs are unaffected.
- Rollback: revert the change. Because the companion is a `devDependency` and loaded lazily, removing it has no effect on any other provider. Users who installed the companion can simply uninstall it.

## Open Questions

- None blocking. Possible future refinements: an init-time companion existence check to convert late failures to early ones, and an optional Docker variant without the companion for size-sensitive deployments.
32 changes: 32 additions & 0 deletions openspec/changes/add-transformers-companion/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## Why

Users who want semantic search without sending document content to a third-party API (air-gapped, privacy-sensitive, or cost-averse setups) currently have no offline embedding option. Transformers.js can run sentence-transformer models locally, but its ONNX runtime weighs ~370 MB across all platform binaries — far too heavy to add to every install of a tool whose default users rely on hosted APIs. We need offline embeddings available to those who want them without inflating the install footprint for everyone else.

## What Changes

- Add a new `transformers` embedding provider that generates embeddings locally and offline, with no API key or network access required.
- Externalize the heavy `@huggingface/transformers` dependency into an **optional companion package**, `@arabold/docs-mcp-server-transformers`, kept in this repo as an npm workspace. The main server depends on it only as a wildcard `devDependency`, so a default end-user install does **not** pull in the ONNX runtime.
- Load the companion lazily via a dynamic import the first time a `transformers:` model is used. When the companion is not installed, surface an actionable error telling the user to install it; never fail at startup or for non-transformers users.
- Keep the integration code (the embeddings class, provider wiring, loader) in the main repo; the companion is a thin re-exporter only.
- Publish the companion in lockstep with the main package from the release pipeline (semantic-release and manual paths) so a version-compatible companion always exists on npm.
- Bundle the companion into the official Docker image so `transformers:` models work out of the box there, caching models under `TRANSFORMERS_CACHE=/models`.
- Fix the known vector dimension for `BAAI/bge-small-en-v1.5` from 512 to its actual 384.

## Capabilities

### New Capabilities
- `local-embeddings`: Offline, local embedding generation via the optional Transformers.js companion package — provider behavior, lazy companion loading, missing-companion handling, device/cache configuration, and how the companion is distributed (workspace, lockstep release, Docker).

### Modified Capabilities
- `embedding-resolution`: The supported-provider list and credential-validation rules change — `transformers` becomes a fully supported provider (not parse-only) that requires no credentials, and the `BAAI/bge-small-en-v1.5` known-dimension entry is corrected to 384.

## Impact

- **New package:** `packages/transformers/` (`@arabold/docs-mcp-server-transformers`), an npm workspace re-exporting `@huggingface/transformers`.
- **New source:** `src/store/embeddings/transformersLoader.ts`, `src/store/embeddings/TransformersJSEmbeddings.ts`.
- **Modified source:** `EmbeddingFactory.ts` and `EmbeddingConfig.ts` (provider plumbing + dimension fix).
- **Build/dist:** `vite.config.ts` externalizes the companion; root `package.json` gains `workspaces` and a wildcard companion `devDependency`; the `build` script compiles the companion first.
- **Release/CI:** `.releaserc.json` and `.github/workflows/release.yml` publish the companion in lockstep. Existing `ci.yml`/`eval.yml` need no change (npm workspaces + build cover it).
- **Docker:** `Dockerfile` copies the companion and adds the `/models` cache volume.
- **Docs:** `docs/guides/embedding-models.md`, `README.md`.
- **No breaking changes** for existing users: no new runtime dependency in the default install, no behavior change for non-`transformers` providers.
Loading
Loading