From 628bde3a083035f665cb06ce86c7852fc4c9ba63 Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Tue, 2 Jun 2026 15:54:40 -0700 Subject: [PATCH 1/6] docs(extraction): backport PR #2194 extraction doc fixes to main PR #2194 merged into 26.05 on 2026-06-02 but never reached main. This backport keeps main aligned with the release branch and the published docs.nvidia.com site after Randy's follow-up review. Timeline: - Friday: 26.05 docs built for docs.nvidia upload; branch differed from NRL GitHub Pages source and the uploaded docs were incorrect. - Saturday: diff main vs 26.05 produced PR #2179 to sync extraction docs. - Monday: PR #2179 merged and docs uploaded to the public site. - Follow-up: Randy opened PR #2194 on 26.05 with additional fixes found after the #2179 sync. Those fixes landed on 26.05 only. - This commit: cherry-pick of c5b257e4 onto main (five extraction doc files only). Changes from #2194: - Fix audio-video.md indented code block rendering - Restore custom-metadata example service variables and storage prose - Move caption scope admonition to multimodal-extraction.md - Trim redundant Helm/OCR deploy detail per review feedback - Restore FAQ Docker Compose note and support-matrix section anchors --- docs/docs/extraction/audio-video.md | 4 ++-- docs/docs/extraction/custom-metadata.md | 14 ++++++++++++-- docs/docs/extraction/faq.md | 4 ++-- docs/docs/extraction/multimodal-extraction.md | 17 ++++++++++++++--- .../extraction/prerequisites-support-matrix.md | 17 +++-------------- 5 files changed, 33 insertions(+), 23 deletions(-) diff --git a/docs/docs/extraction/audio-video.md b/docs/docs/extraction/audio-video.md index c9031ce413..6015b4f0ba 100644 --- a/docs/docs/extraction/audio-video.md +++ b/docs/docs/extraction/audio-video.md @@ -73,7 +73,7 @@ Use the following procedure to run the NIM on your own infrastructure. Self-host ```python from nemo_retriever import create_ingestor - from nemo_retriever.params.models import ASRParams + from nemo_retriever.common.params.models import ASRParams ingestor = ( create_ingestor(run_mode="batch") @@ -102,7 +102,7 @@ Instead of running the pipeline locally, you can call Parakeet through [build.nv ```python from nemo_retriever import create_ingestor - from nemo_retriever.params.models import ASRParams + from nemo_retriever.common.params.models import ASRParams ingestor = ( create_ingestor(run_mode="batch") diff --git a/docs/docs/extraction/custom-metadata.md b/docs/docs/extraction/custom-metadata.md index e449d46e8b..3d59a5f0d0 100644 --- a/docs/docs/extraction/custom-metadata.md +++ b/docs/docs/extraction/custom-metadata.md @@ -35,6 +35,10 @@ meta_df = pd.DataFrame( } ) +hostname = "localhost" +table_name = "nemo_retriever_collection" +lancedb_uri = "./lancedb_data" + ingestor = ( create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670") .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"]) @@ -59,8 +63,14 @@ results = ingestor.ingest_async().result() Set `hostname`, `table_name`, and a **remote** `lancedb_uri` (for example `s3://bucket/path`) to match your deployment—the retriever service rejects local filesystem paths. The client uploads in-memory sidecar metadata to the service before ingest; do not pass a raw local file path as `meta_dataframe` on the REST spec. For local LanceDB directories, use `run_mode="batch"` instead (refer to [Vector databases](vdbs.md)). For a step-by-step walkthrough with additional fields such as category, department, and timestamp, refer to [Vector DB operators and LanceDB — Metadata filtering](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering). +## How metadata is stored { #how-metadata-is-stored } + +During ingestion, each chunk's `content_metadata` is serialized as a **compact JSON string** (no spaces after `:` or `,`) in the LanceDB table's `metadata` column. Sidecar columns from `meta_dataframe`, `meta_source_field`, and `meta_fields` are merged into that JSON object before upload, so custom keys live in the same string—not separate columns. That is why `Retriever.query` filters often use `metadata LIKE '%\"key\":\"value\"%'`. For operator behavior and predicate examples, see [Vector DB operators and LanceDB — Metadata filtering](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering). + ## Best practices { #best-practices } +The following are the best practices when you work with custom metadata: + - Plan metadata structure before ingestion. - Test filter expressions with small datasets first. - Consider performance implications of complex filters. @@ -94,7 +104,7 @@ table = db.open_table("nemo_retriever_collection") **`Retriever.query` + `where`:** LanceDB applies the predicate before ranking. For post-filter logic in Python, use a wider `top_k` first. ```python -from nemo_retriever.retriever import Retriever +from nemo_retriever.graph.retriever import Retriever retriever = Retriever( vdb_kwargs={"uri": "./lancedb_data", "table_name": "nemo_retriever_collection"}, @@ -115,7 +125,7 @@ For a runnable end-to-end flow (ingest, `Retriever.query`, and both filter modes When you ingest through the **retriever service**, upload the sidecar with [`POST /v1/ingest/sidecar`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/routers/ingest.py#L1040-L1129) (multipart file; response [`SidecarUploadResponse`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/responses.py#L60-L68)), then pass the returned `sidecar_id` as `meta_dataframe_id` with `meta_source_field` and `meta_fields` in `pipeline.vdb_upload_params` on [`POST /v1/ingest`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/requests.py#L15-L32) ([`PipelineSpec`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/pipeline_spec.py#L55-L78)). Request and response shapes, form fields, and auth headers are in the service OpenAPI UI at `/docs` (or `/openapi.json`) on your retriever base URL (for example `http://localhost:7670/docs` after `retriever service start`). Do not send a raw local path as `meta_dataframe` on the service spec. -## How metadata is stored { #how-metadata-is-stored } +## Related content { #related-content } - [Vector databases](vdbs.md) — canonical LanceDB upload and retrieval guide - [nemo_retriever_retriever_query_metadata_filter.ipynb](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb) — runnable notebook for sidecar metadata at ingest and filtered `Retriever.query` diff --git a/docs/docs/extraction/faq.md b/docs/docs/extraction/faq.md index 6b3010db52..8e8a965caf 100644 --- a/docs/docs/extraction/faq.md +++ b/docs/docs/extraction/faq.md @@ -25,7 +25,7 @@ For chart-labeled PDF regions and other caption scope limits, see [Are PDF chart ## Are PDF chart or figure regions captioned when Omni is enabled? -No. Chart-labeled PDF regions are not routed through Omni captioning. See [Image captioning](prerequisites-support-matrix.md#image-captioning-2605) for scope, validation, and what the caption stage covers. +No. Chart-labeled PDF regions are not routed through Omni captioning. See [Charts and infographics](multimodal-extraction.md#charts-and-infographics) and [Image captioning](multimodal-extraction.md#image-captioning) for scope, validation, and what the caption stage covers. ## When should I consider advanced visual parsing? @@ -40,7 +40,7 @@ For more information, refer to [Nemotron Parse](https://build.nvidia.com/nvidia/ For [self-hosted deployments](deployment-options.md#when-to-self-host-nims), you should set the environment variables `NGC_API_KEY` and `NIM_NGC_API_KEY`. For more information, refer to [Authentication and API keys](api-keys.md). -For advanced scenarios, you might want to set environment variables for NIM container paths, tags, and batch sizes on the ingestion runtime. Configure them in your Helm values, Kubernetes `Secret`/`ConfigMap`, or follow [Environment variables](environment-config.md). +For advanced scenarios, you might want to set environment variables for NIM container paths, tags, and batch sizes on the ingestion runtime. Configure them in your Helm values, Kubernetes `Secret`/`ConfigMap`, or follow [Environment variables](environment-config.md). If you use **Docker Compose** locally for experiments only, see the unsupported developer page [docker.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/docker.md) — **not** a supported deployment substitute for Helm. ### Library Mode diff --git a/docs/docs/extraction/multimodal-extraction.md b/docs/docs/extraction/multimodal-extraction.md index 5e6a5a4fb5..3b75d53bb2 100644 --- a/docs/docs/extraction/multimodal-extraction.md +++ b/docs/docs/extraction/multimodal-extraction.md @@ -49,7 +49,7 @@ NeMo Retriever Library detects tables as structured page elements, processes the Charts and infographic regions are classified with other page layout elements (tables, text blocks, titles) and processed through layout detection and OCR. `extract_charts` and `extract_infographics` are enabled by default. Outputs use the same metadata schema as other extracted objects. -Chart-labeled PDF regions are **not** routed through the Omni caption stage; they remain on the layout-and-OCR path. For scope and validation guidance, see [Image captioning](prerequisites-support-matrix.md#image-captioning-2605). +Chart-labeled PDF regions are **not** routed through the Omni caption stage; they remain on the layout-and-OCR path. For caption scope and validation, see [Image captioning](#image-captioning). For natural-language infographic descriptions, optionally enable [image captioning](#image-captioning) and set `caption_infographics=True` when you need VLM captions on infographic regions. @@ -63,7 +63,7 @@ For natural-language infographic descriptions, optionally enable [image captioni Scanned PDFs and image-only pages rely on OCR and hybrid paths that combine native text extraction with OCR when needed. For extract methods such as `ocr` and `pdfium_hybrid`, refer to the [Python API reference](nemo-retriever-api-reference.md). -OCR artifacts depend on how you deploy. **Helm / NIM:** the production chart uses **Nemotron OCR v1** (`nvcr.io/nim/nvidia/nemotron-ocr-v1:1.3.0`). **Local Hugging Face inference:** the default engine is **Nemotron OCR v2**, which operates in **multilingual** mode by default. For CLI flags and API parameters, see [Nemotron OCR v2 — language mode](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/docs/cli/README.md#nemotron-ocr-v2-language-mode). For Kubernetes defaults and the Helm-vs-local split, see [OCR artifacts (Helm vs local Hugging Face)](prerequisites-support-matrix.md#nemotron-ocr-v2-language-mode) in the support matrix. +When you run extraction locally with Hugging Face weights, the default OCR engine is **Nemotron OCR v2**, which operates in **multilingual** mode by default. For CLI flags and API parameters, see [Nemotron OCR v2 — language mode](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/docs/cli/README.md#nemotron-ocr-v2-language-mode). For Kubernetes deployment, see [OCR NIM configuration](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md#ocr-nim-configuration) in the Helm chart README. **Related** @@ -77,11 +77,22 @@ Image captioning generates natural-language descriptions for unstructured image **Captioning is optional** — enable it in your ingest configuration (for example, the `caption` API or pipeline flag) when you need natural-language descriptions of image content. Reasoning traces are disabled by default for captioning. +!!! important "PDF chart regions are not captioned by Omni" + + When **nemotron-page-elements-v3** classifies a PDF region as **chart**, that region is processed through layout detection and OCR—not the Omni caption stage. Enabling the caption NIM and the `caption` pipeline stage does **not** send chart-labeled figures to `/v1/chat/completions`. + + The caption stage covers: + + - Unstructured content in the `images` column (standalone image files and page-element regions **not** classified as table, chart, or infographic) + - Optional infographic regions when you set `caption_infographics=True` on `CaptionParams` (the VLM caption is stored in `caption`, separate from OCR `text`) + + To validate caption traffic during ingest, inspect metadata such as `page_elements_v3_counts_by_label`. If the figure is labeled `chart`, expect no Omni chat-completions requests for that region even when captioning is enabled. + **Related** - [Multimodal embeddings (VLM)](embedding.md) - [Metadata reference](content-metadata.md) -- [Image captioning](prerequisites-support-matrix.md#image-captioning-2605) +- [Image captioning — NIM and hardware](prerequisites-support-matrix.md#image-captioning-2605) ## Metadata and content schema { #metadata-and-content-schema } diff --git a/docs/docs/extraction/prerequisites-support-matrix.md b/docs/docs/extraction/prerequisites-support-matrix.md index b8910f2ea8..9f97db16f6 100644 --- a/docs/docs/extraction/prerequisites-support-matrix.md +++ b/docs/docs/extraction/prerequisites-support-matrix.md @@ -2,7 +2,7 @@ Before you begin using [NeMo Retriever Library](overview.md), confirm your software stack, deployment hardware, and—if you use them—advanced features (audio and video, Nemotron Parse, VLM image captioning, reranking) against the guidance in this page. -## Software Requirements +## Software Requirements { #software-requirements } - Linux operating systems (Ubuntu 22.04 or later recommended) - [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) (NVIDIA Driver >= `580`, CUDA >= `13.0`) @@ -64,7 +64,7 @@ Ensure your deployment environment meets these specifications before running the The NeMo Retriever Library extraction core pipeline features run on a single A10G or better GPU. -### Default Helm NIMs +### Default Helm NIMs { #default-helm-nims } The production Helm chart enables these NIM microservices **by default** (for example via `nimOperator.*.enabled=true`): @@ -107,22 +107,11 @@ These NIM microservices are **optional** for the default extraction pipeline. Th For 26.05, use **`nemotron_3_nano_omni_30b_a3b_reasoning`** when you enable the caption stage (hosted model ID `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning`). The Helm key is in the [optional NIMs](#optional-helm-nims-not-auto-wired-by-default) table above. -!!! important "PDF chart regions are not captioned by Omni" - - When **nemotron-page-elements-v3** classifies a PDF region as **chart**, that region is processed through layout detection and OCR—not the Omni caption stage. Enabling the caption NIM and the `caption` pipeline stage does **not** send chart-labeled figures to `/v1/chat/completions`. - - The caption stage covers: - - - Unstructured content in the `images` column (standalone image files and page-element regions **not** classified as table, chart, or infographic) - - Optional infographic regions when you set `caption_infographics=True` on `CaptionParams` (the VLM caption is stored in `caption`, separate from OCR `text`) - - To validate caption traffic during ingest, inspect metadata such as `page_elements_v3_counts_by_label`. If the figure is labeled `chart`, expect no Omni chat-completions requests for that region even when captioning is enabled. - Optional features listed in the table above require additional GPU support, disk space, and feature-specific system dependencies beyond the four default NIMs. For published NIM model IDs and deployment-specific constraints, use the product support matrices linked under [Related Topics](#related-topics) below. -## Model Hardware Requirements +## Model Hardware Requirements { #model-hardware-requirements } NeMo Retriever Library supports the following GPU hardware given system constraints in the table. From b93b76aff809c887e130b0aefb29ff8effa45b59 Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Fri, 5 Jun 2026 10:52:05 -0700 Subject: [PATCH 2/6] =?UTF-8?q?docs(extraction):=20address=20PR=20#2203=20?= =?UTF-8?q?review=20=E2=80=94=20drop=20custom-metadata=20page,=20remove=20?= =?UTF-8?q?chart=20admonition?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Remove custom-metadata.md in favor of vdbs.md#metadata-and-filtering and the metadata filtering notebook. Drop the PDF chart caption admonition from multimodal-extraction.md per review feedback. --- docs/docs/extraction/custom-metadata.md | 131 ------------------ ...egrations-langchain-llamaindex-haystack.md | 2 +- docs/docs/extraction/multimodal-extraction.md | 11 -- docs/docs/extraction/vdbs.md | 11 +- .../extraction/workflow-agentic-retrieval.md | 2 +- docs/mkdocs.yml | 15 +- .../tests/test_src_documentation_snippets.py | 1 - 7 files changed, 15 insertions(+), 158 deletions(-) delete mode 100644 docs/docs/extraction/custom-metadata.md diff --git a/docs/docs/extraction/custom-metadata.md b/docs/docs/extraction/custom-metadata.md deleted file mode 100644 index 3d59a5f0d0..0000000000 --- a/docs/docs/extraction/custom-metadata.md +++ /dev/null @@ -1,131 +0,0 @@ -# Custom metadata and filtering - -Use this documentation to attach per-document metadata during ingestion and to narrow [LanceDB](vdbs.md) search results in [NeMo Retriever Library](overview.md). Implementation details live in the package [Vector DB operators and LanceDB](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering) README. - -## On this page { #on-this-page } - -- [Attach metadata at ingestion](#attach-metadata-at-ingestion) -- [Best practices](#best-practices) -- [Filter results during retrieval](#filter-results-during-retrieval) -- [How metadata is stored](#how-metadata-is-stored) - -## Attach metadata at ingestion { #attach-metadata-at-ingestion } - -Pass a **sidecar metadata table** on `vdb_upload` so selected columns are merged into each chunk's `content_metadata` before LanceDB upload. All three parameters must be set together: - -| Parameter | Purpose | -|-----------|---------| -| `meta_dataframe` | Path to CSV, JSON, or Parquet, or an in-memory `pandas.DataFrame` | -| `meta_source_field` | Column that identifies each document (must match ingest paths or basenames per `meta_join_key`) | -| `meta_fields` | Non-empty list of column names to copy into `content_metadata` | - -Optional `meta_join_key` controls how rows are matched to documents: `auto` (try full path then basename), `source_id` (full path), or `source_name` (basename only). - -For parameter details, refer to the [Python API guide](nemo-retriever-api-reference.md). - -```python -import pandas as pd -from nemo_retriever import create_ingestor - -meta_df = pd.DataFrame( - { - "source": ["data/woods_frost.pdf", "data/multimodal_test.pdf"], - "meta_a": ["alpha", "bravo"], - "meta_b": [10, 20], - } -) - -hostname = "localhost" -table_name = "nemo_retriever_collection" -lancedb_uri = "./lancedb_data" - -ingestor = ( - create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670") - .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"]) - .extract( - extract_text=True, - extract_tables=True, - extract_charts=True, - extract_images=True, - text_depth="page" - ) - .embed() - .vdb_upload( - vdb_op="lancedb", - vdb_kwargs={"lancedb_uri": lancedb_uri, "table_name": table_name}, - meta_dataframe=meta_df, - meta_source_field="source", - meta_fields=["meta_a", "meta_b"], - ) -) -results = ingestor.ingest_async().result() -``` - -Set `hostname`, `table_name`, and a **remote** `lancedb_uri` (for example `s3://bucket/path`) to match your deployment—the retriever service rejects local filesystem paths. The client uploads in-memory sidecar metadata to the service before ingest; do not pass a raw local file path as `meta_dataframe` on the REST spec. For local LanceDB directories, use `run_mode="batch"` instead (refer to [Vector databases](vdbs.md)). For a step-by-step walkthrough with additional fields such as category, department, and timestamp, refer to [Vector DB operators and LanceDB — Metadata filtering](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering). - -## How metadata is stored { #how-metadata-is-stored } - -During ingestion, each chunk's `content_metadata` is serialized as a **compact JSON string** (no spaces after `:` or `,`) in the LanceDB table's `metadata` column. Sidecar columns from `meta_dataframe`, `meta_source_field`, and `meta_fields` are merged into that JSON object before upload, so custom keys live in the same string—not separate columns. That is why `Retriever.query` filters often use `metadata LIKE '%\"key\":\"value\"%'`. For operator behavior and predicate examples, see [Vector DB operators and LanceDB — Metadata filtering](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering). - -## Best practices { #best-practices } - -The following are the best practices when you work with custom metadata: - -- Plan metadata structure before ingestion. -- Test filter expressions with small datasets first. -- Consider performance implications of complex filters. -- Validate metadata during ingestion. -- Handle missing metadata fields gracefully. -- Log invalid filter expressions. - -## Filter results during retrieval { #filter-results-during-retrieval } - -You can use custom metadata to filter documents during retrieval operations. For **predicate pushdown**, pass a `where` SQL predicate through [`Retriever.query`](nemo-retriever-api-reference.md) (refer to [Vector databases](vdbs.md)) or chain `.where(...)` on a native LanceDB `table.search(...)` query. Application-side filtering on returned hits does not change what the database evaluates—raise `top_k` if matches might sit outside the first neighbors. - -### Example filter ideas - -Typical keys to filter on include `category`, `department`, `priority`, and `timestamp` (use comparable ISO-8601 strings for time ranges). Encode predicates in LanceDB SQL against your table columns (often the serialized `metadata` string), or inspect parsed hit metadata after search as in the example below. - -### Example: Use a Filter Expression in Search - -After ingestion is complete and documents are uploaded to LanceDB with metadata, you can narrow results in the database with a **`where`** clause, or in Python on the returned hits. - -**Native LanceDB (SQL pushdown):** connect, embed the query yourself (same model as ingestion), then chain `.where("")` on `table.search(...)` so filtering happens before the `limit`. Exact SQL depends on how `metadata` is stored; refer to [LanceDB metadata filtering](https://docs.lancedb.com/search/filtering#filtering-with-sql). - -```python -import lancedb - -# pseudocode — replace YOUR_VECTOR and YOUR_PREDICATE with real values. -db = lancedb.connect("./lancedb_data") -table = db.open_table("nemo_retriever_collection") -# table.search(YOUR_VECTOR, vector_column_name="vector").where(YOUR_PREDICATE).limit(10).to_list() -``` - -**`Retriever.query` + `where`:** LanceDB applies the predicate before ranking. For post-filter logic in Python, use a wider `top_k` first. - -```python -from nemo_retriever.graph.retriever import Retriever - -retriever = Retriever( - vdb_kwargs={"uri": "./lancedb_data", "table_name": "nemo_retriever_collection"}, - embed_kwargs={ - "model_name": "nvidia/llama-nemotron-embed-1b-v2", - "embed_model_name": "nvidia/llama-nemotron-embed-1b-v2", - }, -) - -hits = retriever.query( - "this is expensive", - top_k=16, - vdb_kwargs={"where": "metadata LIKE '%\"department\":\"Engineering\"%'"}, -) -``` - -For a runnable end-to-end flow (ingest, `Retriever.query`, and both filter modes), refer to [nemo_retriever_retriever_query_metadata_filter.ipynb](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb). - -When you ingest through the **retriever service**, upload the sidecar with [`POST /v1/ingest/sidecar`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/routers/ingest.py#L1040-L1129) (multipart file; response [`SidecarUploadResponse`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/responses.py#L60-L68)), then pass the returned `sidecar_id` as `meta_dataframe_id` with `meta_source_field` and `meta_fields` in `pipeline.vdb_upload_params` on [`POST /v1/ingest`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/requests.py#L15-L32) ([`PipelineSpec`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/src/nemo_retriever/service/models/pipeline_spec.py#L55-L78)). Request and response shapes, form fields, and auth headers are in the service OpenAPI UI at `/docs` (or `/openapi.json`) on your retriever base URL (for example `http://localhost:7670/docs` after `retriever service start`). Do not send a raw local path as `meta_dataframe` on the service spec. - -## Related content { #related-content } - -- [Vector databases](vdbs.md) — canonical LanceDB upload and retrieval guide -- [nemo_retriever_retriever_query_metadata_filter.ipynb](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb) — runnable notebook for sidecar metadata at ingest and filtered `Retriever.query` diff --git a/docs/docs/extraction/integrations-langchain-llamaindex-haystack.md b/docs/docs/extraction/integrations-langchain-llamaindex-haystack.md index 7ee0dda650..a71d24c0dd 100644 --- a/docs/docs/extraction/integrations-langchain-llamaindex-haystack.md +++ b/docs/docs/extraction/integrations-langchain-llamaindex-haystack.md @@ -19,4 +19,4 @@ Haystack-related extraction modes may appear in API tables as **deprecated** in - [Use the Python API](nemo-retriever-api-reference.md) - [Use the CLI](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli) -- [Chunking](concepts.md#chunking), [Upload data](vdbs.md), [Filter search](custom-metadata.md) +- [Chunking](concepts.md#chunking), [Upload data](vdbs.md), [Filter search](vdbs.md#metadata-and-filtering) diff --git a/docs/docs/extraction/multimodal-extraction.md b/docs/docs/extraction/multimodal-extraction.md index 3b75d53bb2..0f73ce70d2 100644 --- a/docs/docs/extraction/multimodal-extraction.md +++ b/docs/docs/extraction/multimodal-extraction.md @@ -77,17 +77,6 @@ Image captioning generates natural-language descriptions for unstructured image **Captioning is optional** — enable it in your ingest configuration (for example, the `caption` API or pipeline flag) when you need natural-language descriptions of image content. Reasoning traces are disabled by default for captioning. -!!! important "PDF chart regions are not captioned by Omni" - - When **nemotron-page-elements-v3** classifies a PDF region as **chart**, that region is processed through layout detection and OCR—not the Omni caption stage. Enabling the caption NIM and the `caption` pipeline stage does **not** send chart-labeled figures to `/v1/chat/completions`. - - The caption stage covers: - - - Unstructured content in the `images` column (standalone image files and page-element regions **not** classified as table, chart, or infographic) - - Optional infographic regions when you set `caption_infographics=True` on `CaptionParams` (the VLM caption is stored in `caption`, separate from OCR `text`) - - To validate caption traffic during ingest, inspect metadata such as `page_elements_v3_counts_by_label`. If the figure is labeled `chart`, expect no Omni chat-completions requests for that region even when captioning is enabled. - **Related** - [Multimodal embeddings (VLM)](embedding.md) diff --git a/docs/docs/extraction/vdbs.md b/docs/docs/extraction/vdbs.md index 7fb85b948f..c74823db2a 100644 --- a/docs/docs/extraction/vdbs.md +++ b/docs/docs/extraction/vdbs.md @@ -91,10 +91,11 @@ Semantic retrieval uses dense embeddings to find content that is similar in mean ## Metadata and filtering { #metadata-and-filtering } -This page covers LanceDB upload and retrieval. **Metadata is not duplicated here.** +Attach sidecar metadata at ingest with `meta_dataframe`, `meta_source_field`, and `meta_fields` on `vdb_upload`. Values merge into each chunk's `content_metadata` and are stored as compact JSON in LanceDB. Narrow results at query time with server-side `where` on [`Retriever.query`](nemo-retriever-api-reference.md) or client-side `filter_hits_by_content_metadata`. -- **Published guide** — [Custom metadata and filtering](custom-metadata.md) (sidecar `meta_*` on `vdb_upload`, compact JSON in LanceDB, server-side `where` on `Retriever.query`, and client-side `filter_hits_by_content_metadata`). -- **Canonical reference** — [Vector DB operators and LanceDB — Metadata filtering](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering) in `nemo_retriever/src/nemo_retriever/vdb/README.md` (operator behavior and examples). +- [Metadata filtering notebook](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb) — end-to-end ingest, `Retriever.query`, and both filter modes +- [Sidecar metadata ingest](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/metadata_and_filtered_search.ipynb) — CLI and graph workflow +- [VDB README (metadata filtering)](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering) — operator behavior, SQL predicates, and examples ## LanceDB deployment characteristics { #lancedb-deployment-characteristics } @@ -144,7 +145,7 @@ Testing and release cadence for these integrations follow the owning project (RA ### More information (embeddings & custom `VDB`) { #vector-database-partners-more-info } -- [Custom metadata and filtering](custom-metadata.md) and the package [VDB README (metadata filtering)](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering) +- [Metadata filtering notebook](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb) and the package [VDB README (metadata filtering)](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb#metadata-filtering) - [Multimodal embeddings (VLM)](embedding.md) - [NeMo Retriever Text Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html) - [NVIDIA NIM catalog](https://build.nvidia.com/) for embedding and retrieval-related NIMs @@ -157,7 +158,7 @@ To implement a custom operator, follow the `VDB` abstract interface described in ## Related Topics { #related-topics } -- [Custom metadata and filtering](custom-metadata.md) +- [Metadata filtering: add sidecar metadata and filter searches](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb) - [Vector DB operators and LanceDB (source)](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/vdb) - [Use the NeMo Retriever Library Python API](nemo-retriever-api-reference.md) - [Store Extracted Images](nemo-retriever-api-reference.md) diff --git a/docs/docs/extraction/workflow-agentic-retrieval.md b/docs/docs/extraction/workflow-agentic-retrieval.md index 78b48cdc47..ea8a83c17d 100644 --- a/docs/docs/extraction/workflow-agentic-retrieval.md +++ b/docs/docs/extraction/workflow-agentic-retrieval.md @@ -8,7 +8,7 @@ NeMo Retriever Library provides ingestion, embedding, storage, and retrieval bui Use these pages together with your orchestration layer: -- [Semantic retrieval](vdbs.md#semantic-retrieval), [Custom metadata and filtering](custom-metadata.md), and [Evaluate on your data](evaluate-on-your-data.md) for retrieval quality and reranking notes +- [Semantic retrieval](vdbs.md#semantic-retrieval), [Metadata and filtering](vdbs.md#metadata-and-filtering), and [Evaluate on your data](evaluate-on-your-data.md) for retrieval quality and reranking notes - [Agentic retrieval (concept)](agentic-retrieval-concept.md) - [Evaluate on your data](evaluate-on-your-data.md), which includes retrieval evaluation guidance - [Release notes](releasenotes.md), which may mention agentic retrieval updates diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 6300e6c41a..3f72ea764f 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -95,26 +95,24 @@ nav: # Single vector-DB page (vdbs.md). Deep links: in-page "On this page" TOC and redirects # (for example extraction/vector-db-partners.md → vdbs.md#vector-database-partners). - "Vector databases": extraction/vdbs.md - - "7. Retrieval & ranking": - - "Custom metadata and filtering": extraction/custom-metadata.md - - "8. Deployment & operations": + - "7. Deployment & operations": - "Ray and distributed ingest": extraction/ray-logging.md - - "9. Customize & extend": + - "8. Customize & extend": - Extending/Customizing NeMo Retriever Library with custom code: https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/src/nemo_retriever/graph#nemo-retriever-graph - "NimClient and custom NIM endpoints": extraction/nimclient.md - - "10. Integrations & ecosystem": + - "9. Integrations & ecosystem": - "Framework integrations": extraction/integrations-langchain-llamaindex-haystack.md - "Starter kits": extraction/notebooks/index.md - - "11. Evaluation & benchmarks": + - "10. Evaluation & benchmarks": - "Evaluate on your own documents": extraction/evaluate-on-your-data.md - - "12. Reference": + - "11. Reference": - "API guide": extraction/nemo-retriever-api-reference.md # TODO: after nv-ingest code removal, update this link when CLI docs are relocated. - "CLI reference": https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli - "Quickstart: retriever CLI": reference/retriever-cli-quickstart.md - Environment variables: extraction/environment-config.md - "Metadata reference": extraction/content-metadata.md - - "13. Support & community": + - "12. Support & community": - Troubleshooting: extraction/troubleshoot.md - FAQ: extraction/faq.md - Contributing: extraction/contributing.md @@ -161,6 +159,7 @@ plugins: extraction/ngc-api-key.md: extraction/api-keys.md extraction/notebooks.md: extraction/notebooks/index.md extraction/data-store.md: extraction/vdbs.md + extraction/custom-metadata.md: extraction/vdbs.md#metadata-and-filtering extraction/nemoretriever-parse.md: extraction/multimodal-extraction.md#text-and-layout-extraction extraction/supported-file-types.md: extraction/multimodal-extraction.md#supported-file-types-and-formats extraction/text-layout-extraction.md: extraction/multimodal-extraction.md#text-and-layout-extraction diff --git a/nemo_retriever/tests/test_src_documentation_snippets.py b/nemo_retriever/tests/test_src_documentation_snippets.py index 9193167e06..cbd843f502 100644 --- a/nemo_retriever/tests/test_src_documentation_snippets.py +++ b/nemo_retriever/tests/test_src_documentation_snippets.py @@ -52,7 +52,6 @@ def _iter_markdown_python_blocks() -> list[tuple[str, str]]: _MD_BLOCKS = _iter_markdown_python_blocks() _PUBLIC_RETRIEVER_DOCS = ( "README.md", - "docs/docs/extraction/custom-metadata.md", "examples/nemo_retriever_retriever_query_metadata_filter.ipynb", "nemo_retriever/README.md", "nemo_retriever/docs/cli/README.md", From 56fa45c2f2e4f2e974835a4c671564e492e5174b Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Wed, 10 Jun 2026 11:48:32 -0700 Subject: [PATCH 3/6] docs(extraction): drop 26.05 from image-captioning anchor on main Rename the support-matrix caption section for main and keep a legacy #image-captioning-2605 alias so existing deep links keep working. --- docs/docs/extraction/multimodal-extraction.md | 2 +- docs/docs/extraction/prerequisites-support-matrix.md | 8 +++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/docs/extraction/multimodal-extraction.md b/docs/docs/extraction/multimodal-extraction.md index 0f73ce70d2..0a38261bc6 100644 --- a/docs/docs/extraction/multimodal-extraction.md +++ b/docs/docs/extraction/multimodal-extraction.md @@ -81,7 +81,7 @@ Image captioning generates natural-language descriptions for unstructured image - [Multimodal embeddings (VLM)](embedding.md) - [Metadata reference](content-metadata.md) -- [Image captioning — NIM and hardware](prerequisites-support-matrix.md#image-captioning-2605) +- [Image captioning — NIM and hardware](prerequisites-support-matrix.md#image-captioning-nim-hardware) ## Metadata and content schema { #metadata-and-content-schema } diff --git a/docs/docs/extraction/prerequisites-support-matrix.md b/docs/docs/extraction/prerequisites-support-matrix.md index 9f97db16f6..009b7a2fd7 100644 --- a/docs/docs/extraction/prerequisites-support-matrix.md +++ b/docs/docs/extraction/prerequisites-support-matrix.md @@ -100,12 +100,14 @@ These NIM microservices are **optional** for the default extraction pipeline. Th |-----------|-----|------| | `rerankqa` | [llama-nemotron-rerank-vl-1b-v2](https://huggingface.co/nvidia/llama-nemotron-rerank-vl-1b-v2) | Reranking for improved retrieval accuracy | | `nemotron_parse` | [nemotron-parse](https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.2) | Optional PDF `extract_method="nemotron_parse"` (default PDF extraction uses **pdfium**) | -| `nemotron_3_nano_omni_30b_a3b_reasoning` | [nemotron-3-nano-omni-30b-a3b-reasoning](https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16) | Supported image captioning for 26.05 when you enable the caption stage | +| `nemotron_3_nano_omni_30b_a3b_reasoning` | [nemotron-3-nano-omni-30b-a3b-reasoning](https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16) | Supported image captioning when you enable the caption stage | | `audio` | [parakeet-1-1b-ctc-en-us](https://huggingface.co/nvidia/parakeet-ctc-1.1b) | [Audio and video](audio-video.md) transcription | -### Image captioning (26.05) { #image-captioning-2605 } +### Image captioning { #image-captioning-nim-hardware } -For 26.05, use **`nemotron_3_nano_omni_30b_a3b_reasoning`** when you enable the caption stage (hosted model ID `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning`). The Helm key is in the [optional NIMs](#optional-helm-nims-not-auto-wired-by-default) table above. + + +Use **`nemotron_3_nano_omni_30b_a3b_reasoning`** when you enable the caption stage (hosted model ID `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning`). The Helm key is in the [optional NIMs](#optional-helm-nims-not-auto-wired-by-default) table above. Optional features listed in the table above require additional GPU support, disk space, and feature-specific system dependencies beyond the four default NIMs. From 50d8501322d8bc8d6030eaf2df552ddadfcf7e4a Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Mon, 15 Jun 2026 11:40:10 -0700 Subject: [PATCH 4/6] docs(extraction): fix caption anchor and add scope prose for PR #2203 Point deployment-options.md at #image-captioning-nim-hardware. Add one sentence under multimodal-extraction #image-captioning so FAQ cross-references have scope detail without restoring the admonition. --- docs/docs/extraction/deployment-options.md | 2 +- docs/docs/extraction/multimodal-extraction.md | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/docs/extraction/deployment-options.md b/docs/docs/extraction/deployment-options.md index 71f0b75109..cb80472c80 100644 --- a/docs/docs/extraction/deployment-options.md +++ b/docs/docs/extraction/deployment-options.md @@ -83,7 +83,7 @@ On a staging host with internet access, pull from NGC, retag to your private reg [Audio and video](audio-video.md) need **`ffmpeg` and `ffprobe` on `PATH`**. The bundled image omits them. Do **not** use `service.installFfmpeg=true` in an air gap (startup install needs package-repo egress). Build a custom service image on a connected staging host, mirror it, and set `service.image.repository` / `service.image.tag`. Skip this step if you do not use audio/video. -For offline image captioning, deploy the in-cluster [Nemotron 3 Nano Omni](prerequisites-support-matrix.md#image-captioning-2605) NIM and point your pipeline caption endpoint at the in-cluster HTTP URL instead of `integrate.api.nvidia.com` or other hosted APIs. +For offline image captioning, deploy the in-cluster [Nemotron 3 Nano Omni](prerequisites-support-matrix.md#image-captioning-nim-hardware) NIM and point your pipeline caption endpoint at the in-cluster HTTP URL instead of `integrate.api.nvidia.com` or other hosted APIs. **Related** diff --git a/docs/docs/extraction/multimodal-extraction.md b/docs/docs/extraction/multimodal-extraction.md index 0a38261bc6..c75d875a47 100644 --- a/docs/docs/extraction/multimodal-extraction.md +++ b/docs/docs/extraction/multimodal-extraction.md @@ -77,6 +77,8 @@ Image captioning generates natural-language descriptions for unstructured image **Captioning is optional** — enable it in your ingest configuration (for example, the `caption` API or pipeline flag) when you need natural-language descriptions of image content. Reasoning traces are disabled by default for captioning. +Chart-classified PDF regions stay on the layout/OCR path; only non-chart image regions and optional infographics (`caption_infographics=True`) receive Omni captions. + **Related** - [Multimodal embeddings (VLM)](embedding.md) From 3f5fe71282214807f39c0d66e56285172552e055 Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Tue, 16 Jun 2026 14:13:50 -0700 Subject: [PATCH 5/6] docs: drop 26.05 labels from minimal-install links on main Rename the Helm README minimal-install section, keep a legacy #recommended-minimal-install-2605 alias, point extraction docs at blob/main, and use version-neutral prose in chart default notes. --- docs/docs/extraction/deployment-options.md | 2 +- .../prerequisites-support-matrix.md | 2 +- nemo_retriever/helm/README.md | 24 ++++++++++--------- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/docs/docs/extraction/deployment-options.md b/docs/docs/extraction/deployment-options.md index cb80472c80..6b5292f1f3 100644 --- a/docs/docs/extraction/deployment-options.md +++ b/docs/docs/extraction/deployment-options.md @@ -22,7 +22,7 @@ Build and run the NeMo Retriever service image with the [Docker service image gu 3. **Published Library Helm charts (supported):** cluster install and upgrade procedures are covered in the [NeMo Retriever Library](https://docs.nvidia.com/nemo/retriever/latest/extraction/overview/) — use alongside the NeMo Retriever chart README for your release 4. [Environment variables](environment-config.md) and [Troubleshoot](troubleshoot.md) as needed -**Core NIMs for the default extraction pipeline** (26.05): `page_elements`, `table_structure`, `ocr`, and `vlm_embed` (`llama-nemotron-embed-vl-1b-v2:1.12.0`). These four are auto-wired into the retriever service. **Nemotron Parse**, **Nemotron 3 Nano Omni**, the **VL reranker**, and **Parakeet ASR** are optional and not auto-wired. For a minimal GPU footprint, disable optional keys you do not need (see [Recommended minimal install (26.05)](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). See [Pre-Requisites & Support Matrix — Default Helm NIMs](prerequisites-support-matrix.md#default-helm-nims). +**Core NIMs for the default extraction pipeline:** `page_elements`, `table_structure`, `ocr`, and `vlm_embed` (`llama-nemotron-embed-vl-1b-v2:1.12.0`). These four are auto-wired into the retriever service. **Nemotron Parse**, **Nemotron 3 Nano Omni**, the **VL reranker**, and **Parakeet ASR** are optional and not auto-wired. For a minimal GPU footprint, disable optional keys you do not need (refer to [Recommended minimal install](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). Refer to [Pre-Requisites & Support Matrix — Default Helm NIMs](prerequisites-support-matrix.md#default-helm-nims). For audio and video extraction in Kubernetes, set `service.installFfmpeg=true` diff --git a/docs/docs/extraction/prerequisites-support-matrix.md b/docs/docs/extraction/prerequisites-support-matrix.md index a8b30c5b6f..c203914da6 100644 --- a/docs/docs/extraction/prerequisites-support-matrix.md +++ b/docs/docs/extraction/prerequisites-support-matrix.md @@ -94,7 +94,7 @@ Default VL embedder container and model for release deployments: ### Optional Helm NIMs (not auto-wired) { #optional-helm-nims-not-auto-wired-by-default } -These NIM microservices are **optional** for the default extraction pipeline. The retriever service does **not** call them until you enable the matching pipeline stage (reranker, Nemotron Parse, caption, or audio). For **26.05 production**, disable keys you do not need (see [Recommended minimal install (26.05)](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). Set `nimOperator..enabled=true` when you want that NIM reconciled. Chart keys are in the [NeMo Retriever Helm chart README](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md#nim-operator-sub-stack). +These NIM microservices are **optional** for the default extraction pipeline. The retriever service does **not** call them until you enable the matching pipeline stage (reranker, Nemotron Parse, caption, or audio). For production deployments, disable keys you do not need (refer to [Recommended minimal install](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). Set `nimOperator..enabled=true` when you want that NIM reconciled. Chart keys are in the [NeMo Retriever Helm chart README](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md#nim-operator-sub-stack). | Helm flag | NIM | Role | |-----------|-----|------| diff --git a/nemo_retriever/helm/README.md b/nemo_retriever/helm/README.md index 41204a3d52..6ad00e6ba9 100644 --- a/nemo_retriever/helm/README.md +++ b/nemo_retriever/helm/README.md @@ -188,7 +188,7 @@ NIM (the VL reranker `rerankqa`, Nemotron Parse, Omni 30B, and the Parakeet `audio` ASR NIM) is **disabled by default** to honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md); -see [Recommended minimal install (26.05)](#recommended-minimal-install-2605) +refer to [Recommended minimal install](#recommended-minimal-install-2605) for the opt-in `--set` flags that turn any of them on. ```bash @@ -199,7 +199,9 @@ helm install retriever ./nemo_retriever/helm \ --set ngcApiSecret.password=$NGC_API_KEY ``` -### Recommended minimal install (26.05) { #recommended-minimal-install-2605 } +### Recommended minimal install { #recommended-minimal-install } + + Deploy only the four core NIMs that the retriever service auto-wires (`page_elements`, `table_structure`, `ocr`, `vlm_embed`): @@ -211,14 +213,14 @@ helm install retriever ./nemo_retriever/helm \ --set ngcApiSecret.password=$NGC_API_KEY ``` -> The VL reranker (`rerankqa`), Nemotron Parse, the Nemotron 3 Nano Omni 30B caption NIM, and the Parakeet `audio` ASR NIM are **all off by default** in 26.05 — they only reconcile when you explicitly opt in. Opt-in flags: +> The VL reranker (`rerankqa`), Nemotron Parse, the Nemotron 3 Nano Omni 30B caption NIM, and the Parakeet `audio` ASR NIM are **all off by default** — they only reconcile when you explicitly opt in. Opt-in flags: > > * VL reranker — `--set nimOperator.rerankqa.enabled=true` > * Nemotron Parse — `--set nimOperator.nemotron_parse.enabled=true` > * Omni 30B captioner — `--set nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled=true` > * Parakeet ASR — `--set nimOperator.audio.enabled=true` (also set `serviceConfig.nimEndpoints.audioGrpcEndpoint=audio:50051` to wire ASR into the service, plus `service.installFfmpeg=true` if your image does not bundle ffmpeg) > -> This matches the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and avoids silently pulling ≈ 62 GiB of Omni weights or claiming a second dedicated GPU on a "default" install. See the [model hardware requirements](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#model-hardware-requirements) table for per-NIM GPU and disk costs. +> This matches the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and avoids silently pulling ≈ 62 GiB of Omni weights or claiming a second dedicated GPU on a "default" install. Refer to the [model hardware requirements](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#model-hardware-requirements) table for per-NIM GPU and disk costs. The chart auto-wires the operator-managed in-cluster URLs of the four "core" NIMs into the service's `nim_endpoints` block: @@ -302,7 +304,7 @@ cluster allows runtime package installation. For air-gapped clusters, see To run self-hosted Parakeet for [audio and video extraction](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/audio-video.md): -1. Set `nimOperator.audio.enabled=true` (it is on by default; disable other optional NIMs you do not need per [Recommended minimal install (26.05)](#recommended-minimal-install-2605)). +1. Set `nimOperator.audio.enabled=true` (it is on by default; disable other optional NIMs you do not need per [Recommended minimal install](#recommended-minimal-install-2605)). 2. Pin the ASR `NIMService` to a **dedicated GPU** with `nimOperator.audio.resources`, `nodeSelector`, or `tolerations` (see [NIM Operator](https://docs.nvidia.com/nim-operator/latest/index.html)). 3. Confirm the GPU SKU in [Model hardware requirements](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#model-hardware-requirements) (footnote ⁴ lists Blackwell limitations). 4. Set `service.installFfmpeg=true` when the retriever service will process audio or video (see `service.installFfmpeg` above). @@ -374,9 +376,9 @@ pair gated on three conditions ALL holding: | `nimOperator.vlm_embed.enabled` | `true` | Multimodal embedding NIM (also used by the vectordb Pod). | | `nimOperator.vlm_embed.nimServiceName` | `llama-nemotron-embed-vl-1b-v2` | NIMService / in-cluster DNS name. | | `nimOperator.vlm_embed.image` | `nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:1.12.0` | Default VLM embed NIM image. | -| `nimOperator.rerankqa.enabled` | `false` | VL reranker NIM (optional; not auto-wired). Set `true` to opt in. Default `false` so 26.05 installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and do not silently provision an extra ≈ 3.1 GiB GPU NIM. The image points at the **VL** SKU (`llama-nemotron-rerank-vl-1b-v2`) per [prerequisites-support-matrix.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#default-helm-nims) — the text-only `llama-nemotron-rerank-1b-v2` silently degrades multimodal reranking and is not the documented POR. | -| `nimOperator.nemotron_parse.enabled` | `false` | Structured-parse NIM (optional). Set `true` when using `extract_method="nemotron_parse"`. Default `false` so 26.05 installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md). Image tag follows the [image tag conventions](#image-tag-conventions). | -| `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled` | `false` | Omni 30B caption NIM (optional). Set `true` to enable image captioning — see [Image captioning (Omni 30B)](#image-captioning-omni-30b). Default `false` so 26.05 installs do not silently pull ≈ 62 GiB of BF16 weights or claim a second dedicated GPU. Image tag follows the [image tag conventions](#image-tag-conventions). | +| `nimOperator.rerankqa.enabled` | `false` | VL reranker NIM (optional; not auto-wired). Set `true` to opt in. Default `false` so default installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and do not silently provision an extra ≈ 3.1 GiB GPU NIM. The image points at the **VL** SKU (`llama-nemotron-rerank-vl-1b-v2`) per [prerequisites-support-matrix.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#default-helm-nims) — the text-only `llama-nemotron-rerank-1b-v2` silently degrades multimodal reranking and is not the documented POR. | +| `nimOperator.nemotron_parse.enabled` | `false` | Structured-parse NIM (optional). Set `true` when using `extract_method="nemotron_parse"`. Default `false` so default installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md). Image tag follows the [image tag conventions](#image-tag-conventions). | +| `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled` | `false` | Omni 30B caption NIM (optional). Set `true` to enable image captioning — refer to [Image captioning (Omni 30B)](#image-captioning-omni-30b). Default `false` so default installs do not silently pull ≈ 62 GiB of BF16 weights or claim a second dedicated GPU. Image tag follows the [image tag conventions](#image-tag-conventions). | | `nimOperator.audio.enabled` | `false` | Parakeet ASR NIM (optional). Set `true` for audio/video transcription; pair with `serviceConfig.nimEndpoints.audioGrpcEndpoint=audio:50051` so the retriever-service can reach it. | | `nimOperator..image.repository` | `nvcr.io/nim/nvidia/...` | Per-NIM image. | | `nimOperator..image.pullSecrets` | `[ngc-secret]` | Referenced by the NIMService CR. | @@ -394,7 +396,7 @@ pair gated on three conditions ALL holding: > are auto-wired into the retriever-service config. Optional NIMs may reconcile > when `nimOperator..enabled` is `true` in `values.yaml`, but the > retriever-service won't call them unless you wire your pipeline to use them. -> For 26.05, prefer the [minimal install](#recommended-minimal-install-2605) overrides. +> Prefer the [recommended minimal install](#recommended-minimal-install-2605) overrides. #### Filtering cached GPU profiles { #filtering-cached-gpu-profiles } @@ -488,7 +490,7 @@ and **ocr** (no `graphic_elements` operator NIM in this chart). For image captioning, set `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled=true` — see [Image captioning (Omni 30B)](#image-captioning-omni-30b) for the chart-side wiring and -[Image captioning (26.05)](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-2605) +[Image captioning — NIM and hardware](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-nim-hardware) for the product matrix. #### Image captioning (Omni 30B) { #image-captioning-omni-30b } @@ -1154,7 +1156,7 @@ nimOperator: - For **offline captioning**, enable `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning` and point the pipeline caption endpoint at the in-cluster NIM URL (see - [Image captioning (26.05)](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-2605)). + [Image captioning — NIM and hardware](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-nim-hardware)). ### Mirroring pattern From 2e02ecb94a475c1f9ac951af754bfdff1a46aeb4 Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Tue, 16 Jun 2026 14:19:18 -0700 Subject: [PATCH 6/6] Revert "docs: drop 26.05 labels from minimal-install links on main" This reverts commit 3f5fe71282214807f39c0d66e56285172552e055. --- docs/docs/extraction/deployment-options.md | 2 +- .../prerequisites-support-matrix.md | 2 +- nemo_retriever/helm/README.md | 24 +++++++++---------- 3 files changed, 13 insertions(+), 15 deletions(-) diff --git a/docs/docs/extraction/deployment-options.md b/docs/docs/extraction/deployment-options.md index 6b5292f1f3..cb80472c80 100644 --- a/docs/docs/extraction/deployment-options.md +++ b/docs/docs/extraction/deployment-options.md @@ -22,7 +22,7 @@ Build and run the NeMo Retriever service image with the [Docker service image gu 3. **Published Library Helm charts (supported):** cluster install and upgrade procedures are covered in the [NeMo Retriever Library](https://docs.nvidia.com/nemo/retriever/latest/extraction/overview/) — use alongside the NeMo Retriever chart README for your release 4. [Environment variables](environment-config.md) and [Troubleshoot](troubleshoot.md) as needed -**Core NIMs for the default extraction pipeline:** `page_elements`, `table_structure`, `ocr`, and `vlm_embed` (`llama-nemotron-embed-vl-1b-v2:1.12.0`). These four are auto-wired into the retriever service. **Nemotron Parse**, **Nemotron 3 Nano Omni**, the **VL reranker**, and **Parakeet ASR** are optional and not auto-wired. For a minimal GPU footprint, disable optional keys you do not need (refer to [Recommended minimal install](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). Refer to [Pre-Requisites & Support Matrix — Default Helm NIMs](prerequisites-support-matrix.md#default-helm-nims). +**Core NIMs for the default extraction pipeline** (26.05): `page_elements`, `table_structure`, `ocr`, and `vlm_embed` (`llama-nemotron-embed-vl-1b-v2:1.12.0`). These four are auto-wired into the retriever service. **Nemotron Parse**, **Nemotron 3 Nano Omni**, the **VL reranker**, and **Parakeet ASR** are optional and not auto-wired. For a minimal GPU footprint, disable optional keys you do not need (see [Recommended minimal install (26.05)](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). See [Pre-Requisites & Support Matrix — Default Helm NIMs](prerequisites-support-matrix.md#default-helm-nims). For audio and video extraction in Kubernetes, set `service.installFfmpeg=true` diff --git a/docs/docs/extraction/prerequisites-support-matrix.md b/docs/docs/extraction/prerequisites-support-matrix.md index c203914da6..a8b30c5b6f 100644 --- a/docs/docs/extraction/prerequisites-support-matrix.md +++ b/docs/docs/extraction/prerequisites-support-matrix.md @@ -94,7 +94,7 @@ Default VL embedder container and model for release deployments: ### Optional Helm NIMs (not auto-wired) { #optional-helm-nims-not-auto-wired-by-default } -These NIM microservices are **optional** for the default extraction pipeline. The retriever service does **not** call them until you enable the matching pipeline stage (reranker, Nemotron Parse, caption, or audio). For production deployments, disable keys you do not need (refer to [Recommended minimal install](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). Set `nimOperator..enabled=true` when you want that NIM reconciled. Chart keys are in the [NeMo Retriever Helm chart README](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md#nim-operator-sub-stack). +These NIM microservices are **optional** for the default extraction pipeline. The retriever service does **not** call them until you enable the matching pipeline stage (reranker, Nemotron Parse, caption, or audio). For **26.05 production**, disable keys you do not need (see [Recommended minimal install (26.05)](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md#recommended-minimal-install-2605)). Set `nimOperator..enabled=true` when you want that NIM reconciled. Chart keys are in the [NeMo Retriever Helm chart README](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md#nim-operator-sub-stack). | Helm flag | NIM | Role | |-----------|-----|------| diff --git a/nemo_retriever/helm/README.md b/nemo_retriever/helm/README.md index 6ad00e6ba9..41204a3d52 100644 --- a/nemo_retriever/helm/README.md +++ b/nemo_retriever/helm/README.md @@ -188,7 +188,7 @@ NIM (the VL reranker `rerankqa`, Nemotron Parse, Omni 30B, and the Parakeet `audio` ASR NIM) is **disabled by default** to honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md); -refer to [Recommended minimal install](#recommended-minimal-install-2605) +see [Recommended minimal install (26.05)](#recommended-minimal-install-2605) for the opt-in `--set` flags that turn any of them on. ```bash @@ -199,9 +199,7 @@ helm install retriever ./nemo_retriever/helm \ --set ngcApiSecret.password=$NGC_API_KEY ``` -### Recommended minimal install { #recommended-minimal-install } - - +### Recommended minimal install (26.05) { #recommended-minimal-install-2605 } Deploy only the four core NIMs that the retriever service auto-wires (`page_elements`, `table_structure`, `ocr`, `vlm_embed`): @@ -213,14 +211,14 @@ helm install retriever ./nemo_retriever/helm \ --set ngcApiSecret.password=$NGC_API_KEY ``` -> The VL reranker (`rerankqa`), Nemotron Parse, the Nemotron 3 Nano Omni 30B caption NIM, and the Parakeet `audio` ASR NIM are **all off by default** — they only reconcile when you explicitly opt in. Opt-in flags: +> The VL reranker (`rerankqa`), Nemotron Parse, the Nemotron 3 Nano Omni 30B caption NIM, and the Parakeet `audio` ASR NIM are **all off by default** in 26.05 — they only reconcile when you explicitly opt in. Opt-in flags: > > * VL reranker — `--set nimOperator.rerankqa.enabled=true` > * Nemotron Parse — `--set nimOperator.nemotron_parse.enabled=true` > * Omni 30B captioner — `--set nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled=true` > * Parakeet ASR — `--set nimOperator.audio.enabled=true` (also set `serviceConfig.nimEndpoints.audioGrpcEndpoint=audio:50051` to wire ASR into the service, plus `service.installFfmpeg=true` if your image does not bundle ffmpeg) > -> This matches the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and avoids silently pulling ≈ 62 GiB of Omni weights or claiming a second dedicated GPU on a "default" install. Refer to the [model hardware requirements](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#model-hardware-requirements) table for per-NIM GPU and disk costs. +> This matches the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and avoids silently pulling ≈ 62 GiB of Omni weights or claiming a second dedicated GPU on a "default" install. See the [model hardware requirements](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#model-hardware-requirements) table for per-NIM GPU and disk costs. The chart auto-wires the operator-managed in-cluster URLs of the four "core" NIMs into the service's `nim_endpoints` block: @@ -304,7 +302,7 @@ cluster allows runtime package installation. For air-gapped clusters, see To run self-hosted Parakeet for [audio and video extraction](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/audio-video.md): -1. Set `nimOperator.audio.enabled=true` (it is on by default; disable other optional NIMs you do not need per [Recommended minimal install](#recommended-minimal-install-2605)). +1. Set `nimOperator.audio.enabled=true` (it is on by default; disable other optional NIMs you do not need per [Recommended minimal install (26.05)](#recommended-minimal-install-2605)). 2. Pin the ASR `NIMService` to a **dedicated GPU** with `nimOperator.audio.resources`, `nodeSelector`, or `tolerations` (see [NIM Operator](https://docs.nvidia.com/nim-operator/latest/index.html)). 3. Confirm the GPU SKU in [Model hardware requirements](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#model-hardware-requirements) (footnote ⁴ lists Blackwell limitations). 4. Set `service.installFfmpeg=true` when the retriever service will process audio or video (see `service.installFfmpeg` above). @@ -376,9 +374,9 @@ pair gated on three conditions ALL holding: | `nimOperator.vlm_embed.enabled` | `true` | Multimodal embedding NIM (also used by the vectordb Pod). | | `nimOperator.vlm_embed.nimServiceName` | `llama-nemotron-embed-vl-1b-v2` | NIMService / in-cluster DNS name. | | `nimOperator.vlm_embed.image` | `nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:1.12.0` | Default VLM embed NIM image. | -| `nimOperator.rerankqa.enabled` | `false` | VL reranker NIM (optional; not auto-wired). Set `true` to opt in. Default `false` so default installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and do not silently provision an extra ≈ 3.1 GiB GPU NIM. The image points at the **VL** SKU (`llama-nemotron-rerank-vl-1b-v2`) per [prerequisites-support-matrix.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#default-helm-nims) — the text-only `llama-nemotron-rerank-1b-v2` silently degrades multimodal reranking and is not the documented POR. | -| `nimOperator.nemotron_parse.enabled` | `false` | Structured-parse NIM (optional). Set `true` when using `extract_method="nemotron_parse"`. Default `false` so default installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md). Image tag follows the [image tag conventions](#image-tag-conventions). | -| `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled` | `false` | Omni 30B caption NIM (optional). Set `true` to enable image captioning — refer to [Image captioning (Omni 30B)](#image-captioning-omni-30b). Default `false` so default installs do not silently pull ≈ 62 GiB of BF16 weights or claim a second dedicated GPU. Image tag follows the [image tag conventions](#image-tag-conventions). | +| `nimOperator.rerankqa.enabled` | `false` | VL reranker NIM (optional; not auto-wired). Set `true` to opt in. Default `false` so 26.05 installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md) and do not silently provision an extra ≈ 3.1 GiB GPU NIM. The image points at the **VL** SKU (`llama-nemotron-rerank-vl-1b-v2`) per [prerequisites-support-matrix.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/prerequisites-support-matrix.md#default-helm-nims) — the text-only `llama-nemotron-rerank-1b-v2` silently degrades multimodal reranking and is not the documented POR. | +| `nimOperator.nemotron_parse.enabled` | `false` | Structured-parse NIM (optional). Set `true` when using `extract_method="nemotron_parse"`. Default `false` so 26.05 installs honor the "optional and disabled by default" contract in [deployment-options.md](https://github.com/NVIDIA/NeMo-Retriever/blob/main/docs/docs/extraction/deployment-options.md). Image tag follows the [image tag conventions](#image-tag-conventions). | +| `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled` | `false` | Omni 30B caption NIM (optional). Set `true` to enable image captioning — see [Image captioning (Omni 30B)](#image-captioning-omni-30b). Default `false` so 26.05 installs do not silently pull ≈ 62 GiB of BF16 weights or claim a second dedicated GPU. Image tag follows the [image tag conventions](#image-tag-conventions). | | `nimOperator.audio.enabled` | `false` | Parakeet ASR NIM (optional). Set `true` for audio/video transcription; pair with `serviceConfig.nimEndpoints.audioGrpcEndpoint=audio:50051` so the retriever-service can reach it. | | `nimOperator..image.repository` | `nvcr.io/nim/nvidia/...` | Per-NIM image. | | `nimOperator..image.pullSecrets` | `[ngc-secret]` | Referenced by the NIMService CR. | @@ -396,7 +394,7 @@ pair gated on three conditions ALL holding: > are auto-wired into the retriever-service config. Optional NIMs may reconcile > when `nimOperator..enabled` is `true` in `values.yaml`, but the > retriever-service won't call them unless you wire your pipeline to use them. -> Prefer the [recommended minimal install](#recommended-minimal-install-2605) overrides. +> For 26.05, prefer the [minimal install](#recommended-minimal-install-2605) overrides. #### Filtering cached GPU profiles { #filtering-cached-gpu-profiles } @@ -490,7 +488,7 @@ and **ocr** (no `graphic_elements` operator NIM in this chart). For image captioning, set `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning.enabled=true` — see [Image captioning (Omni 30B)](#image-captioning-omni-30b) for the chart-side wiring and -[Image captioning — NIM and hardware](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-nim-hardware) +[Image captioning (26.05)](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-2605) for the product matrix. #### Image captioning (Omni 30B) { #image-captioning-omni-30b } @@ -1156,7 +1154,7 @@ nimOperator: - For **offline captioning**, enable `nimOperator.nemotron_3_nano_omni_30b_a3b_reasoning` and point the pipeline caption endpoint at the in-cluster NIM URL (see - [Image captioning — NIM and hardware](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-nim-hardware)). + [Image captioning (26.05)](https://docs.nvidia.com/nemo/retriever/latest/extraction/prerequisites-support-matrix/#image-captioning-2605)). ### Mirroring pattern