Embedder model router by KyleZheng1284 · Pull Request #2232 · NVIDIA/NeMo-Retriever

KyleZheng1284 · 2026-06-11T22:18:50Z

Description

Adds local checkpoint support for embedders by allowing local model directories to bypass HF revision pinning only through an explicit allow_local_path=True opt-in.
Routes local checkpoints through the existing NRL text or VL embedder paths using NRL_LOCAL_EMBED_ARCH=vl|text, without adding a new operator or changing extraction/VDB/eval architecture.
Centralizes text vs VL routing in a shared helper so the factory and text embedding processor stay in sync.
Adds tests covering local checkpoint revision behavior, text/VL routing, env/arg overrides, and unchanged registered Hub ID behavior.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

greptile-apps · 2026-06-11T22:24:19Z

Greptile Summary

Adds local-checkpoint support to the embedder stack: on-disk model directories bypass HF revision pinning via an explicit allow_local_path=True opt-in, and a new model_arch parameter routes them to either the text or VL embedder path using the shared resolve_embed_model_use_vl helper.

hf_model_registry.get_hf_revision gains an allow_local_path flag that short-circuits the pin requirement only for directories that contain config.json, keeping the supply-chain guarantee intact for all Hub IDs.
create_local_embedder / create_local_query_embedder accept model_arch (\"vl\"/\"text\") which, for local paths, must be declared explicitly — the env var NRL_LOCAL_EMBED_ARCH is the fallback; inference is intentionally forbidden.
processor.py is updated to pass model_arch through and use resolve_embed_model_use_vl for the _embed closure selection, but contains an import typo (nemo_retriever.model → nemo_retriever.models) that will cause ModuleNotFoundError at runtime on the local-embedder code path.

Confidence Score: 2/5

Not safe to merge as-is: the import typo in processor.py will break the local-embedder injection path at runtime.

The import in maybe_inject_local_hf_embedder references nemo_retriever.model (singular), a module that does not exist. Every call to embed_text_from_primitives_df without a configured remote endpoint will raise ModuleNotFoundError at the lazy-import site inside the function, completely blocking local-mode text embedding. The fix is a one-character typo correction, but the bug affects the primary code path this PR is meant to enable.

nemo_retriever/src/nemo_retriever/models/inference/processor.py — import typo on line 93 breaks runtime.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/models/inference/processor.py	Import typo: `nemo_retriever.model` (singular) should be `nemo_retriever.models` — causes ModuleNotFoundError at runtime on the local-embedder injection path
nemo_retriever/src/nemo_retriever/models/init.py	Adds local-checkpoint routing helpers and wires model_arch into create_local_embedder/create_local_query_embedder; _is_local_checkpoint_dir lacks the config.json guard present in hf_model_registry._is_local_model_dir
nemo_retriever/src/nemo_retriever/models/hf_model_registry.py	Adds allow_local_path opt-in to get_hf_revision with a config.json guard; clean and well-tested
nemo_retriever/tests/test_local_embed_checkpoint.py	Comprehensive new test file covering revision-bypass, text/VL routing, env/arg overrides, and Hub-ID regression guard; fixture isolation is clean

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[create_local_embedder / maybe_inject_local_hf_embedder] --> B[resolve_embed_model_use_vl\nmodel_name, model_arch]
    B --> C{_is_local_checkpoint_dir?}
    C -- Yes --> D[_resolve_local_embed_arch\nmodel_arch or NRL_LOCAL_EMBED_ARCH]
    D -- invalid/missing --> E[ValueError: must declare arch]
    D -- vl --> F[use_vl = True]
    D -- text --> G[use_vl = False]
    C -- No --> H[is_vl_embed_model\nHub allow-list]
    H --> F
    H --> G
    F --> I[VL embedder\nLlamaNemotronEmbedVL1BV2*]
    G --> J[Text embedder\nLlamaNemotronEmbed1BV2*]
    I --> K[get_hf_revision\nallow_local_path=True]
    J --> K
    K --> L{_is_local_model_dir\nconfig.json present?}
    L -- Yes --> M[return None\nno revision pin]
    L -- No + registered Hub ID --> N[return pinned SHA]
    L -- No + unregistered --> O[ValueError: no pinned revision]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[create_local_embedder / maybe_inject_local_hf_embedder] --> B[resolve_embed_model_use_vl\nmodel_name, model_arch]
    B --> C{_is_local_checkpoint_dir?}
    C -- Yes --> D[_resolve_local_embed_arch\nmodel_arch or NRL_LOCAL_EMBED_ARCH]
    D -- invalid/missing --> E[ValueError: must declare arch]
    D -- vl --> F[use_vl = True]
    D -- text --> G[use_vl = False]
    C -- No --> H[is_vl_embed_model\nHub allow-list]
    H --> F
    H --> G
    F --> I[VL embedder\nLlamaNemotronEmbedVL1BV2*]
    G --> J[Text embedder\nLlamaNemotronEmbed1BV2*]
    I --> K[get_hf_revision\nallow_local_path=True]
    J --> K
    K --> L{_is_local_model_dir\nconfig.json present?}
    L -- Yes --> M[return None\nno revision pin]
    L -- No + registered Hub ID --> N[return pinned SHA]
    L -- No + unregistered --> O[ValueError: no pinned revision]

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/src/nemo_retriever/models/inference/processor.py:93-97
The import uses `nemo_retriever.model` (singular), but the package is `nemo_retriever.models` (plural). There is no `nemo_retriever/model.py` module in the repo, so this raises `ModuleNotFoundError` at runtime on every call to `maybe_inject_local_hf_embedder` when no remote endpoint is configured — i.e., the exact case this function is meant to handle.

```suggestion
    from nemo_retriever.models import (
        create_local_embedder,
        resolve_embed_model,
        resolve_embed_model_use_vl,
    )
```

_{Reviews (3): Last reviewed commit: "Add local embed checkpoint routing" | Re-trigger Greptile}

jperez999 · 2026-06-12T01:06:23Z

@@ -59,6 +60,36 @@ def is_vl_rerank_model(model_name: str | None) -> bool:
    return (model_name or "") in _VL_RERANK_MODEL_IDS


+LOCAL_EMBED_ARCH_ENV = "NRL_LOCAL_EMBED_ARCH"
+_VALID_LOCAL_EMBED_ARCHS = frozenset({"vl", "text"})


Why is this still here? Shouldn't this go away if we accept universal?

seems like these might be architectures? Please elaborate, on why this distinction would still be required.

For this implementation I wasn't planning on accepting universal embedding models for now but rather focusing on either vl or text that nvidia already has on huggingface.

We would need this explicit distinction in the case to avoid routing a VL checkpoint through the text embedder and vice versa.

greptile-apps · 2026-06-16T03:28:57Z

+    from nemo_retriever.model import (
+        create_local_embedder,
+        resolve_embed_model,
+        resolve_embed_model_use_vl,
+    )


The import uses nemo_retriever.model (singular), but the package is nemo_retriever.models (plural). There is no nemo_retriever/model.py module in the repo, so this raises ModuleNotFoundError at runtime on every call to maybe_inject_local_hf_embedder when no remote endpoint is configured — i.e., the exact case this function is meant to handle.

Suggested change

from nemo_retriever.model import (

create_local_embedder,

resolve_embed_model,

resolve_embed_model_use_vl,

)

from nemo_retriever.models import (

create_local_embedder,

resolve_embed_model,

resolve_embed_model_use_vl,

)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/models/inference/processor.py Line: 93-97 Comment: The import uses `nemo_retriever.model` (singular), but the package is `nemo_retriever.models` (plural). There is no `nemo_retriever/model.py` module in the repo, so this raises `ModuleNotFoundError` at runtime on every call to `maybe_inject_local_hf_embedder` when no remote endpoint is configured — i.e., the exact case this function is meant to handle. ```suggestion from nemo_retriever.models import ( create_local_embedder, resolve_embed_model, resolve_embed_model_use_vl, ) ``` How can I resolve this? If you propose a fix, please make it concise.

KyleZheng1284 requested review from a team as code owners June 11, 2026 22:18

KyleZheng1284 requested a review from ChrisJar June 11, 2026 22:18

KyleZheng1284 marked this pull request as draft June 11, 2026 22:19

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread nemo_retriever/src/nemo_retriever/text_embed/processor.py Outdated

Comment thread nemo_retriever/src/nemo_retriever/models/__init__.py

Comment thread nemo_retriever/src/nemo_retriever/utils/hf_model_registry.py Outdated

jperez999 requested changes Jun 12, 2026

View reviewed changes

KyleZheng1284 changed the title ~~Universal embedder model router~~ Embedder model router Jun 12, 2026

KyleZheng1284 marked this pull request as ready for review June 12, 2026 22:12

jperez999 approved these changes Jun 16, 2026

View reviewed changes

KyleZheng1284 added 2 commits June 16, 2026 02:56

routing mechanisms for hf embedding models

5197cc6

Add local embed checkpoint routing

21b9df8

KyleZheng1284 force-pushed the feature/custom-model-operator branch from 3e6a91f to 21b9df8 Compare June 16, 2026 03:25

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedder model router#2232

Embedder model router#2232
KyleZheng1284 wants to merge 2 commits into
NVIDIA:mainfrom
KyleZheng1284:feature/custom-model-operator

KyleZheng1284 commented Jun 11, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jperez999 Jun 12, 2026

Uh oh!

jperez999 Jun 12, 2026

Uh oh!

KyleZheng1284 Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KyleZheng1284 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

greptile-apps Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jperez999 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

jperez999 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

KyleZheng1284 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KyleZheng1284 commented Jun 11, 2026 •

edited

Loading

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading