Skip to content

remove custom-metadata.md; canonical path is vdbs + notebooks#2195

Open
kheiss-uwzoo wants to merge 9 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/consolidate-custom-metadata-into-vdbs
Open

remove custom-metadata.md; canonical path is vdbs + notebooks#2195
kheiss-uwzoo wants to merge 9 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/consolidate-custom-metadata-into-vdbs

Conversation

@kheiss-uwzoo

Copy link
Copy Markdown
Collaborator

Summary

  • Remove docs/docs/extraction/custom-metadata.md (duplicate of notebook + VDB README content added in align metadata docs with VDB filtering guide #2108).
  • Expand Metadata and filtering on �dbs.md with a short overview and links to the worked notebooks.
  • Drop the separate MkDocs nav entry; add redirect custom-metadata.md -> �dbs.md#metadata-and-filtering.
  • Update cross-links and doc-snippet test registry.

Follows Julio's NVBugs 6205401 guidance: VDB/metadata facts live on �dbs.md; runnable walkthroughs stay in notebooks.

Notebooks (canonical examples)

Operator/API reference remains in nemo_retriever/src/nemo_retriever/vdb/README.md.

Test plan

  • MkDocs build; confirm redirect from old custom-metadata URL
  • Nav no longer lists a separate Custom metadata page
  • Notebook links resolve on GitHub

@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners June 1, 2026 18:03
@kheiss-uwzoo kheiss-uwzoo requested a review from edknv June 1, 2026 18:03
@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Removes two thin doc pages (custom-metadata.md and integrations-langchain-llamaindex-haystack.md), consolidates their content into vdbs.md#metadata-and-filtering, and adds MkDocs redirects so existing URLs continue to resolve.

  • vdbs.md gains a full prose overview of the sidecar metadata workflow (meta_dataframe/meta_source_field/meta_fields), retriever-service upload flow, and links to both worked notebooks; cross-links in six other doc files updated.
  • notebooks/index.md now lists both canonical metadata notebooks (metadata_and_filtered_search.ipynb and nemo_retriever_retriever_query_metadata_filter.ipynb), addressing the gap noted in a prior review thread.
  • The doc-snippet test registry swaps custom-metadata.md for vdbs.md; since vdbs.md contains no Python fenced blocks the entry is inert, though coverage is preserved via the .ipynb entry already in the tuple.

Confidence Score: 5/5

Safe to merge; all changes are documentation and redirects with no runtime code paths affected.

The PR deletes two doc pages, expands a section on vdbs.md, adds MkDocs redirects, and updates cross-links. No Python logic, APIs, or data paths are modified. The redirects are correctly wired, both canonical notebooks are now linked, and the previously flagged sparse metadata section has been filled in.

nemo_retriever/tests/test_src_documentation_snippets.py — the new _PUBLIC_RETRIEVER_DOCS entry for vdbs.md produces zero test blocks today; worth noting if Python examples are ever added to that page in the future.

Important Files Changed

Filename Overview
docs/docs/extraction/vdbs.md Metadata and filtering section expanded with overview prose, sidecar parameter details, service workflow, and links to the worked notebook; cross-links updated throughout.
nemo_retriever/tests/test_src_documentation_snippets.py Swapped custom-metadata.md for vdbs.md in _PUBLIC_RETRIEVER_DOCS; vdbs.md currently has no Python fenced blocks, so the entry contributes zero coverage to the constructor-kwargs and syntax tests.
docs/docs/extraction/notebooks/index.md Both canonical metadata notebooks are now listed; prior thread concern about missing metadata_and_filtered_search.ipynb link is addressed.
docs/mkdocs.yml Removed custom-metadata and integrations nav entries; sections renumbered; two redirects added for deleted pages.
docs/docs/extraction/custom-metadata.md Deleted; content consolidated into vdbs.md and notebooks; redirect configured in mkdocs.yml.
docs/docs/extraction/integrations-langchain-llamaindex-haystack.md Deleted; thin wrapper page replaced by redirect to notebooks/index.md which already lists the LangChain and LlamaIndex examples.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    OLD_CM["custom-metadata.md\n(deleted)"]
    OLD_INT["integrations-langchain-llamaindex-haystack.md\n(deleted)"]
    REDIRECT1["redirect:\nextraction/custom-metadata.md\n→ vdbs.md#metadata-and-filtering"]
    REDIRECT2["redirect:\nextraction/integrations-…\n→ notebooks/index.md"]
    VDBS["vdbs.md\n#metadata-and-filtering\n(expanded)"]
    NB_INDEX["notebooks/index.md\n(both metadata notebooks listed)"]
    NB1["metadata_and_filtered_search.ipynb"]
    NB2["nemo_retriever_retriever_query_metadata_filter.ipynb"]
    VDB_README["nemo_retriever/src/nemo_retriever/vdb/README.md\n(canonical operator reference)"]

    OLD_CM -->|content moved to| VDBS
    OLD_CM -->|URL preserved by| REDIRECT1
    OLD_INT -->|URL preserved by| REDIRECT2
    VDBS -->|worked example link| NB2
    VDBS -->|operator reference link| VDB_README
    NB_INDEX --> NB1
    NB_INDEX --> NB2
Loading

Reviews (15): Last reviewed commit: "docs(extraction): address PR2195 review ..." | Re-trigger Greptile

@kheiss-uwzoo kheiss-uwzoo changed the title docs(extraction): remove custom-metadata.md; canonical path is vdbs + notebooks remove custom-metadata.md; canonical path is vdbs + notebooks Jun 1, 2026
@kheiss-uwzoo kheiss-uwzoo requested a review from jperez999 June 1, 2026 18:12
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label Jun 1, 2026
Comment thread docs/docs/extraction/integrations-langchain-llamaindex-haystack.md Outdated
Comment thread docs/docs/extraction/vdbs.md
Comment thread docs/docs/extraction/vdbs.md Outdated
Comment thread docs/docs/extraction/notebooks/index.md
@kheiss-uwzoo kheiss-uwzoo requested a review from randerzander June 5, 2026 17:42
@kheiss-uwzoo kheiss-uwzoo marked this pull request as draft June 5, 2026 23:27
@kheiss-uwzoo kheiss-uwzoo marked this pull request as ready for review June 8, 2026 20:14
Comment thread docs/docs/extraction/notebooks/index.md Outdated
- [Workflow: Ingest documents](../workflow-document-ingestion.md)
- [How to add metadata to your documents and filter searches](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb)
- [Metadata filtering: add sidecar metadata and filter searches](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/nemo_retriever_retriever_query_metadata_filter.ipynb)
- [How to reindex a collection](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/reindex_example.ipynb)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure this is valid? I dont see this file, nor should it exist.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — removed in 0c8878e2. examples/reindex_example.ipynb was deleted on main in #2163 and #2226 had already dropped this link from notebooks/index.md; re-adding it was a merge artifact from an older branch state.


- [Semantic retrieval](vdbs.md#semantic-retrieval)
- Framework examples: [LangChain, LlamaIndex, Haystack](integrations-langchain-llamaindex-haystack.md)
- Framework examples: [Jupyter Notebooks](notebooks/index.md)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cant find this ref either? Is this PR outdated, maybe this should be closed without merge?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target page exists at docs/docs/extraction/notebooks/index.md (mkdocs nav: Starter kits). The integrations page this PR removes only duplicated LangChain/LlamaIndex links already listed there.

Main still has both custom-metadata.md and integrations-langchain-llamaindex-haystack.md, so the consolidation is still needed — not planning to close without merge.

Renamed the link text to Starter kits in 0c8878e2 to match the nav label.

- [NVIDIA AI Blueprints catalog](https://build.nvidia.com/explore/discover)

For framework-specific integration patterns, see [Framework integrations](integrations-langchain-llamaindex-haystack.md).
For framework-specific integration patterns, see [Jupyter Notebooks](notebooks/index.md).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — notebooks/index.md is the Starter kits page; link label updated to match nav in 0c8878e2.

@kheiss-uwzoo kheiss-uwzoo requested a review from jperez999 June 11, 2026 15:32
@kheiss-uwzoo kheiss-uwzoo self-assigned this Jun 11, 2026
kheiss-uwzoo and others added 9 commits June 11, 2026 16:34
Drop dead metadata_and_filtered_search notebook links; document retriever
service sidecar upload on vdbs.md instead of delegating to VDB README.
Delete integrations-langchain-llamaindex-haystack.md, point inbound links at notebooks/index.md, and add a mkdocs redirect.
Replace duplicated metadata prose with a single notebook link per review.
Revert doc-snippet test list change; belongs outside this docs-only PR.
Users arriving via the deleted custom-metadata.md URL need a short
overview of meta_* sidecar params and filter modes, plus links to the
worked notebooks and VDB README—not a bare hyperlink alone.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Restore vdbs.md metadata landing content with service sidecar guidance, drop dead metadata_and_filtered_search.ipynb links, and point the doc-snippet test registry at vdbs.md instead of deleted custom-metadata.md.
Remove reindex_example.ipynb entry (notebook removed on main in NVIDIA#2163).
Rename framework cross-links to Starter kits to match mkdocs nav label.
@kheiss-uwzoo kheiss-uwzoo force-pushed the docs/consolidate-custom-metadata-into-vdbs branch from a541334 to fafaf61 Compare June 11, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants