Skip to content

docs(extraction): sync post-#2179 extraction doc fixes to main#2203

Open
kheiss-uwzoo wants to merge 5 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/backport-2194-extraction-docs-fix
Open

docs(extraction): sync post-#2179 extraction doc fixes to main#2203
kheiss-uwzoo wants to merge 5 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/backport-2194-extraction-docs-fix

Conversation

@kheiss-uwzoo

@kheiss-uwzoo kheiss-uwzoo commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Sync extraction doc structure on main with post-#2179 review fixes that landed on 26.05 in PR #2194 but never reached main. This is not a literal cherry-pick of #2194 — review feedback on this PR evolved the approach.

What changed

  • Remove custom-metadata.md — consolidate metadata/filtering guidance into vdbs.md#metadata-and-filtering; add mkdocs.yml redirect; update cross-links in workflow-agentic-retrieval.md, integrations-langchain-llamaindex-haystack.md, and the doc-snippet test list.
  • Fix stale caption anchors — retarget FAQ and multimodal-extraction.md links from prerequisites-support-matrix.md#image-captioning-2605 to in-page anchors; rename the support-matrix section to #image-captioning-nim-hardware with a legacy #image-captioning-2605 span for external bookmarks; fix deployment-options.md to use the new anchor.
  • Trim support-matrix admonition — remove the chart-caption admonition block per review; add one sentence of caption-scope prose under #image-captioning in multimodal-extraction.md.
  • Other fix audio-video.md markdown rendering (follow-up to #2179) #2194 fixes — FAQ Docker Compose note; simplified OCR deploy prose in multimodal-extraction.md; explicit section anchor IDs in prerequisites-support-matrix.md.

Out of scope (already on main)

Follow-up (eng, not this PR)

Reviewer checklist

  • Delete custom-metadata.md; link to metadata filtering notebook from vdbs.md
  • Remove chart-caption admonition from support matrix and multimodal-extraction
  • Drop 26.05 from image-captioning heading; keep legacy anchor alias
  • Rebase/merge main; conflicts resolved

Test plan

  • MkDocs build passes
  • extraction/custom-metadata.md redirects to vdbs.md#metadata-and-filtering
  • FAQ chart-caption answer links resolve to multimodal-extraction.md#charts-and-infographics and #image-captioning
  • deployment-options.md offline caption link targets #image-captioning-nim-hardware

@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners June 2, 2026 22:44
@kheiss-uwzoo kheiss-uwzoo requested a review from jperez999 June 2, 2026 22:44
@kheiss-uwzoo kheiss-uwzoo changed the title docs(extraction): backport PR #2194 extraction doc fixes to main backport PR #2194 extraction doc fixes to main Jun 2, 2026
@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR syncs post-#2179 extraction documentation fixes from the 26.05 branch to main, consolidating the standalone custom-metadata.md page into vdbs.md#metadata-and-filtering and fixing stale anchor references throughout the extraction docs.

  • custom-metadata.md removed — content absorbed into vdbs.md#metadata-and-filtering as a compact paragraph plus notebook links; a MkDocs redirect is added, consistent with the fragment-redirect pattern already used by a dozen other redirects in mkdocs.yml; all cross-links in faq.md, integrations-langchain-llamaindex-haystack.md, workflow-agentic-retrieval.md, and the doc-snippet test list are updated.
  • Stale anchor chain fixed#image-captioning-2605 renamed to #image-captioning-nim-hardware on prerequisites-support-matrix.md; a <span id="image-captioning-2605"> alias preserves external bookmarks; deployment-options.md, multimodal-extraction.md, and faq.md retargeted to the new anchor or to verified in-page anchors on multimodal-extraction.md.
  • Support-matrix admonition removed — chart-caption scope prose moved to a single sentence under #image-captioning in multimodal-extraction.md; nav renumbered from 7 onward after the single-page "Retrieval & ranking" section is dropped.

Confidence Score: 5/5

Documentation-only reorganisation; no code paths, APIs, or data schemas are changed.

All anchor targets were verified in the post-merge file contents: #charts-and-infographics, #image-captioning, #image-captioning-nim-hardware, and #metadata-and-filtering are all present with the correct explicit IDs. The fragment-redirect pattern for custom-metadata.md is already established by a dozen similar entries in mkdocs.yml. Cross-links in every affected file have been updated consistently, and the deleted file is properly removed from the doc-snippet test list.

No files require special attention.

Important Files Changed

Filename Overview
docs/docs/extraction/custom-metadata.md Deleted — content consolidated into vdbs.md#metadata-and-filtering; no content or links are stranded
docs/docs/extraction/deployment-options.md One-line anchor retarget from #image-captioning-2605 to #image-captioning-nim-hardware, which is the new explicit anchor on prerequisites-support-matrix.md
docs/docs/extraction/faq.md Two link updates: chart-captioning FAQ now points to verified in-page anchors on multimodal-extraction.md; Docker Compose disclaimer added to env-vars answer
docs/docs/extraction/multimodal-extraction.md Simplified OCR prose; chart-caption scope sentence added under #image-captioning; cross-links updated to in-page anchors; all targets verified present
docs/docs/extraction/prerequisites-support-matrix.md Section headings given explicit anchor IDs; #image-captioning-nim-hardware is new canonical anchor; legacy #image-captioning-2605 preserved as a span alias; admonition block removed
docs/docs/extraction/vdbs.md Metadata-and-filtering section expanded with inline guidance replacing the deleted custom-metadata.md page; old links to that page replaced with notebook links
docs/mkdocs.yml Nav renumbered after removing the single-page '7. Retrieval & ranking' section; redirect for custom-metadata.md added, consistent with fragment-redirect pattern already used elsewhere
nemo_retriever/tests/test_src_documentation_snippets.py custom-metadata.md removed from the doc-snippet test list to match the deleted file

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["custom-metadata.md\n(deleted)"] -- "redirect" --> B["vdbs.md\n#metadata-and-filtering"]
    C["faq.md"] -- "chart-caption FAQ" --> D["multimodal-extraction.md\n#charts-and-infographics"]
    C -- "image captioning FAQ" --> E["multimodal-extraction.md\n#image-captioning"]
    E -- "NIM & hardware" --> F["prerequisites-support-matrix.md\n#image-captioning-nim-hardware"]
    F -- "legacy alias" --> G["span id=image-captioning-2605"]
    H["deployment-options.md"] -- "offline captioning link" --> F
    I["integrations doc"] -- "filter search link" --> B
    J["workflow-agentic-retrieval.md"] -- "metadata link" --> B
Loading

Reviews (11): Last reviewed commit: "docs(extraction): fix caption anchor and..." | Re-trigger Greptile

@kheiss-uwzoo kheiss-uwzoo force-pushed the docs/backport-2194-extraction-docs-fix branch from 477f48a to 9a24d31 Compare June 2, 2026 22:55
@kheiss-uwzoo kheiss-uwzoo changed the title backport PR #2194 extraction doc fixes to main docs(extraction): backport PR #2194 extraction doc fixes to main Jun 2, 2026
@kheiss-uwzoo kheiss-uwzoo force-pushed the docs/backport-2194-extraction-docs-fix branch from 9a24d31 to 019547d Compare June 2, 2026 23:01
Comment thread docs/docs/extraction/custom-metadata.md Outdated
Comment thread docs/docs/extraction/multimodal-extraction.md Outdated
kheiss-uwzoo added a commit to kheiss-uwzoo/nv-ingest that referenced this pull request Jun 5, 2026
…a page, remove chart admonition

Remove custom-metadata.md in favor of vdbs.md#metadata-and-filtering and the metadata filtering notebook. Drop the PDF chart caption admonition from multimodal-extraction.md per review feedback.
@kheiss-uwzoo kheiss-uwzoo requested a review from randerzander June 5, 2026 17:54
@kheiss-uwzoo kheiss-uwzoo changed the title docs(extraction): backport PR #2194 extraction doc fixes to main backport PR #2194 extraction doc fixes to main Jun 5, 2026
@kheiss-uwzoo kheiss-uwzoo self-assigned this Jun 8, 2026
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label Jun 8, 2026
Comment thread docs/docs/extraction/multimodal-extraction.md Outdated
@kheiss-uwzoo kheiss-uwzoo requested a review from jperez999 June 10, 2026 20:13
PR NVIDIA#2194 merged into 26.05 on 2026-06-02 but never reached main. This
backport keeps main aligned with the release branch and the published
docs.nvidia.com site after Randy's follow-up review.

Timeline:
- Friday: 26.05 docs built for docs.nvidia upload; branch differed from
  NRL GitHub Pages source and the uploaded docs were incorrect.
- Saturday: diff main vs 26.05 produced PR NVIDIA#2179 to sync extraction docs.
- Monday: PR NVIDIA#2179 merged and docs uploaded to the public site.
- Follow-up: Randy opened PR NVIDIA#2194 on 26.05 with additional fixes found
  after the NVIDIA#2179 sync. Those fixes landed on 26.05 only.
- This commit: cherry-pick of c5b257e onto main (five extraction doc
  files only).

Changes from NVIDIA#2194:
- Fix audio-video.md indented code block rendering
- Restore custom-metadata example service variables and storage prose
- Move caption scope admonition to multimodal-extraction.md
- Trim redundant Helm/OCR deploy detail per review feedback
- Restore FAQ Docker Compose note and support-matrix section anchors
…a page, remove chart admonition

Remove custom-metadata.md in favor of vdbs.md#metadata-and-filtering and the metadata filtering notebook. Drop the PDF chart caption admonition from multimodal-extraction.md per review feedback.
Rename the support-matrix caption section for main and keep a legacy
#image-captioning-2605 alias so existing deep links keep working.
@kheiss-uwzoo kheiss-uwzoo force-pushed the docs/backport-2194-extraction-docs-fix branch from 6a46fb2 to 56fa45c Compare June 11, 2026 23:34
Resolve modify/delete conflict on custom-metadata.md by keeping the PR deletion (content consolidated in vdbs.md with redirect). Bring in main mkdocstrings path fixes and support-matrix updates.
…A#2203

Point deployment-options.md at #image-captioning-nim-hardware. Add one sentence under multimodal-extraction #image-captioning so FAQ cross-references have scope detail without restoring the admonition.
@kheiss-uwzoo kheiss-uwzoo changed the title backport PR #2194 extraction doc fixes to main docs(extraction): sync post-#2179 extraction doc fixes to main Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants