feat: add JSON-LD structured data and llms.txt for LLM discovery#40
feat: add JSON-LD structured data and llms.txt for LLM discovery#40nityatimalsina merged 1 commit intomainfrom
Conversation
Add automatic JSON-LD schema generation and AI handshake files to improve
site visibility for LLM-based discovery systems (Perplexity, ChatGPT, etc.).
JSON-LD Structured Data:
- Add DocsJsonLd component: site-level WebSite schema in layout.tsx
- Add DocsAutoJsonLd component: auto-generates TechArticle or HowTo schema
per page based on file path (quickstart/*, guides/*, getting-started* → HowTo;
everything else → TechArticle)
- Add mdx-components.ts wrapper that injects DocsAutoJsonLd on every page
using Nextra metadata (title, description, filePath) — zero manual imports needed
- Include DocsPageJsonLd and DocsHowToJsonLd as manual override components
- All schemas link to master SoftwareApplication entity via
about: { @id: https://lightpanda.io/#software }
AI Handshake (llms.txt):
- Add scripts/generate-llms.mjs: build-time script that scans all MDX docs
and generates public/llms.txt (page inventory by section) and
public/llms-full.txt (full content, JSX stripped)
- Add prebuild hook in package.json to auto-regenerate on every build
- Add .github/workflows/generate-llms.yml to auto-commit on MDX changes
- Add <link rel="alternate" href="/llms.txt"> discovery link in layout.tsx
There was a problem hiding this comment.
Pull request overview
This PR adds automatic structured data (JSON-LD) to the docs site and introduces llms.txt/llms-full.txt generation to improve LLM-oriented discovery and indexing for the Lightpanda documentation (Next.js + Nextra static export).
Changes:
- Injects JSON-LD across the docs: a site-level
WebSiteschema inlayout.tsxand per-page auto schema via the MDX wrapper. - Adds a build-time generator (
scripts/generate-llms.mjs) plusprebuildhook to producepublic/llms.txtandpublic/llms-full.txt. - Adds a GitHub Action to regenerate and auto-commit
llmsfiles when MDX content changes onmain.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mdx-components.ts | Wraps MDX rendering to auto-inject per-page JSON-LD based on Nextra metadata. |
| src/components/lightpanda/DocsAutoJsonLd.tsx | Implements automatic per-page JSON-LD schema selection and URL derivation from filePath. |
| src/components/lightpanda/DocsJsonLd.tsx | Adds site-level WebSite JSON-LD component for docs. |
| src/components/lightpanda/DocsPageJsonLd.tsx | Provides manual TechArticle JSON-LD override component. |
| src/components/lightpanda/DocsHowToJsonLd.tsx | Provides manual HowTo JSON-LD override component (with steps). |
| src/app/layout.tsx | Injects docs WebSite JSON-LD and adds a discovery <link> for llms.txt. |
| scripts/generate-llms.mjs | New script to scan MDX and generate llms.txt + llms-full.txt. |
| package.json | Adds generate-llms and prebuild script hooks. |
| .github/workflows/generate-llms.yml | Workflow to regenerate and auto-commit llms files on main when MDX changes. |
| public/llms.txt | Generated inventory file committed to the repo. |
| public/llms-full.txt | Generated full-content file committed to the repo. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import { readdir, readFile, writeFile, stat } from 'node:fs/promises' | ||
| import { join, relative, basename, dirname } from 'node:path' | ||
| import { fileURLToPath } from 'node:url' |
There was a problem hiding this comment.
Unused import: stat is imported from node:fs/promises but never used. Removing it will keep the script clean and avoids lint/typecheck failures if unused-import rules are enforced.
| <DNSPrefetch /> | ||
| <link rel="alternate" type="text/markdown" href="/llms.txt" /> | ||
| </NextraHead> |
There was a problem hiding this comment.
Because the Next.js basePath is set to /docs (see next.config.mjs), the static file generated at public/llms.txt will be served at /docs/llms.txt. The current <link ... href="/llms.txt" /> points to the site root and will 404 in production. Update the href to include the basePath (e.g. /docs/llms.txt or an absolute https://lightpanda.io/docs/llms.txt).
| function filePathToDocsPath(filePath: string): string { | ||
| return filePath | ||
| .replace('src/content/', '') | ||
| .replace(/\.mdx$/, '') | ||
| .replace(/\/index$/, '') | ||
| } |
There was a problem hiding this comment.
filePathToDocsPath turns src/content/index.mdx into index, which produces a JSON-LD url of https://lightpanda.io/docs/index for the docs homepage. This should resolve to the canonical homepage URL (https://lightpanda.io/docs/). Consider special-casing the root index file (e.g., map index to an empty path) to avoid incorrect structured data on the landing page.
| function stripMdx(body) { | ||
| return body | ||
| .split('\n') | ||
| .filter((line) => !line.startsWith('import ')) | ||
| .filter((line) => !line.match(/^<[A-Z]/)) | ||
| .filter((line) => !line.match(/^\s*\/>/)) | ||
| .filter((line) => !line.match(/^\s+\{ name:/)) // HowToJsonLd step props | ||
| .filter((line) => !line.match(/^\s+steps=\{/)) | ||
| .filter((line) => !line.match(/^\s+\]\}/)) | ||
| .join('\n') |
There was a problem hiding this comment.
stripMdx removes opening JSX lines like <Callout ...> / <Tabs.Tab ...> but not the corresponding closing tags (e.g. </Callout>, </Tabs>), so llms-full.txt still contains a lot of raw JSX despite the intent to strip it. Update the filter to also drop closing JSX tags (and likely other common JSX-only lines) so the generated output is actually JSX-free.
Add automatic JSON-LD schema generation and AI handshake files to improve site visibility for LLM-based discovery systems (Perplexity, ChatGPT, etc.).
JSON-LD Structured Data:
AI Handshake (llms.txt):