Skip to content

feat: add JSON-LD structured data and llms.txt for LLM discovery#40

Merged
nityatimalsina merged 1 commit intomainfrom
feat/seo-jsonld-llms-txt
Mar 10, 2026
Merged

feat: add JSON-LD structured data and llms.txt for LLM discovery#40
nityatimalsina merged 1 commit intomainfrom
feat/seo-jsonld-llms-txt

Conversation

@nityatimalsina
Copy link
Contributor

Add automatic JSON-LD schema generation and AI handshake files to improve site visibility for LLM-based discovery systems (Perplexity, ChatGPT, etc.).

JSON-LD Structured Data:

  • Add DocsJsonLd component: site-level WebSite schema in layout.tsx
  • Add DocsAutoJsonLd component: auto-generates TechArticle or HowTo schema per page based on file path (quickstart/, guides/, getting-started* → HowTo; everything else → TechArticle)
  • Add mdx-components.ts wrapper that injects DocsAutoJsonLd on every page using Nextra metadata (title, description, filePath) — zero manual imports needed
  • Include DocsPageJsonLd and DocsHowToJsonLd as manual override components
  • All schemas link to master SoftwareApplication entity via about: { @id: https://lightpanda.io/#software }

AI Handshake (llms.txt):

  • Add scripts/generate-llms.mjs: build-time script that scans all MDX docs and generates public/llms.txt (page inventory by section) and public/llms-full.txt (full content, JSX stripped)
  • Add prebuild hook in package.json to auto-regenerate on every build
  • Add .github/workflows/generate-llms.yml to auto-commit on MDX changes
  • Add discovery link in layout.tsx

Add automatic JSON-LD schema generation and AI handshake files to improve
site visibility for LLM-based discovery systems (Perplexity, ChatGPT, etc.).

JSON-LD Structured Data:
- Add DocsJsonLd component: site-level WebSite schema in layout.tsx
- Add DocsAutoJsonLd component: auto-generates TechArticle or HowTo schema
  per page based on file path (quickstart/*, guides/*, getting-started* → HowTo;
  everything else → TechArticle)
- Add mdx-components.ts wrapper that injects DocsAutoJsonLd on every page
  using Nextra metadata (title, description, filePath) — zero manual imports needed
- Include DocsPageJsonLd and DocsHowToJsonLd as manual override components
- All schemas link to master SoftwareApplication entity via
  about: { @id: https://lightpanda.io/#software }

AI Handshake (llms.txt):
- Add scripts/generate-llms.mjs: build-time script that scans all MDX docs
  and generates public/llms.txt (page inventory by section) and
  public/llms-full.txt (full content, JSX stripped)
- Add prebuild hook in package.json to auto-regenerate on every build
- Add .github/workflows/generate-llms.yml to auto-commit on MDX changes
- Add <link rel="alternate" href="/llms.txt"> discovery link in layout.tsx
Copilot AI review requested due to automatic review settings March 10, 2026 03:30
@nityatimalsina nityatimalsina merged commit a86b133 into main Mar 10, 2026
3 checks passed
@nityatimalsina nityatimalsina deleted the feat/seo-jsonld-llms-txt branch March 10, 2026 03:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automatic structured data (JSON-LD) to the docs site and introduces llms.txt/llms-full.txt generation to improve LLM-oriented discovery and indexing for the Lightpanda documentation (Next.js + Nextra static export).

Changes:

  • Injects JSON-LD across the docs: a site-level WebSite schema in layout.tsx and per-page auto schema via the MDX wrapper.
  • Adds a build-time generator (scripts/generate-llms.mjs) plus prebuild hook to produce public/llms.txt and public/llms-full.txt.
  • Adds a GitHub Action to regenerate and auto-commit llms files when MDX content changes on main.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/mdx-components.ts Wraps MDX rendering to auto-inject per-page JSON-LD based on Nextra metadata.
src/components/lightpanda/DocsAutoJsonLd.tsx Implements automatic per-page JSON-LD schema selection and URL derivation from filePath.
src/components/lightpanda/DocsJsonLd.tsx Adds site-level WebSite JSON-LD component for docs.
src/components/lightpanda/DocsPageJsonLd.tsx Provides manual TechArticle JSON-LD override component.
src/components/lightpanda/DocsHowToJsonLd.tsx Provides manual HowTo JSON-LD override component (with steps).
src/app/layout.tsx Injects docs WebSite JSON-LD and adds a discovery <link> for llms.txt.
scripts/generate-llms.mjs New script to scan MDX and generate llms.txt + llms-full.txt.
package.json Adds generate-llms and prebuild script hooks.
.github/workflows/generate-llms.yml Workflow to regenerate and auto-commit llms files on main when MDX changes.
public/llms.txt Generated inventory file committed to the repo.
public/llms-full.txt Generated full-content file committed to the repo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +14 to +16
import { readdir, readFile, writeFile, stat } from 'node:fs/promises'
import { join, relative, basename, dirname } from 'node:path'
import { fileURLToPath } from 'node:url'
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: stat is imported from node:fs/promises but never used. Removing it will keep the script clean and avoids lint/typecheck failures if unused-import rules are enforced.

Copilot uses AI. Check for mistakes.
Comment on lines 60 to 62
<DNSPrefetch />
<link rel="alternate" type="text/markdown" href="/llms.txt" />
</NextraHead>
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Next.js basePath is set to /docs (see next.config.mjs), the static file generated at public/llms.txt will be served at /docs/llms.txt. The current <link ... href="/llms.txt" /> points to the site root and will 404 in production. Update the href to include the basePath (e.g. /docs/llms.txt or an absolute https://lightpanda.io/docs/llms.txt).

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +31
function filePathToDocsPath(filePath: string): string {
return filePath
.replace('src/content/', '')
.replace(/\.mdx$/, '')
.replace(/\/index$/, '')
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filePathToDocsPath turns src/content/index.mdx into index, which produces a JSON-LD url of https://lightpanda.io/docs/index for the docs homepage. This should resolve to the canonical homepage URL (https://lightpanda.io/docs/). Consider special-casing the root index file (e.g., map index to an empty path) to avoid incorrect structured data on the landing page.

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +70
function stripMdx(body) {
return body
.split('\n')
.filter((line) => !line.startsWith('import '))
.filter((line) => !line.match(/^<[A-Z]/))
.filter((line) => !line.match(/^\s*\/>/))
.filter((line) => !line.match(/^\s+\{ name:/)) // HowToJsonLd step props
.filter((line) => !line.match(/^\s+steps=\{/))
.filter((line) => !line.match(/^\s+\]\}/))
.join('\n')
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripMdx removes opening JSX lines like <Callout ...> / <Tabs.Tab ...> but not the corresponding closing tags (e.g. </Callout>, </Tabs>), so llms-full.txt still contains a lot of raw JSX despite the intent to strip it. Update the filter to also drop closing JSX tags (and likely other common JSX-only lines) so the generated output is actually JSX-free.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants