-
Notifications
You must be signed in to change notification settings - Fork 7
Add markdown alternate links for LLM training data discovery #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add <link rel="alternate" type="text/markdown"> to page headers pointing to .md version - Improve MDX-to-markdown compilation to produce clean markdown output - Preserve code blocks and frontmatter while stripping JSX components Co-Authored-By: Claude Opus 4.5 <[email protected]>
Pages that only contain React components (like the landing page) now return a helpful markdown response with the title, description, and a link to the full interactive page. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
- Add dedent function to normalize indentation when extracting content from JSX components - Add normalizeIndentation function to clean up stray whitespace while preserving meaningful markdown indentation (nested lists, blockquotes) - Move list detection regex patterns to module top level for performance - Ensures code block markers (```) start at column 0 Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
The previous regex patterns `["']?([^"'\n]+)["']?` would truncate text at the first apostrophe (e.g., "Arcade's" became "Arcade"). This fix: - Uses separate patterns for double-quoted, single-quoted, and unquoted values - Requires closing quotes to be at end of line to prevent apostrophes from being misinterpreted as closing delimiters - Adds stripSurroundingQuotes helper for fallback cases Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
When x-pathname header is not set, pathname defaults to "/" which would produce an invalid alternate link "https://docs.arcade.dev/.md". Only render the alternate link when we have a real page path. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Summary
<link rel="alternate" type="text/markdown">to all page headers, pointing to the.mdversion of each page.mdURLs return clean, readable markdown instead of raw MDXThis enables LLM crawlers and training pipelines to discover and consume the markdown versions of our documentation, similar to what Vercel does with their docs.
Test plan
<link rel="alternate" type="text/markdown" href="...">https://docs.arcade.dev/en/get-started/quickstarts/call-tool-agent.md- should return clean markdown with code blocks preservedhttps://docs.arcade.dev/en/home.md- should return fallback content with title/description and link to full page🤖 Generated with Claude Code
Note
Enables clean, consumable markdown versions of docs and surfaces them to crawlers.
app/api/markdown/[[...slug]]/route.tsendpoint that reads page.mdx, compiles to markdown (preserves frontmatter/code blocks, strips imports/exports/JSX, normalizes indentation), provides fallback content for component-only pages, and serves withtext/markdownheadersapp/layout.tsxto inject<link rel="alternate" type="text/markdown">pointing tohttps://docs.arcade.dev{pathname}.mdfor all non-root pagesWritten by Cursor Bugbot for commit fd16de7. This will update automatically on new commits. Configure here.