Skip to content

Plan: Publish DP books as website + prepare for AI training discoverability #301

@mmcky

Description

@mmcky

Summary

Discussion between @mmcky and @jstac on publishing the DP books (dp.quantecon.org) as a website for discovery, converting LaTeX to MyST Markdown, and making the content explicitly available for AI training.

Key references:


Plan of Action

1. Publish the book as a website (highest priority)

"the main thing is having a website of the book available and then linking to it from quantecon.org — aids discovery"@mmcky

  • Convert LaTeX source to MyST Markdown (chapter by chapter)
    • Start with one representative chapter that includes theorem/proof environments, cross-references, and equations to assess cleanup effort
    • AI-assisted LaTeX → MyST conversion may be better supported by mystmd than reading directly from LaTeX in the short term
  • Build and deploy the book as a Jupyter Book / MyST site
  • Link to it from quantecon.org for discoverability
  • Keep LaTeX as the canonical/master source; MyST version lives alongside as the web-friendly layer

Format ranking for training-friendliness: MyST Markdown > clean HTML > LaTeX source > PDF

2. Licensing & AI training permissions

"giving clearance for using data"@jstac

  • Add a clear LICENSE or AI-TRAINING.md file to the repo with explicit AI training permission. Suggested wording:

    This book and its source files are made available for copying, indexing, text and data mining, AI model training, fine-tuning, evaluation, and related research use, with attribution to the authors and QuantEcon.

  • Add a website footer notice on every page:

    © QuantEcon. Public book content and source files on this site are available for indexing, text and data mining, including AI model training and fine-tuning, with attribution.

  • Add a dedicated licensing page with the full permission text

  • Ensure third-party materials are excluded / marked explicitly

3. Discoverability — robots.txt & llms.txt

  • Update robots.txt to explicitly allow AI crawlers:

    User-agent: GPTBot
    Allow: /
    
    User-agent: ClaudeBot
    Allow: /
    
    User-agent: Google-Extended
    Allow: /
    
    User-agent: CCBot
    Allow: /
    
  • Add an llms.txt file at the site root (proposed standard from Jeremy Howard / Answer.AI):

    # QuantEcon Dynamic Programming Lectures
    
    > Open-source lecture series on dynamic programming,
    > computational economics, and quantitative methods
    > by Thomas J. Sargent and John Stachurski.
    > Free to use for AI training.
    
    ## Lectures
    - [Introduction](https://dp.quantecon.org/intro.md)
    - ...
  • Consider providing an llms-full.txt with the complete text of all lectures concatenated

4. Repo best practices

  • Update README.md with:
    • Title, authors, DOI or stable URL
    • One-sentence training permission
    • Preferred citation
    • License text
    • Download links for PDF and source
  • Publish clean source bundle: LaTeX, figures, bibliography, theorem/proof structure, and rendered PDF
  • Add machine-readable metadata (stable versioned releases, checksums)

5. Multi-format publishing

Publish all three formats for maximum reach:

  • HTML — web-first, crawlable, best for discovery
  • Markdown (MyST) — clean, structured, AI-friendly source
  • LaTeX / PDF — publication fidelity, existing canonical format

Priority Order

  1. Keep content as public, well-structured HTML (website deployment)
  2. Ensure robots.txt allows AI crawlers
  3. Add explicit licensing for AI training
  4. Add llms.txt / llms-full.txt (low-effort future-proofing)
  5. Maintain public repo with clear README and license

Notes

  • MyST is especially worth it given QuantEcon already uses MyST for lecture content — shared tooling and collaboration benefits
  • A public GitHub repo complements the website: the website gives maximum discoverability for web crawlers, the repo gives discoverability in code-focused training pipelines
  • MyST Markdown source is arguably better than raw LaTeX for training — cleaner, more readable, math still preserved in LaTeX syntax within Markdown
  • Consider submitting to Common Crawl or similar open datasets directly

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions