Enterprise-grade document extraction for LLM and NLP pipelines.
pip install aixtract
pip install "aixtract[all]" # All format supportfrom aixtract import extract
result = extract("document.pdf")
print(result.content_markdown)| Converter | Extensions | Dependencies |
|---|---|---|
| txt | .txt, .md, .rst, .log | none |
| csv | .csv, .tsv | none |
| json | .json | none |
| xml | .xml | none |
| archive | .zip | none |
| pypdf, pdfplumber | ||
| docx | .docx, .doc | docx |
| xlsx | .xlsx, .xls | openpyxl │ |
| pptx | .pptx, .ppt | pptx |
| html | .html, .htm | bs4 |
| epub | .epub | ebooklib, bs4 |
| image | .png, .jpg, .jpeg, .tiff, .bmp | PIL, pytesseract |
| audio | .mp3, .wav, .m4a, .flac, .ogg | whisper |
If you are developing or contributing to aixtract, you can use the provided Makefile or pip directly.
The Makefile is configured to automatically use the python and pip binaries inside .venv, so you don't even need to activate the environment to install.
Install for usage:
make installInstall for development (includes testing/linting tools):
make devIf you prefer to run commands manually or don't have make installed:
-
Activate the virtual environment:
source .venv/bin/activate -
Install the project in editable mode:
# For basic usage pip install -e . # For development (with dev dependencies) # NOTE: Quotes are required in zsh pip install -e ".[all,dev]"
AIXtract incorporates adapted components from CAMEL-AI, licensed under the Apache License 2.0. See NOTICE for details.
Apache License 2.0