Skip to content

manceps/aixtract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AIXtract

Enterprise-grade document extraction for LLM and NLP pipelines.

Installation

pip install aixtract
pip install "aixtract[all]"  # All format support

Quick Start

from aixtract import extract

result = extract("document.pdf")
print(result.content_markdown)

Supported Formats

Converter Extensions Dependencies
txt .txt, .md, .rst, .log none
csv .csv, .tsv none
json .json none
xml .xml none
archive .zip none
pdf .pdf pypdf, pdfplumber
docx .docx, .doc docx
xlsx .xlsx, .xls openpyxl │
pptx .pptx, .ppt pptx
html .html, .htm bs4
epub .epub ebooklib, bs4
image .png, .jpg, .jpeg, .tiff, .bmp PIL, pytesseract
audio .mp3, .wav, .m4a, .flac, .ogg whisper

Development Setup

If you are developing or contributing to aixtract, you can use the provided Makefile or pip directly.

Option 1: Using make (Recommended)

The Makefile is configured to automatically use the python and pip binaries inside .venv, so you don't even need to activate the environment to install.

Install for usage:

make install

Install for development (includes testing/linting tools):

make dev

Option 2: Using pip manually

If you prefer to run commands manually or don't have make installed:

  1. Activate the virtual environment:

    source .venv/bin/activate
  2. Install the project in editable mode:

    # For basic usage
    pip install -e .
    
    # For development (with dev dependencies)
    # NOTE: Quotes are required in zsh
    pip install -e ".[all,dev]"

Acknowledgments

AIXtract incorporates adapted components from CAMEL-AI, licensed under the Apache License 2.0. See NOTICE for details.

License

Apache License 2.0

About

Standardized python library for text extraction from various formats.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors