π PDReader - AI-Powered PDF Q&A
Upload any PDF and chat with it using AI. PDReader uses Retrieval-Augmented Generation (RAG) to understand your documents and answer questions accurately.
π€ Drag & Drop Upload - Simply drop your PDFs into the interface
π Smart Processing - Automatically extracts text, chunks content, and creates embeddings
π¬ Natural Chat - Ask questions in plain English and get relevant answers
π Multi-Document Support - Chat with multiple PDFs at once
π Source Citations - See exactly which parts of the document the answer came from
ποΈ Document Management - View, delete, and manage your uploaded documents
πΎ Persistent Storage - Documents and their vector stores are saved locally
π Privacy-First - Your documents stay on your machine
Technology
Purpose
FastAPI
High-performance API framework
LangChain
LLM framework & document processing
FAISS
Vector similarity search
OpenAI
GPT models for embeddings & chat
PyPDF
PDF text extraction
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (React + Vite) β
β β
β Document Upload ββββββΊ Chat Interface ββββββΊ Source Citations β
β (Drag & Drop) (Real-time Chat) (Page + Chunk refs) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β HTTP
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI) β
β β
β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββ β
β β Documents β β Chat β β Health β β
β β Router β β Router β β Router β β
β β (CRUD ops) β β (Q&A) β β (Status check) β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββββββββ¬ββββββββββββ β
β β β β |
β ββββββββββββββββββββββββΌββββββββββββββββββββββββββ |
β β β
β βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββ β
β β Services Layer β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββ β β
β β β PDF β β Vector β β LLM β β β
β β β Processing β β Search β β Service β β β
β β β (PyPDF + β β (FAISS + β β (GPT-3.5-turbo) β β β
β β β LangChain) β β OpenAI β β β β β
β β β β β Embeddings) β β β β β
β β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Local β β FAISS β β OpenAI β
β File β β Vector β β API β
β System β β Store β β β
β (PDFs + β β (Local) β β β
β JSON) β β β β β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
git clone https://github.com/yourusername/PDReader.git
cd PDReader
cd backend
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\S cripts\a ctivate
# Activate (Mac/Linux)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy the example env file
cp .env.example .env
# Edit .env and add your OpenAI API key
OPENAI_API_KEY=sk-your-api-key-here
uvicorn main:app --reload --port 8000
5. Frontend Setup (in a new terminal)
cd frontend
# Install dependencies
npm install
# Start development server
npm run dev
Open http://localhost:5173 in your browser
Upload a PDF using the drag & drop zone or file picker
Wait for the document status to show "ready" (processing happens automatically)
Ask questions about your document in the chat box
View source citations to see where answers came from
Method
Endpoint
Description
GET
/health
Health check
POST
/api/documents/upload
Upload PDF(s)
GET
/api/documents
List all documents
GET
/api/documents/{id}
Get document details
DELETE
/api/documents/{id}
Delete a document
DELETE
/api/documents
Delete all documents
POST
/api/chat
Ask a question
POST /api/chat
{
"query" : " What is this document about?" ,
"document_ids" : [" doc-uuid-1" , " doc-uuid-2" ]
}
{
"answer" : " This document is an annual report..." ,
"sources" : [
{
"document_id" : " doc-uuid-1" ,
"filename" : " report.pdf" ,
"chunk_text" : " Annual Report 2024..." ,
"page" : 1
}
],
"model" : " gpt-3.5-turbo"
}
PDReader/
βββ backend/
β βββ main.py # FastAPI application & routes
β βββ services.py # PDF processing & LLM logic
β βββ schemas.py # Pydantic models
β βββ requirements.txt # Python dependencies
β βββ .env # Environment variables
βββ frontend/
β βββ src/
β β βββ App.tsx # Main React component
β β βββ api.ts # API client functions
β β βββ types.ts # TypeScript types
β β βββ index.css # Global styles
β βββ package.json # Node dependencies
β βββ vite.config.ts # Vite configuration
βββ README.md
Customize behavior by editing backend/services.py:
Variable
Default
Description
CHUNK_SIZE
500
Text chunk size for embeddings
CHUNK_OVERLAP
50
Overlap between chunks
TOP_K
4
Number of documents to retrieve
OPENAI_MODEL
gpt-3.5-turbo
LLM model to use
LangChain for the amazing RAG abstractions
FAISS for efficient similarity search
OpenAI for the LLM capabilities