Skip to content

pratim4dasude/PDReader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ PDReader - AI-Powered PDF Q&A

FastAPI LangChain FAISS OpenAI React

Upload any PDF and chat with it using AI. PDReader uses Retrieval-Augmented Generation (RAG) to understand your documents and answer questions accurately.

✨ Features

  • πŸ“€ Drag & Drop Upload - Simply drop your PDFs into the interface
  • πŸ”„ Smart Processing - Automatically extracts text, chunks content, and creates embeddings
  • πŸ’¬ Natural Chat - Ask questions in plain English and get relevant answers
  • πŸ“š Multi-Document Support - Chat with multiple PDFs at once
  • πŸ” Source Citations - See exactly which parts of the document the answer came from
  • πŸ—ƒοΈ Document Management - View, delete, and manage your uploaded documents
  • πŸ’Ύ Persistent Storage - Documents and their vector stores are saved locally
  • πŸ” Privacy-First - Your documents stay on your machine

πŸ› οΈ Tech Stack

Backend

Technology Purpose
FastAPI High-performance API framework
LangChain LLM framework & document processing
FAISS Vector similarity search
OpenAI GPT models for embeddings & chat
PyPDF PDF text extraction

Frontend

Technology Purpose
React UI framework
TypeScript Type safety
Tailwind CSS Styling
Vite Build tool
Lucide React Icons

πŸ—οΈ System Design

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Frontend (React + Vite)                        β”‚
β”‚                                                                             β”‚
β”‚   Document Upload ─────► Chat Interface ─────► Source Citations             β”‚
β”‚   (Drag & Drop)          (Real-time Chat)     (Page + Chunk refs)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                                      β”‚ HTTP
                                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Backend (FastAPI)                              β”‚
β”‚                                                                             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚   β”‚  Documents  β”‚        β”‚    Chat     β”‚     β”‚       Health           β”‚     β”‚
β”‚   β”‚   Router    β”‚        β”‚   Router    β”‚     β”‚       Router           β”‚     β”‚
β”‚   β”‚  (CRUD ops) β”‚        β”‚  (Q&A)      β”‚     β”‚    (Status check)      β”‚     β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚          β”‚                      β”‚                         β”‚                 |
β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 | 
β”‚                                 β”‚                                           β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚                     Services Layer                                    β”‚ β”‚
β”‚   β”‚                                                                       β”‚ β”‚
β”‚   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β”‚
β”‚   β”‚   β”‚   PDF        β”‚  β”‚   Vector     β”‚  β”‚        LLM              β”‚     β”‚ β”‚
β”‚   β”‚   β”‚  Processing  β”‚  β”‚   Search     β”‚  β”‚      Service            β”‚     β”‚ β”‚
β”‚   β”‚   β”‚  (PyPDF +    β”‚  β”‚  (FAISS +    β”‚  β”‚  (GPT-3.5-turbo)        β”‚     β”‚ β”‚
β”‚   β”‚   β”‚  LangChain)  β”‚  β”‚  OpenAI      β”‚  β”‚                         β”‚     β”‚ β”‚
β”‚   β”‚   β”‚              β”‚  β”‚  Embeddings) β”‚  β”‚                         β”‚     β”‚ β”‚
β”‚   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                               β”‚                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚                     β”‚                     β”‚
          β–Ό                     β–Ό                     β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚   Local     β”‚       β”‚   FAISS     β”‚       β”‚   OpenAI    β”‚
   β”‚   File      β”‚       β”‚   Vector    β”‚       β”‚    API      β”‚
   β”‚   System    β”‚       β”‚   Store     β”‚       β”‚             β”‚
   β”‚  (PDFs +    β”‚       β”‚  (Local)    β”‚       β”‚             β”‚
   β”‚   JSON)     β”‚       β”‚             β”‚       β”‚             β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Getting Started

Prerequisites

Installation

1. Clone the repository

git clone https://github.com/yourusername/PDReader.git
cd PDReader

2. Backend Setup

cd backend

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (Mac/Linux)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure API Key

# Copy the example env file
cp .env.example .env

# Edit .env and add your OpenAI API key
OPENAI_API_KEY=sk-your-api-key-here

4. Start Backend

uvicorn main:app --reload --port 8000

5. Frontend Setup (in a new terminal)

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

πŸŽ‰ Usage

  1. Open http://localhost:5173 in your browser
  2. Upload a PDF using the drag & drop zone or file picker
  3. Wait for the document status to show "ready" (processing happens automatically)
  4. Ask questions about your document in the chat box
  5. View source citations to see where answers came from

πŸ“‘ API Endpoints

Method Endpoint Description
GET /health Health check
POST /api/documents/upload Upload PDF(s)
GET /api/documents List all documents
GET /api/documents/{id} Get document details
DELETE /api/documents/{id} Delete a document
DELETE /api/documents Delete all documents
POST /api/chat Ask a question

Example: Chat Request

POST /api/chat
{
  "query": "What is this document about?",
  "document_ids": ["doc-uuid-1", "doc-uuid-2"]
}

Example Response

{
  "answer": "This document is an annual report...",
  "sources": [
    {
      "document_id": "doc-uuid-1",
      "filename": "report.pdf",
      "chunk_text": "Annual Report 2024...",
      "page": 1
    }
  ],
  "model": "gpt-3.5-turbo"
}

πŸ“ Project Structure

PDReader/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py          # FastAPI application & routes
β”‚   β”œβ”€β”€ services.py      # PDF processing & LLM logic
β”‚   β”œβ”€β”€ schemas.py       # Pydantic models
β”‚   β”œβ”€β”€ requirements.txt # Python dependencies
β”‚   └── .env             # Environment variables
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.tsx      # Main React component
β”‚   β”‚   β”œβ”€β”€ api.ts       # API client functions
β”‚   β”‚   β”œβ”€β”€ types.ts     # TypeScript types
β”‚   β”‚   └── index.css    # Global styles
β”‚   β”œβ”€β”€ package.json     # Node dependencies
β”‚   └── vite.config.ts   # Vite configuration
└── README.md

βš™οΈ Configuration

Customize behavior by editing backend/services.py:

Variable Default Description
CHUNK_SIZE 500 Text chunk size for embeddings
CHUNK_OVERLAP 50 Overlap between chunks
TOP_K 4 Number of documents to retrieve
OPENAI_MODEL gpt-3.5-turbo LLM model to use

πŸ™ Acknowledgments

  • LangChain for the amazing RAG abstractions
  • FAISS for efficient similarity search
  • OpenAI for the LLM capabilities

About

Turn PDFs into searchable, conversational knowledge using AI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors