Detect Healthcare Overcharging Instantly
MedClear is an AI-powered healthcare billing audit tool that detects overcharging in hospital bills using OCR (Optical Character Recognition) and intelligent price comparison against NPPA + CGHS government pricing standards.
Think of it as a " TurboTax for medical bills" โ upload your hospital bill, and MedClear instantly tells you if you've been overcharged and by how much.
Patients in India receive expensive hospital bills but have no way to verify if they are being overcharged due to lack of accessible pricing transparency.
Healthcare billing is one of the most opaque industries in the world. Patients are expected to pay thousands โ sometimes lakhs โ without any way to verify if the charges are legitimate.
- Hidden Charges โ Patients have no way to verify if hospital bill items are correctly priced. A simple "Injection" can cost โน50 or โน5,000 with no explanation.
- No Transparency โ Medical billing lacks accessible standard pricing benchmarks. Unlike grocery shopping where you can compare prices, hospital bills are treated as non-negotiable.
- Exploitation โ Estimated โน50,000+ crores is lost annually to overcharging in India alone. In the US, medical billing errors cost patients billions every year.
- No Recourse โ Patients rarely challenge bills because they don't have the data or expertise to prove overcharging. Hospitals know this and exploit it.
MedClear bridges the information gap between patients and healthcare pricing.
How it works:
- Upload โ Patient uploads hospital bill (image or PDF)
- Extract โ OCR technology pulls all line items and prices from the bill
- Match โ Smart matching algorithm maps each item to NPPA/CGHS standard codes
- Compare โ Each item is checked against official government-defined rates
- Report โ Detailed savings report shows exactly where you were overcharged
The result? Instant clarity. Real savings. Empowerment.
| Feature | Description |
|---|---|
| ๐ Upload Hospital Bill | Drag-and-drop support for bills in JPG, PNG, or PDF format |
| ๐ผ๏ธ OCR Text Extraction | Tesseract-powered extraction pulls line items and prices from scanned documents |
| ๐ Smart Matching | Fuzzy matching algorithm maps bill items to standard NPPA/CGHS codes |
| ๐ Price Comparison | Real-time comparison against official NPPA + CGHS databases |
| Flags items exceeding government-defined rates with percentage overcharge | |
| ๐ฐ Savings Report | Generates downloadable PDF report with itemized savings breakdown |
| ๐ฑ User Dashboard | History of all uploaded bills and their audit results |
| ๐ Secure Storage | Bills encrypted and stored securely with user-level access control |
Step-by-Step Pipeline:
- Upload โ User drags and drops a hospital bill (image/PDF)
- Preprocessing โ Image is enhanced, rotated, and noise-reduced for better OCR
- OCR โ Tesseract + OpenCV extracts all text, line items, and prices
- Entity Extraction โ NLP parses extracted text into structured data (item name, quantity, unit price, total)
- Code Mapping โ Fuzzy matching maps each item to NPPA drug codes or CGHS service codes
- Price Lookup โ Query the database for government-defined rates
- Comparison โ Calculate overcharge amount and percentage for each item
- Output โ Generate JSON response + PDF report for the user
We chose each technology in MedClear based on performance, developer experience, ecosystem maturity, and scalability. Here's why:
| Technology | Why We Chose It |
|---|---|
| React 18 | Industry-standard component library with excellent performance via concurrent rendering. Used by Netflix, Airbnb, and Instagram. |
| TypeScript | Static typing catches 30% of bugs at compile time. Essential for a financial/healthcare application where errors are costly. |
| Tailwind CSS | Utility-first CSS allows rapid UI development without context-switching between files. Smaller bundle size than traditional CSS frameworks. |
| Vite | Next-gen build tool that's 10-100x faster than webpack. Instant server start and lightning-fast HMR. |
| Zustand | Lightweight state management without the boilerplate of Redux. Perfect for our simple auth + UI state needs. |
| React Query | Handles server state, caching, and background refetching. Eliminates manual "loading" management. |
| Framer Motion | Production-ready animations that make the app feel premium and polished. |
React + TypeScript + Tailwind + Vite + Zustand + React Query + Framer Motion
| Technology | Why We Chose It |
|---|---|
| Node.js | Event-driven I/O is perfect for our I/O-heavy OCR pipeline. Same language as frontend = full-stack productivity. |
| Express.js | Minimal, unopinionated framework. We only pay for what we use. Massive ecosystem of middleware. |
| TypeScript | End-to-end type safety from backend to frontend. Auto-complete everywhere. |
| Prisma | Type-safe ORM that feels like a query builder. Migration system is best-in-class. |
| MongoDB | NoSQL database for flexible data storage. Perfect for unstructured billing data with complex queries. |
| JWT | Stateless authentication. Perfect for scalable APIs. |
Node.js + Express + TypeScript + Prisma + MongoDB + JWT + Zod
| Technology | Why We Chose It |
|---|---|
| Python | Dominant language for AI/ML. Tesseract, OpenCV, and scikit-learn all have Python-first APIs. |
| Tesseract OCR | Open-source, battle-tested OCR. Supports 100+ languages. No API costs = free at scale. |
| OpenCV | Computer vision library for image preprocessing (contrast, deskew, denoising). Critical for accurate OCR on blurry hospital bills. |
| FuzzyWuzzy | String matching library for matching bill items to NPPA codes. Handles typos and variations. |
| Pandas | Data processing for analyzing large NPPA datasets efficiently. |
| FastAPI | Async Python web framework. High-performance API for OCR results.uvicorn as the ASGI server. |
Python + Tesseract + OpenCV + FuzzyWuzzy + Pandas + FastAPI + Uvicorn
| Technology | Why We Chose It |
|---|---|
| PostgreSQL | The gold standard for relational data. JSON support for flexible metadata. Perfect for billing records. |
| Pinecone | Vector database for semantic search. Enables "find similar drugs/services" functionality. |
| Docker | Containerization ensures the same environment from dev to production. Essential for Python + Node compatibility. |
| Docker Compose | Local development with one command. All services (DB, Redis, API) start together. |
| GitHub Actions | Free CI/CD for open-source. Automated testing and deployment. |
PostgreSQL + Pinecone + Docker + Docker Compose + GitHub Actions
medclear/
โโโ frontend/ # React + TypeScript + Vite
โ โโโ src/
โ โ โโโ components/
โ โ โโโ pages/
โ โ โโโ hooks/
โ โ โโโ stores/
โ โ โโโ utils/
โ โโโ package.json
โ
โโโ backend/ # Node.js + Express + Prisma
โ โโโ src/
โ โ โโโ controllers/
โ โ โโโ routes/
โ โ โโโ middleware/
โ โ โโโ services/
โ โ โโโ utils/
โ โโโ package.json
โ
โโโ ocr-service/ # Python + FastAPI
โ โโโ app/
โ โ โโโ routers/
โ โ โโโ services/
โ โ โโโ utils/
โ โโโ requirements.txt
โ
โโโ docker-compose.yml # All services orchestration
{
"audit_result": {
"bill_id": "BILL-2024-00123",
"hospital_name": "Apollo Hospitals",
"total_bill": "โน1,42,500",
"overcharged": "โน42,300",
"savings_percentage": "29.7%",
"status": "audit_complete",
"flagged_items": [
{
"item": "Private Ward (3 days)",
"category": "room_charges",
"charged": "โน45,000",
"allowed_cghs": "โน12,000",
"overcharge": "โน33,000",
"overcharge_percentage": "275%"
},
{
"item": "Injection (Ceftriaxone 1g)",
"category": "medications",
"charged": "โน850",
"allowed_nppa": "โน45",
"overcharge": "โน805",
"overcharge_percentage": "1789%"
},
{
"item": "Blood Test (CBC)",
"category": "diagnostics",
"charged": "โน600",
"allowed_cghs": "โน150",
"overcharge": "โน450",
"overcharge_percentage": "300%"
}
],
"recommendations": [
"Dispute the ward charges with hospital billing department",
"Request itemized bill with drug NDC codes",
"File a complaint with NPPA if charges are not rectified"
]
}
}
โ ๏ธ You were overcharged โน42,300 โ that's โน33,000 just on ward charges alone (275% over CGHS rates).
| Tool | Version | Purpose |
|---|---|---|
| Node.js | 18+ | Frontend & Backend runtime |
| Python | 3.9+ | OCR Service |
| MongoDB | 14+ | Primary database |
| npm / pip | Latest | Package management |
# Clone the repository
git clone https://github.com/Rachit-Kakkad1/medclear.git
cd medclear
# Install frontend dependencies
cd frontend
npm install
# Go back and install backend dependencies
cd ../backend
npm install
# Install OCR service dependencies
cd ../ocr-service
pip install -r requirements.txtCreate a .env file in backend/:
# Database
DATABASE_URL="postgresql://postgres:password@localhost:5432/medclear"
# Authentication
JWT_SECRET="your-super-secret-jwt-key-change-in-production"
JWT_EXPIRES_IN="7d"
# Redis
REDIS_URL="redis://localhost:6379"
# OCR Service
OCR_SERVICE_URL="http://localhost:8000"
# NPPA API (Government pricing data)
NPPA_API_URL="https://api.nppa.gov.in/pricing"
NPPA_API_KEY="your-api-key"
# App Config
NODE_ENV="development"
PORT=3000Create a .env file in ocr-service/:
# Tesseract OCR
TESSDATA_PREFIX="/usr/share/tesseract-5/ tessdata"
# Image Processing
MAX_IMAGE_SIZE=10485760
SUPPORTED_FORMATS="jpg,jpeg,png,pdf"
# Server
HOST="0.0.0.0"
PORT=8000cd backend
# Run Prisma migrations
npx prisma migrate dev
# Seed NPPA/CGHS pricing data
npx prisma db seed# Terminal 1 โ Backend API
cd backend
npm run dev
# Starts at http://localhost:3000
# Terminal 2 โ Frontend
cd frontend
npm run dev
# Starts at http://localhost:5173
# Terminal 3 โ OCR Service
cd ocr-service
uvicorn app.main:app --reload
# Starts at http://localhost:8000# Start all services with Docker
docker-compose up --buildVisit http://localhost:5173 to start auditing bills.
We're just getting started. Here's what's on our roadmap:
| Feature | Status | Description |
|---|---|---|
| ๐ฅ Prescription Scanner | Planned | Analyze doctor prescriptions for medication overpricing |
| ๐ก๏ธ Insurance Integration | Planned | Auto-submit audit reports to insurance providers |
| ๐ Real-time Pricing | Planned | Live API integration from NPPA for latest drug prices |
| ๐ฑ Mobile App | Planned | Native iOS and Android applications |
| ๐ Multi-country Support | Researching | Support for US (Medicare), UK (NHS), EU pricing standards |
| ๐ค AI Recommendations | Researching | Personalized cost-saving suggestions based on medical history |
| ๐ Analytics Dashboard | Planned | Hospital-level pricing analytics for researchers |
| ๐ Alert System | Planned | Push notifications when pricing data updates |
We welcome contributions from developers, designers, and healthcare professionals!
# 1. Fork the repository
# 2. Clone your fork
git clone https://github.com/YOUR_USERNAME/medclear.git
# 3. Create a feature branch
git checkout -b feature/amazing-new-feature
# 4. Make your changes
# 5. Run tests
npm test # Frontend
npm run test # Backend
pytest # OCR Service
# 6. Commit with descriptive message
git commit -m "Add: New feature that does X"
# 7. Push to your fork
git push origin feature/amazing-new-feature
# 8. Open a Pull Request- ๐ Bug Fixes โ Help us squash bugs
- ๐จ UI/UX โ Make MedClear beautiful
- ๐ Features โ Build new capabilities
- ๐ Documentation โ Improve docs
- ๐งช Testing โ Increase test coverage
- ๐ Security โ Audit for vulnerabilities
๐ก Looking for a way to contribute? Check out our Good First Issues label.
MIT License โ see LICENSE for details.
Built with โค๏ธ for healthcare transparency

