Skip to content
View 2dogsandanerd's full-sized avatar

Block or report 2dogsandanerd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
2dogsandanerd/README.md

Hi there, I'm Stefan πŸ‘‹

"Most RAG projects don't fail because of the LLM. They fail because they treat PDF ingestion as a simple file upload."

I am an AI-Native Architect focused on solving the hardest problem in the current AI hype cycle: The Ingestion Gap. My mission is to replace "Digital Paper" (dead PDFs) with structured, semantic knowledge that allows Local AI to reason without hallucinations.


πŸš€ The Ecosystem

I build modular, production-ready kits to fix the "Garbage In" problem for high-compliance environments (Public Sector / Enterprise).

πŸ—οΈ Architecture & Platforms

  • RAG Enterprise Core (⭐ New)
    • The Blueprint for BSI-compliant, self-hosted RAG.
    • Features: Ingestion Triage, GraphRAG, Semantic Caching, and Full Observability.
    • Status: Architecture Preview / Closed Source Engine.

πŸ› οΈ Essential Tooling

  • Validated Table Extractor
    • The proof that RAG can handle complex tables if you use Docling + Vision Validation.
    • Status: Open Source Audit Tool.
  • Smart Ingest Kit
    • Production-grade document ingestion pipeline using Docling v2.
    • Solves: Layout Analysis, Table Reconstruction, Markdown Conversion.

πŸ€– Proven in Production

  • Mail Modul Alpha (Private)
    • A fully autonomous, privacy-first AI email assistant running locally.
    • The proof that my ingestion engine works in the wild.

🧠 The "Ingestion-First" Stack

I don't believe in "One Model Fits All". I believe in Triage.

Layer Tools & Tech
Ingestion Docling v2 (Layout Analysis), Qwen2-VL (Vision Fallback), PyMuPDF (Fast Lane)
Storage ChromaDB (Vector), Neo4j (Graph/Relationships), Redis (Semantic Cache)
Orchestration LangGraph (Agentic Workflows), FastAPI (Microservices)
Observability Sentry, Grafana, Jaeger (Tracing)
Infrastructure Docker Compose (Local First), Ollama (Inference)

🌱 Philosophy

  • Structure > Vectors: Embeddings are useless if the input table was ripped apart. I reconstruct structure (Markdown) first.
  • Local > Cloud: Data sovereignty (GDPR/BSI) is not optional. I build for air-gapped reality.
  • Logic > Magic: I prefer deterministic code for business rules over probabilistic LLM guessing.

πŸ“« Connect & Context

  • Reddit: u/ChapterEquivalent188 - Discussing the "PoC Trap" & Ingestion Realities.
  • Focus: Currently open for strategic dialogue regarding High-Compliance RAG Architectures (Public Sector / Industry).

Pinned Loading

  1. RAG_enterprise_core RAG_enterprise_core Public

    Enterprise-grade Retrieval-Augmented Generation system with microservices architecture.

    19 1

  2. ClawRag ClawRag Public

    RAG system combining Docling document processing with ChromaDB vector storage to power openclaw

    Python 29 5

  3. Knowledge-Base-Self-Hosting-Kit Knowledge-Base-Self-Hosting-Kit Public

    A Docker-powered RAG system that understands the difference between code and prose. Ingest your codebase and documentation, then query them with full privacy and zero configuration.

    Python 138 17

  4. DAUT DAUT Public

    DAUT – Documentation Auto Updater - AI-powered documentation generator for your codebase. MCP-Connector

    Python 3