Skip to content

feat: Add Neo4j Database Exporter#4

Open
Sagar-Mondal wants to merge 1 commit intoIBM:mainfrom
Sagar-Mondal:neo4j
Open

feat: Add Neo4j Database Exporter#4
Sagar-Mondal wants to merge 1 commit intoIBM:mainfrom
Sagar-Mondal:neo4j

Conversation

@Sagar-Mondal
Copy link

Summary

This PR introduces a new exporter that enables direct ingestion of extracted knowledge graphs into a live Neo4j database. It integrates the official neo4j Python driver to efficiently populate nodes and relationships, streamlining the workflow from document extraction to persistent graph storage.

Key Changes

1. New Database Client

  • Added docling_graph/db_clients/neo4j_client.py: Implements the Neo4jExporter class.
    • Handles connection to Neo4j via the Bolt protocol.
    • Implements batch processing for efficient node and relationship ingestion.
    • Supports merge and create write strategies.
  • Added docling_graph/db_clients/__init__.py: Exposes the new exporter for import.

2. Configuration Updates

  • Modified docling_graph/config.py:
    • Added Neo4jConfig model to validate connection parameters (uri, username, password, database, batch_size).
    • Updated PipelineConfig to include the neo4j configuration section.
    • Updated export_format validation to accept "neo4j".

3. Pipeline Integration

  • Modified docling_graph/pipeline.py:
    • Integrated Neo4jExporter into the main execution flow.
    • Added logic to initialize the exporter using values from the global configuration when export_format="neo4j" is selected.

4. Dependencies

  • Modified pyproject.toml: Added neo4j to the project dependencies.

Motivation

Previously, users had to export to CSV or Cypher scripts and manually load them into Neo4j. This feature enables:

  • Direct Ingestion: Populates Neo4j instances directly without intermediate files.
  • Production Readiness: Simplifies workflows for deployments requiring real-time or scheduled graph updates.

How to Test

  1. Ensure a Neo4j instance is running (e.g., bolt://localhost:7687).
  2. Update your pipeline configuration:
    PipelineConfig(
        # ... other settings ...
        export_format="neo4j",
        neo4j=Neo4jConfig(
            uri="bolt://localhost:7687",
            username="neo4j",
            password="your_password"
        )
    )
  3. Run the pipeline and verify that nodes and relationships appear in the target database.

Signed-off-by: Sagar-Mondal <sagarandshiva@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant