This example demonstrates how to build an index for a GitHub repository using CocoIndex.
- We will ingest a GitHub repository.
- For each file, perform chunking (Tree-sitter) and then embedding.
- We will save the embeddings and the metadata in Postgres with PGVector.
- Create a
.envfile from.env.example, and fill configurations for your GitHub app.
Note: You need to configure the GitHub source with your repository details:
repo_name: The GitHub repository name (e.g., "owner/repo-name")branch: The branch to index (e.g., "main")private_key_path: Path to your private key for authentication
We will match against user-provided text by a SQL query, reusing the embedding operation in the indexing flow.
Install Postgres if you don't have one.
-
Install dependencies:
pip install -e . -
Setup:
cocoindex setup main.py
-
Update index:
cocoindex update main.py
-
Run:
python main.py