Name	Name	Last commit message	Last commit date
parent directory ..
.env.example	.env.example
README.md	README.md
dataset.json	dataset.json
requirements.txt	requirements.txt
upload_existing_results.py	upload_existing_results.py

Upload Existing Evaluation Results to Galileo

This example demonstrates how to upload evaluation results you've already generated to Galileo for analysis, visualization, and comparison.

Use Case

Sometimes you've already run evaluations—whether with a different tool, offline, or in the past—and you want to centralize and visualize those results in Galileo. This example shows you how to:

✅ Upload historical evaluation data to Galileo experiments
✅ Reconstruct full execution traces for debugging
✅ Compute Galileo metrics on existing results

This is particularly useful when:

You have legacy evaluation data to migrate to Galileo
You ran evaluations with custom tooling and want unified visualization
You need to analyze past model behavior with Galileo's evaluation metrics

What This Example Does

The Problem: Galileo v2 experiments typically run your prompts live, but sometimes you already have the results and just want to upload them.

The Solution: This example takes your pre-existing evaluation data (questions, contexts, LLM responses, ground truth) and uploads it to Galileo as a completed experiment with full tracing.

How it Works:

Your JSON file contains complete evaluation records (question, context chunks array, LLM answer, ground truth)
A Galileo dataset is created with inputs and expected outputs
An experiment "replays" your results, reconstructing execution traces with proper chunk attribution
Galileo computes metrics and provides full visualization

Data Format

Your evaluation data should be in JSON format with the following structure:

[
  {
    "question": "Your input/question text",
    "context": ["chunk1", "chunk2", "chunk3"], // Array of context chunks
    "llm_answer": "The response your model generated",
    "ground_truth_answer": "The expected correct answer",
    "model": "gpt-4o" // Optional: specify the model used
  }
]

Required fields:

question - The input to your system
llm_answer - The response your system generated
ground_truth_answer - The expected/correct answer

Optional fields:

context - Array of retrieved context chunks (for RAG systems).
model - Model identifier (defaults to "gpt-4o" if not specified)

Setup

1. Install Dependencies

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure Environment

Copy .env.example to .env and add your Galileo credentials:

cp .env.example .env

Edit .env:

GALILEO_API_KEY=your_api_key_here
GALILEO_CONSOLE_URL=https://app.galileo.ai
GALILEO_PROJECT=your_project_name

Getting your Galileo credentials:

Log in to Galileo Console
Navigate to Settings → API Keys
Create a new API key or copy an existing one
Create or select a project for your experiments

3. Prepare Your Data

Place your evaluation results in dataset.json following the format above. See the included example file with space mission support Q&A data.

Running the Example

python upload_existing_results.py

The script will:

✅ Load your evaluation data from dataset.json
✅ Create or retrieve a Galileo dataset
✅ Upload an experiment with full trace reconstruction
✅ Provide a link to view results in the Galileo console

Customization

Changing the Dataset

Edit dataset.json with your own evaluation data. The example uses space mission support Q&A, but you can use any domain.

Adjusting Metrics

In upload_existing_results.py, modify the metrics parameter in upload_experiment():

from galileo.schema.metrics import GalileoScorers

custom_metrics = [
    GalileoScorers.ground_truth_adherence,
    GalileoScorers.context_adherence,
    GalileoScorers.correctness,
    # Add any other Galileo metrics you want
]

upload_experiment(
    dataset=dataset,
    evaluation_data_path="dataset.json",
    project_name=project_name,
    run_name="my-experiment",
    metrics=custom_metrics  # Use your custom metrics
)

What You'll See in Galileo

After running the script, your Galileo project will contain:

Dataset: Your questions and ground truth answers
Experiment Run: Complete execution with:
- Individual traces for each evaluation
- Retriever spans
- LLM spans with prompts and responses
- Computed metrics (adherence, completeness, correctness, etc.)

Troubleshooting

"Missing environment variables"

Make sure you've created a .env file with your Galileo credentials
Verify all three required variables are set: GALILEO_API_KEY, GALILEO_CONSOLE_URL, GALILEO_PROJECT

"Question not found in evaluation data"

Ensure your JSON file has unique questions
Check that there are no extra whitespace or formatting differences

Import errors

Make sure you've activated your virtual environment
Run pip install -r requirements.txt to install all dependencies

"Dataset already exists"

The script will automatically use existing datasets with the same name
To create a fresh dataset, change DATASET_NAME in the script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Upload Existing Evaluation Results to Galileo

Use Case

What This Example Does

Data Format

Setup

1. Install Dependencies

2. Configure Environment

3. Prepare Your Data

Running the Example

Customization

Changing the Dataset

Adjusting Metrics

What You'll See in Galileo

Troubleshooting

"Missing environment variables"

"Question not found in evaluation data"

Import errors

"Dataset already exists"

Learn More

FilesExpand file tree

upload_experiment

Directory actions

More options

Directory actions

More options

Latest commit

History

upload_experiment

Folders and files

parent directory

README.md

Upload Existing Evaluation Results to Galileo

Use Case

What This Example Does

Data Format

Setup

1. Install Dependencies

2. Configure Environment

3. Prepare Your Data

Running the Example

Customization

Changing the Dataset

Adjusting Metrics

What You'll See in Galileo

Troubleshooting

"Missing environment variables"

"Question not found in evaluation data"

Import errors

"Dataset already exists"

Learn More