This example demonstrates how to upload evaluation results you've already generated to Galileo for analysis, visualization, and comparison.
Sometimes you've already run evaluations—whether with a different tool, offline, or in the past—and you want to centralize and visualize those results in Galileo. This example shows you how to:
- ✅ Upload historical evaluation data to Galileo experiments
- ✅ Reconstruct full execution traces for debugging
- ✅ Compute Galileo metrics on existing results
This is particularly useful when:
- You have legacy evaluation data to migrate to Galileo
- You ran evaluations with custom tooling and want unified visualization
- You need to analyze past model behavior with Galileo's evaluation metrics
The Problem: Galileo v2 experiments typically run your prompts live, but sometimes you already have the results and just want to upload them.
The Solution: This example takes your pre-existing evaluation data (questions, contexts, LLM responses, ground truth) and uploads it to Galileo as a completed experiment with full tracing.
How it Works:
- Your JSON file contains complete evaluation records (question, context chunks array, LLM answer, ground truth)
- A Galileo dataset is created with inputs and expected outputs
- An experiment "replays" your results, reconstructing execution traces with proper chunk attribution
- Galileo computes metrics and provides full visualization
Your evaluation data should be in JSON format with the following structure:
[
{
"question": "Your input/question text",
"context": ["chunk1", "chunk2", "chunk3"], // Array of context chunks
"llm_answer": "The response your model generated",
"ground_truth_answer": "The expected correct answer",
"model": "gpt-4o" // Optional: specify the model used
}
]Required fields:
question- The input to your systemllm_answer- The response your system generatedground_truth_answer- The expected/correct answer
Optional fields:
context- Array of retrieved context chunks (for RAG systems).model- Model identifier (defaults to "gpt-4o" if not specified)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtCopy .env.example to .env and add your Galileo credentials:
cp .env.example .envEdit .env:
GALILEO_API_KEY=your_api_key_here
GALILEO_CONSOLE_URL=https://app.galileo.ai
GALILEO_PROJECT=your_project_name
Getting your Galileo credentials:
- Log in to Galileo Console
- Navigate to Settings → API Keys
- Create a new API key or copy an existing one
- Create or select a project for your experiments
Place your evaluation results in dataset.json following the format above. See the included example file with space mission support Q&A data.
python upload_existing_results.pyThe script will:
- ✅ Load your evaluation data from
dataset.json - ✅ Create or retrieve a Galileo dataset
- ✅ Upload an experiment with full trace reconstruction
- ✅ Provide a link to view results in the Galileo console
Edit dataset.json with your own evaluation data. The example uses space mission support Q&A, but you can use any domain.
In upload_existing_results.py, modify the metrics parameter in upload_experiment():
from galileo.schema.metrics import GalileoScorers
custom_metrics = [
GalileoScorers.ground_truth_adherence,
GalileoScorers.context_adherence,
GalileoScorers.correctness,
# Add any other Galileo metrics you want
]
upload_experiment(
dataset=dataset,
evaluation_data_path="dataset.json",
project_name=project_name,
run_name="my-experiment",
metrics=custom_metrics # Use your custom metrics
)After running the script, your Galileo project will contain:
- Dataset: Your questions and ground truth answers
- Experiment Run: Complete execution with:
- Individual traces for each evaluation
- Retriever spans
- LLM spans with prompts and responses
- Computed metrics (adherence, completeness, correctness, etc.)
- Make sure you've created a
.envfile with your Galileo credentials - Verify all three required variables are set:
GALILEO_API_KEY,GALILEO_CONSOLE_URL,GALILEO_PROJECT
- Ensure your JSON file has unique questions
- Check that there are no extra whitespace or formatting differences
- Make sure you've activated your virtual environment
- Run
pip install -r requirements.txtto install all dependencies
- The script will automatically use existing datasets with the same name
- To create a fresh dataset, change
DATASET_NAMEin the script