This demo app shows how to configure Galileo to monitor and evaluate a multi-agent app built using LangGraph.
In this folder you will find 2 versions of the app:
- A before version that contains the app without any evaluations
- An after version that contains the app with evaluations
To learn how to add evaluations, check out the Add evaluations to a multi-agent LangGraph application cookbook in the Galileo documentation.
This app is a chatbot for the fictional financial services company, Brahe Bank. You can use the bot to ask about:
- Information on the current credit card offers, and their terms and conditions
- Information on your credit score (this is hard coded to 550)
- More coming soon!
This app uses:
- LangGraph to orchestrate agents
- Chainlit to provide a UI
- Pinecone as a vector database
This app has a number of agents, orchestrated by a supervisor agent.
This agent provides information on the available credit cards.
---
config:
flowchart:
curve: linear
---
graph TD;
__start__([<p>Start</p>]):::first
agent(credit card information agent)
tools(tools)
pinecone_tool(pinecone retrieval tool)
__end__([<p>End</p>]):::last
__start__ --> agent;
agent -.-> __end__;
agent -.-> tools;
tools --> agent;
tools -.-> pinecone_tool;
pinecone_tool --> tools;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
This agent provides information on a hard coded credit score, with is high enough for the Orbit credit card, but not enough for the Celestial credit card.
---
config:
flowchart:
curve: linear
---
graph TD;
__start__([<p>__start__</p>]):::first
agent(agent)
tools(tools)
__end__([<p>__end__</p>]):::last
__start__ --> agent;
agent -.-> __end__;
agent -.-> tools;
tools --> agent;
tools -.-> credit_score_tool;
credit_score_tool --> tools;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
---
config:
flowchart:
curve: linear
---
graph TD;
__start__([<p>__start__</p>]):::first
brahe-bank-supervisor-agent(brahe-bank-supervisor-agent)
credit-card-agent(credit-card-agent)
credit-score-agent(credit-score-agent)
__end__([<p>__end__</p>]):::last
__start__ --> brahe-bank-supervisor-agent;
brahe-bank-supervisor-agent -.-> __end__;
brahe-bank-supervisor-agent -.-> credit-card-agent;
brahe-bank-supervisor-agent -.-> credit-score-agent;
credit-card-agent --> brahe-bank-supervisor-agent;
credit-score-agent --> brahe-bank-supervisor-agent;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
To see traces in Galileo, you need to run the after version of the app.
To run the app, you need the following:
- A Galileo account, with a project created
- A Pinecone account
- An OpenAI API Key
-
Copy the
.env.examplefile to.env -
Fill in the values
For the Galileo values, you MUST create the project up front, but the log stream does not need to be created, it will be created automatically
You can install the dependencies into a virtual environment using uv.
uv sync --devPinecone is used to store documents that different agents can use. There is a helper script to create indexes and upload the documents.
python ./scripts/setup_pinecone.pyThis will take a few seconds and a successful run should look like:
Loading documents for credit-card-information folder...
...
✅ Document processing and upload complete!
To launch the app, you can use the chainlit package that you just installed:
chainlit run app.py -wThis will start the app, and launch it on localhost:8000. The -w flag will watch for code changes, and reload if these are made, so you avoid restarting the app if you make code changes.
This project also includes a launch.json configured to debug the app in VS Code.
Once you have interacted with the app, traces will appear in Galileo. Log into the Galileo console, and you will see your traces.
From there you can configure the metrics you are interested in. Once metrics are enabled, you can have more conversations to see the evaluations.
This project also includes a unit test to run the chatbot with a set of defined prompts, evaluating the prompts for action advancement, action completion, tool selection quality, and tool errors, only passing the test if both metrics score an average of 100% (or 0% for tool errors) over all the entries in the dataset.
This is run using the Galileo experiments framework - allowing you to run any code as an experiment against a fixed dataset of prompts. This mechanism allows you to run AI applications, from simple to complex, under test conditions with a defined set of inputs. You can then use the results of evaluations run against your app to help with model selection or prompt engineering, as well as validating your application as part of a CI/CD pipeline.
You can run the unit test by running the following command inside your virtual environment:
python -m pytest test.pyThis will run the single test which will:
- Look in your project for a dataset, creating it if it doesn't exist
- Call the agent inside a call to
run_experiment, passing each row from the dataset in as inputs - Poll the experiment until it has finished and the metrics are calculated
- Check that all the metrics return 100% (or 0% for tool errors), failing if they do not
To see the benefits of this unit test, after running it, check the insights in Galileo to fix up the agent system prompts. For example, the system prompt for the supervisor agent doesn't suggest using the credit score tool to answer questions on credit score.
