To make things easier, we provide a starter repository with a preconfigured dltHub project. It contains a working source, pipeline, transformations, and a small dashboard so you can focus on learning the Runtime rather than setting everything up from scratch.
This starter pack includes:
- A dlt pipeline that loads data from the jaffle shop API into a local DuckDB destination.
- A remote destination configured as MotherDuck. You can swap it for any other cloud destination you prefer (for example BigQuery, Snowflake, AWS S3, …).
- A simple Marimo dashboard that you can use to explore and analyze the data.
- A set of custom transformations that are executed after the raw data is loaded.
In this README, we’ll walk through cloning the repo, installing dependencies, connecting to Runtime, and then deploying both pipelines and dashboards.
git clone https://github.com/dlt-hub/runtime-starter-pack.git
cd runtime-starter-packThe starter pack comes with a pyproject.toml that defines all required dependencies:
[project]
name = "runtime-starter-pack"
version = "0.1.0"
requires-python = ">=3.13"
dependencies = [
"dlt[motherduck,workspace,hub]==1.20.0a0",
"marimo>=0.18.2",
"numpy>=2.3.5",
]Install everything with uv:
uv syncActivate the environment:
source .venv/bin/activateIf you are running this tutorial as part of the early access program, you need to create .dlt/secrets.toml file and add your Runtime invite code there:
[runtime]
invite_code="xxx-yyy"Next, configure your destination credentials. The starter pack uses MotherDuck as the destination, but you can switch to any other destination you prefer. Details on configuring credentials for Runtime are available here. Make sure your destination credentials are valid before running pipelines remotely. Below you can find instructions for configuring credentials for MotherDuck destination.
prod.config.toml (for batch jobs running on Runtime):
[destination.fruitshop_destination]
destination_type = "motherduck"prod.secrets.toml (for batch jobs - read/write credentials):
[destination.fruitshop_destination.credentials]
database = "your_database"
password = "your-motherduck-service-token" # Read/write tokenaccess.config.toml (for interactive notebooks):
[destination.fruitshop_destination]
destination_type = "motherduck"access.secrets.toml (for interactive notebooks - read-only credentials):
[destination.fruitshop_destination.credentials]
database = "your_database"
password = "your-motherduck-read-only-token" # Read-only tokenGetting MotherDuck Credentials
- Sign up at motherduck.com
- Go to Settings > Service Tokens
- Create two tokens:
- A read/write token for the
prodprofile- A read-only token for the
accessprofile
🛑 Security Files matching
*.secrets.tomlandsecrets.tomlare gitignored by default. Never commit secrets to version control. The Runtime securely stores your secrets when you sync your configuration.
Authenticate your local workspace with the managed Runtime:
uv run dlt runtime loginThis will:
- Open a browser window.
- Use GitHub OAuth for authentication.
- Link your local workspace to your dltHub Runtime account through automatically generated workspace id. You can find this id in your
config.toml.
Currently, GitHub-based authentication is the only supported method. Additional authentication options will be added later.
For a full list of available commands and options, see the Runtime CLI reference.
dltHub Runtime supports two types of jobs:
-
Batch jobs – Python scripts that are meant to be run once or on a schedule.
- Created with commands like
dlt runtime launch <script>(and scheduled withdlt runtime schedule <script>). - Typical use cases: ELT pipelines, transformation runs, backfills.
- Runs with the
prodprofile.
- Created with commands like
-
Interactive jobs – long-running jobs that serve an interactive notebook or app.
- Started with
dlt runtime <script>. - Typical use cases: Marimo notebooks, dashboards, and (in the future) apps like Streamlit.
- Runs with the
accessprofile.
- Started with
Now let’s deploy and run a pipeline remotely:
uv run dlt runtime launch fruitshop_pipeline.pyThis single command:
- Uploads your code and configuration to Runtime.
- Creates and starts a batch job.
- Streams logs and status, so you can follow the run from your terminal. To run it in deatached mode, use
uv run dlt runtime launch fruitshop_pipeline.py -d
dltHub supports two types of jobs:
- batch job, which are Python scripts, which are supposed to be run once or scheduled
- interactive job, which basically serves the interactive notebook
uv run dlt runtime serve fruitshop_notebook.pyThis command:
- Uploads your code and configuration.
- Starts an interactive notebook session using the access profile.
- Opens the notebook in your browser.
Interactive notebooks use the
accessprofile with read-only credentials, so they are safe for data exploration and dashboarding without the risk of accidental writes. Read more about profiles in the Runtime profiles docs.
Interactive jobs are the building block for serving notebooks, dashboards , streamlit or similar apps (in the future). At the moment, only Marimo is supported. You can share links to these interactive jobs with your colleagues for collaborative exploration.
To run a pipeline on a schedule, use:
uv run dlt runtime schedule fruitshop_pipeline.py "*/10 * * * *"This example schedules the pipeline to run every 10 minutes. Use crontab.guru to build and test your cron expressions.
To cancel an existing schedule:
uv run dlt runtime schedule fruitshop_pipeline.py cancelThe command line is great for development, but the dltHub web UI gives you a bird’s-eye view of everything running on Runtime. Visit dlthub.app to access the dashboard. You will find:
- A list of existing jobs.
- An overview of scheduled runs.
- Visibility into interactive sessions.
- Management actions and workspace settings
Visit dlthub.app to access the web dashboard. The dashboard provides overview of your existing jobs, scheduled and interactive runs and some management and settings.
The dltHub Dashboard lets you see all your pipelines and job runs, inspect job metadata (status, start time, duration, logs, etc.), and access the data in your destination via a SQL interface. This makes it easy to debug issues, check the health of your pipelines, and quickly validate the data that has been loaded.
Interactive jobs such as notebooks and dashboards can be shared via public links. To manage public links:
- Open the context menu of a job in the job list or navigate to the job detail page.
- Click "Manage Public Link".
- Enable the link to generate a shareable URL, or disable it to revoke access.
Anyone with an active public link can view the running notebook or dashboard, even if they don’t have direct Runtime access. This is ideal for sharing dashboards with stakeholders, business users, or other teams.
Raw ingested data is rarely enough. Transformations let you reshape, enrich, and prepare data for analytics and downstream tools. Transformations are useful when you want to aggregate raw data into reporting tables, join multiple tables into enriched datasets, create dimensional models for analytics, and apply business logic to normalize or clean data.
dltHub Transformations let you build new tables or entire datasets from data that has already been ingested using dlt.
Key characteristics:
- Defined in Python functions decorated with
@dlt.hub.transformation. - Can use Python (via Ibis) or pure SQL
- Operate on the destination dataset (
dlt.Dataset) - Executed on the destination compute or locally via DuckDB
You can find full details in the Transformations documentation. Below are a few core patterns to get you started.
Use the @dlt.hub.transformation decorator to define transformations. The function must accept a dlt.Dataset parameter and yield an Ibis table expression or SQL query.
import dlt
import typing
from ibis import ir
@dlt.hub.transformation
def customer_orders(dataset: dlt.Dataset) -> typing.Iterator[ir.Table]:
"""Aggregate statistics about previous customer orders"""
orders = dataset.table("orders").to_ibis()
yield orders.group_by("customer_id").aggregate(
first_order=orders.ordered_at.min(),
most_recent_order=orders.ordered_at.max(),
number_of_orders=orders.id.count(),
)This transformation reads the orders table from the destination, aggregates per customer, and yields a result that can be materialized as a new table.
You can join multiple tables and then aggregate or reshape the data:
import dlt
import typing
import ibis
from ibis import ir
@dlt.hub.transformation
def customer_payments(dataset: dlt.Dataset) -> typing.Iterator[ir.Table]:
"""Customer order and payment info"""
orders = dataset.table("orders").to_ibis()
payments = dataset.table("payments").to_ibis()
yield (
payments.left_join(orders, payments.order_id == orders.id)
.group_by(orders.customer_id)
.aggregate(total_amount=ibis._.amount.sum())
)Here, we join payments with orders and aggregate total payment amounts per customer.
If you prefer, you can also write transformations as raw SQL:
@dlt.hub.transformation
def enriched_purchases(dataset: dlt.Dataset) -> typing.Any:
yield dataset(
"""
SELECT customers.name, purchases.quantity
FROM purchases
JOIN customers
ON purchases.customer_id = customers.id
"""
)This is a good option if your team is more comfortable with SQL or you want to port existing SQL models.
The starter pack includes a predefined jaffle_transformations.py script that:
- Combines two resources: data from the jaffle shop API and payments stored in parquet files.
- Loads them into a local DuckDB (default dev profile).
- Creates aggregations and loads them into the remote destination.
Before running transformations locally, you need to issue a license for the transformations feature:
dlt license issue dlthub.transformationYou can find more details in the license section of the docs.
To run transformations locally (using the default dev profile):
uv run python jaffle_transformations.pyTo run the same transformations against your production destination:
uv run dlt profile prod pin
uv run python jaffle_transformations.pydlt profile prod pinsets prod as the active profile.- The script will now read from and write to the production dataset and credentials.
You can deploy and orchestrate transformations on dltHub Runtime just like any other pipeline:
uv run dlt runtime launch jaffle_transformations.pyThis uploads the transformation script, runs it on managed infrastructure, and streams logs back to your terminal. You can also schedule this job and monitor it via the dltHub UI.