Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
354 changes: 86 additions & 268 deletions AGENTS.md

Large diffs are not rendered by default.

27 changes: 26 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,37 @@
# Change Log

## 1.8.0

- **Author CTEs visually in the Model Wizard.** Models that compile to `WITH ... SELECT ...` open with a draggable CTE list above the SELECT step. Click a CTE to edit its source, columns, filters, and framework artifact overrides in a side panel; the wizard validates the model as you type and surfaces any CTE errors before the Next step.

### Dashboards as Code

- **New `dj.lightdash.restrictedProjects` setting** — flag Lightdash project UUIDs as `block` (the Upload tab refuses with an inline error) or `warn` (upload proceeds only after a confirmation dialog).
- **`Add path to .gitignore` now defaults to on**, overridable via the new `dj.lightdash.defaultAddPathToGitignore` setting (`false` keeps it opt-in). The written entry is now root-anchored (e.g. `/lightdash/`) so a same-named directory nested elsewhere isn't ignored.

### Trino Query Control Center

- **New Query Control Center (`DJ: Query Control Center`)** — a master-detail panel that replaces the Query View for inspecting and triaging Trino queries. A **Live** tab (queries from your active coordinator, or the local Trino CLI when no profile is set, with a "dbt runs only" filter) and a **History** tab both support search, state, and user/source filtering. Selecting a query shows its summary, stage tree, slowest operators, failure details, and SQL, plus **Jump to Model** (opens the matching `.model.json`) and **Analyze with AI**, which saves sanitized JSON under `.dj/diagnostics/` so analyzed queries reopen even after the coordinator evicts them (~15 min).
- **Named Trino connection profiles.** `dj.trino.profiles` and `dj.trino.activeProfile` define coordinator profiles (dev / staging / prod) you switch from the panel or `DJ: Select Trino Connection Profile...`. Each profile resolves its secret at request time from VS Code SecretStorage (set via `DJ: Set Trino Credentials...`), an environment variable, a password file, or your `~/.dbt/profiles.yml` — never plain text in settings — and the panel shows a coordinator status indicator with one-click refresh for expired tokens.
- **New `dj-trino-analyzer` agent skill** (`.agents/skills/dj-trino-analyzer/SKILL.md`, written when `dj.codingAgent` is on) gives a coding agent operator-level heuristics for diagnosing slow or failed Trino queries from the sanitized JSON, plus a bundled Trino QueryInfo field reference (`references/`, verified against the Trino 479 source) for deep dives into the raw `.dj/diagnostics/<id>.full.json` — schema tables, enum gotchas, and ready-to-paste jq recipes. The sanitizer doubles as a tool firewall: payloads containing row data are rejected before they reach disk, so customer data never reaches an LLM prompt.

### Agent skills

- **`convert-sql-to-model` skill renamed to `dj-convert-sql-to-model`** so every DJ skill shares the `dj-` prefix. The stale `.agents/skills/convert-sql-to-model/` folder from earlier releases is removed automatically the next time skills are deployed.
- **Resolve git merge and rebase conflicts the DJ way through your AI assistant.** When `dj.codingAgent` is enabled, a new skill at `.agents/skills/dj-resolve-merge-conflicts/SKILL.md` teaches an IDE agent to merge only the `.model.json` / `.source.json` sources of truth and let the generated `.sql` / `.yml` regenerate instead of hand-merging them. When an incoming branch looks old or built on an older DJ schema, it flags the divergence and offers a guided port of just the models you need instead of a full merge.

### Bug fixes

- **YAML reserved tokens round-trip safely.** Values like `OFF`, `ON`, `YES`, `NO` (and lowercase variants) are now quoted on emit and tolerated on load, so `time_intervals: OFF` no longer turns into `false` in the manifest and crashes sync. Per-column meta failures also name the offending column.
- **Sync errors surface the real cause.** SQL/YML generation failures now show the underlying message instead of always pointing at `expr` syntax.
- **Removed the empty Column Lineage panel.** Column lineage lives in the Data Explorer's Column view; `DJ: Column Lineage` opens it as before.

## 1.7.1

### Iceberg write strategy update

- **Write strategy** — Iceberg incremental writes now use an event-date literal directly instead of creating and querying a temporary table, improving write performance


## 1.7.0

### Adhoc SQL Editor / Query Draft
Expand Down
11 changes: 6 additions & 5 deletions airflow/v2_10/source_etl.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
optimize_run_timeout_minutes,
override_backfill_start,
override_sources,
run_chronological,
schedule_cron,
source_date_limit,
source_date_tasks,
Expand Down Expand Up @@ -376,8 +377,8 @@ def fetch_source_dates(source: SourceExtended, ti=None, **context):
) and event_date_lookback not in event_dates:
event_dates.append(event_date_lookback)

# Sorting in reverse so we always get the most recent dates first
event_dates.sort(reverse=True)
# Sorting in run_chronological order, reverse by default
event_dates.sort(reverse=not run_chronological)

# We'll try with the full list of event dates first, but if this fails we'll cut in half and try again
dbt_source_dates_new_rows = []
Expand Down Expand Up @@ -500,7 +501,7 @@ def fetch_source_runs(source_dates: list[dict], ti=None):
etl_timestamp = ti.xcom_pull(key="etl_timestamp", task_ids="start_etl")

source_runs = build_runs(
source_id_dates_list, etl_timestamp, date_limit=source_date_limit
source_id_dates_list, etl_timestamp, date_limit=source_date_limit, chronological=run_chronological
)

# Perform the merges first, before running the sources, because might limit and/or timeout the source run tasks
Expand Down Expand Up @@ -727,7 +728,7 @@ def fetch_model_runs(model_dates: list[list[dict]], ti=None):
etl_timestamp = ti.xcom_pull(key="etl_timestamp", task_ids="start_etl")

model_runs = build_runs(
model_id_dates_list, etl_timestamp, date_limit=model_date_limit
model_id_dates_list, etl_timestamp, date_limit=model_date_limit, chronological=run_chronological
)

run_models_timestamp = datetime.now(timezone.utc).strftime(
Expand Down Expand Up @@ -799,7 +800,7 @@ def fetch_error_runs(ti=None):
)

error_runs = build_runs(
id_dates_list=model_id_dates_list, etl_timestamp=etl_timestamp, date_limit=1
model_id_dates_list, etl_timestamp, date_limit=1, chronological=run_chronological
)

run_errors_timestamp = datetime.now(timezone.utc).strftime(
Expand Down
5 changes: 3 additions & 2 deletions airflow/v2_10/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ def build_runs(
id_dates_list: list[dict],
etl_timestamp: str,
date_limit: int = None,
chronological: bool = False
):

ids_by_event_date: dict[str, list[str]] = {}
Expand All @@ -226,8 +227,8 @@ def build_runs(
ids.append(id)
ids_by_event_date[event_date] = ids

# Sort in descending order of event date so most recent runs are first
ids_by_event_date = dict(sorted(ids_by_event_date.items(), reverse=True))
# Sorting based on chronological order value
ids_by_event_date = dict(sorted(ids_by_event_date.items(), reverse=not chronological))

event_dates_by_ids: dict[str, list[str]] = {}
for (
Expand Down
1 change: 1 addition & 0 deletions airflow/v2_10/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def var(name: str, default: str = "") -> str:
if var("dj_etl_override_sources")
else []
)
run_chronological: bool = var("dj_etl_run_chronological", "false").lower() == "true"
schedule_cron: str = var("dj_etl_schedule_cron", "0 */6 * * *")
storage_type: str = var("dj_etl_storage_type", "delta_lake")
skip_sources: bool = var("dj_etl_skip_sources", "false").lower() == "true"
Expand Down
11 changes: 6 additions & 5 deletions airflow/v2_7/source_etl.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
optimize_run_timeout_minutes,
override_backfill_start,
override_sources,
run_chronological,
schedule_cron,
source_date_limit,
source_date_tasks,
Expand Down Expand Up @@ -372,8 +373,8 @@ def fetch_source_dates(source: SourceExtended, ti=None, **context):
) and event_date_lookback not in event_dates:
event_dates.append(event_date_lookback)

# Sorting in reverse so we always get the most recent dates first
event_dates.sort(reverse=True)
# Sorting in run_chronological order, reverse by default
event_dates.sort(reverse=not run_chronological)

# We'll try with the full list of event dates first, but if this fails we'll cut in half and try again
dbt_source_dates_new_rows = []
Expand Down Expand Up @@ -496,7 +497,7 @@ def fetch_source_runs(source_dates: list[dict], ti=None):
etl_timestamp = ti.xcom_pull(key="etl_timestamp", task_ids="start_etl")

source_runs = build_runs(
source_id_dates_list, etl_timestamp, date_limit=source_date_limit
source_id_dates_list, etl_timestamp, date_limit=source_date_limit,chronological=run_chronological
)

# Perform the merges first, before running the sources, because might limit and/or timeout the source run tasks
Expand Down Expand Up @@ -718,7 +719,7 @@ def fetch_model_runs(model_dates: list[list[dict]], ti=None):
etl_timestamp = ti.xcom_pull(key="etl_timestamp", task_ids="start_etl")

model_runs = build_runs(
model_id_dates_list, etl_timestamp, date_limit=model_date_limit
model_id_dates_list, etl_timestamp, date_limit=model_date_limit, chronological=run_chronological
)

run_models_timestamp = datetime.now(timezone.utc).strftime(
Expand Down Expand Up @@ -789,7 +790,7 @@ def fetch_error_runs(ti=None):
)

error_runs = build_runs(
id_dates_list=model_id_dates_list, etl_timestamp=etl_timestamp, date_limit=1
id_dates_list=model_id_dates_list, etl_timestamp=etl_timestamp, date_limit=1, chronological=run_chronological
)

run_errors_timestamp = datetime.now(timezone.utc).strftime(
Expand Down
5 changes: 3 additions & 2 deletions airflow/v2_7/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ def build_runs(
id_dates_list: list[dict],
etl_timestamp: str,
date_limit: int = None,
chronological: bool = False
):

ids_by_event_date: dict[str, list[str]] = {}
Expand All @@ -226,8 +227,8 @@ def build_runs(
ids.append(id)
ids_by_event_date[event_date] = ids

# Sort in descending order of event date so most recent runs are first
ids_by_event_date = dict(sorted(ids_by_event_date.items(), reverse=True))
# Sorting based on chronological order value
ids_by_event_date = dict(sorted(ids_by_event_date.items(), reverse=not chronological))

event_dates_by_ids: dict[str, list[str]] = {}
for (
Expand Down
1 change: 1 addition & 0 deletions airflow/v2_7/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def var(name: str, default: str = "") -> str:
if var("dj_etl_override_sources")
else []
)
run_chronological: bool = var("dj_etl_run_chronological", "false").lower() == "true"
schedule_cron: str = var("dj_etl_schedule_cron", "0 */6 * * *")
storage_type: str = var("dj_etl_storage_type", "delta_lake")
skip_sources: bool = var("dj_etl_skip_sources", "false").lower() == "true"
Expand Down
32 changes: 32 additions & 0 deletions docs/SETTINGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Complete guide to configuring the DJ (Data JSON) Framework VS Code extension.
| `lightdash.defaultSqlFilter` | Global default `sql_filter` for lightdash tables | Next sync 🔄 |
| `lightdash.defaultSqlFilterRequiredColumns` | Required columns guard for the global filter | Next sync 🔄 |
| `lightdash.defaultPartitionColumnCaseSensitive` | Set default `case_sensitive` value for partition columns in YAML | Next sync 🔄 |
| `lightdash.defaultAddPathToGitignore` | Default state of the Download tab `.gitignore` checkbox | Next panel ⚡ |
| `lightdash.restrictedProjects` | Block/warn DJ Upload against Lightdash project UUIDs | Next upload ⚡ |
| `aiHintTag` | Tag for AI-generated hints | Next sync 🔄 |
| `codingAgent` | Coding agent integration | Refresh 🔄 |
| `autoGenerateTests` | Auto-generate row count tests | Varies 🔄 |
Expand Down Expand Up @@ -151,6 +153,36 @@ Takes effect on next `DJ: Sync to SQL and YML`.
- When `false` (the default), partition columns are emitted without the auto-injected `case_sensitive` flag. Per-model and per-column `lightdash.case_sensitive` overrides in `.model.json` continue to work in either mode.
- Takes effect on next `DJ: Sync to SQL and YML`.

**`dj.lightdash.defaultAddPathToGitignore`** - Initial state of the Download tab `Add path to .gitignore` checkbox (default: `true`)

```json
{ "dj.lightdash.defaultAddPathToGitignore": true }
```

- Controls whether the `Add path to .gitignore` checkbox on the Dashboards-as-Code **Download** tab starts checked. When checked, downloading appends the configured `dj.lightdash.dashboardsAsCodePath` to the workspace `.gitignore` as a **root-anchored** entry (e.g. `/lightdash/`, inside a managed `# dj` … `# /dj` marker block) so generated YAML stays out of version control without ignoring same-named directories nested elsewhere.
- When `false`, the checkbox starts unchecked (the previous opt-in behaviour).
- This setting only seeds the checkbox's default; users can still toggle it per-download.
- Takes effect the next time the Dashboards-as-Code panel is opened (no resync / refresh needed).

**`dj.lightdash.restrictedProjects`** - Restrict the DJ Dashboards-as-Code Upload tab against specific Lightdash project UUIDs (default: `[]`)

```json
{
"dj.lightdash.restrictedProjects": [
{ "uuid": "prod-uuid-here", "mode": "block", "label": "production" },
{ "uuid": "preview-uuid-here", "mode": "warn", "label": "preview" }
]
}
```

- `mode: "block"` — the Upload tab refuses to spawn `lightdash upload` and shows an inline error on the Project UUID field.
- `mode: "warn"` — the Upload tab shows a confirmation dialog; the upload only runs after explicit acknowledgement.
- `label` is optional and surfaces in the error / confirmation message alongside the UUID.
- Unlisted UUIDs are allowed. Matching is case-insensitive and whitespace-tolerant.
- Enforcement runs in both the webview (pre-flight) and the extension host (defense-in-depth). Direct API callers can't bypass the policy.
- The setting only restricts uploads initiated from the DJ Upload tab. Users with the right Lightdash permissions can still run `lightdash upload` manually from a terminal; DJ has no way to intercept the standalone CLI.
- Takes effect on the next DJ Lightdash upload (no resync / refresh needed).

**`dj.lightdash.defaultSqlFilter`** - Global default `sql_filter` for lightdash tables

**`dj.lightdash.defaultSqlFilterRequiredColumns`** - Columns required for the default to apply
Expand Down
22 changes: 9 additions & 13 deletions docs/integrations/trino-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,34 +54,30 @@ When you create a source, DJ connects to Trino to browse catalogs and retrieve t

## Query Engine Monitoring

The Query Engine view in the sidebar provides real-time monitoring of your Trino cluster activity.
The Query Engine view in the sidebar gives at-a-glance Trino cluster status and a one-click entry point to the full Query Control Center.

**What You Can Monitor:**

- **Nodes** - Active Trino worker nodes and their status (active/inactive)
- **My Queries** - Your currently running queries with states:
- 🟢 FINISHED - Query completed successfully
- 🔵 RUNNING - Query is currently executing
- ⚪ QUEUED - Query waiting to execute
- 🔴 FAILED - Query encountered an error
- **Query Control Center** - Click to open the master-detail webview that lists live and persisted Trino queries with stage/operator stats, an `Analyze with AI` action that writes a sanitized diagnostic JSON, and a `Jump to Model` action that resolves the SQL back to the originating `.model.json`.

**Accessing Query Engine View:**

1. Open DJ sidebar in VS Code (Activity Bar icon)
2. Locate "Query Engine" view (database icon)
3. Expand sections to see:
3. The view shows:
- **Nodes** (with count of active nodes)
- **My Queries** (with count of queries)
- **Query Control Center** shortcut

**What's Monitored:**

This view shows queries executed by DJ when:
The Query Control Center surfaces:

- Running dbt models in Data Explorer
- Compiling models that query Trino
- Creating sources (Trino introspection queries)
- **Live** queries on the active Trino coordinator (polled every 5s; optional "My dbt runs only" filter chip)
- **Recent (persisted)** queries from `.dj/diagnostics/` — survives coordinator in-memory eviction
- For each query: state, wall/CPU time, peak memory, splits, blocked time, data-skew score, largest operator, stage tree, operator table with heuristic chips, the failure block for `FAILED` queries, and the SQL viewer

**Note:** This is a **monitoring view only** - you cannot execute custom SQL queries through this interface. It displays activity from DJ operations that use Trino internally.
**Note:** Custom SQL execution still happens through dbt / the Data Explorer; the Query Engine view itself is a monitoring entry point only.

## Model Execution

Expand Down
10 changes: 8 additions & 2 deletions jest.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,14 @@ const config = {
'@services': ['<rootDir>/src/services'],
'@shared/(.*)': ['<rootDir>/src/shared/$1'],
'@shared': ['<rootDir>/src/shared'],
'@web/(.*)': ['<rootDir>/web/src/shared/$1'],
'@web': ['<rootDir>/web/src/shared'],
'@web/(.*)': ['<rootDir>/web/src/$1'],
'@web': ['<rootDir>/web/src'],
},
// `isolatedModules: true` keeps ts-jest in transpile-only mode for imported
// files, so cross-package type resolution (e.g. the web tree's DOM globals
// and `@web/*` paths) does not need to be wired into the root tsconfig.
transform: {
'^.+\\.tsx?$': ['ts-jest', { isolatedModules: true }],
},
testPathIgnorePatterns: [
'<rootDir>/out/',
Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading