andre-salvati · andre-salvati · Jun 8, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
diff --git a/.github/workflows/onpush.yml b/.github/workflows/onpush.yml
@@ -9,6 +9,7 @@ on:
     paths-ignore:
       - 'README.md'
       - 'CLAUDE.md'
+      - 'CHANGELOG.md'
       - 'docs/**'
   # Manual trigger for re-running CI without a new commit (e.g. after a transient
   # GitHub Actions hiccup that silently drops a push event):

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,13 +2,19 @@
 
 ---
 
+## [#34](https://github.com/andre-salvati/databricks-template/pull/34) · 2026-06-05 · feat: standardize silver/gold field names, fix dashboard KPIs, add total_orders
+
+Dropped `ds_kpi` from the dashboard — all three KPI counters (Total Value, Total Orders, Number of Customers) now bind to `ds_orders` with aggregate expressions so all five filters update them; added a third KPI tile for Total Orders (`COUNT DISTINCT order_id`).
+Standardized field names across silver (`curated.order_enriched`) and gold (`report.order_agg`) following four rules: `{entity}_id` suffix, entity-qualified names, `item_*` prefix for item-level fields, no abbreviations; `date` is now cast to `DateType` in silver.
+Added `order_enriched_schema` and `order_agg_schema` to `commonSchemas.py` as canonical schemas for silver and gold; all tests and the integration validator import from there instead of inlining definitions.
+
+---
+
 ## [#33](https://github.com/andre-salvati/databricks-template/pull/33) · 2026-06-04 · feat: AI/BI dashboard, country in gold layer, randomized seed data
 
-Added `country` to `curated.order_enriched` and `report.order_agg` (and their SDP equivalents) so the gold layer carries the full customer dimension needed for country-based reporting; unit tests updated accordingly.
-Added an AI/BI (Lakeview) dashboard with three line charts (total value by date × country, by date × product, by date × category) and a global filter page (date range, country, customer, product, category); uses `make truncate env=X yes=--yes` before first post-deploy run to handle the schema change to `report.order_agg`.
-Dashboard JSON (`resources/orders_dashboard.lvdash.json`) and its DAB resource entry are generated by `sdk_generate_template_job.py` at deploy time with the target catalog embedded — both files are gitignored.
-Completed README documentation: added "Databricks Dashboards" to the Technologies section, added a dashboard screenshot block, and replaced the placeholder dashboard Features bullet with a full description of the charts and filter panel.
-Improved seed data chart visibility: customers are assigned a non-uniform country distribution (US=200, UK=100, DE=50, FR=50, BR=30, CA=25, AU=20, JP=15, MX=7, IN=3) so country lines are clearly separated; `total_item` now scales with `prod_category_id` (category × $15 base + $10 noise), producing a ~6× spread across categories visible in the category chart.
+Added `country` to `curated.order_enriched` and `report.order_agg` (and SDP equivalents) so the gold layer carries the full customer dimension needed for country-based reporting.
+Added an AI/BI (Lakeview) dashboard with three line charts (total value by date × country, product, and category) and a global filter page; dashboard JSON is generated by `sdk_generate_template_job.py` at deploy time with the target catalog embedded and is gitignored.
+Improved seed data chart visibility with a non-uniform country distribution and `total_item` scaling with `prod_category_id` (category × $15 base + $10 noise), producing a ~6× spread across categories.
 
 ---
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -95,8 +95,8 @@ Medallion schemas (`MEDALLION_SCHEMAS` in `config.py`):
 
 Each task's input/output tables are **hardcoded** in the task module (e.g. `raw.customer` → `curated.order_enriched`). The medallion layer is a semantic contract, not a runtime parameter — this is the dbt `ref()` pattern. Don't parameterize the layer; if a task genuinely needs a configurable target, that's a different task.
 
-`curated.order_enriched` columns: `name, country, id_customer, id_order, total, date, product_id, prod_category_id, seq, desc_item, qty, total_item`
-`report.order_agg` columns: `name, country, date, product_id, prod_category_id, total_qty, total_value`
+`curated.order_enriched` columns: `customer_name, country, customer_id, order_id, order_total, order_date (DateType), product_id, product_category_id, item_seq, item_description, item_quantity, item_total`
+`report.order_agg` columns: `customer_name, country, order_date (DateType), product_id, product_category_id, total_quantity, total_value, total_orders`
 
 ### Job-level parameters (runtime, overridable per-run)
 

diff --git a/scripts/sdk_generate_template_job.py b/scripts/sdk_generate_template_job.py
@@ -480,41 +480,15 @@ def _build_dashboard_json(catalog: str) -> dict:
     """
     return {
         "datasets": [
-            {
-                "name": "ds_kpi",
-                "displayName": "KPIs",
-                "queryLines": [
-                    f"SELECT ROUND(SUM(total_item), 2) AS total_value, "
-                    f"COUNT(DISTINCT id_order) AS num_orders, "
-                    f"COUNT(DISTINCT id_customer) AS num_customers "
-                    f"FROM {catalog}.curated.order_enriched "
-                    f"WHERE date BETWEEN :date_range.min AND :date_range.max"
-                ],
-                "parameters": [
-                    {
-                        "keyword": "date_range",
-                        "displayName": "Date Range",
-                        "dataType": "DATE",
-                        "complexType": "RANGE",
-                        "defaultSelection": {
-                            "range": {
-                                "dataType": "DATE",
-                                "min": {"value": "now-1y"},
-                                "max": {"value": "now"},
-                            }
-                        },
-                    }
-                ],
-            },
             {
                 "name": "ds_orders",
                 "displayName": "Orders",
                 "queryLines": [
-                    f"SELECT CAST(date AS DATE) AS order_date, country, name AS customer, "
-                    f"CAST(product_id AS STRING) AS product_id, CAST(prod_category_id AS STRING) AS category_id, "
-                    f"SUM(total_value) AS total_value "
+                    f"SELECT order_date, country, customer_name AS customer, "
+                    f"CAST(product_id AS STRING) AS product_id, CAST(product_category_id AS STRING) AS category_id, "
+                    f"SUM(total_value) AS total_value, SUM(total_orders) AS total_orders "
                     f"FROM {catalog}.report.order_agg "
-                    f"WHERE date BETWEEN :date_range.min AND :date_range.max "
+                    f"WHERE order_date BETWEEN :date_range.min AND :date_range.max "
                     f"GROUP BY 1, 2, 3, 4, 5"
                 ],
                 "parameters": [
@@ -566,9 +540,9 @@ def _build_dashboard_json(catalog: str) -> dict:
                                 {
                                     "name": "main_query",
                                     "query": {
-                                        "datasetName": "ds_kpi",
-                                        "fields": [{"name": "total_value", "expression": "`total_value`"}],
-                                        "disaggregated": True,
+                                        "datasetName": "ds_orders",
+                                        "fields": [{"name": "total_value", "expression": "SUM(`total_value`)"}],
+                                        "disaggregated": False,
                                     },
                                 }
                             ],
@@ -583,22 +557,22 @@ def _build_dashboard_json(catalog: str) -> dict:
                     },
                     {
                         "widget": {
-                            "name": "kpi-num-orders",
+                            "name": "kpi-total-orders",
                             "queries": [
                                 {
                                     "name": "main_query",
                                     "query": {
-                                        "datasetName": "ds_kpi",
-                                        "fields": [{"name": "num_orders", "expression": "`num_orders`"}],
-                                        "disaggregated": True,
+                                        "datasetName": "ds_orders",
+                                        "fields": [{"name": "total_orders", "expression": "SUM(`total_orders`)"}],
+                                        "disaggregated": False,
                                     },
                                 }
                             ],
                             "spec": {
                                 "version": 2,
                                 "widgetType": "counter",
-                                "encodings": {"value": {"fieldName": "num_orders", "displayName": "Number of Orders"}},
-                                "frame": {"title": "Number of Orders", "showTitle": True},
+                                "encodings": {"value": {"fieldName": "total_orders", "displayName": "Total Orders"}},
+                                "frame": {"title": "Total Orders", "showTitle": True},
                             },
                         },
                         "position": {"x": 2, "y": 2, "width": 2, "height": 3},
@@ -610,9 +584,11 @@ def _build_dashboard_json(catalog: str) -> dict:
                                 {
                                     "name": "main_query",
                                     "query": {
-                                        "datasetName": "ds_kpi",
-                                        "fields": [{"name": "num_customers", "expression": "`num_customers`"}],
-                                        "disaggregated": True,
+                                        "datasetName": "ds_orders",
+                                        "fields": [
+                                            {"name": "num_customers", "expression": "COUNT(DISTINCT `customer`)"}
+                                        ],
+                                        "disaggregated": False,
                                     },
                                 }
                             ],
@@ -784,22 +760,13 @@ def _build_dashboard_json(catalog: str) -> dict:
                                         "disaggregated": False,
                                     },
                                 },
-                                {
-                                    "name": "q_date_kpi",
-                                    "query": {
-                                        "datasetName": "ds_kpi",
-                                        "parameters": [{"name": "date_range", "keyword": "date_range"}],
-                                        "disaggregated": False,
-                                    },
-                                },
                             ],
                             "spec": {
                                 "version": 2,
                                 "widgetType": "filter-date-range-picker",
                                 "encodings": {
                                     "fields": [
                                         {"parameterName": "date_range", "queryName": "q_date"},
-                                        {"parameterName": "date_range", "queryName": "q_date_kpi"},
                                     ]
                                 },
                                 "frame": {"showTitle": True, "title": "Date Range"},

diff --git a/src/template/commonSchemas.py b/src/template/commonSchemas.py
@@ -1,6 +1,9 @@
 from pyspark.sql.types import (
+    DateType,
+    DoubleType,
     FloatType,
     IntegerType,
+    LongType,
     StringType,
     StructField,
     StructType,
@@ -34,3 +37,33 @@
         StructField("total_item", FloatType(), True),
     ]
 )
+
+order_enriched_schema = StructType(
+    [
+        StructField("customer_name", StringType(), True),
+        StructField("country", StringType(), True),
+        StructField("customer_id", IntegerType(), True),
+        StructField("order_id", IntegerType(), True),
+        StructField("order_total", FloatType(), True),
+        StructField("order_date", DateType(), True),
+        StructField("product_id", IntegerType(), True),
+        StructField("product_category_id", IntegerType(), True),
+        StructField("item_seq", IntegerType(), True),
+        StructField("item_description", StringType(), True),
+        StructField("item_quantity", IntegerType(), True),
+        StructField("item_total", FloatType(), True),
+    ]
+)
+
+order_agg_schema = StructType(
+    [
+        StructField("customer_name", StringType(), True),
+        StructField("country", StringType(), True),
+        StructField("order_date", DateType(), True),
+        StructField("product_id", IntegerType(), True),
+        StructField("product_category_id", IntegerType(), True),
+        StructField("total_quantity", LongType(), True),
+        StructField("total_value", DoubleType(), True),
+        StructField("total_orders", LongType(), True),
+    ]
+)
diff --git a/src/template/job1/generate_orders.py b/src/template/job1/generate_orders.py
@@ -12,18 +12,18 @@ def enrich_order(self, df_customer, df_order, df_order_item):
             df_order_item.join(df_order, df_order_item["id_order"] == df_order["id"])
             .join(df_customer, df_order["id_customer"] == df_customer["id"])
             .select(
-                "name",
+                df_customer["name"].alias("customer_name"),
                 "country",
-                "id_customer",
-                "id_order",
-                "total",
-                "date",
+                df_order["id_customer"].alias("customer_id"),
+                df_order_item["id_order"].alias("order_id"),
+                df_order["total"].alias("order_total"),
+                df_order["date"].cast("date").alias("order_date"),
                 "product_id",
-                "prod_category_id",
-                "seq",
-                "desc_item",
-                "qty",
-                "total_item",
+                df_order["prod_category_id"].alias("product_category_id"),
+                df_order_item["seq"].alias("item_seq"),
+                df_order_item["desc_item"].alias("item_description"),
+                df_order_item["qty"].alias("item_quantity"),
+                df_order_item["total_item"].alias("item_total"),
             )
         )
 

diff --git a/src/template/job1/generate_orders_agg.py b/src/template/job1/generate_orders_agg.py
@@ -1,4 +1,4 @@
-from pyspark.sql.functions import sum
+from pyspark.sql.functions import countDistinct, sum
 
 from ..baseTask import BaseTask
 
@@ -10,9 +10,10 @@ def __init__(self, config):
     def aggregate_orders(self, df_order):
         # TODO code your transformations here...
 
-        return df_order.groupBy("name", "country", "date", "product_id", "prod_category_id").agg(
-            sum("qty").alias("total_qty"),
-            sum("total_item").alias("total_value"),
+        return df_order.groupBy("customer_name", "country", "order_date", "product_id", "product_category_id").agg(
+            sum("item_quantity").alias("total_quantity"),
+            sum("item_total").alias("total_value"),
+            countDistinct("order_id").alias("total_orders"),
         )
 
     def run(self):

diff --git a/src/template/job1_sdp/transforms.py b/src/template/job1_sdp/transforms.py
@@ -26,24 +26,25 @@ def enrich_order(df_customer: DataFrame, df_order: DataFrame, df_order_item: Dat
 
     Returns:
         Enriched DataFrame with columns:
-        name, country, id_customer, id_order, total, date, product_id, prod_category_id, seq, desc_item, qty, total_item
+        customer_name, country, customer_id, order_id, order_total, order_date, product_id,
+        product_category_id, item_seq, item_description, item_quantity, item_total
     """
     return (
         df_order_item.join(df_order, df_order_item["id_order"] == df_order["id"])
         .join(df_customer, df_order["id_customer"] == df_customer["id"])
         .select(
-            "name",
+            df_customer["name"].alias("customer_name"),
             "country",
-            "id_customer",
-            "id_order",
-            "total",
-            "date",
+            df_order["id_customer"].alias("customer_id"),
+            df_order_item["id_order"].alias("order_id"),
+            df_order["total"].alias("order_total"),
+            df_order["date"].cast("date").alias("order_date"),
             "product_id",
-            "prod_category_id",
-            "seq",
-            "desc_item",
-            "qty",
-            "total_item",
+            df_order["prod_category_id"].alias("product_category_id"),
+            df_order_item["seq"].alias("item_seq"),
+            df_order_item["desc_item"].alias("item_description"),
+            df_order_item["qty"].alias("item_quantity"),
+            df_order_item["total_item"].alias("item_total"),
         )
     )
 
@@ -58,10 +59,11 @@ def aggregate_orders(df_order_enriched: DataFrame) -> DataFrame:
         df_order_enriched: curated.order_enriched
 
     Returns:
-        DataFrame with columns: name, country, date, product_id, prod_category_id,
-        total_qty (LongType), total_value (DoubleType)
+        DataFrame with columns: customer_name, country, order_date, product_id,
+        product_category_id, total_quantity (LongType), total_value (DoubleType), total_orders (LongType)
     """
-    return df_order_enriched.groupBy("name", "country", "date", "product_id", "prod_category_id").agg(
-        F.sum("qty").alias("total_qty"),
-        F.sum("total_item").alias("total_value"),
+    return df_order_enriched.groupBy("customer_name", "country", "order_date", "product_id", "product_category_id").agg(
+        F.sum("item_quantity").alias("total_quantity"),
+        F.sum("item_total").alias("total_value"),
+        F.countDistinct("order_id").alias("total_orders"),
     )