Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions dbt/models/ccao/docs.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# additional_mydec_sales

{% docs table_additional_mydec_sales %}
Document numbers for sales that were recorded in MyDec but never made it into
`iasworld.sales`. Used to inject these sales into `default.vw_pin_sale`
directly from `sale.mydec`.

One-time list generated in June 2026 from the Summary tab of an internal
"Missing Sales" spreadsheet. Loaded by the `ccao-additional_mydec_sales.R`
warehouse script.

**Primary Key**: `doc_no`
{% enddocs %}

# cc_dli_senfrr

{% docs table_cc_dli_senfrr %}
Expand Down
3 changes: 3 additions & 0 deletions dbt/models/ccao/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ sources:
tags:
- load_manual
tables:
- name: additional_mydec_sales
description: '{{ doc("table_additional_mydec_sales") }}'

- name: commercial_valuation
description: '{{ doc("table_commercial_valuation") }}'
tags:
Expand Down
74 changes: 70 additions & 4 deletions dbt/models/default/default.vw_pin_sale.sql
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,71 @@ WITH town_class AS (
AND par.deactivat IS NULL
),

-- doc no's for these sales come from the ccao.additional_mydec_sales table.
-- Constructing the rows here, upstream of all other logic, means the
-- additional sales flow through the same dedupe windows and filters as
-- iasworld sales
sales_unioned AS (

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to put mydec sales into the iasworld structure upstream of all the filters

SELECT
sales.parid,
sales.saledt,
sales.price,
sales.salekey,
sales.instruno,
sales.instrtyp,
sales.nopar,
sales.oldown,
sales.own1,
sales.saletype,
sales.deactivat,
sales.cur
FROM {{ source('iasworld', 'sales') }} AS sales
UNION ALL
SELECT
REPLACE(md.line_1_primary_pin, '-', '') AS parid,
-- Match the exact iasworld saledt string format so that the
-- dedupe windows below, which partition on raw saledt, treat
-- same-day sales from both sources as equal
CONCAT(md.line_4_instrument_date, ' 00:00:00.000') AS saledt,
CAST(md.line_11_full_consideration AS DECIMAL(10, 0)) AS price,
CAST(NULL AS DECIMAL(8, 0)) AS salekey,
md.document_number AS instruno,
-- Map mydec instrument types to iasworld deed type codes (see
-- the sale.deed_type seed) so that these sales pass downstream
-- deed type filters the same way iasworld sales do
CASE
WHEN md.line_5_instrument_type = 'Warranty Deed' THEN '01'
WHEN md.line_5_instrument_type = 'Trustee Deed' THEN '02'
WHEN md.line_5_instrument_type = 'Quit Claim Deed' THEN '03'
WHEN md.line_5_instrument_type = 'Executor Deed' THEN '04'
WHEN md.line_5_instrument_type = 'Beneficial interest'
THEN '06'
ELSE '05'
END AS instrtyp,

@wagnerlmichael wagnerlmichael Jun 12, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have the new mydec sales pass the sale_filter_deed_type gates, we grab values that the filter understands (01, 02...).

SELECT
      md.line_5_instrument_type,
      COUNT(*) AS n_all_mydec,
      COUNT(ams.doc_no) AS n_in_injection_list
  FROM "sale"."mydec" AS md
  LEFT JOIN "ccao"."additional_mydec_sales" AS ams
      ON REPLACE(md.document_number, 'D', '') = ams.doc_no
  GROUP BY md.line_5_instrument_type
  ORDER BY n_all_mydec DESC
# line_5_instrument_type n_all_mydec n_in_injection_list
1 Warranty Deed 706392 8304
2 (NULL) 396265 2
3 Trustee Deed 127690 1548
4 Special Warranty Deed 118141 1668
5 Quit Claim Deed 16292 137
6 Judicial Sale 12217 89
7 Executor Deed 8613 84
8 Administrator's Deed 6637 112
9 Deed in Trust 5330 53
10 Other 4240 44
11 Beneficial interest 1523 15
12 Guardian's Deed 932 9
13 Limited Warranty Deed 811 9
14 Judge's Deed 551 4
15 Sheriff's Deed 361 5
. . .

-- Non-multisale mydec sales are single-parcel by definition
CAST(1 AS DECIMAL(4, 0)) AS nopar,
NULLIF(TRIM(md.seller_name), '') AS oldown,
NULLIF(TRIM(md.buyer_name), '') AS own1,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buyer and seller name technically needed for sales val outputs

CAST(NULL AS VARCHAR) AS saletype,
CAST(NULL AS VARCHAR) AS deactivat,
'Y' AS cur

@wagnerlmichael wagnerlmichael Jun 15, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more stub columns so that the downstream logic doesn't break

FROM {{ source('sale', 'mydec') }} AS md
INNER JOIN {{ source('ccao', 'additional_mydec_sales') }} AS ams
ON md.document_number = ams.doc_no
-- Exclude doc nos already live in iasworld so we keep parity with
-- prod for sales that were ingested normally
LEFT JOIN (
SELECT DISTINCT NULLIF(REPLACE(instruno, 'D', ''), '') AS doc_no
FROM {{ source('iasworld', 'sales') }}
WHERE deactivat IS NULL
AND cur = 'Y'
) AS ias

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match ias.doc_No onto the inner joined mydec sales and then we filter AND ias.doc_no IS NULL to remove sales that already are present in iasworld

ON md.document_number = ias.doc_no
WHERE NOT md.is_multisale

@wagnerlmichael wagnerlmichael Jun 12, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-sale exclusion is done here, then we put '1' as nopar for compatibility

AND md.line_11_full_consideration IS NOT NULL
AND ias.doc_no IS NULL
),

-- "nopar" isn't entirely accurate for sales associated with only one parcel,
-- so we create our own counter
calculated AS (
Expand All @@ -29,7 +94,7 @@ calculated AS (
SELECT DISTINCT
parid,
NULLIF(REPLACE(instruno, 'D', ''), '') AS instruno
FROM {{ source('iasworld', 'sales') }}
FROM sales_unioned
WHERE deactivat IS NULL
AND cur = 'Y'
)
Expand Down Expand Up @@ -128,7 +193,7 @@ unique_sales AS (
sales.instrtyp IN ('03', '04', '06') OR sales.instrtyp IS NULL,
FALSE
) AS sale_filter_deed_type
FROM {{ source('iasworld', 'sales') }} AS sales
FROM sales_unioned AS sales
LEFT JOIN calculated
ON NULLIF(REPLACE(sales.instruno, 'D', ''), '')
= calculated.instruno
Expand All @@ -145,14 +210,15 @@ unique_sales AS (
)
AND tc.township_code IS NOT NULL
AND sales.price IS NOT NULL
)
) AS sales_calculated
-- Only use max price by pin/sale date
WHERE max_price = 1
AND (bad_doc_no = 1 OR is_multisale = TRUE)
),

mydec_sales AS (
SELECT * FROM (
SELECT *
FROM (
SELECT
REPLACE(document_number, 'D', '') AS doc_no,
REPLACE(line_1_primary_pin, '-', '') AS pin,
Expand Down
4 changes: 2 additions & 2 deletions dbt/models/sale/sale.vw_flag.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ SELECT
sf.sv_outlier_reason3,
sf.run_id,
sf.version
FROM {{ source('sale', 'flag') }} AS sf
FROM z_dev_miwagne_sale.flag AS sf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Thought, non-blocking] In contrast to my above comment, this hardcoded schema reference is necessary in order for the build to work, for two reasons:

  1. sale.flag is a dbt source (i.e. it's managed outside of the dbt DAG) so {{ source('sale', 'flag') }} will always resolve to the sale schema, even when dbt runs in a dev or CI environment
  2. z_dev_miwagne_sale is a separate dev environment, so even if sale.flag were defined as a dbt model and managed in the dbt DAG, dbt would resolve {{ ref('sale.flag') }} to the z_ci_add_additional_mydec_sales_to_vw_pin_sale_sale schema, not z_dev_miwagne_sale

You may know this already, but I figured I'd drop a comment to make it extra clear!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 is very clarifying - that the managed DAG that build-and-test-dbt uses for staging vs prod assets is a separate thing from our hardwired sales val dev environment setup. Is that roughly correct?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's right!

INNER JOIN (
SELECT
meta_sale_document_num,
MAX(version) AS max_version
FROM {{ source('sale', 'flag') }}
FROM z_dev_miwagne_sale.flag
GROUP BY meta_sale_document_num
) AS mv
ON sf.meta_sale_document_num = mv.meta_sale_document_num
Expand Down
4 changes: 2 additions & 2 deletions dbt/models/sale/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ models:
equals: >
(
SELECT COUNT(DISTINCT(meta_sale_document_num))
FROM {{ source('sale', 'flag') }}
FROM z_dev_miwagne_sale.flag
)
- unique_combination_of_columns:
name: sale_vw_flag_unique_by_doc_no
Expand Down Expand Up @@ -430,7 +430,7 @@ models:
SELECT COUNT(*)
FROM (
SELECT DISTINCT(meta_sale_document_num) AS doc_no
FROM {{ source('sale', 'flag') }}
FROM z_dev_miwagne_sale.flag
UNION
SELECT DISTINCT(doc_no) AS doc_no
FROM {{ source('sale', 'flag_review') }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
library(arrow)
library(dplyr)
library(openxlsx)

# Declare output paths
AWS_S3_WAREHOUSE_BUCKET <- Sys.getenv("AWS_S3_WAREHOUSE_BUCKET")
output_bucket <- file.path(
AWS_S3_WAREHOUSE_BUCKET,
"ccao", "other", "additional_mydec_sales"
)

input_file <- "O:/CCAODATA/data/additional_mydec_sales/Missing Sales.xlsx"

openxlsx::read.xlsx(input_file, sheet = "Summary") %>%
select(doc_no = `203.Document.Number`) %>%
mutate(
doc_no = gsub("\\D", "", as.character(doc_no)),
loaded_at = as.character(Sys.time())
) %>%
write_parquet(file.path(output_bucket, "additional_mydec_sales.parquet"))