[WIP] Testing#1048
Conversation
|
View diff at a glance SELECT
(SELECT count(*) FROM "z_ci_add_additional_mydec_sales_to_vw_pin_sale_default"."vw_pin_sale")
-
(SELECT count(*) FROM "default"."vw_pin_sale")
AS row_difference;
SELECT
COUNT(*) AS sales_matching_missing_list,
COUNT_IF(sale.sale_key IS NULL) AS from_injected_cte,
COUNT_IF(sale.sale_key IS NOT NULL) AS already_in_iasworld
FROM "z_ci_add_additional_mydec_sales_to_vw_pin_sale_default"."vw_pin_sale" AS sale
INNER JOIN "ccao"."additional_mydec_sales" AS ams
ON sale.doc_no = ams.doc_no
|
| WHEN md.line_5_instrument_type = 'Beneficial interest' | ||
| THEN '06' | ||
| ELSE '05' | ||
| END AS instrtyp, |
There was a problem hiding this comment.
To have the new mydec sales pass the sale_filter_deed_type gates, we grab values that the filter understands (01, 02...).
SELECT
md.line_5_instrument_type,
COUNT(*) AS n_all_mydec,
COUNT(ams.doc_no) AS n_in_injection_list
FROM "sale"."mydec" AS md
LEFT JOIN "ccao"."additional_mydec_sales" AS ams
ON REPLACE(md.document_number, 'D', '') = ams.doc_no
GROUP BY md.line_5_instrument_type
ORDER BY n_all_mydec DESC| # | line_5_instrument_type | n_all_mydec | n_in_injection_list |
|---|---|---|---|
| 1 | Warranty Deed | 706392 | 8304 |
| 2 | (NULL) | 396265 | 2 |
| 3 | Trustee Deed | 127690 | 1548 |
| 4 | Special Warranty Deed | 118141 | 1668 |
| 5 | Quit Claim Deed | 16292 | 137 |
| 6 | Judicial Sale | 12217 | 89 |
| 7 | Executor Deed | 8613 | 84 |
| 8 | Administrator's Deed | 6637 | 112 |
| 9 | Deed in Trust | 5330 | 53 |
| 10 | Other | 4240 | 44 |
| 11 | Beneficial interest | 1523 | 15 |
| 12 | Guardian's Deed | 932 | 9 |
| 13 | Limited Warranty Deed | 811 | 9 |
| 14 | Judge's Deed | 551 | 4 |
| 15 | Sheriff's Deed | 361 | 5 |
| . . . |
| -- Non-multisale mydec sales are single-parcel by definition | ||
| CAST(1 AS DECIMAL(4, 0)) AS nopar, | ||
| NULLIF(TRIM(md.seller_name), '') AS oldown, | ||
| NULLIF(TRIM(md.buyer_name), '') AS own1, |
There was a problem hiding this comment.
buyer and seller name technically needed for sales val outputs
| AND cur = 'Y' | ||
| ) AS ias | ||
| ON REPLACE(md.document_number, 'D', '') = ias.doc_no | ||
| WHERE NOT md.is_multisale |
There was a problem hiding this comment.
Multi-sale exclusion is done here, then we put '1' as nopar for compatibility
| FROM {{ source('iasworld', 'sales') }} | ||
| WHERE deactivat IS NULL | ||
| AND cur = 'Y' | ||
| ) AS ias |
There was a problem hiding this comment.
Match ias.doc_No onto the inner joined mydec sales and then we filter AND ias.doc_no IS NULL to remove sales that already are present in iasworld
| NULLIF(TRIM(md.buyer_name), '') AS own1, | ||
| CAST(NULL AS VARCHAR) AS saletype, | ||
| CAST(NULL AS VARCHAR) AS deactivat, | ||
| 'Y' AS cur |
There was a problem hiding this comment.
more stub columns so that the downstream logic doesn't break
| -- Constructing the rows here, upstream of all other logic, means the | ||
| -- additional sales flow through the same dedupe windows and filters as | ||
| -- iasworld sales | ||
| sales_unioned AS ( |
There was a problem hiding this comment.
The idea here is to put mydec sales into the iasworld structure upstream of all the filters
| 'Administrator''s Deed', | ||
| 'Guardian''s Deed' |
There was a problem hiding this comment.
I think this is the right filter for these but I'm not positive
There was a problem hiding this comment.
Okay, this is genuinely great work. Super impressed at the attention to detail you put in here, you caught stuff I probably wouldn't even have. I just have two minor requests:
-
Can we move removing the 'D' in
ccao.additional_mydec_sales.document_numberinto the warehouse script like we do for the normal mydec sales and out ofdefault.vw_pin_sale? Just for the sake of consistency. -
Can we double check that we've mapped deed types as thoroughly as possible? I think using something like the query below could help investigate how it's being done currently:
select distinct sales.instrtyp,
mydec.line_5_instrument_type
from iasworld.sales
left join sale.mydec on sales.instruno = mydec.document_number
where sales.cur = 'Y'
and sales.deactivat is null
and sales.instrtyp in ('01', '02', '03', '04', '05', '06', '99')
and trim(mydec.line_5_instrument_type) != ''
and substr(sales.saledt, 1, 4) = '2025'
order by sales.instrtypThere's definitely some overlap in how these map to each other, but we can just use our better judgement to decide how they should map in those cases.
Co-authored-by: William Ridgeway <10358980+wrridgeway@users.noreply.github.com>
wrridgeway
left a comment
There was a problem hiding this comment.
This looks great! Thanks @wagnerlmichael. We may need to update the deed type mapping if Derrick informs us of anything new, but otherwise I think we're good to do.
I'll drop a bit more review here. Checking to make sure we only have row diffs for 2025:
with old as (
select year,
count(*) as old_count
from default.vw_pin_sale
group by year
),
new as (
select year,
count(*) as new_count
from z_ci_add_additional_mydec_sales_to_vw_pin_sale_default.vw_pin_sale
group by year
)
select old.*,
new.new_count,
abs(old.old_count - new.new_count) as diff
from old
left join new on old.year = new.year
where abs(old.old_count - new.new_count) != 0| year | old_count | new_count | diff |
|---|---|---|---|
| 2025 | 66131 | 76013 | 9882 |
Checking we're still unique by doc_no for non-multisales:
select doc_no,
count(*)
from z_ci_add_additional_mydec_sales_to_vw_pin_sale_default.vw_pin_sale
where not is_multisale
group by doc_no
having count(*) > 1returns nothing.
jeancochrane
left a comment
There was a problem hiding this comment.
A few thoughts on the dbt architecture here! I know dbt refs and sources can be confusing, so I'm happy to chat it out if it would be faster.
I suspect you may just be waiting until later on to add more context, but to be clear: Even though this work is likely temporary and won't get merged to the main branch, we should still give the PR a clear title and description so that we can find this PR in the future and remind ourselves what we did and why we did it.
| LEFT JOIN mydec_sales | ||
| ON unique_sales.doc_no = mydec_sales.doc_no | ||
| LEFT JOIN {{ ref('sale.vw_outlier') }} AS outlier | ||
| LEFT JOIN z_ci_add_additional_mydec_sales_to_vw_pin_sale_sale.vw_outlier AS outlier |
There was a problem hiding this comment.
[Question, non-blocking] When the build-and-test-dbt GitHub workflow builds a staging version of default.vw_pin_sale using the new view definition in this branch, it should automatically resolve {{ ref('sale.vw_outlier') }} to z_ci_add_additional_mydec_sales_to_vw_pin_sale_sale.vw_outlier, since dbt will be running in a CI environment on the add-additional-mydec-sales-to-vw-pin-sale branch. So I wouldn't expect you to have to make this change in order for default.vw_pin_sale to build correctly. Is that not the case here?
There was a problem hiding this comment.
Ah, right. I made the z_dev_miwagne_sale.flag AS sf change first and then just edited the other two in the same way without thinking very hard. I'm sure that is how it will work when I switch it back
| sf.run_id, | ||
| sf.version | ||
| FROM {{ source('sale', 'flag') }} AS sf | ||
| FROM z_dev_miwagne_sale.flag AS sf |
There was a problem hiding this comment.
[Thought, non-blocking] In contrast to my above comment, this hardcoded schema reference is necessary in order for the build to work, for two reasons:
sale.flagis a dbt source (i.e. it's managed outside of the dbt DAG) so{{ source('sale', 'flag') }}will always resolve to thesaleschema, even when dbt runs in a dev or CI environmentz_dev_miwagne_saleis a separate dev environment, so even ifsale.flagwere defined as a dbt model and managed in the dbt DAG, dbt would resolve{{ ref('sale.flag') }}to thez_ci_add_additional_mydec_sales_to_vw_pin_sale_saleschema, notz_dev_miwagne_sale
You may know this already, but I figured I'd drop a comment to make it extra clear!
There was a problem hiding this comment.
2 is very clarifying - that the managed DAG that build-and-test-dbt uses for staging vs prod assets is a separate thing from our hardwired sales val dev environment setup. Is that roughly correct?
| 'has_characteristic_change' VALUE has_characteristic_change | ||
| ) AS review_json -- noqa: enable=CP02,RF02 | ||
| FROM {{ ref('sale.vw_flag') }} AS flag | ||
| FROM z_ci_add_additional_mydec_sales_to_vw_pin_sale_sale.vw_flag AS flag |
There was a problem hiding this comment.
[Question, non-blocking] This ref is the same as the one I commented on in default.vw_pin_sale: In theory, ref() should resolve correctly here, so my question stands about why we need to hardcode the schema name.
There was a problem hiding this comment.
Follow up question - will the ref resolve correctly pointing to the dev table even when there is no diff to the file itself?
|
Just to be clear - I approved the original changes to |
No description provided.