I think this could be a good senior + junior project
Problem:
Every year we ask Data Integrity to review missing or suspicious condo characteristics before condo modeling begins. We document this in the recurring condo-chars-prep process. Currently, this process flags a few kind of condo PINs that we recommend for review, including QC flag for possible issues as identified by our intern's work.
Our intern's work identifying outlier condo characteristics is a static excel output.
But as condo characteristics update and change, we should update our list of outlier condo characteristics instead of relying on a static list.
Goal:
Let's formalize, automate, and document condos with outlier or missing data. We should use Caroline's code for outlier detection to construct a view of current condo units with outlier or missing data such that it can be pulled on demand, or as a recurring request.
This could look like
- a new process in recurring-data-requests with code that detects characteristics outliers leveraging Caroline's code as well as missing data. It should produce an excel workbook for review and for easy re-ingestion into athena.
- a new athena view of condo units recommended for review, ready for export or transformation into a Tableau-delivered product?
I think this could be a good senior + junior project
Problem:
Every year we ask Data Integrity to review missing or suspicious condo characteristics before condo modeling begins. We document this in the recurring condo-chars-prep process. Currently, this process flags a few kind of condo PINs that we recommend for review, including QC flag for possible issues as identified by our intern's work.
Our intern's work identifying outlier condo characteristics is a static excel output.
But as condo characteristics update and change, we should update our list of outlier condo characteristics instead of relying on a static list.
Goal:
Let's formalize, automate, and document condos with outlier or missing data. We should use Caroline's code for outlier detection to construct a view of current condo units with outlier or missing data such that it can be pulled on demand, or as a recurring request.
This could look like