-
Notifications
You must be signed in to change notification settings - Fork 132
Pennsylvania education #1890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Pennsylvania education #1890
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,111 @@ | ||||||||||||||||||||||||||||||
| #### Copyright 2025 Google LLC | ||||||||||||||||||||||||||||||
| #### | ||||||||||||||||||||||||||||||
| #### Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||||||||||||||||||||||||
| #### you may not use this file except in compliance with the License. | ||||||||||||||||||||||||||||||
| #### | ||||||||||||||||||||||||||||||
| #### http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||||||||||||||||||
| #### | ||||||||||||||||||||||||||||||
| #### Unless required by applicable law or agreed to in writing, software | ||||||||||||||||||||||||||||||
| #### distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||||||||||||||||||||||
| #### WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||||||||||||||||||||||
| #### See the License for the specific language governing permissions and | ||||||||||||||||||||||||||||||
| #### limitations under the License. | ||||||||||||||||||||||||||||||
| ----- | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ## Pennsylvania_education Import | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Dataset related to Pennsylvania's Education at county level. | ||||||||||||||||||||||||||||||
| ----- | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| **Provenance Description:** | ||||||||||||||||||||||||||||||
| Data assets within this catalog are authored and maintained by individual Commonwealth agencies, which serve as the authoritative sources for their respective domains. The portal, managed by the Office of Administration, provides a transparent audit trail by documenting original publication dates, metadata update frequencies, and the specific departmental "stewards" responsible for the data's accuracy and integrity. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ### How to Use | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| The workflow for this data import involves two main steps: downloading the necessary files and then processing them. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| #### Step 1: Download the Data | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| - **Source:** [Pennsylvania_Education](https://data.pa.gov/browse?sortBy=relevance&page=1&pageSize=20) | ||||||||||||||||||||||||||||||
| - **Description:** The provided URL links to the Education data category within the Commonwealth of Pennsylvania’s open data repository. This portal serves as a centralized clearinghouse for public records, statistics, and geospatial data managed by the Pennsylvania Department of Education (PDE) and related agencies. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| To fetch the necessary data files, you'll need to run download script `download_script.py`. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| The download_script will download below mentioned files in the `input_files` folder. Within this folder, there are four sub-folders, each containing categorized data for both adults and children: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| - educational_attainment_by_age_range_and_gender | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| - post_secondary_completions_total_awards_degrees | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| - public_school_enrollment_by_county_grade_and_race | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| - undergraduate_stem_enrollment | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ### Auto refresh Type | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| This import will be refreshed in a fully automated manner. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ----- | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| #### Step 2: Process the Files | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| After downloading the files, you can process them to generate the final output. To do this: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| **Option A: Use the `run_processing.sh` script** | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| The `run_processing.sh` script automates the processing of all the downloaded files. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| **Run the following command:** | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||||||||
| sh run.sh | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| **Option B: Manually Execute the Processing Script** | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| You can also run the `stat_var_processor.py` script individually for each file. This script is located in the `data/tools/statvar_importer/` directory. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Here are the specific commands for each file: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||||||||
| python3 stat_var_processor.py | ||||||||||||||||||||||||||||||
| --input_data=../../statvar_imports/pennsylvania/pennsylvania_education/input_files/educational_attainment_by_age_range_and_gender/*.csv" | ||||||||||||||||||||||||||||||
| --pv_map=../../statvar_imports/pennsylvania/pennsylvania_education/educational_attainment_by_age_range_and_gender_pvmap.csv" | ||||||||||||||||||||||||||||||
| --config_file=../../statvar_imports/pennsylvania/pennsylvania_education/common_metadata.csv" | ||||||||||||||||||||||||||||||
| --output_path=../../statvar_imports/pennsylvania/pennsylvania_education/output_files/educational_attainment_by_age_range_and_gender_output" | ||||||||||||||||||||||||||||||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | ||||||||||||||||||||||||||||||
| --places_resolved_csv=../../statvar_imports/pennsylvania/pennsylvania_education/educational_attainment_by_age_range_and_gender_places_resolver.csv" | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
Comment on lines
+73
to
+80
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This command has two issues:
Suggested change
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||||||||
| python3 stat_var_processor.py | ||||||||||||||||||||||||||||||
| --input_data=../../statvar_imports/pennsylvania/pennsylvania_education/input_files/post_secondary_completions_total_awards_degrees/*.csv" | ||||||||||||||||||||||||||||||
| --pv_map=../../statvar_imports/pennsylvania/pennsylvania_education/post_secondary_completions_total_awards_degrees_pvmap.csv" | ||||||||||||||||||||||||||||||
| --config_file=../../statvar_imports/pennsylvania/pennsylvania_education/common_metadata.csv" | ||||||||||||||||||||||||||||||
| --output_path=../../statvar_imports/pennsylvania/pennsylvania_education/output_files/post_secondary_completions_total_awards_degrees_output" | ||||||||||||||||||||||||||||||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | ||||||||||||||||||||||||||||||
| --places_resolved_csv=../../statvar_imports/pennsylvania/pennsylvania_education/post_secondary_completions_total_awards_degrees_places_resolver.csv" | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||||||||
| python3 stat_var_processor.py | ||||||||||||||||||||||||||||||
| --input_data=../../statvar_imports/pennsylvania/pennsylvania_education/input_files/public_school_enrollment_by_county_grade_and_race/*.csv" | ||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Suggested change
|
||||||||||||||||||||||||||||||
| --pv_map=../../statvar_imports/pennsylvania/pennsylvania_education/public_school_enrollment_by_county_grade_and_race_pvmap.csv" | ||||||||||||||||||||||||||||||
| --config_file=../../statvar_imports/pennsylvania/pennsylvania_education/common_metadata.csv" | ||||||||||||||||||||||||||||||
| --output_path=../../statvar_imports/pennsylvania/pennsylvania_education/output_files/public_school_enrollment_by_county_grade_and_race_output" | ||||||||||||||||||||||||||||||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | ||||||||||||||||||||||||||||||
| --places_resolved_csv=../../statvar_imports/pennsylvania/pennsylvania_education/public_school_enrollment_by_county_grade_and_race_places_resolver.csv" | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||||||||
| python3 stat_var_processor.py | ||||||||||||||||||||||||||||||
| --input_data=../../statvar_imports/pennsylvania/pennsylvania_education/input_files/undergraduate_stem_enrollment/*.csv" | ||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Suggested change
|
||||||||||||||||||||||||||||||
| --pv_map=../../statvar_imports/pennsylvania/pennsylvania_education/undergraduate_stem_enrollment_pvmap.csv" | ||||||||||||||||||||||||||||||
| --config_file=../../statvar_imports/pennsylvania/pennsylvania_education/common_metadata.csv" | ||||||||||||||||||||||||||||||
| --output_path=../../statvar_imports/pennsylvania/pennsylvania_education/output_files/undergraduate_stem_enrollment_output" | ||||||||||||||||||||||||||||||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | ||||||||||||||||||||||||||||||
| --places_resolved_csv=../../statvar_imports/pennsylvania/pennsylvania_education/undergraduate_stem_enrollment_places_resolver.csv" | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| parameter,value | ||
| #places_within, | ||
| output_columns,"observationAbout,observationDate,value,variableMeasured" | ||
| header_rows,1 | ||
| url,https://data.pa.gov/browse?sortBy=relevance&page=1&pageSize=20 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| import os | ||
| import requests | ||
|
|
||
| def download_file(url, output_path): | ||
| print(f'Downloading {url} to {output_path}...') | ||
| response = requests.get(url, stream=True) | ||
| response.raise_for_status() | ||
|
|
||
| os.makedirs(os.path.dirname(output_path), exist_ok=True) | ||
| with open(output_path, 'wb') as f: | ||
| for chunk in response.iter_content(chunk_size=8192): | ||
| f.write(chunk) | ||
| print('Download complete.') | ||
|
|
||
| def main(): | ||
| base_path = os.path.dirname(os.path.abspath(__file__)) | ||
| input_files_dir = os.path.join(base_path, 'input_files') | ||
|
|
||
| datasets = { | ||
| 'educational_attainment_by_age_range_and_gender': 'xwn6-8rmw', | ||
| 'post_secondary_completions_total_awards_degrees': 'jqcu-bcsg', | ||
| 'public_school_enrollment_by_county_grade_and_race': 'wb8u-h3s8', | ||
| 'undergraduate_stem_enrollment': 'r75w-4bue' | ||
| } | ||
|
|
||
| for folder, data_id in datasets.items(): | ||
| url = f'https://data.pa.gov/api/views/{data_id}/rows.csv?accessType=DOWNLOAD' | ||
| output_path = os.path.join(input_files_dir, f'{folder}.csv') | ||
| download_file(url, output_path) | ||
|
|
||
| if __name__ == '__main__': | ||
| main() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| key,,,p1,v1,p2,v2 | ||
| County FIPS Code,observationAbout,geoId/{Data},populationType,Person,statType,measuredValue | ||
| Total Population,measuredProperty,count,value,{Number},, | ||
| No High School Diploma,educationalAttainment,NoDiploma,value,{Number},, | ||
| High School Diploma Or Equivalent,educationalAttainment,HighSchoolDiplomaIncludesEquivalency,value,{Number},, | ||
| Some College No Degree,educationalAttainment,SomeCollegeNoDegree,value,{Number},, | ||
| Associate's Degree,educationalAttainment,AssociatesDegree,value,{Number},, | ||
| Bachelor's Degree,educationalAttainment,BachelorsDegree,value,{Number},, | ||
| Graduate or Professional Degree,educationalAttainment,GraduateOrProfessionalDegree,value,{Number},, | ||
| Total Post-Secondary Degrees,educationalAttainment,PostSecondaryDegree,value,{Number},, | ||
| Male,gender,Male,,,, | ||
| Female,gender,Female,,,, | ||
| 35 to 44 Years,age,[35 44 Years],,,, | ||
| 25 to 34 Years,age,[25 34 Years],,,, | ||
| 45 to 64 Years,age,[45 64 Years],,,, | ||
| ,,,,,, | ||
| 2010,observationDate,2010,,,, | ||
| 2011,observationDate,2011,,,, | ||
| 2012,observationDate,2012,,,, | ||
| 2013,observationDate,2013,,,, | ||
| 2014,observationDate,2014,,,, | ||
| 2015,observationDate,2015,,,, | ||
| 2016,observationDate,2016,,,, |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,34 @@ | ||||||
| { | ||||||
| "import_specifications": [ | ||||||
| { | ||||||
| "import_name": "Pennsylvania_Education", | ||||||
| "curator_emails": ["[email protected]"], | ||||||
| "provenance_url": "https://data.pa.gov/", | ||||||
| "provenance_description": "Dataset related to the pennsylvania's Education at country level.", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a couple of typos in the
Suggested change
|
||||||
| "scripts": ["download_script.py", "run_processing.sh"], | ||||||
| "source_files": [ | ||||||
| "input_files/*.csv" | ||||||
| ], | ||||||
| "import_inputs": [ | ||||||
| { | ||||||
| "template_mcf": "output_files/educational_attainment_by_age_range_and_gender/educational_attainment_by_age_range_and_gender_output.tmcf", | ||||||
| "cleaned_csv": "output_files/educational_attainment_by_age_range_and_gender/educational_attainment_by_age_range_and_gender_output.csv" | ||||||
| }, | ||||||
| { | ||||||
| "template_mcf": "output_files/post_secondary_completions_total_awards_degrees/post_secondary_completions_total_awards_degrees_output.tmcf", | ||||||
| "cleaned_csv": "output_files/post_secondary_completions_total_awards_degrees/post_secondary_completions_total_awards_degrees_output.csv" | ||||||
| }, | ||||||
| { | ||||||
| "template_mcf": "output_files/public_school_enrollment_by_county_grade_and_race/public_school_enrollment_by_county_grade_and_race_output.tmcf", | ||||||
| "cleaned_csv": "output_files/public_school_enrollment_by_county_grade_and_race/public_school_enrollment_by_county_grade_and_race_output.csv" | ||||||
| }, | ||||||
| { | ||||||
| "template_mcf": "output_files/undergraduate_stem_enrollment/undergraduate_stem_enrollment_output.tmcf", | ||||||
| "cleaned_csv": "output_files/undergraduate_stem_enrollment/undergraduate_stem_enrollment_output.csv" | ||||||
| } | ||||||
| ], | ||||||
| "cron_schedule": "0 02 * * 2" | ||||||
| } | ||||||
| ] | ||||||
| } | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command to run the processing script is incorrect. The script is named
run_processing.sh, notrun.sh.