-
Notifications
You must be signed in to change notification settings - Fork 132
Adding files #1889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding files #1889
Conversation
Summary of ChangesHello @smarthg-gi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes a new, automated data ingestion pipeline for key educational statistics pertaining to Pennsylvania. It integrates multiple datasets from the state's open data repository, streamlining the process of acquiring and structuring information on student demographics, academic achievements, and higher education enrollment, particularly in STEM fields. The setup ensures consistent and repeatable data processing for future updates. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new data import for Pennsylvania education data. I've found several issues related to file paths, data correctness, and typos in configuration and documentation files that need to be addressed. The most critical issue is extraneous data in a pvmap file, which will lead to processing failures. There are also several incorrect paths in the README and manifest files. Additionally, the directory name pennsylvania_education_for_git_upload seems temporary and should probably be renamed to pennsylvania_education for consistency.
| ,,,,,,,,,,,,geoId/4202758800 | ||
| ,,,,,,,,,,,, | ||
| ,,,,,,,,,,,,wikidataId/Q2834810 | ||
| ,,,,,,,,,,,,geoId/4231200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --input_data=../../statvar_imports/pennsylvania/pennsylvania_education/input_files/educational_attainment_by_age_range_and_gender/*.csv" | ||
| --pv_map=../../statvar_imports/pennsylvania/pennsylvania_education/educational_attainment_by_age_range_and_gender_pvmap.csv" | ||
| --config_file=../../statvar_imports/pennsylvania/pennsylvania_education/common_metadata.csv" | ||
| --output_path=../../statvar_imports/pennsylvania/pennsylvania_education/output_files/educational_attainment_by_age_range_and_gender_output" | ||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | ||
| --places_resolved_csv=../../statvar_imports/pennsylvania/pennsylvania_education/educational_attainment_by_age_range_and_gender_places_resolver.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file paths in this command block (and subsequent ones) are incorrect. They refer to a directory pennsylvania_education, but the actual directory is pennsylvania_education_for_git_upload. This will cause the commands to fail. Please update all paths in this README to use the correct directory name.
Additionally, there is a trailing double quote " at the end of each path which should be removed.
| --input_data=../../statvar_imports/pennsylvania/pennsylvania_education/input_files/educational_attainment_by_age_range_and_gender/*.csv" | |
| --pv_map=../../statvar_imports/pennsylvania/pennsylvania_education/educational_attainment_by_age_range_and_gender_pvmap.csv" | |
| --config_file=../../statvar_imports/pennsylvania/pennsylvania_education/common_metadata.csv" | |
| --output_path=../../statvar_imports/pennsylvania/pennsylvania_education/output_files/educational_attainment_by_age_range_and_gender_output" | |
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | |
| --places_resolved_csv=../../statvar_imports/pennsylvania/pennsylvania_education/educational_attainment_by_age_range_and_gender_places_resolver.csv" | |
| --input_data=../../statvar_imports/pennsylvania/pennsylvania_education_for_git_upload/input_files/educational_attainment_by_age_range_and_gender/*.csv | |
| --pv_map=../../statvar_imports/pennsylvania/pennsylvania_education_for_git_upload/educational_attainment_by_age_range_and_gender_pvmap.csv | |
| --config_file=../../statvar_imports/pennsylvania/pennsylvania_education_for_git_upload/common_metadata.csv | |
| --output_path=../../statvar_imports/pennsylvania/pennsylvania_education_for_git_upload/output_files/educational_attainment_by_age_range_and_gender_output | |
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | |
| --places_resolved_csv=../../statvar_imports/pennsylvania/pennsylvania_education_for_git_upload/educational_attainment_by_age_range_and_gender_places_resolver.csv |
| #places_within, | ||
| output_columns,"observationAbout,observationDate,value,variableMeasured" | ||
| header_rows,1 | ||
| url,https://data.pa.gov/-/Educational-Attainment-by-Age-Range-and-Gender-200/xwn6-8rmw/about_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This common_metadata.csv file contains a URL specific to the 'Educational Attainment by Age Range and Gender' dataset. Since this is a common configuration file that will be used for multiple datasets, having a specific URL is incorrect as it will be applied to all of them. Please consider moving dataset-specific metadata to their respective pvmap files or creating separate config files if needed.
| "template_mcf": "output_files/educational_attainment_by_age_range_and_gender_output.tmcf", | ||
| "cleaned_csv": "output_files/educational_attainment_by_age_range_and_gender_output.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paths for template_mcf and cleaned_csv appear to be incorrect. Based on the run_processing.sh script, the output files are placed inside subdirectories corresponding to the dataset name (e.g., output_files/educational_attainment_by_age_range_and_gender/). The paths in the manifest should reflect this structure. This issue applies to all import_inputs entries.
| "template_mcf": "output_files/educational_attainment_by_age_range_and_gender_output.tmcf", | |
| "cleaned_csv": "output_files/educational_attainment_by_age_range_and_gender_output.csv" | |
| "template_mcf": "output_files/educational_attainment_by_age_range_and_gender/educational_attainment_by_age_range_and_gender_output.tmcf", | |
| "cleaned_csv": "output_files/educational_attainment_by_age_range_and_gender/educational_attainment_by_age_range_and_gender_output.csv" |
|
|
||
| ## Pennsylvania_education Import | ||
|
|
||
| Dataset related to the pennsylvania's Education at country level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues in the description:
- There's a typo:
pennsylvania'sshould bePennsylvania's. - The data seems to be at the county level, not
countrylevel. Please correct this to avoid confusion.
| Dataset related to the pennsylvania's Education at country level. | |
| Dataset related to Pennsylvania's Education at county level. |
| All downloaded files will be located in the `input_files` folder. Within this folder, there are six sub-folders, each containing categorized data for both adults and children: | ||
|
|
||
| - educational_attainment_by_age_range_and_gender | ||
|
|
||
| - post_secondary_completions_total_awards_degrees | ||
|
|
||
| - public_school_enrollment_by_county_grade_and_race | ||
|
|
||
| - undergraduate_stem_enrollment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statvar_imports/pennsylvania/pennsylvania_education_for_git_upload/README.md
Outdated
Show resolved
Hide resolved
statvar_imports/pennsylvania/pennsylvania_education_for_git_upload/manifest.json
Outdated
Show resolved
Hide resolved
…load/manifest.json Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…load/README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
No description provided.