Skip to content

Import workflow fixes#1886

Open
vish-cs wants to merge 1 commit intodatacommonsorg:masterfrom
vish-cs:itest
Open

Import workflow fixes#1886
vish-cs wants to merge 1 commit intodatacommonsorg:masterfrom
vish-cs:itest

Conversation

@vish-cs
Copy link
Contributor

@vish-cs vish-cs commented Feb 10, 2026

  • Use PENDING state to track imports with failed ingestion for retry
  • get_import_list support importList as override and retry params
  • Import helper invokes update_import_version for version tracking
  • Handle duplicate events in CDA feed handler
  • Improved handling for import_summary.json and import_metadata_mcf.mcf
  • Clean up next_refresh handling

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @vish-cs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the reliability and flexibility of the import and ingestion automation workflows. Key changes include more explicit error handling, refined import status management to support retries, and enhanced data persistence for import summaries. These updates aim to make the system more resilient to failures and provide better control over the import process.

Highlights

  • Enhanced Error Handling and Status Management: Improved error propagation in import helper functions and introduced a 'RETRY' status for Dataflow job failures, allowing for more robust workflow execution.
  • Refactored Import Status Updates: The update_import_status function in the import helper now explicitly handles both status and version updates, and the handle_feed_event logic was streamlined to use these new capabilities.
  • Flexible Import List Retrieval: The Spanner client's get_import_list method was enhanced to support filtering by a specific list of imports and to optionally include imports with a 'RETRY' status, improving control over which imports are processed.
  • Import Summary Persistence: Added functionality to update and persist import summaries in Google Cloud Storage, ensuring that import metadata is consistently maintained.
  • Standardized Staging Version: The 'staging' version string was standardized to 'STAGING' across the workflow and helper functions for consistency.
Changelog
  • import-automation/workflow/import-automation-workflow.yaml
    • Updated the 'staging' version string to 'STAGING' for consistency.
  • import-automation/workflow/import-helper/main.py
    • Added raise statements to exception blocks in invoke_ingestion_workflow and update_import_status.
    • Modified update_import_status to accept individual parameters and construct the request body internally.
    • Introduced a separate call within update_import_status to update the import version after status update.
    • Refactored handle_feed_event to use the updated update_import_status signature and added a try-except block for event processing.
  • import-automation/workflow/ingestion-helper/README.md
    • Added 'status' as a required parameter for the update_ingestion_status action in the documentation.
  • import-automation/workflow/ingestion-helper/import_utils.py
    • Simplified the parsing of next_refresh and ensured croniter results are in ISO format.
  • import-automation/workflow/ingestion-helper/main.py
    • Modified get_import_list to accept an includeRetry parameter.
    • Updated update_ingestion_status to include 'status' in validation and pass it to the Spanner client.
    • Added a call to storage.update_import_summary when updating import status.
    • Standardized the 'staging' version check to 'STAGING' in update_import_version.
  • import-automation/workflow/ingestion-helper/spanner_client.py
    • Enhanced get_import_list to filter by a provided import list or by 'READY'/'RETRY' states, using parameterized queries.
    • Modified update_ingestion_status to accept and apply a dynamic status, and to conditionally update ingestion history only for 'DONE' statuses.
  • import-automation/workflow/ingestion-helper/storage_client.py
    • Added a new update_import_summary method to write import summary data to GCS.
    • Included temporary logic to copy metadata for a specific PR and added default metadata generation if a metadata file is not found.
    • Corrected logging message for update_version_file.
  • import-automation/workflow/spanner-ingestion-workflow.yaml
    • Added include_retry as a workflow argument and passed it to the get_import_list step.
    • Passed ingestion_function and workflow_id to the run_dataflow_job sub-workflow.
    • Added a status: 'DONE' parameter to the update_ingestion_status call.
    • Modified Dataflow job failure handling to update the import status to 'RETRY' instead of immediately failing the workflow.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix and enhance the import workflow by adding retry logic and improving import list fetching. However, it introduces significant security vulnerabilities, including path traversal in StorageClient methods, which could allow writing blobs to arbitrary GCS locations, and an MCF injection vulnerability due to unsanitized user input in metadata generation. Furthermore, a critical bug in ingestion-helper/main.py prevents import status updates, and there's a medium-severity suggestion for code clarity. It is imperative to address these security flaws and functional issues to ensure the workflow's integrity and secure operation.

@vish-cs vish-cs force-pushed the itest branch 2 times, most recently from 78e3ba7 to 4ebba11 Compare February 11, 2026 08:10
@vish-cs vish-cs requested a review from ajaits February 11, 2026 09:30
@vish-cs vish-cs force-pushed the itest branch 2 times, most recently from 6c2168b to d7bf4bd Compare February 13, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant