Skip to content

feat(spp_mis_demo_v2): deterministic blueprint-based demo data generation#62

Open
emjay0921 wants to merge 2 commits into19.0from
feat/mis-demo-blueprints
Open

feat(spp_mis_demo_v2): deterministic blueprint-based demo data generation#62
emjay0921 wants to merge 2 commits into19.0from
feat/mis-demo-blueprints

Conversation

@emjay0921
Copy link
Contributor

Why is this change needed?

The MIS demo module previously used random Faker-based generation for volume data, producing different results on each run. This makes testing unreliable and screenshots non-reproducible. This syncs the blueprint/seed approach from openspp-modules-v2 (feature/mis-demo-blueprints branch).

How was the change implemented?

spp_mis_demo_v2:

  • New household_blueprints.py — ~20 blueprint definitions covering diverse household archetypes (~730 total households)
  • New seeded_volume_generator.pySeededVolumeGenerator class using random.Random(seed=42) for deterministic output
  • Updated mis_demo_generator.py — replaces _generate_random_households() with blueprint-based generation, populates cycle beneficiaries and entitlements
  • New test_blueprint_reproducibility.py — verifies same seed produces identical output
  • Improved demo data realism (enrollment date backdating, payment history)

spp_demo:

  • New locale providers: fil_PH (Filipino names) and fr_TG (Togolese names) with 70+ first names and 85+ last names each
  • Updated demo_stories.py — locale-aware story character name generation
  • Updated country data XML

New unit tests

  • test_blueprint_reproducibility.py — determinism verification (same seed = identical farms)

Unit tests executed by the author

Tests pass in openspp-modules-v2

How to test manually

  1. Install/upgrade spp_mis_demo_v2
  2. Run the demo wizard — verify households are generated with Filipino names
  3. Run the wizard again with same settings — verify identical output
  4. Check cycle beneficiaries and entitlements are populated

Related links

openspp-modules-v2 branch: feature/mis-demo-blueprints

…inistic blueprints

- Add household blueprints with ~730 deterministic households
- Add SeededVolumeGenerator for reproducible demo data
- Add locale-aware Filipino/Togolese name providers in spp_demo
- Populate cycle beneficiaries and entitlements
- Improve demo data realism (enrollment dates, payment history)
- Add blueprint reproducibility tests
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and realism of the demo data generation for the OpenSPP MIS module. By transitioning from a random, Faker-driven approach to a deterministic, blueprint-based system, the generated data is now fully reproducible. This change ensures consistent testing environments and predictable demo scenarios, while also introducing locale-aware naming to make the demo data more relevant to specific regional contexts.

Highlights

  • Deterministic Demo Data Generation: The demo data generation process has been refactored from random Faker-based generation to a deterministic blueprint-based approach, ensuring reproducible results for testing and screenshots.
  • Locale-Aware Name Generation: Introduced new locale providers for Filipino (fil_PH) and Togolese (fr_TG) names, allowing demo stories and generated volume data to use country-specific names based on user selection.
  • Household Blueprint Definitions: Added a comprehensive set of 28 household blueprint templates, defining diverse household archetypes (e.g., young families, elderly, special cases) for realistic and structured volume data generation.
  • Seeded Volume Generator: Implemented a new SeededVolumeGenerator class that uses a fixed seed (random.Random(seed=42)) to ensure that the generated household structures, member attributes, and names are identical across runs for a given locale.
  • Enhanced Demo Data Realism: Improved the realism of generated data by backdating enrollment dates, populating cycle beneficiaries and entitlements for blueprint-generated households, and activating relevant currencies (PHP, LKR, XOF) based on country selection.
  • Reproducibility Unit Tests: Added new unit tests specifically to verify the reproducibility of the blueprint-based data generation, ensuring that the same seed and locale consistently produce identical output.
Changelog
  • spp_demo/locale_providers/init.py
    • Imported and registered new fil_PH and fr_TG locale providers.
  • spp_demo/locale_providers/fil_PH/init.py
    • Added new file defining Filipino (fil_PH) names for Faker.
  • spp_demo/locale_providers/fr_TG/init.py
    • Added new file defining Togolese (fr_TG) names for Faker.
  • spp_demo/models/demo_stories.py
    • Added copy import for deep copying story data.
    • Updated docstring to mention locale-aware names.
    • Replaced area_ref and area_kind with district in demo story definitions.
    • Introduced LOCALE_NAMES dictionary for locale-specific name overrides.
    • Added helper functions (_apply_locale_to_story, get_localized_stories, get_localized_reserved_names, get_localized_name) for handling locale-aware story names.
  • spp_demo/tests/test_demo_stories.py
    • Added new test cases to verify the functionality of locale-aware story name generation and reserved names.
  • spp_mis_demo_v2/data/demo_currencies.xml
    • Activated Sri Lankan Rupee (LKR) and West African CFA Franc (XOF) currencies, in addition to Philippine Peso (PHP).
  • spp_mis_demo_v2/docs/USE_CASES.md
    • Updated table of contents to include new sections.
    • Revised 'Overview' to reflect 'Blueprint + Seeded Faker' architecture, deterministic volume, and country-aware names.
    • Added new sections 'Blueprint Architecture' and 'Country / Locale Support' with detailed explanations and tables.
    • Adjusted formatting in formula library test cases.
  • spp_mis_demo_v2/models/init.py
    • Imported new household_blueprints and seeded_volume_generator modules.
  • spp_mis_demo_v2/models/household_blueprints.py
    • Added new file defining 28 household blueprint templates for deterministic demo data generation.
  • spp_mis_demo_v2/models/mis_demo_generator.py
    • Removed Faker import and replaced with create_faker from spp_demo.locale_providers.
    • Added country_id field to the wizard for locale and currency selection.
    • Removed fields related to old random volume generation (volume_enrollments, generate_random_groups, etc.).
    • Updated _check_positive_integers constraint to reflect removed fields.
    • Modified _onchange_demo_mode to remove references to old random volume generation fields.
    • Added COUNTRY_CONFIG mapping and _get_country_config method for country-specific settings.
    • Updated action_generate to use _get_country_config for locale/currency, initialize SeededVolumeGenerator, and generate households from blueprints instead of random groups.
    • Modified _ensure_demo_stories_exist to use localized stories.
    • Removed _generate_random_groups method.
    • Updated _get_story_name to be locale-aware.
    • Removed _configure_compliance_manager and related calls.
    • Simplified logging messages.
    • Updated _create_program_cycles to handle cycle memberships and entitlements for blueprint-generated households, including backdating and fund creation.
    • Added _create_program_fund and _get_entitlement_amount helper methods.
    • Modified _localize_cr_def to localize change request details based on selected locale.
    • Updated _ensure_story_registrant to use locale-aware story names.
    • Removed nosemgrep comments for sudo() calls, indicating they are now considered safe in this context.
    • Updated success notification messages to reflect blueprint-based generation.
  • spp_mis_demo_v2/models/seeded_volume_generator.py
    • Added new file defining the SeededVolumeGenerator class for deterministic generation of households and members from blueprints. Includes methods for batch creation, name generation, gender resolution, and program enrollment.
  • spp_mis_demo_v2/tests/init.py
    • Imported new test_blueprint_reproducibility module.
  • spp_mis_demo_v2/tests/test_blueprint_reproducibility.py
    • Added new file with tests to verify the deterministic nature and structural integrity of blueprint-based data generation across different locales.
  • spp_mis_demo_v2/tests/test_mis_demo_generator.py
    • Updated tests to remove references to old random volume generation fields and adapt to the new blueprint-based generation.
  • spp_mis_demo_v2/views/mis_demo_wizard_view.xml
    • Added country_id radio widget to the wizard form.
    • Updated the info alert to describe the new deterministic blueprint-based generation, including household/member counts and country-aware names.
    • Removed fields related to old random volume generation.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

# Use explicit registrant_name if provided, otherwise derive from story_id
story_name = cr_def.get("registrant_name") or self._get_story_name(story_id)
# Always use locale-aware name resolution (handles CR-specific IDs too)
story_name = self._get_story_name(story_id)

Check warning

Code scanning / Semgrep OSS

Semgrep Finding: semgrep.odoo-sudo-without-context Warning

sudo() bypasses all access controls. Ensure this is: Intentional and documented Using minimal scope (e.g., .sudo().read(['field']) not .sudo()) Not exposing sensitive data to unauthorized users
# Always use locale-aware name resolution (handles CR-specific IDs too)
story_name = self._get_story_name(story_id)

registrant = self.env["res.partner"].search(

Check warning

Code scanning / Semgrep OSS

Semgrep Finding: semgrep.odoo-sudo-without-context Warning

sudo() bypasses all access controls. Ensure this is: Intentional and documented Using minimal scope (e.g., .sudo().read(['field']) not .sudo()) Not exposing sensitive data to unauthorized users
story_name = self._get_story_name(story_id)

registrant = self.env["res.partner"].search(
[("name", "=", story_name), ("is_registrant", "=", True)],

Check warning

Code scanning / Semgrep OSS

Semgrep Finding: semgrep.odoo-sudo-without-context Warning

sudo() bypasses all access controls. Ensure this is: Intentional and documented Using minimal scope (e.g., .sudo().read(['field']) not .sudo()) Not exposing sensitive data to unauthorized users
@@ -2813,7 +2900,7 @@ def _create_single_change_request(self, registrant, cr_def, stats, demo_user=Non
detail.write(detail_vals)

Check warning

Code scanning / Semgrep OSS

Semgrep Finding: semgrep.odoo-sudo-without-context Warning

sudo() bypasses all access controls. Ensure this is: Intentional and documented Using minimal scope (e.g., .sudo().read(['field']) not .sudo()) Not exposing sensitive data to unauthorized users
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant improvement, introducing deterministic demo data generation using a blueprint and seed-based approach. This will greatly enhance testing reliability and reproducibility. The addition of new locales for Filipino and Togolese names also adds valuable realism to the demo data. The code is generally well-structured, and the new functionality is supported by a good set of unit tests. I've identified a few areas for improvement, including a bug in the new SeededVolumeGenerator, some opportunities for code cleanup and efficiency gains, and a question about a potentially removed feature. I also noticed that the methods _generate_random_groups and _generate_volume_enrollments in spp_mis_demo_v2/models/mis_demo_generator.py are no longer called after your refactoring. It would be good to remove this dead code to keep the codebase clean.

I am having trouble creating individual review comments. Click here to see my feedback.

spp_mis_demo_v2/models/mis_demo_generator.py (1206-1248)

high

The _configure_compliance_manager method and its call have been removed. Was this intentional? This seems to remove the functionality of configuring compliance rules for demo programs. If this is an intended change, it would be good to mention it in the PR description.

spp_mis_demo_v2/models/seeded_volume_generator.py (185-193)

high

To fix a bug in _find_member_spec and improve the design, it's better to associate the member_spec with the created individual record here. This avoids fragile lookups later. You can achieve this by storing a tuple of (individual, member_spec) in the members list for each household.

            if group_record.id not in group_households:
                group_households[group_record.id] = {
                    "group": group_record,
                    "members": [],
                    "blueprint": member_specs[list(groups).index(group_record)][0],
                }
            _group_record, member_spec = individual_to_group[ind_idx]
            group_households[group_record.id]["members"].append((individual, member_spec))

spp_mis_demo_v2/models/seeded_volume_generator.py (238-296)

high

The _find_member_spec method is buggy. It doesn't use the member_record argument and will always return the first elderly member spec from the blueprint, which is incorrect for households with multiple elderly members.

By associating the spec with the member during creation (as suggested in another comment), you can simplify the logic in enroll_in_programs and remove the _find_member_spec method entirely.

                    # Individual-target program: enroll qualifying members
                    for member, member_spec in hh["members"]:
                        # ESP: only enroll elderly members (age >= 60 from blueprint)
                        if prog_id == "elderly_social_pension":
                            if not member_spec or member_spec.get("age_range", (0, 0))[0] < 60:
                                continue
                        enrollment_vals.append(
                            {
                                "program_id": program.id,
                                "partner_id": member.id,
                                "state": "enrolled",
                            }
                        )
                        enrollment_dates.append(member.registration_date or reg_date)
                else:
                    # Group-target program: enroll the household
                    enrollment_vals.append(
                        {
                            "program_id": program.id,
                            "partner_id": group.id,
                            "state": "enrolled",
                        }
                    )
                    enrollment_dates.append(reg_date)

            # Individual-level food assistance
            if bp.get("individual_food_assistance"):
                fa_program = program_map.get("food_assistance")
                if fa_program:
                    for member, member_spec in hh["members"]:
                        enrollment_vals.append(
                            {
                                "program_id": fa_program.id,
                                "partner_id": member.id,
                                "state": "enrolled",
                            }
                        )
                        enrollment_dates.append(member.registration_date or reg_date)

        if not enrollment_vals:
            return

        _logger.info("Enrolling %d program memberships...", len(enrollment_vals))
        memberships = self._batch_create("spp.program.membership", enrollment_vals)

        # Add state variety and backdate enrollment dates in one pass.
        # enrollment_date is @api.depends("state") so we must do BOTH via SQL
        # after all ORM operations are complete, to prevent recomputation.
        self.env.flush_all()
        self._apply_membership_realism(memberships, enrollment_dates)

    def _apply_membership_realism(self, memberships, enrollment_dates):
        """Apply state variety and backdate enrollment dates via SQL.

        enrollment_date is @api.depends("state") — any ORM state change triggers
        recomputation to Datetime.now(). To prevent this, we:
        1. flush_all() to commit ORM state
        2. Apply state variety + date backdating together in raw SQL
        3. Invalidate the cache so ORM sees our changes
        """
        if not memberships:
            return

        membership_ids = memberships.ids
        exited_count = paused_count = not_eligible_count = 0

        for idx, mem_id in enumerate(membership_ids):
            # Determine state
            roll = self.rng.random()
            state = "enrolled"
            if roll < 0.02:
                state = "not_eligible"
                not_eligible_count += 1
            elif roll < 0.05:
                state = "paused"
                paused_count += 1
            elif roll < 0.10:
                state = "exited"
                exited_count += 1

            # Determine enrollment date from registration date
            if idx < len(enrollment_dates):
                reg_date = enrollment_dates[idx]
                enrollment_dt = datetime.datetime.combine(reg_date, datetime.time(8, 0, 0))
            else:
                enrollment_dt = datetime.datetime.now()

            # Single SQL update for both state and enrollment_date
            self.env.cr.execute(
                "UPDATE spp_program_membership SET state = %s, enrollment_date = %s WHERE id = %s",
                (state, enrollment_dt, mem_id),
            )

        memberships.invalidate_recordset(["state", "enrollment_date"])
        _logger.info(
            "Realism for %d memberships: %d exited, %d paused, %d not_eligible, dates backdated",
            len(membership_ids),
            exited_count,
            paused_count,
            not_eligible_count,
        )

    # =========================================================================
    # Internal helpers
    # =========================================================================

spp_demo/models/demo_stories.py (1219-1229)

medium

The call to get_story_by_id(story["id"]) is inside a loop over story.get("journey", []). This is inefficient as it repeatedly searches for the same story. You can improve performance by fetching the original story once before the loop.

    if "children" in pnames:
        orig_story = get_story_by_id(story["id"])
        if not orig_story:
            return story

        orig_profile = orig_story.get("profile", {})
        orig_children = orig_profile.get("children", [])

        for step in story.get("journey", []):
            if "member" in step:
                # Find matching child by position
                for idx, child in enumerate(orig_children):
                    if child.get("name") == step["member"] and idx < len(pnames["children"]):
                        step["member"] = pnames["children"][idx]
                        break

spp_mis_demo_v2/docs/USE_CASES.md (475-477)

medium

This markdown table appears to be broken because all columns are on a single line. For the table to render correctly, each row, including the header and separator, should be on its own line. This issue also occurs for other tables in this file.

**Test Cases:**

| Persona        | PMT Score | Expected | Result       |
| -------------- | --------- | -------- | ------------ |
| Maria Santos   | 38        | `true`   | ✓ Eligible   |
| Rosa Garcia    | 42        | `true`   | ✓ Eligible   |
| Carlos Morales | 48        | `false`  | ✗ Not eligible |
| Ibrahim Hassan | 35        | `true`   | ✓ Eligible   |

spp_mis_demo_v2/models/mis_demo_generator.py (2903-2904)

medium

The nosec comment for this raw SQL query has been removed. While this pattern is common in Odoo and detail._table is generally safe as it comes from the model, it's good practice to keep the nosec comment to explicitly acknowledge and silence security scanner warnings about potential SQL injection.

                    f"UPDATE {detail._table} SET create_date = %s WHERE id = %s",  # nosec B608

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 78.95461% with 153 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.74%. Comparing base (a72ada0) to head (c3c0549).

Files with missing lines Patch % Lines
spp_mis_demo_v2/models/mis_demo_generator.py 40.17% 134 Missing ⚠️
spp_mis_demo_v2/models/seeded_volume_generator.py 95.14% 12 Missing ⚠️
spp_demo/locale_providers/__init__.py 66.66% 2 Missing ⚠️
spp_mis_demo_v2/models/household_blueprints.py 86.66% 2 Missing ⚠️
...is_demo_v2/tests/test_blueprint_reproducibility.py 97.10% 2 Missing ⚠️
spp_demo/models/demo_stories.py 98.61% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (a72ada0) and HEAD (c3c0549). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (a72ada0) HEAD (c3c0549)
spp_api_v2_cycles 1 0
spp_api_v2_products 1 0
spp_api_v2_service_points 1 0
spp_api_v2_data 1 0
spp_api_v2_change_request 1 0
spp_api_v2_entitlements 1 0
spp_aggregation 1 0
spp_api_v2 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             19.0      #62       +/-   ##
===========================================
- Coverage   71.36%   59.74%   -11.62%     
===========================================
  Files         319      181      -138     
  Lines       25878    15746    -10132     
===========================================
- Hits        18467     9408     -9059     
+ Misses       7411     6338     -1073     
Flag Coverage Δ
spp_aggregation ?
spp_api_v2 ?
spp_api_v2_change_request ?
spp_api_v2_cycles ?
spp_api_v2_data ?
spp_api_v2_entitlements ?
spp_api_v2_products ?
spp_api_v2_service_points ?
spp_base_common 92.81% <ø> (ø)
spp_dci_demo 84.78% <ø> (?)
spp_demo 85.01% <98.21%> (?)
spp_grm_demo 0.00% <ø> (?)
spp_mis_demo_v2 74.63% <73.16%> (?)
spp_programs 49.56% <ø> (ø)
spp_security 51.08% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…NG for name generation

Replace all Faker usage with deterministic random.Random(seed).choice()
from locale-specific name arrays. This ensures fully reproducible output
across runs with the same seed.

- Remove create_faker import and fake variable from mis_demo_generator
- Delete dead code: _generate_random_groups, _create_random_individual,
  _generate_volume_enrollments
- Remove fake parameter from _create_program_cycles
- Remove faker from external_dependencies in manifest
- Remove tests that called deleted _create_random_individual method
- Update docs and docstrings to reflect seeded RNG approach
@@ -3597,10 +3382,7 @@ def _show_success_notification(self, stats):

Check warning

Code scanning / Semgrep OSS

Semgrep Finding: semgrep.odoo-sudo-without-context Warning

sudo() bypasses all access controls. Ensure this is: Intentional and documented Using minimal scope (e.g., .sudo().read(['field']) not .sudo()) Not exposing sensitive data to unauthorized users
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant