Skip to content

Trt-1989: partition management#3334

Open
neisw wants to merge 5 commits intoopenshift:mainfrom
neisw:trt-1989-partition-management
Open

Trt-1989: partition management#3334
neisw wants to merge 5 commits intoopenshift:mainfrom
neisw:trt-1989-partition-management

Conversation

@neisw
Copy link
Contributor

@neisw neisw commented Mar 17, 2026

  • Adds migration helpers for one off partition migration steps
  • Adds FK analysis for pre-migration review
  • Adds example delete helper to handle deletes without relying on cascading deletes / FK constraints
  • Adds partition mananagement for
    • Updating partitioned tables
    • Detaching old partitions
    • Dropping older detached partitions

Summary by CodeRabbit

  • New Features

    • End-to-end partition management: discovery, stats, automated partition creation/aging, attach/detach, safe removal and dry-run support
    • Database migration toolkit: create, update and finalize partitioned tables, incremental data migration, identity/sequence sync, and partition-aware renames
    • Partition impact analysis and controlled deletion of test data
  • Refactor

    • Models and migration flow adjusted to support partitioned tables and explicit timestamp/primary-key handling
  • Tests

    • Extensive unit tests covering partitioning, validation, SQL generation and migration helpers

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci bot requested review from dgoodwin and petr-muller March 17, 2026 12:33
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: da10bc4d-7820-48e4-b121-b9aac6ba875c

📥 Commits

Reviewing files that changed from the base of the PR and between 98d456e and a297fe9.

📒 Files selected for processing (2)
  • pkg/db/partitions.go
  • pkg/db/utils.go

Walkthrough

Adds a comprehensive PostgreSQL partitioning and migration subsystem, migrates prow test models to explicit ID/timestamp fields, integrates partition lifecycle management into DB init and the Prow dataloader, and adds extensive partition- and migration-focused tests.

Changes

Cohort / File(s) Summary
Partition engine & utilities
pkg/db/partitions.go, pkg/db/utils.go
New end-to-end partition management and DB utilities: discovery, stats, create/attach/detach/drop, retention analysis, partitioned table creation/update, FK/sequence handling, schema introspection, migration helpers, rename/sync utilities, dry-run and transactional safety. Large API surface and many exported types/functions added.
Partition migration APIs
pkg/db/migration.go
New staged migration APIs: MigrateToPartitionedTable, UpdatePartitionedTableMigration, FinalizePartitionedTableMigration, AnalyzePartitioningImpact, and DeleteTestsByName with logging and dry-run support.
DB init/migration flow change
pkg/db/db.go
Reordered migration flow to handle partitioned models explicitly: removed AutoMigrate for partitioned models and invoke UpdatePartitionedTable/partitioned migration loop instead; retained syncPartitionedTables call with TODOs.
Model changes for partitioning
pkg/db/models/prow.go
Replaced gorm.Model on ProwJobRunTest and ProwJobRunTestOutput with explicit ID, CreatedAt, UpdatedAt, DeletedAt; added composite/index annotations and adjusted foreign-key/index metadata for partition compatibility.
Partition tests
pkg/db/partitions_test.go, pkg/db/utils_test.go
Extensive unit tests covering partition-name validation, PartitionConfig validation and SQL generation, data-type normalization, schema/migration verification scaffolding, and related utilities and edge cases.
Dataloader partition lifecycle
pkg/dataloader/prowloader/prow.go
Added PartitionManagementConfig and partitionConfigs; new helper methods updatePartitions, agePartitions, preparePartitions; replaced ad-hoc per-date partition creation with centralized partition-management loop; expanded partition-related logging and error handling.
Minor cleanup & comments
pkg/dataloader/prowloader/prow.go
Removed unused import and added comment in extractTestCases noting reliance on created_at for test entries.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Sql Injection Prevention ❌ Error CREATE INDEX statements in partitions.go lines 1204-1206 and 1566-1603 lack proper pq.QuoteIdentifier() protection for table/column identifiers, creating SQL injection vulnerabilities. Wrap all table and column identifiers in CREATE INDEX statements with pq.QuoteIdentifier() and implement code review checklist to ensure consistent protection across all SQL statement construction.
Go Error Handling ⚠️ Warning PR contains multiple Go error handling violations including ignored errors, panic() outside init(), and unwrapped error returns without context. Remove ignored error assignments, replace panic() with proper error returns, and refactor all return err statements to use fmt.Errorf with %w verb for context.
Single Responsibility And Clear Naming ⚠️ Warning Pull request violates Single Responsibility Principle: utils.go is 2,159 lines with 8 conceptual domains, 45 new public methods added to DB type, and partition logic scattered across multiple files. Restructure packages with focused modules (schema.go, datamigration.go, sequences.go, foreignkeys.go, tableops.go), consolidate partition operations in partitions.go, move misplaced methods, and restrict migration.go to migration operations only.
✅ Passed checks (1 passed)
Check name Status Explanation
Excessive Css In React Should Use Styles ✅ Passed This custom check is not applicable to the PR as it exclusively modifies Go backend files for database management, with no React/JavaScript component changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
pkg/db/utils_test.go (1)

121-183: These tests never exercise the new DB helpers.

These cases only validate local literals or log text, so they all pass even if the production migration/partition helpers regress. For this surface area, either call the real APIs with assertions or move the examples into package docs.
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 251-480

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/utils_test.go` around lines 121 - 183, The
TestMigrateTableDataValidation unit test only asserts on literals and never
calls the real migration helpers (e.g., the new migrateTableData / partition
helper functions), so it gives false confidence; update the test to either (A)
call the actual helpers under test with a lightweight in-memory or test DB
fixture and assert their behavior (invoke migrateTableData/PartitionHelper
methods with representative inputs, verify returned errors and SQL/results) or
(B) if you want to keep it as documentation, move these cases into package
examples or godoc and add real integration tests elsewhere; ensure the test
references and exercises the actual functions (migrateTableData,
PartitionHelper, or whatever new DB helper names exist) and covers
dryRun=true/false, omitted columns, and error paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/db/db.go`:
- Around line 127-129: The current loop calls UpdatePartitionedTable for each
partitionedModel in partitionedModelsToMigrate but fails when the physical
partitioned table doesn't exist; modify the flow to ensure the base partitioned
table exists before calling UpdatePartitionedTable by either invoking the
table-creation routine (e.g., call CreatePartitionedTable or run AutoMigrate for
partitionedModel.model/tableName) or update UpdatePartitionedTable to detect a
missing table and create it instead; reference partitionedModelsToMigrate,
UpdatePartitionedTable, AutoMigrate, and the partition sync so the missing-table
creation happens prior to the UpdatePartitionedTable call to avoid bootstrapping
failures.
- Around line 121-123: The tableName for the models.ProwJobRunTestOutput schema
entry is incorrect ("prow_job_run_tests_outputs") and must be changed to the
canonical "prow_job_run_test_outputs" so migrations target the real table;
locate the struct literal that sets model: &models.ProwJobRunTestOutput{} and
update its tableName field to "prow_job_run_test_outputs" to match other call
sites.

In `@pkg/db/migration.go`:
- Around line 168-170: The resume window is inclusive because
MigrateTableDataRange uses ">= startDate", so passing maxDate (or migrateFrom)
causes rows at that exact timestamp to be re-copied; to fix it, advance the
start timestamp by the smallest representable unit before passing into
MigrateTableDataRange (e.g., compute a next-timestamp from maxDate and use that
as the start) for the calls that pass maxDate/migrateFrom (references:
MigrateTableDataRange, maxDate, migrateUpTo and the other call sites noted
around lines 256-258) so resumed runs become exclusive of the already-migrated
max timestamp.
- Around line 391-409: The logic in AnalyzePartitioningImpact incorrectly
hardcodes "created_at" as the partition key when checking inbound FKs; change it
to use the actual partition column(s) discovered for the target table (e.g., the
variable holding partition columns like partCols/partitionCols) instead of
hasCreatedAt and "created_at" literals: compute whether the source has all
required partition column(s) (e.g., check each name in partitionCols against
srcColSet), set fields["source_has_<col>"] or a single boolean like
fields["source_has_partition_cols"], and update the fields["reason"] messages
and UNIQUE-constraint suggestion to reference the real partition column list
(join partitionCols) and fk.ReferencedColumns accordingly so the DROP vs EXPAND
decision and explanatory text reflect the true partition key(s) rather than
always using created_at.

In `@pkg/db/partitions.go`:
- Around line 1604-1612: The index cleanup in UpdatePartitionedTable is dropping
PostgreSQL constraint-backed indexes (created by MoveForeignKeys as UNIQUE
constraints) because currentIdxMap contains them; modify UpdatePartitionedTable
to first query pg_constraint/pg_class to collect names of indexes backing
constraints (e.g., select conindid/pg_get_indexdef or join pg_constraint ->
pg_class to get index names) into a set (constraintIdxSet) and then skip any
idxName that is in constraintIdxSet (in addition to the existing
strings.HasSuffix(idxName, "_pkey") check) when iterating currentIdxMap; ensure
the new check references currentIdxMap, UpdatePartitionedTable and Skip
constraint-backed indexes so the UNIQUE constraints retain their backing
indexes.

In `@pkg/db/utils.go`:
- Around line 699-725: VerifyPartitionCoverage currently treats the date range
as inclusive of the day containing endDate, but the helper intends [startDate,
endDate) semantics; change how endDay is computed so the day containing endDate
is excluded. Specifically, in VerifyPartitionCoverage (and where endDay is set),
replace endDay := endDate.Truncate(24 * time.Hour) with endDay :=
endDate.Add(-time.Nanosecond).Truncate(24 * time.Hour) (or equivalent
subtraction of a tiny duration) so the loop over currentDate only checks up to
the last day fully before endDate; keep the rest of the loop, existingDates map
and getPartitionsInDateRange usage unchanged.
- Around line 1603-1628: The current validation skips checks when a target is
also a source (tableRenames/tableRenameMap) but the sequential ALTER TABLE
execution still fails for cycles (e.g., A→B, B→A); update the rename logic to
detect cycles in the rename graph and perform a two-phase rename instead: first
compute strongly connected components or any cycle detection over tableRenames,
then for any cycle rename all source tables to unique temporary names (e.g.,
append a safe suffix or GUID), execute those ALTER TABLE ... RENAME TO temp
operations, and finally rename temps to their intended targets; modify the
execution loop that issues the ALTER TABLE statements to use this temp-phase for
cycles and fall back to direct single-step renames for acyclic mappings.

---

Nitpick comments:
In `@pkg/db/utils_test.go`:
- Around line 121-183: The TestMigrateTableDataValidation unit test only asserts
on literals and never calls the real migration helpers (e.g., the new
migrateTableData / partition helper functions), so it gives false confidence;
update the test to either (A) call the actual helpers under test with a
lightweight in-memory or test DB fixture and assert their behavior (invoke
migrateTableData/PartitionHelper methods with representative inputs, verify
returned errors and SQL/results) or (B) if you want to keep it as documentation,
move these cases into package examples or godoc and add real integration tests
elsewhere; ensure the test references and exercises the actual functions
(migrateTableData, PartitionHelper, or whatever new DB helper names exist) and
covers dryRun=true/false, omitted columns, and error paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 39e750cc-5817-48ab-bfbb-df0cc3ff2816

📥 Commits

Reviewing files that changed from the base of the PR and between 5bbcb4d and af7e262.

📒 Files selected for processing (8)
  • pkg/dataloader/prowloader/prow.go
  • pkg/db/db.go
  • pkg/db/migration.go
  • pkg/db/models/prow.go
  • pkg/db/partitions.go
  • pkg/db/partitions_test.go
  • pkg/db/utils.go
  • pkg/db/utils_test.go

Comment on lines +127 to +129
for _, partitionedModel := range partitionedModelsToMigrate {
if _, err := d.UpdatePartitionedTable(partitionedModel.model, partitionedModel.tableName, false, false); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This path only works if the partitioned tables already exist.

UpdatePartitionedTable errors when the target table is missing. Since these models were removed from AutoMigrate, any environment bootstrapping from an empty schema will stop here before the later partition sync can create them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/db.go` around lines 127 - 129, The current loop calls
UpdatePartitionedTable for each partitionedModel in partitionedModelsToMigrate
but fails when the physical partitioned table doesn't exist; modify the flow to
ensure the base partitioned table exists before calling UpdatePartitionedTable
by either invoking the table-creation routine (e.g., call CreatePartitionedTable
or run AutoMigrate for partitionedModel.model/tableName) or update
UpdatePartitionedTable to detect a missing table and create it instead;
reference partitionedModelsToMigrate, UpdatePartitionedTable, AutoMigrate, and
the partition sync so the missing-table creation happens prior to the
UpdatePartitionedTable call to avoid bootstrapping failures.

Comment on lines +391 to +409
// Assume partitioning on created_at (most common case) — check if source has it
hasCreatedAt := srcColSet["created_at"]
fields["source_has_created_at"] = hasCreatedAt

if len(srcPartCols) > 0 {
// Source table is also partitioned
fields["action"] = "DROP"
fields["reason"] = "both tables are partitioned; FK constraints between partitioned tables are not supported"
} else if !hasCreatedAt {
// Source table lacks the partition key column — FK must be dropped
fields["action"] = "DROP"
fields["reason"] = fmt.Sprintf("source table %s lacks created_at column needed to reference partitioned %s; FK cannot be expanded",
fk.SourceTable, tableName)
} else {
// Source table has the partition key — FK could be expanded
fields["action"] = "EXPAND or DROP"
fields["reason"] = fmt.Sprintf("source table %s has created_at column; FK could be expanded to include partition key, "+
"but this requires a UNIQUE constraint on %s(%s, created_at) and adds complexity",
fk.SourceTable, tableName, fk.ReferencedColumns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

AnalyzePartitioningImpact is only correct for created_at partitions.

This helper already looks up the table's real partition columns, but the inbound-FK branch ignores them and hard-codes created_at. For any table partitioned on a different key, the report will recommend dropping or expanding the wrong FK shape.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/migration.go` around lines 391 - 409, The logic in
AnalyzePartitioningImpact incorrectly hardcodes "created_at" as the partition
key when checking inbound FKs; change it to use the actual partition column(s)
discovered for the target table (e.g., the variable holding partition columns
like partCols/partitionCols) instead of hasCreatedAt and "created_at" literals:
compute whether the source has all required partition column(s) (e.g., check
each name in partitionCols against srcColSet), set fields["source_has_<col>"] or
a single boolean like fields["source_has_partition_cols"], and update the
fields["reason"] messages and UNIQUE-constraint suggestion to reference the real
partition column list (join partitionCols) and fk.ReferencedColumns accordingly
so the DROP vs EXPAND decision and explanatory text reflect the true partition
key(s) rather than always using created_at.

Comment on lines +1604 to +1612
// Drop indexes that are no longer in the model
for idxName := range currentIdxMap {
// Skip primary key and system indexes
if strings.HasSuffix(idxName, "_pkey") {
continue
}
alterStatements = append(alterStatements,
fmt.Sprintf("DROP INDEX IF EXISTS %s", idxName))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the specific lines in question
sed -n '1604,1612p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 344


🏁 Script executed:

# Let's also see more context around those lines
sed -n '1590,1625p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 1116


🏁 Script executed:

# Find the UpdatePartitionedTable function definition
grep -n "func.*UpdatePartitionedTable" pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 187


🏁 Script executed:

# Find MoveForeignKeys function
grep -n "func.*MoveForeignKeys" pkg/db/utils.go

Repository: openshift/sippy

Length of output: 148


🏁 Script executed:

# Let's look at the MoveForeignKeys function
sed -n '963,1050p' pkg/db/utils.go

Repository: openshift/sippy

Length of output: 2825


🏁 Script executed:

# Check what creates UNIQUE constraints - look for more of MoveForeignKeys
sed -n '963,1100p' pkg/db/utils.go

Repository: openshift/sippy

Length of output: 4334


🏁 Script executed:

# Let's understand how currentIdxMap is built - look earlier in UpdatePartitionedTable
sed -n '1293,1400p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 3324


🏁 Script executed:

# See how getCurrentIndexes is implemented
grep -n "func.*getCurrentIndexes" pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 137


🏁 Script executed:

# Read that function
sed -n '1200,1280p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 2320


🏁 Script executed:

# Continue reading UpdatePartitionedTable to see how modelIndexes is built
sed -n '1400,1550p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 5399


🏁 Script executed:

# Read getCurrentIndexes function
sed -n '1675,1750p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 2153


🏁 Script executed:

# Also check what MoveForeignKeys specifically creates - look for UNIQUE constraint creation
sed -n '1050,1200p' pkg/db/utils.go

Repository: openshift/sippy

Length of output: 4972


Exclude constraint-backed indexes from the index cleanup loop.

When MoveForeignKeys creates UNIQUE constraints to support expanded FKs, PostgreSQL automatically creates backing indexes with the constraint name. The index cleanup in UpdatePartitionedTable will drop these indexes (they're in the current database but not in the GORM model definition), leaving the UNIQUE constraints without their backing indexes and corrupting the schema.

Constraint-backed indexes can be identified via the pg_constraint system catalog and should be skipped during cleanup alongside the _pkey indexes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/partitions.go` around lines 1604 - 1612, The index cleanup in
UpdatePartitionedTable is dropping PostgreSQL constraint-backed indexes (created
by MoveForeignKeys as UNIQUE constraints) because currentIdxMap contains them;
modify UpdatePartitionedTable to first query pg_constraint/pg_class to collect
names of indexes backing constraints (e.g., select conindid/pg_get_indexdef or
join pg_constraint -> pg_class to get index names) into a set (constraintIdxSet)
and then skip any idxName that is in constraintIdxSet (in addition to the
existing strings.HasSuffix(idxName, "_pkey") check) when iterating
currentIdxMap; ensure the new check references currentIdxMap,
UpdatePartitionedTable and Skip constraint-backed indexes so the UNIQUE
constraints retain their backing indexes.

Comment on lines +699 to +725
func (dbc *DB) VerifyPartitionCoverage(tableName string, startDate, endDate time.Time) error {
partitions, err := dbc.getPartitionsInDateRange(tableName, startDate, endDate)
if err != nil {
return fmt.Errorf("failed to get partitions: %w", err)
}

// Create a map of existing partition dates for quick lookup
existingDates := make(map[string]bool)
for _, p := range partitions {
dateStr := p.PartitionDate.Format("2006-01-02")
existingDates[dateStr] = true
}

// Normalize to calendar days so timestamps within a day are handled correctly
startDay := startDate.Truncate(24 * time.Hour)
endDay := endDate.Truncate(24 * time.Hour)

// Check that we have a partition for each day in the range
var missingDates []string
currentDate := startDay
for !currentDate.After(endDay) {
dateStr := currentDate.Format("2006-01-02")
if !existingDates[dateStr] {
missingDates = append(missingDates, dateStr)
}
currentDate = currentDate.AddDate(0, 0, 1) // Move to next day
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The partition coverage check is off by one day.

This helper documents [startDate, endDate), but the loop still validates the day containing endDate itself. A migration ending exactly at midnight should not require a partition for the first day outside the copy window.

Suggested fix
-	// Normalize to calendar days so timestamps within a day are handled correctly
-	startDay := startDate.Truncate(24 * time.Hour)
-	endDay := endDate.Truncate(24 * time.Hour)
+	// Normalize to calendar days so timestamps within a day are handled correctly.
+	// The required coverage is [startDate, endDate), so compute the last covered day
+	// from the instant just before endDate.
+	startDay := startDate.Truncate(24 * time.Hour)
+	lastRequiredDay := endDate.Add(-time.Nanosecond).Truncate(24 * time.Hour)

 	// Check that we have a partition for each day in the range
 	var missingDates []string
 	currentDate := startDay
-	for !currentDate.After(endDay) {
+	for !currentDate.After(lastRequiredDay) {
 		dateStr := currentDate.Format("2006-01-02")
 		if !existingDates[dateStr] {
 			missingDates = append(missingDates, dateStr)
 		}
 		currentDate = currentDate.AddDate(0, 0, 1) // Move to next day
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (dbc *DB) VerifyPartitionCoverage(tableName string, startDate, endDate time.Time) error {
partitions, err := dbc.getPartitionsInDateRange(tableName, startDate, endDate)
if err != nil {
return fmt.Errorf("failed to get partitions: %w", err)
}
// Create a map of existing partition dates for quick lookup
existingDates := make(map[string]bool)
for _, p := range partitions {
dateStr := p.PartitionDate.Format("2006-01-02")
existingDates[dateStr] = true
}
// Normalize to calendar days so timestamps within a day are handled correctly
startDay := startDate.Truncate(24 * time.Hour)
endDay := endDate.Truncate(24 * time.Hour)
// Check that we have a partition for each day in the range
var missingDates []string
currentDate := startDay
for !currentDate.After(endDay) {
dateStr := currentDate.Format("2006-01-02")
if !existingDates[dateStr] {
missingDates = append(missingDates, dateStr)
}
currentDate = currentDate.AddDate(0, 0, 1) // Move to next day
}
func (dbc *DB) VerifyPartitionCoverage(tableName string, startDate, endDate time.Time) error {
partitions, err := dbc.getPartitionsInDateRange(tableName, startDate, endDate)
if err != nil {
return fmt.Errorf("failed to get partitions: %w", err)
}
// Create a map of existing partition dates for quick lookup
existingDates := make(map[string]bool)
for _, p := range partitions {
dateStr := p.PartitionDate.Format("2006-01-02")
existingDates[dateStr] = true
}
// Normalize to calendar days so timestamps within a day are handled correctly.
// The required coverage is [startDate, endDate), so compute the last covered day
// from the instant just before endDate.
startDay := startDate.Truncate(24 * time.Hour)
lastRequiredDay := endDate.Add(-time.Nanosecond).Truncate(24 * time.Hour)
// Check that we have a partition for each day in the range
var missingDates []string
currentDate := startDay
for !currentDate.After(lastRequiredDay) {
dateStr := currentDate.Format("2006-01-02")
if !existingDates[dateStr] {
missingDates = append(missingDates, dateStr)
}
currentDate = currentDate.AddDate(0, 0, 1) // Move to next day
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/utils.go` around lines 699 - 725, VerifyPartitionCoverage currently
treats the date range as inclusive of the day containing endDate, but the helper
intends [startDate, endDate) semantics; change how endDay is computed so the day
containing endDate is excluded. Specifically, in VerifyPartitionCoverage (and
where endDay is set), replace endDay := endDate.Truncate(24 * time.Hour) with
endDay := endDate.Add(-time.Nanosecond).Truncate(24 * time.Hour) (or equivalent
subtraction of a tiny duration) so the loop over currentDate only checks up to
the last day fully before endDate; keep the rest of the loop, existingDates map
and getPartitionsInDateRange usage unchanged.

Comment on lines +1603 to +1628
for _, rename := range tableRenames {
source := rename.From
target := rename.To
// Skip check if this target is also a source (table swap scenario)
if _, isAlsoSource := tableRenameMap[target]; isAlsoSource {
log.WithFields(log.Fields{
"source": source,
"target": target,
}).Debug("target is also a source - table swap detected")
continue
}

var exists bool
query := `
SELECT EXISTS (
SELECT 1 FROM pg_tables
WHERE schemaname = 'public' AND tablename = @table_name
)
`
result := dbc.DB.Raw(query, sql.Named("table_name", target)).Scan(&exists)
if result.Error != nil {
return 0, fmt.Errorf("failed to check if target table %s exists: %w", target, result.Error)
}
if exists {
return 0, fmt.Errorf("target table %s already exists (conflict with rename from %s)", target, source)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Get the full context of the function containing lines 1603-1628
cat -n pkg/db/utils.go | sed -n '1600,1650p'

Repository: openshift/sippy

Length of output: 2226


🏁 Script executed:

# Also check lines 1955-1968 mentioned in "Also applies to"
cat -n pkg/db/utils.go | sed -n '1950,1975p'

Repository: openshift/sippy

Length of output: 903


🏁 Script executed:

# Search for where tableRenames are actually executed (the ALTER TABLE statements)
rg -n "ALTER TABLE" pkg/db/utils.go -A 3 -B 3

Repository: openshift/sippy

Length of output: 3581


🏁 Script executed:

# Search for the full function that contains this rename logic
rg -n "func.*rename" pkg/db/utils.go -i

Repository: openshift/sippy

Length of output: 425


Swap/cycle renames still fail at execution time.

The validation at lines 1603-1628 detects when a target is also a source and skips the pre-execution conflict check. However, this does not address the core problem: the sequential execution at lines 1955-1968 will fail for any cycle (e.g., A→B, B→A). After the first ALTER TABLE A RENAME TO B succeeds, the second statement ALTER TABLE B RENAME TO A fails because table B already exists. Cycles require either temporary intermediate names or a two-phase rename strategy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/utils.go` around lines 1603 - 1628, The current validation skips
checks when a target is also a source (tableRenames/tableRenameMap) but the
sequential ALTER TABLE execution still fails for cycles (e.g., A→B, B→A); update
the rename logic to detect cycles in the rename graph and perform a two-phase
rename instead: first compute strongly connected components or any cycle
detection over tableRenames, then for any cycle rename all source tables to
unique temporary names (e.g., append a safe suffix or GUID), execute those ALTER
TABLE ... RENAME TO temp operations, and finally rename temps to their intended
targets; modify the execution loop that issues the ALTER TABLE statements to use
this temp-phase for cycles and fall back to direct single-step renames for
acyclic mappings.

@neisw neisw changed the title Trt 1989 partition management Trt-1989: partition management Mar 17, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 17, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@neisw: This pull request references Trt-1989 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Adds migration helpers for one off partition migration steps
  • Adds FK analysis for pre-migration review
  • Adds example delete helper to handle deletes without relying on cascading deletes / FK constraints
  • Adds partition mananagement for
  • Updating partitioned tables
  • Detaching old partitions
  • Dropping older detached partitions

Summary by CodeRabbit

Release Notes

  • New Features

  • Automated partition lifecycle management with configurable retention policies, automatic aging, and old partition cleanup

  • Database table migration utilities supporting partitioned table transitions with dry-run capabilities

  • Partition discovery, statistics, and retention analysis tools for improved data management

  • Enhanced data models to support partitioned table architecture

  • Tests

  • Comprehensive test coverage for partition management and migration utilities

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@neisw: This pull request references Trt-1989 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Adds migration helpers for one off partition migration steps
  • Adds FK analysis for pre-migration review
  • Adds example delete helper to handle deletes without relying on cascading deletes / FK constraints
  • Adds partition mananagement for
  • Updating partitioned tables
  • Detaching old partitions
  • Dropping older detached partitions

Summary by CodeRabbit

Release Notes

  • New Features

  • End-to-end partition management: discovery, stats, creation, attach/detach, aging and safe removal with dry-run support

  • Database migration utilities to convert and sync tables to partitioned layouts, verify schemas, and migrate data incrementally

  • Tools for partition impact analysis, identity/sequence sync, and controlled deletion of test-related data

  • Tests

  • Extensive unit tests covering partition logic, validation, SQL generation, and migration helpers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
pkg/db/migration.go (1)

393-411: ⚠️ Potential issue | 🟠 Major

Inbound FK analysis is still hard-coded to created_at.

The function already discovers actual partition columns (tablePartCols), but this branch still assumes created_at, so recommendations are wrong for other partition keys.

Suggested fix
-			// Assume partitioning on created_at (most common case) — check if source has it
-			hasCreatedAt := srcColSet["created_at"]
-			fields["source_has_created_at"] = hasCreatedAt
+			missingPartCols := []string{}
+			for _, partCol := range tablePartCols {
+				hasCol := srcColSet[partCol]
+				fields[fmt.Sprintf("source_has_%s", partCol)] = hasCol
+				if !hasCol {
+					missingPartCols = append(missingPartCols, partCol)
+				}
+			}
@@
-			} else if !hasCreatedAt {
+			} else if len(missingPartCols) > 0 {
 				fields["action"] = "DROP"
-				fields["reason"] = fmt.Sprintf("source table %s lacks created_at column needed to reference partitioned %s; FK cannot be expanded",
-					fk.SourceTable, tableName)
+				fields["reason"] = fmt.Sprintf(
+					"source table %s lacks required partition column(s) [%s] to reference partitioned %s; FK cannot be expanded",
+					fk.SourceTable, strings.Join(missingPartCols, ", "), tableName,
+				)
 			} else {
 				fields["action"] = "EXPAND or DROP"
-				fields["reason"] = fmt.Sprintf("source table %s has created_at column; FK could be expanded to include partition key, "+
-					"but this requires a UNIQUE constraint on %s(%s, created_at) and adds complexity",
-					fk.SourceTable, tableName, fk.ReferencedColumns)
+				fields["reason"] = fmt.Sprintf(
+					"source table %s has required partition column(s); FK could be expanded to include partition key(s), "+
+						"but this requires a UNIQUE constraint on %s(%s, %s) and adds complexity",
+					fk.SourceTable, tableName, fk.ReferencedColumns, strings.Join(tablePartCols, ", "),
+				)
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/migration.go` around lines 393 - 411, The branch is incorrectly
hard-coded to "created_at"; instead derive the partition key(s) from
tablePartCols and check that all those columns exist in srcColSet (e.g.,
hasPartitionKeys := all cols in tablePartCols are present in srcColSet). Replace
uses of hasCreatedAt and the literal "created_at" in messages with that computed
hasPartitionKeys and the actual tablePartCols list, and update the reason text
to reference the required UNIQUE constraint on (fk.ReferencedColumns,
<comma-separated tablePartCols>) rather than created_at so the recommendation
correctly reflects the real partition key(s); adjust fields keys/names
accordingly (srcColSet, srcPartCols, fields, fk, tableName,
fk.ReferencedColumns, tablePartCols).
pkg/db/partitions.go (1)

1608-1616: ⚠️ Potential issue | 🟠 Major

Do not drop constraint-backed indexes during cleanup.

This loop still drops indexes that back constraints (not only _pkey should be protected). Dropping them can break constraint integrity after FK migration steps.

Suggested fix
+	// Gather indexes owned by constraints so we never drop them.
+	constraintIdxSet, err := dbc.getConstraintBackedIndexes(tableName)
+	if err != nil {
+		return "", fmt.Errorf("failed to get constraint-backed indexes: %w", err)
+	}
+
 	// Drop indexes that are no longer in the model
 	for idxName := range currentIdxMap {
 		// Skip primary key and system indexes
 		if strings.HasSuffix(idxName, "_pkey") {
 			continue
 		}
+		if constraintIdxSet[idxName] {
+			continue
+		}
 		alterStatements = append(alterStatements,
 			fmt.Sprintf("DROP INDEX IF EXISTS %s", idxName))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/partitions.go` around lines 1608 - 1616, The loop that drops indexes
(iterating currentIdxMap and appending to alterStatements) must skip any
constraint-backed indexes, not just ones ending with "_pkey"; update the loop in
partitions.go to check against the set/map of constraint-backed index names
(e.g., constraintIndexMap or whatever constraint map you populated earlier) and
continue if idxName is present there (or alternatively detect constraint-backed
names via the existing constraint metadata); only append DROP INDEX for indexes
that are neither primary/system nor listed as constraint-backed.
🧹 Nitpick comments (1)
pkg/db/partitions.go (1)

831-846: Avoid per-day IsPartitionAttached lookups in the date loop.

This loop issues an attachment query per existing date. For large ranges, this becomes an N+1 catalog-query path and slows partition sync significantly.

Suggested refactor
+	attachedPartitions, err := dbc.ListAttachedPartitions(tableName)
+	if err != nil {
+		return 0, fmt.Errorf("failed to list attached partitions: %w", err)
+	}
+	attachedSet := make(map[string]bool, len(attachedPartitions))
+	for _, p := range attachedPartitions {
+		attachedSet[p.TableName] = true
+	}
@@
-			attached, err := dbc.IsPartitionAttached(partitionName)
-			if err != nil {
-				return 0, fmt.Errorf("failed to check if partition %s is attached: %w", partitionName, err)
-			}
-			if !attached {
+			if !attachedSet[partitionName] {
 				partitionsToReattach = append(partitionsToReattach, partitionName)
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/partitions.go` around lines 831 - 846, The per-day attachment check
inside the date loop causes N+1 catalog queries; instead gather all partition
names for dates that exist into a slice (use the same partitionName format
"%s_%s" as in the loop) and call a single batched check (implement or use a
helper like IsPartitionsAttached(partitionNames []string) returning a
map[string]bool or set) to determine which are detached, then populate
partitionsToReattach from that result; keep the existing logic that appends new
dates to partitionsToCreate but remove the per-iteration call to
dbc.IsPartitionAttached and replace it with the batched lookup and corresponding
error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/db/migration.go`:
- Around line 168-171: The current use of
maxDate.UTC().Truncate(24*time.Hour).AddDate(0,0,1) moves migrateFrom to next
midnight and skips rows between maxDate and that midnight; change migrateFrom to
start just after the last migrated timestamp instead (e.g., migrateFrom :=
maxDate.UTC().Add(time.Nanosecond)) so rows with timestamps > maxDate are
included without reprocessing the exact maxDate row; apply the same fix to the
other occurrence around lines 257-260 where migrateFrom is computed and ensure
the call to MigrateTableDataRange still receives the adjusted migrateFrom.
- Around line 56-57: The code builds raw SQL with interpolated identifiers
(dateColumn, sourceTable) in the query variable used with dbc.DB.Raw, which
risks SQL injection and query breakage; fix by validating identifiers against a
strict whitelist/regex (e.g. ^[A-Za-z_][A-Za-z0-9_]*$) or by mapping to known
table/column names, then escape/quote the validated identifiers using the DB's
identifier quote (e.g. backticks or double quotes) before using fmt.Sprintf to
build the SQL; apply this change to the query construction sites that set query
and call dbc.DB.Raw (the SELECT MIN(...) block around query, and the other
occurrences referenced at the query-building code near the other two locations).

---

Duplicate comments:
In `@pkg/db/migration.go`:
- Around line 393-411: The branch is incorrectly hard-coded to "created_at";
instead derive the partition key(s) from tablePartCols and check that all those
columns exist in srcColSet (e.g., hasPartitionKeys := all cols in tablePartCols
are present in srcColSet). Replace uses of hasCreatedAt and the literal
"created_at" in messages with that computed hasPartitionKeys and the actual
tablePartCols list, and update the reason text to reference the required UNIQUE
constraint on (fk.ReferencedColumns, <comma-separated tablePartCols>) rather
than created_at so the recommendation correctly reflects the real partition
key(s); adjust fields keys/names accordingly (srcColSet, srcPartCols, fields,
fk, tableName, fk.ReferencedColumns, tablePartCols).

In `@pkg/db/partitions.go`:
- Around line 1608-1616: The loop that drops indexes (iterating currentIdxMap
and appending to alterStatements) must skip any constraint-backed indexes, not
just ones ending with "_pkey"; update the loop in partitions.go to check against
the set/map of constraint-backed index names (e.g., constraintIndexMap or
whatever constraint map you populated earlier) and continue if idxName is
present there (or alternatively detect constraint-backed names via the existing
constraint metadata); only append DROP INDEX for indexes that are neither
primary/system nor listed as constraint-backed.

---

Nitpick comments:
In `@pkg/db/partitions.go`:
- Around line 831-846: The per-day attachment check inside the date loop causes
N+1 catalog queries; instead gather all partition names for dates that exist
into a slice (use the same partitionName format "%s_%s" as in the loop) and call
a single batched check (implement or use a helper like
IsPartitionsAttached(partitionNames []string) returning a map[string]bool or
set) to determine which are detached, then populate partitionsToReattach from
that result; keep the existing logic that appends new dates to
partitionsToCreate but remove the per-iteration call to dbc.IsPartitionAttached
and replace it with the batched lookup and corresponding error handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e3f3880f-7afe-476b-ae4b-3ed8ecb640ff

📥 Commits

Reviewing files that changed from the base of the PR and between af7e262 and b2d93df.

📒 Files selected for processing (3)
  • pkg/db/db.go
  • pkg/db/migration.go
  • pkg/db/partitions.go

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@neisw: This pull request references Trt-1989 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Adds migration helpers for one off partition migration steps
  • Adds FK analysis for pre-migration review
  • Adds example delete helper to handle deletes without relying on cascading deletes / FK constraints
  • Adds partition mananagement for
  • Updating partitioned tables
  • Detaching old partitions
  • Dropping older detached partitions

Summary by CodeRabbit

  • New Features

  • End-to-end partition management: discovery, stats, automated partition creation/aging, attach/detach, safe removal and dry-run support

  • Database migration toolkit: create, update and finalize partitioned tables, incremental data migration, identity/sequence sync, and partition-aware renames

  • Partition impact analysis and controlled deletion of test data

  • Tests

  • Extensive unit tests covering partitioning, validation, SQL generation and migration helpers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
pkg/db/partitions.go (1)

1612-1620: ⚠️ Potential issue | 🟠 Major

Don't drop indexes that back live constraints.

getCurrentIndexes includes PostgreSQL-managed backing indexes for UNIQUE constraints. MoveForeignKeys in pkg/db/utils.go creates those constraints during FK expansion, and this cleanup only exempts _pkey, so a later UpdatePartitionedTable can fail with cannot drop index ... because constraint ... requires it. Filter out indexes referenced by pg_constraint.conindid before this loop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/partitions.go` around lines 1612 - 1620, The loop that drops indexes
(iterating currentIdxMap) is erroneously removing PostgreSQL-managed backing
indexes for UNIQUE/other constraints; update the logic so it first queries
pg_constraint.conindid (or uses getCurrentIndexes/MoveForeignKeys helpers) to
build a set of index OIDs/names that are referenced by constraints and then skip
those when appending DROP INDEX statements (in the same scope referencing
currentIdxMap and the DROP loop used by UpdatePartitionedTable). In short:
collect constraint-backed index identifiers from pg_constraint.conindid, add
them to a skip set, and change the existing DROP loop to continue when an
idxName is in that skip set (still keeping the existing "_pkey" exemption).
pkg/db/utils.go (2)

1603-1615: ⚠️ Potential issue | 🟠 Major

Table swaps still need a temp-name phase.

Lines 1608-1614 permit targets that are also sources, but the execution path is still a single-pass ALTER TABLE ... RENAME TO .... A true swap/cycle like A→B, B→A will still fail on the first conflicting rename. Detect cycles and route them through temporary names before the final pass.

Also applies to: 1957-1969

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/utils.go` around lines 1603 - 1615, The current loop over tableRenames
in pkg/db/utils.go skips targets that are also sources but still performs
single-pass ALTER TABLE ... RENAME TO ..., which will fail for cycles (e.g.,
A→B, B→A); modify the rename logic to detect cycles in
tableRenames/tableRenameMap (e.g., by building a directed graph of rename.From →
rename.To and finding strongly connected components or cycles) and for any cycle
generate unique temporary names and route those tables through a temp-name phase
before performing the final renames; update the code paths that iterate
tableRenames (the loop referencing source/target and tableRenameMap) to branch:
non-cyclic renames can be executed normally, while cyclic components are
transformed into two-phase renames using the temp names to avoid conflicts.

712-725: ⚠️ Potential issue | 🟠 Major

Keep endDate exclusive in coverage checks.

MigrateTableDataRange copies [startDate, endDate), but this loop still requires the partition for the day containing endDate. A run ending exactly at midnight, or an empty range with startDate == endDate, is reported as missing a partition that is never read.

Suggested fix
-	// Normalize to calendar days so timestamps within a day are handled correctly
-	startDay := startDate.Truncate(24 * time.Hour)
-	endDay := endDate.Truncate(24 * time.Hour)
+	// Normalize to calendar days so timestamps within a day are handled correctly.
+	// Required coverage is [startDate, endDate).
+	startDay := startDate.Truncate(24 * time.Hour)
+	lastRequiredDay := endDate.Add(-time.Nanosecond).Truncate(24 * time.Hour)

 	// Check that we have a partition for each day in the range
 	var missingDates []string
 	currentDate := startDay
-	for !currentDate.After(endDay) {
+	for !currentDate.After(lastRequiredDay) {
 		dateStr := currentDate.Format("2006-01-02")
 		if !existingDates[dateStr] {
 			missingDates = append(missingDates, dateStr)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/utils.go` around lines 712 - 725, The coverage check currently treats
endDate as inclusive and thus may require a partition for the day containing
endDate even though MigrateTableDataRange copies [startDate, endDate) (causing
false missing partitions or when startDate==endDate). Fix by making the
end-of-range exclusive before normalizing: compute an exclusive endDay from
endDate (e.g., endDate.Add(-time.Nanosecond) then Truncate(24*time.Hour)) or
skip the loop when endDate <= startDate, and then iterate from startDay to that
exclusive endDay when building missingDates; update references in the same
function (startDay, endDay/currentDate, existingDates, missingDates,
MigrateTableDataRange usage) accordingly.
pkg/db/migration.go (1)

394-412: ⚠️ Potential issue | 🟠 Major

Use the real partition key in the inbound FK analysis.

This branch still hard-codes created_at, so any table partitioned on a different column gets the wrong DROP/EXPAND guidance. If this helper is meant for pre-migration review, it needs the planned partition column(s) as input instead of assuming one.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/migration.go` around lines 394 - 412, The code currently hard-codes
"created_at" when deciding whether an inbound FK can be expanded; replace that
assumption by using the actual planned partition key(s) passed into this helper
(e.g., add or use a parameter like plannedPartitionCols or partitionKey) instead
of "created_at". Update the existence check to use srcColSet[partitionKey] (or
iterate plannedPartitionCols and ensure all are present), adjust the fields
entries and error/reason messages to reference the real partition key
variable(s) rather than the literal "created_at", and when describing the UNIQUE
constraint change the referenced columns from "%s(%s, created_at)" to include
the actual partition key(s) (e.g., "%s(%s, <partitionKeyList>)"); keep logic
around srcPartCols, fields["action"], and fields["reason"] otherwise the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/db/migration.go`:
- Around line 283-301: The swap and FK rewrite must run atomically: perform
RenameTables and MoveForeignKeys inside a single DB transaction so failures roll
back both operations. Either (a) open a transaction (e.g., dbc.BeginTx) and call
tx-aware versions of RenameTables and MoveForeignKeys (or modify those functions
to accept a tx param like RenameTables(tx, ...) and MoveForeignKeys(tx, ...)),
or (b) add overloads to RenameTables/MoveForeignKeys to accept an existing tx
and use that tx for the DDL; on any error rollback the transaction and return
the error, otherwise commit and then log the successful renames/moves. Ensure
you update the code paths that currently call RenameTables and MoveForeignKeys
(and use variables renames, oldTable, sourceTable, moveForeignKeys, dryRun) to
use the transactional variants so the swap + FK rewrite are atomic.

In `@pkg/db/partitions.go`:
- Around line 277-305: The query uses cutoffDate (set as
time.Now().AddDate(...)) and compares it to TO_DATE(...) causing off-by-one
deletions; change to compare dates only by either replacing the bound parameter
with SQL CURRENT_DATE (i.e., keep the WHERE clause using CURRENT_DATE -
`@retentionDays` or TO_DATE(...) < CURRENT_DATE - `@retentionDays`) or compute
cutoffDate as a date-only value before binding (e.g., truncate to midnight with
time.Now().Truncate(24*time.Hour).AddDate(0,0,-retentionDays)) and bind that
instead; update both places where cutoffDate is used (the attachedOnly query
block referencing TO_DATE(... ) < `@cutoff_date` and the GetRetentionSummary
function) and ensure the parameter/SQL uses date-only semantics so TO_DATE(...)
compares to a date, not a timestamp.

---

Duplicate comments:
In `@pkg/db/migration.go`:
- Around line 394-412: The code currently hard-codes "created_at" when deciding
whether an inbound FK can be expanded; replace that assumption by using the
actual planned partition key(s) passed into this helper (e.g., add or use a
parameter like plannedPartitionCols or partitionKey) instead of "created_at".
Update the existence check to use srcColSet[partitionKey] (or iterate
plannedPartitionCols and ensure all are present), adjust the fields entries and
error/reason messages to reference the real partition key variable(s) rather
than the literal "created_at", and when describing the UNIQUE constraint change
the referenced columns from "%s(%s, created_at)" to include the actual partition
key(s) (e.g., "%s(%s, <partitionKeyList>)"); keep logic around srcPartCols,
fields["action"], and fields["reason"] otherwise the same.

In `@pkg/db/partitions.go`:
- Around line 1612-1620: The loop that drops indexes (iterating currentIdxMap)
is erroneously removing PostgreSQL-managed backing indexes for UNIQUE/other
constraints; update the logic so it first queries pg_constraint.conindid (or
uses getCurrentIndexes/MoveForeignKeys helpers) to build a set of index
OIDs/names that are referenced by constraints and then skip those when appending
DROP INDEX statements (in the same scope referencing currentIdxMap and the DROP
loop used by UpdatePartitionedTable). In short: collect constraint-backed index
identifiers from pg_constraint.conindid, add them to a skip set, and change the
existing DROP loop to continue when an idxName is in that skip set (still
keeping the existing "_pkey" exemption).

In `@pkg/db/utils.go`:
- Around line 1603-1615: The current loop over tableRenames in pkg/db/utils.go
skips targets that are also sources but still performs single-pass ALTER TABLE
... RENAME TO ..., which will fail for cycles (e.g., A→B, B→A); modify the
rename logic to detect cycles in tableRenames/tableRenameMap (e.g., by building
a directed graph of rename.From → rename.To and finding strongly connected
components or cycles) and for any cycle generate unique temporary names and
route those tables through a temp-name phase before performing the final
renames; update the code paths that iterate tableRenames (the loop referencing
source/target and tableRenameMap) to branch: non-cyclic renames can be executed
normally, while cyclic components are transformed into two-phase renames using
the temp names to avoid conflicts.
- Around line 712-725: The coverage check currently treats endDate as inclusive
and thus may require a partition for the day containing endDate even though
MigrateTableDataRange copies [startDate, endDate) (causing false missing
partitions or when startDate==endDate). Fix by making the end-of-range exclusive
before normalizing: compute an exclusive endDay from endDate (e.g.,
endDate.Add(-time.Nanosecond) then Truncate(24*time.Hour)) or skip the loop when
endDate <= startDate, and then iterate from startDay to that exclusive endDay
when building missingDates; update references in the same function (startDay,
endDay/currentDate, existingDates, missingDates, MigrateTableDataRange usage)
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c2833840-5fba-43f1-80e6-2e5878486b73

📥 Commits

Reviewing files that changed from the base of the PR and between b2d93df and 98d456e.

📒 Files selected for processing (3)
  • pkg/db/migration.go
  • pkg/db/partitions.go
  • pkg/db/utils.go

Comment on lines +283 to +301
count, err := dbc.RenameTables(renames, true, true, true, true, dryRun)
if err != nil {
return fmt.Errorf("failed to swap tables: %w", err)
}
l.WithField("renames", count).Info("tables swapped")

// Step 6: Move or drop foreign keys
fkTarget := sourceTable
if !moveForeignKeys {
fkTarget = ""
}
if moveForeignKeys {
l.Info("step 6: moving foreign keys")
} else {
l.Info("step 6: dropping foreign keys")
}
moved, err := dbc.MoveForeignKeys(oldTable, fkTarget, dryRun)
if err != nil {
return fmt.Errorf("failed to process foreign keys: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make the swap and FK rewrite atomic.

Line 283 commits the table swap before Line 299 starts a separate FK rewrite. If MoveForeignKeys fails, the migration is left half-finished: the app now points at the new sourceTable, while some FKs still reference *_old. Please run these steps in one transaction, or make RenameTables/MoveForeignKeys accept an existing tx.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/migration.go` around lines 283 - 301, The swap and FK rewrite must run
atomically: perform RenameTables and MoveForeignKeys inside a single DB
transaction so failures roll back both operations. Either (a) open a transaction
(e.g., dbc.BeginTx) and call tx-aware versions of RenameTables and
MoveForeignKeys (or modify those functions to accept a tx param like
RenameTables(tx, ...) and MoveForeignKeys(tx, ...)), or (b) add overloads to
RenameTables/MoveForeignKeys to accept an existing tx and use that tx for the
DDL; on any error rollback the transaction and return the error, otherwise
commit and then log the successful renames/moves. Ensure you update the code
paths that currently call RenameTables and MoveForeignKeys (and use variables
renames, oldTable, sourceTable, moveForeignKeys, dryRun) to use the
transactional variants so the swap + FK rewrite are atomic.

Comment on lines +277 to +305
cutoffDate := time.Now().AddDate(0, 0, -retentionDays)

var query string
if attachedOnly {
// Only return attached partitions
query = `
WITH attached_partitions AS (
SELECT c.relname AS tablename
FROM pg_inherits i
JOIN pg_class c ON i.inhrelid = c.oid
JOIN pg_class p ON i.inhparent = p.oid
WHERE p.relname = @table_name
)
SELECT
tablename,
'public' as schemaname,
TO_DATE(SUBSTRING(tablename FROM '_(\d{4}_\d{2}_\d{2})$'), 'YYYY_MM_DD') AS partition_date,
(CURRENT_DATE - TO_DATE(SUBSTRING(tablename FROM '_(\d{4}_\d{2}_\d{2})$'), 'YYYY_MM_DD'))::INT AS age_days,
pg_total_relation_size('public.'||tablename) AS size_bytes,
pg_size_pretty(pg_total_relation_size('public.'||tablename)) AS size_pretty,
COALESCE(n_live_tup, 0) AS row_estimate
FROM pg_tables
LEFT JOIN pg_stat_user_tables ON pg_stat_user_tables.relname = pg_tables.tablename
AND pg_stat_user_tables.schemaname = pg_tables.schemaname
WHERE pg_tables.schemaname = 'public'
AND pg_tables.tablename LIKE @table_pattern ESCAPE '\'
AND pg_tables.tablename IN (SELECT tablename FROM attached_partitions)
AND TO_DATE(SUBSTRING(tablename FROM '_(\d{4}_\d{2}_\d{2})$'), 'YYYY_MM_DD') < @cutoff_date
ORDER BY partition_date ASC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/db/partitions.go | head -320 | tail -50

Repository: openshift/sippy

Length of output: 2753


🏁 Script executed:

sed -n '277,310p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 1418


🏁 Script executed:

sed -n '357,390p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 1249


🏁 Script executed:

sed -n '320,330p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 474


🏁 Script executed:

sed -n '340,345p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 277


🏁 Script executed:

rg -n "QueryRow|Query|cutoff_date" pkg/db/partitions.go | head -20

Repository: openshift/sippy

Length of output: 925


🏁 Script executed:

sed -n '350,410p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 2571


🏁 Script executed:

sed -n '330,340p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 450


🏁 Script executed:

sed -n '405,410p' pkg/db/partitions.go

Repository: openshift/sippy

Length of output: 314


🏁 Script executed:

python3 << 'EOF'
# Demonstrate the PostgreSQL date/timestamp comparison issue

# Scenario: retentionDays = 30, current time = 2025-03-15 14:30:45
import datetime

now = datetime.datetime(2025, 3, 15, 14, 30, 45)
retention_days = 30
cutoff_date = now - datetime.timedelta(days=retention_days)

print("Current time:", now)
print("Retention days:", retention_days)
print("Cutoff timestamp:", cutoff_date)
print("Cutoff date only:", cutoff_date.date())
print()

# Example partition dates to check
partition_dates = [
    datetime.date(2025, 2, 12),  # 31 days old
    datetime.date(2025, 2, 13),  # 30 days old (exactly)
    datetime.date(2025, 2, 14),  # 29 days old
]

print("PostgreSQL comparison: TO_DATE(...) < `@cutoff_date`")
print("WHERE TO_DATE extracts date from partition name")
print()

for pdate in partition_dates:
    # PostgreSQL TO_DATE creates midnight
    date_as_timestamp_midnight = datetime.datetime.combine(pdate, datetime.time(0, 0, 0))
    
    # Comparison with the timestamp cutoff
    is_selected = date_as_timestamp_midnight < cutoff_date
    
    print(f"Partition {pdate} (age {(now.date() - pdate).days} days)")
    print(f"  {date_as_timestamp_midnight} < {cutoff_date}")
    print(f"  Selected for deletion: {is_selected}")
    print()
EOF

Repository: openshift/sippy

Length of output: 610


🏁 Script executed:

# Check if there are calls to these functions and what retentionDays values are used
rg -n "GetPartitionsForRemoval|GetRetentionSummary" pkg/db/partitions.go | head -10

Repository: openshift/sippy

Length of output: 685


🏁 Script executed:

rg -n "GetPartitionsForRemoval|GetRetentionSummary" --type go -A 3 | grep -E "GetPartitionsForRemoval|GetRetentionSummary|retentionDays" | head -20

Repository: openshift/sippy

Length of output: 892


Use CURRENT_DATE for the partition age cutoff to avoid off-by-one deletions.

The cutoff date is computed as time.Now().AddDate(0, 0, -retentionDays), which includes the current time of day. When this timestamp is compared against TO_DATE(...) in the WHERE clause—which returns midnight—a partition exactly at the retention boundary gets selected for deletion after the current day passes noon. For example, with retentionDays=30 at 2:30 PM on March 15, the partition from February 13 (exactly 30 days old) is incorrectly selected because 2025-02-13 00:00:00 < 2025-02-13 14:30:45. Replace with CURRENT_DATE in the SQL query, or compute the cutoff using only the date component before binding the parameter.

Also applies to: 357–406 (GetRetentionSummary)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/partitions.go` around lines 277 - 305, The query uses cutoffDate (set
as time.Now().AddDate(...)) and compares it to TO_DATE(...) causing off-by-one
deletions; change to compare dates only by either replacing the bound parameter
with SQL CURRENT_DATE (i.e., keep the WHERE clause using CURRENT_DATE -
`@retentionDays` or TO_DATE(...) < CURRENT_DATE - `@retentionDays`) or compute
cutoffDate as a date-only value before binding (e.g., truncate to midnight with
time.Now().Truncate(24*time.Hour).AddDate(0,0,-retentionDays)) and bind that
instead; update both places where cutoffDate is used (the attachedOnly query
block referencing TO_DATE(... ) < `@cutoff_date` and the GetRetentionSummary
function) and ensure the parameter/SQL uses date-only semantics so TO_DATE(...)
compares to a date, not a timestamp.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@neisw: This pull request references Trt-1989 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Adds migration helpers for one off partition migration steps
  • Adds FK analysis for pre-migration review
  • Adds example delete helper to handle deletes without relying on cascading deletes / FK constraints
  • Adds partition mananagement for
  • Updating partitioned tables
  • Detaching old partitions
  • Dropping older detached partitions

Summary by CodeRabbit

  • New Features

  • End-to-end partition management: discovery, stats, automated partition creation/aging, attach/detach, safe removal and dry-run support

  • Database migration toolkit: create, update and finalize partitioned tables, incremental data migration, identity/sequence sync, and partition-aware renames

  • Partition impact analysis and controlled deletion of test data

  • Refactor

  • Models and migration flow adjusted to support partitioned tables and explicit timestamp/primary-key handling

  • Tests

  • Extensive unit tests covering partitioning, validation, SQL generation and migration helpers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@neisw: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e a297fe9 link true /test e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@neisw
Copy link
Contributor Author

neisw commented Mar 17, 2026

/hold
until after #3333 merges and migration is complete

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants