Skip to content

CLDSRV-922: do not reference null version from master in null-key mode#6190

Merged
bert-e merged 2 commits into
development/9.3from
bugfix/CLDSRV-922-backbeat-null-version-master
Jun 16, 2026
Merged

CLDSRV-922: do not reference null version from master in null-key mode#6190
bert-e merged 2 commits into
development/9.3from
bugfix/CLDSRV-922-backbeat-null-version-master

Conversation

@tcarmet

@tcarmet tcarmet commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

When replicating a version on top of a pre-existing null version, the destination master could end up referencing that null version in a way the storage layout no longer supports, which intermittently left a bucket non-empty after all its objects were deleted. This change makes replication follow the same null-version contract as the regular write path so the leftover key can no longer occur.

Added a deterministic test that asserts the master references the null version only in compatibility mode; it fails on the old behavior and passes on the new. The full backbeat-route functional suite was run against a bucketd-backed stack in both compatibility modes with no regressions.

@bert-e

bert-e commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Hello tcarmet,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e

bert-e commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Incorrect fix version

The Fix Version/s in issue CLDSRV-922 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.3.10

  • 9.4.0

Please check the Fix Version/s of CLDSRV-922, or the target
branch of this pull request.

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

LGTM

The fix addresses two real bugs in the replication path:
- options.extraMDversioningPreprocessingResult.extraMD was a typo that silently prevented extraMD from ever being applied to the master metadata during replication.
- nullVersionId was unconditionally set on the master in both null-key mode and compat mode, but in null-key mode the master must not carry it (the null version is stored under the null key instead). The fix correctly delegates this to extraMD, which is only populated in compat mode.

The deleteNullKey assignment is also correctly moved outside the old nullVersionId guard so it's always set when the result exists.

Test is well-structured and deterministic.

Review by Claude Code

@tcarmet tcarmet marked this pull request as draft June 15, 2026 23:51
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
8483 2 8481 0
View the full list of 2 ❄️ flaky test(s)
"after each" hook for "should fail if trying to overwrite a delete marker"::MPU with x-scal-s3-version-id header With default signature "after each" hook for "should fail if trying to overwrite a delete marker"

Flake rate in main: 100.00% (Passed 0 times, Failed 14 times)

Stack Traces | 0.013s run time
We encountered an internal error. Please try again.
"after each" hook for "should fail if trying to overwrite a delete marker"::MPU with x-scal-s3-version-id header With v4 signature "after each" hook for "should fail if trying to overwrite a delete marker"

Flake rate in main: 100.00% (Passed 0 times, Failed 5 times)

Stack Traces | 0.013s run time
We encountered an internal error. Please try again.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@tcarmet tcarmet force-pushed the bugfix/CLDSRV-922-backbeat-null-version-master branch from 1a6d248 to af3bc64 Compare June 16, 2026 00:00
@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown

LGTM

The fix correctly addresses two bugs in the backbeat replication path:
- Unconditionally setting omVal.nullVersionId regardless of null-version mode (should only be set in compat mode via extraMD)
- Using options.extraMD (the local options variable, which never has extraMD) instead of versioningPreprocessingResult.extraMD — meaning Object.assign(omVal, options.extraMD) was always a no-op, even in compat mode

The new code correctly gates nullVersionId on master through extraMD (only present in compat mode) and fixes the data source. Tests are well-targeted.

Review by Claude Code

@tcarmet tcarmet marked this pull request as ready for review June 16, 2026 00:41
@tcarmet tcarmet requested a review from a team June 16, 2026 00:42
@bert-e

bert-e commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

Comment thread tests/multipleBackend/routes/routeBackbeatForReplication.js Outdated
The backbeat putMetadata path stamped nullVersionId on the new master
unconditionally, even when null-version compatibility mode is disabled.
In null-key mode the null version is stored under the null key, so a
master referencing it via nullVersionId leads the metadata backend to
orphan the null key on a later version-specific delete, leaving the
bucket non-empty (intermittent BucketNotEmpty on cleanup).

Mirror the normal createAndStoreObject path: apply nullVersionId to the
master only via extraMD (populated solely in compat mode), and fix the
adjacent Object.assign that read options.extraMD instead of the
versioningPreprocessing result.

Add a deterministic functional test asserting the master carries
nullVersionId only in compat mode, and update the routeBackbeat unit
test which previously asserted the old (buggy) master nullVersionId.
@tcarmet tcarmet force-pushed the bugfix/CLDSRV-922-backbeat-null-version-master branch from af3bc64 to 45dc80a Compare June 16, 2026 17:33
@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown

LGTM

Review by Claude Code

return routeBackbeatAPIProxy(request, response, requestContexts, log);
}

const useMultipleBackend =
Comment on lines +712 to +729
async () => {
// If we create a new version of an object (so objMd is null),
// we should make sure that the masterVersion is versioned.
// If an object already exists, we just want to update the metadata
// of the existing object and not create a new one
if (versioning && !objMd) {
let masterMD;

try {
masterMD = await metadata.getObjectMDPromised(bucketName, objectKey, {}, log);
} catch (err) {
if (err.is?.NoSuchKey) {
log.debug('no master found for versioned object', {
method: 'putMetadata',
bucketName,
objectKey,
});
} else {
throw err;
}
Comment on lines 777 to -871
@@ -789,68 +804,74 @@ function putMetadata(request, response, bucketInfo, objMd, log, callback) {
});
// Treat as success - the repair already completed
// Get the current metadata to return
return metadata.getObjectMD(bucketName, objectKey, {}, log,
(getErr, currentMD) => {
if (getErr) {
log.warn('could not retrieve metadata after repair duplicate key error', {
error: getErr,
method: 'putMetadata',
});
// Still treat as success since repair likely completed
return next(null, md || {});
}
return next(null, currentMD || md || {});
});
return metadata.getObjectMD(bucketName, objectKey, {}, log, (getErr, currentMD) => {
if (getErr) {
log.warn('could not retrieve metadata after repair duplicate key error', {
error: getErr,
method: 'putMetadata',
});
// Still treat as success since repair likely completed
return next(null, md || {});
}
return next(null, currentMD || md || {});
});
}

log.error('error putting object metadata', {
error: err,
method: 'putMetadata',
});
return next(err);
}
pushReplicationMetric(objMd, omVal, bucketName, objectKey, log);
if (objMd &&
if (
objMd &&
headers['x-scal-replication-content'] !== 'METADATA' &&
versionId &&
// The new data location is set to null when archiving to a Cold site.
// In that case "removing old data location key" is handled by the lifecycle
// transition processor. Check the content-length as a null location can
// also be from an empty object.
(omVal['content-length'] === 0 ||
(omVal.location && Array.isArray(omVal.location))) &&
locationKeysHaveChanged(objMd.location, omVal.location)) {
(omVal['content-length'] === 0 || (omVal.location && Array.isArray(omVal.location))) &&
locationKeysHaveChanged(objMd.location, omVal.location)
) {
log.info('removing old data locations', {
method: 'putMetadata',
bucketName,
objectKey,
});
async.eachLimit(objMd.location, 5,
(loc, nextEach) => dataWrapper.data.delete(loc, log, err => {
if (err) {
log.warn('error removing old data location key', {
async.eachLimit(
objMd.location,
5,
(loc, nextEach) =>
dataWrapper.data.delete(loc, log, err => {
if (err) {
log.warn('error removing old data location key', {
bucketName,
objectKey,
locationKey: loc,
error: err.message,
});
}
// do not forward the error to let other
// locations be deleted
nextEach();
}),
() => {
log.debug('done removing old data locations', {
method: 'putMetadata',
bucketName,
objectKey,
locationKey: loc,
error: err.message,
});
}
// do not forward the error to let other
// locations be deleted
nextEach();
}),
() => {
log.debug('done removing old data locations', {
method: 'putMetadata',
bucketName,
objectKey,
});
});
},
);
}
return _respond(response, md, log, next);
});
}
], callback);
},
],
callback,
);
});
}

@@ -904,29 +925,27 @@ function putObject(request, response, log, callback) {
}
const payloadLen = parseInt(request.headers['content-length'], 10);
const backendInfo = new BackendInfo(config, storageLocation);
return dataStore(context, CIPHER, request, payloadLen, {}, backendInfo, log,
(err, retrievalInfo, md5) => {
if (err) {
log.error('error putting data', {
error: err,
method: 'putObject',
});
return callback(err);
}
if (contentMD5 !== md5) {
return callback(errors.BadDigest);
}
const responsePayload = constructPutResponse({
Comment on lines 1408 to +1424
});
return callback(err);
}
log.debug('batch delete successful', { locations });

// Update inflight metrics for the data which has just been freed
const bucket = request.bucketName;
const contentLength = locations.reduce((length, loc) => length + loc.size, 0);

// TODO: `bucket` should probably always be passed, to be confirmed in CLDSRV-643
// For now be leniant and skip inflight updates if it is not specified, to avoid any
// impact esp. on CRR
if (!bucket || !config.isQuotaEnabled() || contentLength == 0) {
return _respond(response, null, log, callback);
}

return async.waterfall([
// eslint-disable-next-line no-unused-vars
next => metadata.getBucket(bucket, log, (err, bucketMD, raftSessionId) => next(err, bucketMD)),
(bucketMD, next) => quotaUtils.validateQuotas(request, bucketMD, request.accountQuotas,
['objectDelete'], 'objectDelete', -contentLength, false, log, next),
], err => {
},
err => {
if (err) {
// Ignore error, as the data has been deleted already: only inflight count
// has not been updated, and will be eventually consistent anyway
log.warn('batch delete failed to update inflights', {
log.error('batch delete failed', {
method: 'batchDelete',
locations,
error: err,
});
return callback(err);
}
log.debug('batch delete successful', { locations });

return _respond(response, null, log, callback);
});
});
// Update inflight metrics for the data which has just been freed
const bucket = request.bucketName;
const contentLength = locations.reduce((length, loc) => length + loc.size, 0);

// TODO: `bucket` should probably always be passed, to be confirmed in CLDSRV-643
// For now be leniant and skip inflight updates if it is not specified, to avoid any
locations,
error: err,
});
}
Comment on lines 1451 to 1462
@@ -1462,59 +1510,45 @@ function listLifecycle(request, response, userInfo, log, cb) {
}

Comment on lines +1803 to -1829
},
's3',
requestContexts,
),
(userInfo, authorizationResults, next) =>
handleAuthorizationResults(request, authorizationResults, apiMethods[0], undefined, log, err =>
next(err, userInfo),
),
(userInfo, next) => {
if (request.serverAccessLog) {
// eslint-disable-next-line no-param-reassign
request.serverAccessLog.analyticsUserName = userInfo.getIAMdisplayName();
request.serverAccessLog.startTurnAroundTime = process.hrtime.bigint();
}
// eslint-disable-next-line no-param-reassign
request.accountQuotas = infos?.accountQuota;
return next(err, userInfo, authorizationResults);
}, 's3', requestContexts),
(userInfo, authorizationResults, next) => handleAuthorizationResults(
request, authorizationResults, apiMethods[0], undefined, log, err => next(err, userInfo)),
(userInfo, next) => {
if (request.serverAccessLog) {
// eslint-disable-next-line no-param-reassign
request.serverAccessLog.startTurnAroundTime = process.hrtime.bigint();
}
// TODO: understand why non-object requests (batchdelete) were not authenticated
if (!isObjectRequest) {
Comment on lines -1830 to -1833
return routeNonObjectRequest(request, response, userInfo, log, next);
}
const decodedVidResult = decodeVersionId(request.query);
if (decodedVidResult instanceof Error) {
Comment thread lib/routes/routeBackbeat.js Outdated
Comment on lines 1834 to 1897
@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown

LGTM

The core fix in putMetadata correctly addresses two issues:

1. The old code unconditionally set omVal.nullVersionId from versioningPreprocessingResult.nullVersionId, which is wrong in null-key mode where the master must not carry nullVersionId. The new code only propagates nullVersionId via extraMD, which is set solely in compat mode — matching the normal createAndStoreObject write path.

2. The old code had Object.assign(omVal, options.extraMD) referencing the local options variable (which has no extraMD property) instead of versioningPreprocessingResult.extraMD. This was a silent no-op that prevented extraMD from ever being applied through this code path. The new code correctly references versioningPreprocessingResult.extraMD.

The broadened condition (from ?.nullVersionId to just checking the result exists) is also correct since deleteNullKey needs to propagate regardless of whether nullVersionId is present.

Tests are well-structured: the unit test asserts nullVersionId is absent in null-key mode, and the functional test deterministically verifies both compat and null-key mode behavior including the bucket-empty invariant.

Review by Claude Code

@tcarmet

tcarmet commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

/approve

@bert-e

bert-e commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Conflict

A conflict has been raised during the creation of
integration branch w/9.4/bugfix/CLDSRV-922-backbeat-null-version-master with contents from bugfix/CLDSRV-922-backbeat-null-version-master
and development/9.4.

I have not created the integration branch.

Here are the steps to resolve this conflict:

 git fetch
 git checkout -B w/9.4/bugfix/CLDSRV-922-backbeat-null-version-master origin/development/9.4
 git merge origin/bugfix/CLDSRV-922-backbeat-null-version-master
 # <intense conflict resolution>
 git commit
 git push -u origin w/9.4/bugfix/CLDSRV-922-backbeat-null-version-master

The following options are set: approve

@bert-e

bert-e commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Build failed

The build for commit did not succeed in branch bugfix/CLDSRV-922-backbeat-null-version-master

The following options are set: approve

@bert-e

bert-e commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/9.3

  • ✔️ development/9.4

The following branches have NOT changed:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1
  • development/9.2

This pull request did not target the following hotfix branch(es) so they
were left untouched:

  • hotfix/7.10.15
  • hotfix/7.70.73
  • hotfix/7.4.5
  • hotfix/7.10.0
  • hotfix/6.4.7
  • hotfix/7.10.30
  • hotfix/7.10.49
  • hotfix/7.4.4
  • hotfix/7.6.0
  • hotfix/7.70.51
  • hotfix/7.4.1
  • hotfix/7.10.28
  • hotfix/7.9.0
  • hotfix/7.4.9
  • hotfix/7.4.0
  • hotfix/7.4.8
  • hotfix/7.7.0
  • hotfix/7.70.45
  • hotfix/7.4.7
  • hotfix/9.0.32
  • hotfix/7.70.11
  • hotfix/7.10.1
  • hotfix/7.2.0
  • hotfix/7.8.0
  • hotfix/7.70.21
  • hotfix/7.10.8
  • hotfix/7.10.3
  • hotfix/7.4.6
  • hotfix/7.4.3
  • hotfix/7.10.27
  • hotfix/8.8.45
  • hotfix/7.10.4
  • hotfix/7.4.10
  • hotfix/9.2.24
  • hotfix/7.10.2
  • hotfix/7.4.2
  • hotfix/9.0.7

Please check the status of the associated issue CLDSRV-922.

Goodbye tcarmet.

The following options are set: approve

@bert-e bert-e merged commit 0b928c2 into development/9.3 Jun 16, 2026
127 of 132 checks passed
@bert-e bert-e deleted the bugfix/CLDSRV-922-backbeat-null-version-master branch June 16, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants