Make metrics based bad order detection order specific #4021

MartinquaXD · 2026-01-06T10:41:45Z

Description

Currently the bad token detection assumes that we are perfectly able to detect "broken" orders and only orders that trade specific tokens that a particular solver is not able to handle cause problems. However this assumption does not work well with the increasing complexity of new order types that can suddenly start failing for any number of reasons.
The most prominent recent example were flashloan orders where the EIP 1271 signature verified correctly but transferring the tokens into the settlement contract failed because the user's Aave debt position was not healthy enough.
Our current logic caused a lot of collateral damage because such orders could cause many reasonable tokens to be flagged as unsupported although the tokens themselves were perfectly fine and only that particular order was problematic.

Changes

To address this this PR change the metrics based detection mechanism to only flag on an order by order basis instead flagging all orders trading specific tokens. The change itself is relatively simple (collect metrics keyed by Uid instead of token`) but came with a few related changes:

the name bad_token_detection is now incorrect in most (but not all!) cases so many things were renamed
- this includes a few config parameters so they must be updated in the infra repo as well!
caching uids has a lot more potential to bloat the cache so a cache eviction task was introduced, this required 2 new config parameters (max_age, gc_interval)

How to test

adjusted existing unit to make sure the metrics logic still works correctly with Uid
added a new unit test for the cache eviction

Related issues

Fixes #4019

MartinquaXD · 2026-01-06T10:45:45Z

crates/driver/src/domain/competition/detector/bad_orders/metrics.rs

github unfortunately marks this whole file as new when it actually was actually just moved and modified slightly. The new parts are:

added last_seen_at

added spawn_gc_task (and the associated unit test)

The refactoring/renaming should probably be separated from the logic changes to avoid this.

This diff is much better https://www.diffchecker.com/iMNVcpf7/

m-sz · 2026-01-07T14:34:19Z

Infra PR https://github.com/cowprotocol/infrastructure/pull/4302

jmg-duarte

LGTM

crates/driver/src/domain/competition/bad_orders/metrics.rs

crates/driver/src/infra/solver/mod.rs

crates/driver/src/domain/competition/detector/bad_tokens/simulation.rs

m-sz · 2026-01-08T14:03:00Z

Moved bad_token and bad_order detection as submodules to common detector

The public facing api (Quality and the Detector) now reside inside of it. Then its submodules are bad_tokens::simulation and bad_orders::metrics which exactly describe how we rule out the specific offending trades. The rest of the crate uses only the main detector module and detector::Detector struct which makes it less confusing.

Also removed the hand rolled current_unix_timestamp() in favour of now_in_epoch_seconds()

m-sz · 2026-01-08T14:07:39Z

I am now left wondering how should the configuration struct be called. It's currently named BadOrderDetectionConfig and configures both the bad token detection based on simulation and bad order detection based on metrics. Calling it simply DetectorConfig could be confusing.

jmg-duarte · 2026-01-08T17:40:26Z

I am now left wondering how should the configuration struct be called. It's currently named BadOrderDetectionConfig and configures both the bad token detection based on simulation and bad order detection based on metrics. Calling it simply DetectorConfig could be confusing.

Suggestions:

FaultDetectorConfig
Split the configs into token/order or metrics/simulation (or both) and then create a separate one called DetectionConfigurations — its no longer ambiguous because you can open it and see both explicit ones

squadgazzz · 2026-01-08T18:56:38Z

I am now left wondering how should the configuration struct be called. It's currently named BadOrderDetectionConfig and configures both the bad token detection based on simulation and bad order detection based on metrics. Calling it simply DetectorConfig could be confusing.

We now have 2 detectors that live in different modules. Can we simply split their configs or do they share some config params?

m-sz · 2026-01-09T13:14:19Z

I am now left wondering how should the configuration struct be called. It's currently named BadOrderDetectionConfig and configures both the bad token detection based on simulation and bad order detection based on metrics. Calling it simply DetectorConfig could be confusing.

We now have 2 detectors that live in different modules. Can we simply split their configs or do they share some config params?

They could be split into two if there was not the common hardcoded token statuses:

Here is the overall struct, I'll call it "DetectorConfig" to avoid confusion

pub struct DetectorConfig {
    pub token_supported: HashMap<eth::Address, bool>,
    pub enable_simulation_strategy: bool,
    pub enable_metrics_strategy: bool,
    pub metrics_strategy_failure_ratio: f64,
    pub metrics_strategy_required_measurements: u32,
    pub metrics_strategy_log_only: bool,
    pub metrics_strategy_freeze_time: Duration,
    pub metrics_strategy_gc_interval: Duration,
    pub metrics_strategy_gc_max_age: Duration,
}

It could be split into 2 configs

pub struct BadTokenDetectorConfig {
    pub enable_simulation_strategy: bool,
}

pub struct BardOrderDetectorConfig {
    pub enable_metrics_strategy: bool,
    pub metrics_strategy_failure_ratio: f64,
    pub metrics_strategy_required_measurements: u32,
    pub metrics_strategy_log_only: bool,
    pub metrics_strategy_freeze_time: Duration,
    pub metrics_strategy_gc_interval: Duration,
    pub metrics_strategy_gc_max_age: Duration,
}

with the remaining field: pub token_supported: HashMap<eth::Address, bool> which truly applies to neither, as the overall Detector that combines inside the BadToken and BadOrder short-circuits the check for token quality based on this field.

I am inclined to leave it as-is, keeping the name as BadOrderDetectionConfig since this is truly used at the order level of an auction.

The Competition strategy uses the detector on an per-order basis to ask if it is unsupported. The underlying mechanism makes a decision based either on the token itself or on the specific order, which might be considered an implementation detail. Thus the BadOrderDetector has inside of it an order-level detector based on metrics an a token-level one based on simulation that together provide us the answer if an order is unsupported or not.

Let's keep it as is and move forward with the PR.

crates/driver/src/domain/competition/detector/mod.rs

crates/driver/src/domain/competition/mod.rs

Co-authored-by: ilya <[email protected]>

squadgazzz · 2026-01-13T10:13:41Z

crates/e2e/src/setup/colocation.rs

 account = "{account}"
 merge-solutions = {merge_solutions}
 quote-using-limit-orders = {quote_using_limit_orders}
 enable-simulation-bad-token-detection = true


Is this config still relevant?

Yes, the simulation-bad-token-detection retains its previous name: https://github.com/cowprotocol/services/pull/4021/changes/BASE..1cdb7f81f9a4c2607acab68c108c39acd5307638#diff-892459cd473d2f4681aa36029b2d7967cac7604c93c0f780cb917dd597fd1719R839-R840

crates/e2e/src/setup/colocation.rs

m-sz · 2026-01-15T15:28:23Z

Thanks for the reviews. I will wait for @MartinquaXD to provide his thoughts as he is the original author and merge only then.

MartinquaXD

Looks alright to me. I like the discussion about proper naming and where to move stuff.
One more nit from my side: detector is a very generic name for the module. risk_detector would give the reader at least some information on what it's supposed to be.
BTW GH doesn't let me approve my own PR.

m-sz · 2026-01-16T12:13:03Z

I'll change the detector naming to risk_detector and merge

MartinquaXD added 4 commits January 6, 2026 07:48

use metrics to flag bad orders instead of bad tokens

33b7411

Rename bad_token module to bad_orders

d5e84e4

Merge remote-tracking branch 'origin/main' into bad-order-detection

23350d4

More renaming and unit tests

5ba26d4

MartinquaXD commented Jan 6, 2026

View reviewed changes

MartinquaXD and others added 3 commits January 6, 2026 10:49

fix failing tests

2962298

Merge branch 'main' into convert-bad-token-to-bad-order-detection

a7c4aa4

Merge branch 'main' into convert-bad-token-to-bad-order-detection

b9e05ac

m-sz self-assigned this Jan 7, 2026

m-sz added 2 commits January 7, 2026 15:12

Merge branch 'main' into convert-bad-token-to-bad-order-detection

b530ad1

Fix naming of freeze-time argument

e71af0e

m-sz marked this pull request as ready for review January 7, 2026 14:52

m-sz requested a review from a team as a code owner January 7, 2026 14:52

m-sz added 2 commits January 7, 2026 16:44

Merge branch 'main' into convert-bad-token-to-bad-order-detection

2d6a925

Merge branch 'main' into convert-bad-token-to-bad-order-detection

c1fc50c

jmg-duarte approved these changes Jan 7, 2026

View reviewed changes

squadgazzz reviewed Jan 7, 2026

View reviewed changes

crates/driver/src/domain/competition/bad_orders/metrics.rs Outdated Show resolved Hide resolved

crates/driver/src/infra/solver/mod.rs Outdated Show resolved Hide resolved

crates/driver/src/domain/competition/detector/bad_tokens/simulation.rs Show resolved Hide resolved

m-sz added 2 commits January 8, 2026 14:58

Re-structure bad order and bad token detector

cf7ddd9

Use now_in_epoch_seconds()

a91e6c3

m-sz added 2 commits January 8, 2026 15:58

Merge branch 'main' into convert-bad-token-to-bad-order-detection

90a4ef6

clippy

3e65816

Merge branch 'main' into convert-bad-token-to-bad-order-detection

e6e9dcd

m-sz requested a review from squadgazzz January 12, 2026 10:54

squadgazzz reviewed Jan 12, 2026

View reviewed changes

crates/driver/src/domain/competition/detector/mod.rs Outdated Show resolved Hide resolved

squadgazzz reviewed Jan 12, 2026

View reviewed changes

crates/driver/src/domain/competition/mod.rs Outdated Show resolved Hide resolved

m-sz and others added 2 commits January 12, 2026 12:46

Update crates/driver/src/domain/competition/detector/mod.rs

731824b

Co-authored-by: ilya <[email protected]>

Rename competition's detector to risk_detector

1cdb7f8

m-sz requested a review from squadgazzz January 12, 2026 12:10

squadgazzz reviewed Jan 13, 2026

View reviewed changes

crates/e2e/src/setup/colocation.rs Show resolved Hide resolved

squadgazzz approved these changes Jan 14, 2026

View reviewed changes

MartinquaXD commented Jan 15, 2026

View reviewed changes

m-sz added 2 commits January 16, 2026 13:37

Rename top-level detector to risk_detector

5096b3d

Merge branch 'main' into convert-bad-token-to-bad-order-detection

e52286d

m-sz enabled auto-merge January 16, 2026 12:38

Add renamed risk_detector

ed0a0a9

m-sz disabled auto-merge January 16, 2026 12:48

m-sz mentioned this pull request Jan 16, 2026

Optimize live orders queries based on confirmed_valid_to column #4055

Draft

2 tasks

m-sz added this pull request to the merge queue Jan 16, 2026

Merged via the queue into main with commit 04f4e63 Jan 16, 2026
19 checks passed

m-sz deleted the convert-bad-token-to-bad-order-detection branch January 16, 2026 13:25

github-actions bot locked and limited conversation to collaborators Jan 16, 2026

Make metrics based bad order detection order specific #4021

Make metrics based bad order detection order specific #4021

Conversation

MartinquaXD commented Jan 6, 2026 • edited by squadgazzz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

How to test

Related issues

Uh oh!

MartinquaXD Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

squadgazzz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

squadgazzz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

m-sz commented Jan 7, 2026

Uh oh!

jmg-duarte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

m-sz commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-sz commented Jan 8, 2026

Uh oh!

jmg-duarte commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squadgazzz commented Jan 8, 2026

Uh oh!

m-sz commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

squadgazzz Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

m-sz Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

m-sz commented Jan 15, 2026

Uh oh!

MartinquaXD left a comment

Choose a reason for hiding this comment

Uh oh!

m-sz commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MartinquaXD commented Jan 6, 2026 •

edited by squadgazzz

Loading

m-sz commented Jan 8, 2026 •

edited

Loading

jmg-duarte commented Jan 8, 2026 •

edited

Loading

m-sz Jan 14, 2026 •

edited

Loading