Reducing complexity of implementation in order to be able to add Atlas text search token based pagination #1046

kbuma · 2025-12-19T20:05:36Z

Summary

Major changes:

remove parallelization of server requests
re-implement handling of list criteria for parameters in endpoints that do not accept lists (this was tied into the parallelization code previously)
re-implement handling of list criteria that is too large for a single request (this was tied into the parallelization code previously)

tsmathis · 2025-12-19T21:07:42Z

mp_api/client/core/client.py

-                    slice_size = num_params_min_chunk or 1
+                # If successful, continue with normal pagination
+                total_data = {"data": []}  # type: dict
+                total_data["data"].extend(data["data"])


should favor .append(...) w/itertools.chain.from_iterable(...) at the end rather than repeated calls to .extend (especially since there is a loop later).

lines: 656, 701, 732, 806

tsmathis · 2025-12-19T21:22:57Z

mp_api/client/core/client.py

+                    for i in range(0, len(split_values), batch_size):
+                        batch = split_values[i : i + batch_size]


Might be a ways off for being the minimum py version, but in 3.12 itertools introduced batched. I've used the approximate implementation from the docs before:

def batched(iterable, n, *, strict=False): # batched('ABCDEFG', 2) → AB CD EF G if n < 1: raise ValueError('n must be at least one') iterator = iter(iterable) while batch := tuple(islice(iterator, n)): if strict and len(batch) != n: raise ValueError('batched(): incomplete batch') yield batch

tsmathis

Not really much to say on my end, I am be curious though about the performance/execution time of this implementation vs. the parallel approach.

esoteric-ephemera · 2026-01-06T22:07:51Z

mp_api/client/core/client.py

-                params_min_chunk = min(
-                    parallel_param_str_chunks, key=lambda x: len(x.split("%2C"))
+        # If we found a parameter to split, try the request first and only split on error
+        if split_param and split_values and len(split_values) > 1:


if split_param and len(split_values or []) > 1

esoteric-ephemera · 2026-01-06T22:09:34Z

mp_api/client/core/client.py

-                r -= 1
+            except MPRestError as e:
+                # If we get 422 or 414 error, or 0 results for comma-separated params, split into batches
+                if "422" in str(e) or "414" in str(e) or "Got 0 results" in str(e):


any(trace in str(e) for trace in ("422","414","Got 0 results"))

esoteric-ephemera · 2026-01-06T22:13:55Z

mp_api/client/core/client.py

-            ]
+                    # Batch the split values to reduce number of requests
+                    # Use batches of up to 100 values to balance URL length and request count
+                    batch_size = min(100, max(1, len(split_values) // 10))


Should the batch size be chosen according to the limits we (may) impose on a Query? Or alternatively, should there be a check on the length of a batch after fixing the batch size? That way excessively long queries get rejected (e.g., I query for 1M task IDs, 100 batches would still give me an overly-long list of task IDs)

kbuma added 5 commits December 18, 2025 13:53

skip alloys test if lib is missing

4f44708

remove parallel calls in client

460196f

added back logic to split some requests but do not parallelize.

ca8ddd0

Merge branch 'main' into search-pagination

67f337a

lint

8af7860

kbuma requested review from esoteric-ephemera, tschaume and tsmathis December 19, 2025 20:10

tsmathis reviewed Dec 19, 2025

View reviewed changes

esoteric-ephemera reviewed Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination #1046

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination #1046

kbuma commented Dec 19, 2025

Uh oh!

tsmathis Dec 19, 2025

Uh oh!

tsmathis Dec 19, 2025

Uh oh!

tsmathis left a comment

Uh oh!

esoteric-ephemera Jan 6, 2026 •

edited

Loading

Uh oh!

esoteric-ephemera Jan 6, 2026

Uh oh!

esoteric-ephemera Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		for i in range(0, len(split_values), batch_size):
		batch = split_values[i : i + batch_size]

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination #1046

Are you sure you want to change the base?

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination #1046

Conversation

kbuma commented Dec 19, 2025

Summary

Uh oh!

tsmathis Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

tsmathis Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

tsmathis left a comment

Choose a reason for hiding this comment

Uh oh!

esoteric-ephemera Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esoteric-ephemera Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

esoteric-ephemera Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

esoteric-ephemera Jan 6, 2026 •

edited

Loading