Skip to content

Conversation

@kbuma
Copy link
Collaborator

@kbuma kbuma commented Dec 19, 2025

Summary

Major changes:

  • remove parallelization of server requests
  • re-implement handling of list criteria for parameters in endpoints that do not accept lists (this was tied into the parallelization code previously)
  • re-implement handling of list criteria that is too large for a single request (this was tied into the parallelization code previously)

slice_size = num_params_min_chunk or 1
# If successful, continue with normal pagination
total_data = {"data": []} # type: dict
total_data["data"].extend(data["data"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should favor .append(...) w/itertools.chain.from_iterable(...) at the end rather than repeated calls to .extend (especially since there is a loop later).

lines: 656, 701, 732, 806

Comment on lines +685 to +686
for i in range(0, len(split_values), batch_size):
batch = split_values[i : i + batch_size]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a ways off for being the minimum py version, but in 3.12 itertools introduced batched. I've used the approximate implementation from the docs before:

def batched(iterable, n, *, strict=False):
    # batched('ABCDEFG', 2) → AB CD EF G
    if n < 1:
        raise ValueError('n must be at least one')
    iterator = iter(iterable)
    while batch := tuple(islice(iterator, n)):
        if strict and len(batch) != n:
            raise ValueError('batched(): incomplete batch')
        yield batch

Copy link
Collaborator

@tsmathis tsmathis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really much to say on my end, I am be curious though about the performance/execution time of this implementation vs. the parallel approach.

params_min_chunk = min(
parallel_param_str_chunks, key=lambda x: len(x.split("%2C"))
# If we found a parameter to split, try the request first and only split on error
if split_param and split_values and len(split_values) > 1:
Copy link
Collaborator

@esoteric-ephemera esoteric-ephemera Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if split_param and len(split_values or []) > 1

r -= 1
except MPRestError as e:
# If we get 422 or 414 error, or 0 results for comma-separated params, split into batches
if "422" in str(e) or "414" in str(e) or "Got 0 results" in str(e):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any(trace in str(e) for trace in ("422","414","Got 0 results"))

]
# Batch the split values to reduce number of requests
# Use batches of up to 100 values to balance URL length and request count
batch_size = min(100, max(1, len(split_values) // 10))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the batch size be chosen according to the limits we (may) impose on a Query? Or alternatively, should there be a check on the length of a batch after fixing the batch size? That way excessively long queries get rejected (e.g., I query for 1M task IDs, 100 batches would still give me an overly-long list of task IDs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants