Skip to content

Optimize recv() for embedded systems with batched reads#32

Open
dcliftreaves wants to merge 1 commit intotehmaze:masterfrom
dcliftreaves:master
Open

Optimize recv() for embedded systems with batched reads#32
dcliftreaves wants to merge 1 commit intotehmaze:masterfrom
dcliftreaves:master

Conversation

@dcliftreaves
Copy link
Copy Markdown

@dcliftreaves dcliftreaves commented Dec 1, 2017

Summary

Optimizes the recv() method for reliable communication with embedded systems by reading the sequence bytes, packet data, and checksum in a single getc() call instead of multiple separate calls.

Problem

When receiving data from embedded systems over fast serial lines without hardware flow control (the most common configuration for modern UARTs with only 2 IO lines), the original recv() implementation would:

  1. Call getc(1) for the first sequence byte
  2. Call getc(1) for the second sequence byte
  3. Call getc(packet_size + 1 + crc_mode) for the packet data + checksum

Each call had its own timeout, and the delays between reads created windows where incoming data could be missed or buffers could overrun. Additionally, when the inner header loop received garbage/unexpected bytes, it would time.sleep(timeout) for the full timeout duration before retrying, causing further timing issues.

Changes

xmodem/__init__.py:

  • Batched read: Sequence bytes + packet data + checksum are now read in a single getc(2 + packet_size + 1 + crc_mode) call, minimizing the time window for buffer overruns
  • Header error recovery: On bad/unexpected header bytes, the code now immediately breaks to purge and NAK instead of sleeping for the full timeout duration
  • Error count management: error_count is only reset after successful packet reception (not unconditionally before each read), ensuring the retry limit works correctly across retransmissions

test/unit/test_xmodem.py — 11 new recv() unit tests:

Test What it covers
test_xmodem_recv_successful_single_block Basic single-block CRC transfer
test_xmodem_recv_successful_multi_block 3-block transfer with data verification
test_xmodem_recv_bad_sequence_number Wrong sequence number → NAK + retry
test_xmodem_recv_bad_sequence_complement Mismatched 1's complement → NAK + retry
test_xmodem_recv_bad_crc Corrupted CRC → NAK + retry
test_xmodem_recv_short_block Badly sized (truncated) block → NAK + retry
test_xmodem_recv_timeout_on_block_read Timeout during data read → NAK + retry
test_xmodem_recv_checksum_mode Checksum mode (crc_mode=0) transfer
test_xmodem_recv_cancel_by_can_can 2×CAN cancellation during data phase
test_xmodem_recv_exceeds_retry_limit Abort after exceeding retry limit
test_xmodem1k_receive_successful_when_timeout_after_first_packet (updated) Timeout recovery with batched reads

Backward Compatibility

  • The public API is unchanged — recv() accepts the same parameters and returns the same values
  • All existing upstream unit tests pass
  • The callback parameter and all upstream features (empty file handling, etc.) are preserved

Motivation

This mirrors the optimization discussed in issue #19 for send(). Most modern embedded UARTs use only two IO lines (TX/RX) without hardware flow control. Reading all expected bytes upfront lets the caller/OS buffer management work optimally rather than issuing multiple small reads that can miss data.

🤖 Generated with Claude Code

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-2.6%) to 65.094% when pulling f5d5f65 on dcliftreaves:master into aed6428 on tehmaze:master.

@coveralls
Copy link
Copy Markdown

coveralls commented Dec 1, 2017

Coverage Status

Coverage decreased (-1.8%) to 65.944% when pulling d82bfb7 on dcliftreaves:master into aed6428 on tehmaze:master.

@jquast
Copy link
Copy Markdown
Collaborator

jquast commented Dec 1, 2017

Thank you @dcliftreaves for the contribution, it's very appreciated. I think I get the idea here, previously:

  • single read() for first seq number
  • single read() next seq number
  • final read() for packet & crc checksum

You propose to

  • read() the sequence numbers, packet & checksum in single call

Which helps with timing issues for some systems, presumably where OS read calls are demanded to be fast enough to keep up with the endpoint (perhaps some very fast baudrate transfer on a serial line without flow control?).

I'd only be certain about merging if there are test cases to cover at least some of the edge cases here -- bad sequence numbers, badly sized blocks, timeouts, etc. that this code change makes adjustments to, I'd be worried about introducing new errors for existing users in this case, so test cases would make me much more comfortable.

I'll hope to try to take a look at this / write test cases soon but no promises, anyone else please join review!

@jquast
Copy link
Copy Markdown
Collaborator

jquast commented Dec 1, 2017

This PR pretty much mirrors the issue for #19, which was for send()

@dcliftreaves
Copy link
Copy Markdown
Author

Thank you for your kind response!

Your comment: "perhaps some very fast baudrate transfer on a serial line without flow control?" Is right on the money. Most modern UARTs on embedded systems only use two IO lines which mean that they can only use software flow control which isn't normally used during xmodem transfers.

Also, yes to this: This PR pretty much mirrors the issue for #19

I am not sure I can help too much with test cases as they are fairly complex. I will try to look at adding one or two test cases.

@jquast
Copy link
Copy Markdown
Collaborator

jquast commented Dec 20, 2017

Hope to get to this over the holiday!

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-2.001%) to 65.741% when pulling fde49a6 on dcliftreaves:master into aed6428 on tehmaze:master.

1 similar comment
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-2.001%) to 65.741% when pulling fde49a6 on dcliftreaves:master into aed6428 on tehmaze:master.

Read sequence bytes, packet data, and checksum in a single getc() call
instead of multiple separate calls. This reduces timing issues when
communicating with embedded systems over fast serial lines without
hardware flow control, where multiple reads with separate timeouts can
stack up and cause buffer overruns.

Also fix the inner header loop to break and NAK on bad headers rather
than sleeping for the full timeout duration, which caused problems with
embedded systems.

Add comprehensive unit tests for recv() covering:
- Successful single-block and multi-block transfers
- Bad sequence numbers and bad sequence complements
- Corrupted CRC checksums
- Short (badly sized) blocks
- Timeout during block data read
- Checksum mode (crc_mode=0)
- Cancellation via 2xCAN during data phase
- Retry limit exhaustion on repeated errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dcliftreaves dcliftreaves changed the title Embedded sender optimization Optimize recv() for embedded systems with batched reads Feb 28, 2026
@dcliftreaves
Copy link
Copy Markdown
Author

Hey @jquast 👋 — I've rebased this PR on top of the current master (including the pyproject.toml modernization from #60) and significantly improved it:

What changed since the original PR

  1. Rebased onto current upstream — clean single commit on top of master, no merge conflicts
  2. Preserved all upstream features — the callback parameter, empty file handling, and all other additions since 2017 are intact
  3. Added 11 unit tests covering the edge cases you requested:
    • Bad sequence numbers
    • Bad sequence complements (1's complement mismatch)
    • Corrupted CRC checksums
    • Badly sized (truncated) blocks
    • Timeout during block data read
    • Checksum mode (crc_mode=0)
    • 2×CAN cancellation during data phase
    • Retry limit exhaustion
    • Successful single-block and multi-block transfers
  4. Improved error recovery — on bad headers, the code now breaks to purge+NAK immediately instead of time.sleep(timeout) which was problematic for embedded systems

All existing upstream unit tests continue to pass.

Would love to get this merged — happy to address any additional feedback!

@jquast
Copy link
Copy Markdown
Collaborator

jquast commented Feb 28, 2026

Thanks I'll take a look soon

time.sleep(timeout)

# Reads next char from stream
char = self.getc(1, timeout=1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Reads next char from stream" was important for the side effect of the CAN above

# cancel at two consecutive cancels
if cancel:
self.log.info('Transmission canceled: received 2xCAN '
'at block %d', sequence)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious and that's ok -- this sort of thing is the cause of all bugs in this library it's difficult to see,

this "CAN" treatment depended on the deleted line marked below to receive the second CAN char, with these changes, "line noise" of a single SOH would fail, not a big problem. I have included a test to exercise the "line noise" possibility of missing second CAN with TDD

data = data[2:]
if len(data) != (packet_size + 1 + crc_mode):
self.log.warning('recv: expected %d data bytes, got %d',
packet_size + 1 + crc_mode, len(data))
Copy link
Copy Markdown
Collaborator

@jquast jquast Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set data = None here also to prevent possible IndexError depending when size of 'data' is too small

Copy link
Copy Markdown
Collaborator

@jquast jquast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed and made the suggested changes, thanks for the submission I'll release soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants