Skip to content

Separate backend each threads on send_queued_mail#521

Merged
selwin merged 6 commits into
ui:masterfrom
ibadarrohman:fix-thread-related-errors-when-running-send_queued_mail
Apr 16, 2026
Merged

Separate backend each threads on send_queued_mail#521
selwin merged 6 commits into
ui:masterfrom
ibadarrohman:fix-thread-related-errors-when-running-send_queued_mail

Conversation

@ibadarrohman

@ibadarrohman ibadarrohman commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

LLM Output Claude Sonnet 4.6

Root Cause: In _send_bulk, prepare_email_message() is called in the main thread before spawning worker threads. This embeds the main thread's AnymailRequestsBackend connection (and its requests.Session) into every email's _cached_email_message. When the ThreadPool workers dispatch those emails, all threads concurrently call session.request() on the same shared requests.Session, which is explicitly not thread-safe. This corrupts socket state, causing [Errno 11] Resource temporarily unavailable at the OS level.

The fix: in _send_email, replace the pre-cached connection with a thread-local one from connections[] before dispatching. Since ConnectionHandler uses threading.local, each worker thread will get its own AnymailRequestsBackend with its own requests.Session.

Deeper Analysis


1. Why requests.Session is not thread-safe

A Session object holds mutable shared state:

session.cookies      ← mutable CookieJar
session.headers      ← mutable dict
session.adapters     ← HTTPAdapter with its PoolManager
session.env_proxies  ← mutated on each request()

The critical one is session.request() itself — it performs a non-atomic read-modify-write sequence:

# Inside requests/sessions.py (simplified)
def request(self, method, url, ...):
    req = Request(...)
    prep = self.prepare_request(req)   # reads self.headers, self.cookies
    settings = self.merge_environment_settings(...)  # reads+writes self.env_proxies
    return self.send(prep, **settings)

When Thread A and Thread B both call session.request() at the same time, they both enter prepare_request() and merge_environment_settings() concurrently — reading and writing the same dict/cookie objects without locks. This is a classic TOCTOU (Time-of-Check-Time-of-Use) race.


2. Why does sharing corrupt socket state?

Go one level deeper: HTTPAdapter owns a urllib3.PoolManager, which maps hostnames → HTTPConnectionPool. Each pool has a queue.LifoQueue of HTTPConnection objects.

When 5 threads share the same adapter/pool and all call adapter.send() simultaneously:

Thread 1 ──→ pool.urlopen() → get conn from queue → send bytes → read response
Thread 2 ──→ pool.urlopen() → get conn from queue → ← queue is empty, create new conn
Thread 3 ──→ pool.urlopen() → get conn from queue → ← queue is empty, create new conn
Thread 4 ──→ pool.urlopen() → get conn from queue → ← queue is empty, create new conn
Thread 5 ──→ pool.urlopen() → get conn from queue → ← queue is empty, create new conn

All 5 threads simultaneously call HTTPConnection._new_conn()socket.create_connection() → internally calls socket.getaddrinfo() for DNS.

At the OS level, DNS resolution requires creating its own temporary UDP socket to query the resolver. Five threads doing this at exactly the same instant means the OS must allocate 5 resolver sockets simultaneously, on top of the 5 TCP sockets being opened. When system resources are tight (file descriptor pressure, socket buffer limits, or the resolver queue is full), the kernel returns EAGAIN on one of those socket allocations.


3. Why specifically Errno 11 (EAGAIN / Resource temporarily unavailable)

EAGAIN (errno 11 on Linux) means: "I can't do this right now — try again later." It's the kernel's way of saying a resource is momentarily exhausted without hard-failing.

The exact call that fails, from the traceback:

# urllib3/util/connection.py
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):

getaddrinfo creates a non-blocking DNS socket internally. Under the hood, libc's resolver does:

// Creates a UDP socket to talk to /etc/resolv.conf nameserver
fd = socket(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK, ...)
// If the system socket table is temporarily saturated:
// → returns -1, errno = EAGAIN

So the chain is:

5 threads × (1 TCP socket + 1 DNS socket) = 10 simultaneous socket() syscalls
         ↓
system socket buffer / fd slots temporarily saturated
         ↓
socket() syscall returns EAGAIN
         ↓
Python raises BlockingIOError: [Errno 11]
         ↓
urllib3 wraps it as NewConnectionError
         ↓
requests wraps it as ConnectionError
         ↓
anymail wraps it as AnymailRequestsAPIError

Why per-thread sessions fix it

With the fix, each worker thread calls connections[alias] which hits threading.local storage. The first access on a given thread creates a brand new AnymailRequestsBackend → new requests.Session → new HTTPAdapter → new PoolManager. Now:

Thread 1 → its own Session → its own socket lifecycle
Thread 2 → its own Session → its own socket lifecycle
Thread 3 → its own Session → its own socket lifecycle

No shared state, no concurrent mutation, sockets are created sequentially within each thread's own connection lifecycle — no EAGAIN.

@ibadarrohman ibadarrohman changed the title fix thread related errors when running send_queued_mail Separate backend each threads on send_queued_mail Apr 15, 2026
Comment thread post_office/models.py Outdated
Comment thread tests/test_mail.py
Comment on lines +145 to +147
Ensure _send_bulk() opens one connection per thread, not one per email.
With THREADS_PER_PROCESS=1: main thread opens one during prepare, worker
thread opens one during send — total 2 opens for any number of emails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @selwin

@ibadarrohman ibadarrohman requested a review from selwin April 16, 2026 06:03
Comment thread post_office/models.py Outdated
@selwin selwin merged commit ba1ccfb into ui:master Apr 16, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants