-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Description
Observation:
Rarely stream messages can fail to be delivered to an Orleans client. Persists until silos are redeployed.
Details:
- End user connects to Orleans Client Application via API
- Upon connection, the Orleans client subscribes to a stream for that user
- The stream subscription is used by Orleans Silos to send messages back to the end user
- It's possible for a particular stream to stop delivering messages despite reconnection / resubscription. Only fixed on silo redeployment.
Configuration:
- Orleans 9.2.1
- In memory streams (
AddMemoryStreams("MemProvider") - DynamoDB pubsub storage
- Activation repartitioner enabled (defaults)
- Activation rebalancer enabled (defaults)
Example troubleshooting sequence (from telemetry)
- Some amount of end user connections causing stream subscriptions / silo changes
- Basically the cluster had been running for awhile and the end user had come and gone. I can try to pull more details if it seems relevant
- End user connects and subscribes to the stream
- At this point there were 2 producers registered.
- One of which was removed as a result of a failed
Orleans.Streams.IStreamProducerExtension.AddSubscribercall (I believe that particular silo instance was no longer active based on the ip in the sys target). - The other producer successfully added the subscriber
- One of which was removed as a result of a failed
- A short while later an event occurred which generated a new message on the stream
- Can see that
IMemoryStreamQueueGrain.Enqueuewas called
- Can see that
- Shortly after can see a call to
IMemoryStreamQueueGrain.Dequeuefor the same grain- No calls to
IStreamConsumerExtension.DeliverBatchorIPubSubRendezvousGrain.RegisterProducerobserved
- No calls to
- Future stream messages also not delivered despite more connection / subscription attempts
- After a silo deployment, stream messages for that particular user stream began to function normally again
I'm wondering if the pubSubCache within the PersistentStreamPullingAgent was stale in some way that it thought all information was up to date but actually it wasn't.
Metadata
Metadata
Assignees
Labels
No labels