Skip to content

Lost stream messages #9828

@jsteinich

Description

@jsteinich

Observation:
Rarely stream messages can fail to be delivered to an Orleans client. Persists until silos are redeployed.

Details:

  • End user connects to Orleans Client Application via API
  • Upon connection, the Orleans client subscribes to a stream for that user
  • The stream subscription is used by Orleans Silos to send messages back to the end user
  • It's possible for a particular stream to stop delivering messages despite reconnection / resubscription. Only fixed on silo redeployment.

Configuration:

  • Orleans 9.2.1
  • In memory streams (AddMemoryStreams("MemProvider")
  • DynamoDB pubsub storage
  • Activation repartitioner enabled (defaults)
  • Activation rebalancer enabled (defaults)

Example troubleshooting sequence (from telemetry)

  1. Some amount of end user connections causing stream subscriptions / silo changes
    • Basically the cluster had been running for awhile and the end user had come and gone. I can try to pull more details if it seems relevant
  2. End user connects and subscribes to the stream
  3. At this point there were 2 producers registered.
    • One of which was removed as a result of a failed Orleans.Streams.IStreamProducerExtension.AddSubscriber call (I believe that particular silo instance was no longer active based on the ip in the sys target).
    • The other producer successfully added the subscriber
  4. A short while later an event occurred which generated a new message on the stream
    • Can see that IMemoryStreamQueueGrain.Enqueue was called
  5. Shortly after can see a call to IMemoryStreamQueueGrain.Dequeue for the same grain
    • No calls to IStreamConsumerExtension.DeliverBatch or IPubSubRendezvousGrain.RegisterProducer observed
  6. Future stream messages also not delivered despite more connection / subscription attempts
  7. After a silo deployment, stream messages for that particular user stream began to function normally again

I'm wondering if the pubSubCache within the PersistentStreamPullingAgent was stale in some way that it thought all information was up to date but actually it wasn't.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions