Skip to content

Use random_bytes() instead of mt_rand() for APM IDs #1443

@githoober

Description

@githoober

Use random_bytes() instead of mt_rand() for APM IDs

Summary

The PHP agent still generates IDs in userland via mt_rand(0, 255) in Elastic\Apm\Impl\Util\IdGenerator::generateBinaryId().

In our CLI workload we observe span.id collisions within a single trace. That breaks the parent-child tree in Kibana/APM UI because unrelated spans end up sharing the same ID and children attach to the wrong parent.

The observed behavior is not consistent with normal 64-bit collision odds. The safer fix is to stop using mt_rand() for trace/span/error IDs and generate them with random_bytes() + bin2hex() instead.

Environment

  • Ubuntu on WSL2
  • PHP 7.3
  • Elastic APM PHP agent 1.15.1
  • APM Server 8.12.1
  • CLI SAPI
  • async_backend_comm = false
  • transaction_max_spans = 10000

Workload

A CLI script creates one APM transaction per iteration in a loop. Each transaction produces:

  • 3-4 custom phase spans created via $transaction->beginCurrentSpan()
  • Explicit child spans created via $span->beginChildSpan() (~40 child spans)
  • Auto-instrumented MySQL spans (~200-400 query spans)

Total: about 630 span documents per transaction in the failing case below.

Observed behavior

For one trace, querying traces-apm-* showed:

  • 630 span documents
  • 595 unique span.id values
  • 35 duplicate IDs

These were not harmless duplicates between equivalent spans. Collisions occurred between unrelated spans with different names, parents, and subtypes.

The most visible example was two phase spans receiving the same span.id. Because hundreds of child spans referenced that shared ID as parent.id, Kibana showed those children under both phases.

Example collision (phase spans)

Two separate Elasticsearch documents with the same span.id:

Field Document 1 Document 2
span.name Phase B Phase C
span.id 32541f7789d94269 32541f7789d94269
parent.id 03fc952bdf5ec885 (transaction) 03fc952bdf5ec885 (transaction)
span.duration.us 12,565,650 7,475
@timestamp 2026-03-20T00:10:35Z 2026-03-20T00:10:47Z

Example collision (unrelated spans)

Field Document 1 Document 2
span.name SELECT t0.col_a... (auto-instrumented) Custom child span
span.id 3075fe3f6dd78ca4 3075fe3f6dd78ca4
parent.id 7380c176f9e7ae9d (Phase A) 3201c860fc91b343 (child of Phase B)
span.subtype mysql app

Relevant code

Current main still uses mt_rand() in the PHP ID generator:

// agent/php/ElasticApm/Impl/Util/IdGenerator.php
private static function generateBinaryId(int $idLengthInBytes): array
{
    $result = [];
    for ($i = 0; $i < $idLengthInBytes; ++$i) {
        $result[] = mt_rand(0, 255);
    }
    return $result;
}

Auto-instrumented spans also go through the PHP span creation path, so custom and auto-instrumented spans share this generator.

With 64-bit span IDs and about 630 spans, the probability of a true random collision in one transaction is effectively zero (~1e-14). The collisions above strongly suggest repeated PRNG state, not bad luck.

Why mt_rand() is the bug

Even without proving the exact reseed source, using mt_rand() for identifiers is brittle:

  • mt_rand() is a process-global PRNG, not an isolated ID generator
  • it is explicitly not cryptographically secure
  • reseeding or state reuse anywhere in the process can repeat future outputs

That makes it a poor choice for trace/span/error IDs in a long-running PHP process.

I have not isolated the exact component that resets or reuses MT state in this workload. The Elastic extension may be involved, but the actionable problem appears broader: ID generation currently depends on mt_rand() at all.

Suggested fix

Replace mt_rand()-based ID generation with random_bytes().

The simplest implementation looks like:

public static function generateId(int $idLengthInBytes): string
{
    return bin2hex(random_bytes($idLengthInBytes));
}

If keeping generateBinaryId() is preferred:

private static function generateBinaryId(int $idLengthInBytes): array
{
    return array_values(unpack('C*', random_bytes($idLengthInBytes)));
}

random_bytes() is available on supported PHP versions and avoids shared MT state entirely.

Expected behavior

  • no duplicate span.id values within a trace except by impossible-to-reproduce random chance
  • ID generation should not depend on mutable global Mersenne Twister state

Related context

  • historical issue #20 and PR #23 show the native extension previously generated IDs with php_mt_rand()
  • current main still uses mt_rand() in the PHP IdGenerator

Verification query

To check for span ID collisions in a trace:

POST traces-apm-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        { "term": { "trace.id": "<TRACE_ID>" } },
        { "term": { "processor.event": "span" } }
      ]
    }
  },
  "aggs": {
    "unique_ids": { "cardinality": { "field": "span.id" } },
    "total_docs": { "value_count": { "field": "span.id" } }
  }
}

If unique_ids < total_docs, there are duplicate span IDs in that trace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions