Use random_bytes() instead of mt_rand() for APM IDs
Summary
The PHP agent still generates IDs in userland via mt_rand(0, 255) in Elastic\Apm\Impl\Util\IdGenerator::generateBinaryId().
In our CLI workload we observe span.id collisions within a single trace. That breaks the parent-child tree in Kibana/APM UI because unrelated spans end up sharing the same ID and children attach to the wrong parent.
The observed behavior is not consistent with normal 64-bit collision odds. The safer fix is to stop using mt_rand() for trace/span/error IDs and generate them with random_bytes() + bin2hex() instead.
Environment
- Ubuntu on WSL2
- PHP 7.3
- Elastic APM PHP agent 1.15.1
- APM Server 8.12.1
- CLI SAPI
async_backend_comm = false
transaction_max_spans = 10000
Workload
A CLI script creates one APM transaction per iteration in a loop. Each transaction produces:
- 3-4 custom phase spans created via
$transaction->beginCurrentSpan()
- Explicit child spans created via
$span->beginChildSpan() (~40 child spans)
- Auto-instrumented MySQL spans (~200-400 query spans)
Total: about 630 span documents per transaction in the failing case below.
Observed behavior
For one trace, querying traces-apm-* showed:
- 630 span documents
- 595 unique
span.id values
- 35 duplicate IDs
These were not harmless duplicates between equivalent spans. Collisions occurred between unrelated spans with different names, parents, and subtypes.
The most visible example was two phase spans receiving the same span.id. Because hundreds of child spans referenced that shared ID as parent.id, Kibana showed those children under both phases.
Example collision (phase spans)
Two separate Elasticsearch documents with the same span.id:
| Field |
Document 1 |
Document 2 |
span.name |
Phase B |
Phase C |
span.id |
32541f7789d94269 |
32541f7789d94269 |
parent.id |
03fc952bdf5ec885 (transaction) |
03fc952bdf5ec885 (transaction) |
span.duration.us |
12,565,650 |
7,475 |
@timestamp |
2026-03-20T00:10:35Z |
2026-03-20T00:10:47Z |
Example collision (unrelated spans)
| Field |
Document 1 |
Document 2 |
span.name |
SELECT t0.col_a... (auto-instrumented) |
Custom child span |
span.id |
3075fe3f6dd78ca4 |
3075fe3f6dd78ca4 |
parent.id |
7380c176f9e7ae9d (Phase A) |
3201c860fc91b343 (child of Phase B) |
span.subtype |
mysql |
app |
Relevant code
Current main still uses mt_rand() in the PHP ID generator:
// agent/php/ElasticApm/Impl/Util/IdGenerator.php
private static function generateBinaryId(int $idLengthInBytes): array
{
$result = [];
for ($i = 0; $i < $idLengthInBytes; ++$i) {
$result[] = mt_rand(0, 255);
}
return $result;
}
Auto-instrumented spans also go through the PHP span creation path, so custom and auto-instrumented spans share this generator.
With 64-bit span IDs and about 630 spans, the probability of a true random collision in one transaction is effectively zero (~1e-14). The collisions above strongly suggest repeated PRNG state, not bad luck.
Why mt_rand() is the bug
Even without proving the exact reseed source, using mt_rand() for identifiers is brittle:
mt_rand() is a process-global PRNG, not an isolated ID generator
- it is explicitly not cryptographically secure
- reseeding or state reuse anywhere in the process can repeat future outputs
That makes it a poor choice for trace/span/error IDs in a long-running PHP process.
I have not isolated the exact component that resets or reuses MT state in this workload. The Elastic extension may be involved, but the actionable problem appears broader: ID generation currently depends on mt_rand() at all.
Suggested fix
Replace mt_rand()-based ID generation with random_bytes().
The simplest implementation looks like:
public static function generateId(int $idLengthInBytes): string
{
return bin2hex(random_bytes($idLengthInBytes));
}
If keeping generateBinaryId() is preferred:
private static function generateBinaryId(int $idLengthInBytes): array
{
return array_values(unpack('C*', random_bytes($idLengthInBytes)));
}
random_bytes() is available on supported PHP versions and avoids shared MT state entirely.
Expected behavior
- no duplicate
span.id values within a trace except by impossible-to-reproduce random chance
- ID generation should not depend on mutable global Mersenne Twister state
Related context
- historical issue #20 and PR #23 show the native extension previously generated IDs with
php_mt_rand()
- current
main still uses mt_rand() in the PHP IdGenerator
Verification query
To check for span ID collisions in a trace:
POST traces-apm-*/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{ "term": { "trace.id": "<TRACE_ID>" } },
{ "term": { "processor.event": "span" } }
]
}
},
"aggs": {
"unique_ids": { "cardinality": { "field": "span.id" } },
"total_docs": { "value_count": { "field": "span.id" } }
}
}
If unique_ids < total_docs, there are duplicate span IDs in that trace.
Use
random_bytes()instead ofmt_rand()for APM IDsSummary
The PHP agent still generates IDs in userland via
mt_rand(0, 255)inElastic\Apm\Impl\Util\IdGenerator::generateBinaryId().In our CLI workload we observe
span.idcollisions within a single trace. That breaks the parent-child tree in Kibana/APM UI because unrelated spans end up sharing the same ID and children attach to the wrong parent.The observed behavior is not consistent with normal 64-bit collision odds. The safer fix is to stop using
mt_rand()for trace/span/error IDs and generate them withrandom_bytes()+bin2hex()instead.Environment
async_backend_comm = falsetransaction_max_spans = 10000Workload
A CLI script creates one APM transaction per iteration in a loop. Each transaction produces:
$transaction->beginCurrentSpan()$span->beginChildSpan()(~40 child spans)Total: about 630 span documents per transaction in the failing case below.
Observed behavior
For one trace, querying
traces-apm-*showed:span.idvaluesThese were not harmless duplicates between equivalent spans. Collisions occurred between unrelated spans with different names, parents, and subtypes.
The most visible example was two phase spans receiving the same
span.id. Because hundreds of child spans referenced that shared ID asparent.id, Kibana showed those children under both phases.Example collision (phase spans)
Two separate Elasticsearch documents with the same
span.id:span.namespan.id32541f7789d9426932541f7789d94269parent.id03fc952bdf5ec885(transaction)03fc952bdf5ec885(transaction)span.duration.us@timestampExample collision (unrelated spans)
span.nameSELECT t0.col_a...(auto-instrumented)span.id3075fe3f6dd78ca43075fe3f6dd78ca4parent.id7380c176f9e7ae9d(Phase A)3201c860fc91b343(child of Phase B)span.subtypeRelevant code
Current
mainstill usesmt_rand()in the PHP ID generator:Auto-instrumented spans also go through the PHP span creation path, so custom and auto-instrumented spans share this generator.
With 64-bit span IDs and about 630 spans, the probability of a true random collision in one transaction is effectively zero (~1e-14). The collisions above strongly suggest repeated PRNG state, not bad luck.
Why
mt_rand()is the bugEven without proving the exact reseed source, using
mt_rand()for identifiers is brittle:mt_rand()is a process-global PRNG, not an isolated ID generatorThat makes it a poor choice for trace/span/error IDs in a long-running PHP process.
I have not isolated the exact component that resets or reuses MT state in this workload. The Elastic extension may be involved, but the actionable problem appears broader: ID generation currently depends on
mt_rand()at all.Suggested fix
Replace
mt_rand()-based ID generation withrandom_bytes().The simplest implementation looks like:
If keeping
generateBinaryId()is preferred:random_bytes()is available on supported PHP versions and avoids shared MT state entirely.Expected behavior
span.idvalues within a trace except by impossible-to-reproduce random chanceRelated context
php_mt_rand()mainstill usesmt_rand()in the PHPIdGeneratorVerification query
To check for span ID collisions in a trace:
If
unique_ids < total_docs, there are duplicate span IDs in that trace.