PermNet-RM — The Unmasked Butterfly

A branch-free, fixed-topology Reed-Muller encoder for HQC, built from a GF(2) zeta-transform butterfly decomposition.

What this is

PermNet-RM is a drop-in replacement for reed_muller_encode() in the HQC reference implementation. The BIT0MASK idiom (mask = -((uint64_t)((m >> i) & 1))) that the Jeon et al. single-trace attack exploits is removed from the encoder body in both the C source and the compiled binary: a per-stage compiler barrier (__asm__ volatile ("" : "+r"(x))) prevents GCC from constant-folding the butterfly on isolated-bit registers into a neg instruction. Verified on arm-none-eabi-gcc 15.2.0 at every -O0..-Ofast level: zero negs in the encoder body. The shared-output masked d=1 variant closes the remaining per-bit integer Hamming-weight residual on 32-bit targets.

The relevant attacks:

Jeon et al. (ePrint 2026/071) recover the full 128-bit encapsulation message from a single decapsulation trace with up to 96.9% success, using a total of 5,000 power traces for profiling and evaluation on an STM32F303 (ARM Cortex-M4). The target of the attack is the RM encoder's BIT0MASK idiom, reached during the FO re-encryption step.
Lai et al. (ePrint 2025/2162, YODO) demonstrate ciphertext-independent, passive single-trace attacks on HQC that exploit timing leakages in sparse-vector processing (gf_carryless_mul, find_peaks, Karatsuba base cases, key re-sampling). YODO does not attack the RM encoder directly, but it motivates constant-time work across the HQC code base. The RM encoder is a separate, complementary leakage source.

PermNet-RM replaces the conditional generator-row accumulation with a fixed-topology butterfly network that computes the GF(2) zeta transform. Message bits enter as initial register state at fixed positions; every subsequent operation is unconditional.

The equivalence between RM(1,m) codewords and the GF(2) zeta (Möbius) transform over the Boolean lattice is classical (Yates, 1937). The contribution here is an ABI-compatible, branch-free implementation that drops into HQC.

What is (and is not) demonstrated

Binary-level branch-free on x86-64 across six GCC optimisation levels (-O0 through -Ofast) — verified by disassembly grep for conditional-jump mnemonics; enforced in CI.
Zero cycle-count timing spread on x86-64 at -O3 across all 256 RM(1,7) inputs under our TSC measurement.
Exhaustive correctness over the complete input space (256 inputs for RM(1,7), 512 for RM(1,8), and 65,536 (share0, share1) pairs for the masked d=1 composition).
Substantial Hamming-weight leakage reduction on 32-bit ARM in ELMO simulation. The unmasked encoder with compiler barriers halves the bit-6 peak vs the BIT0MASK baseline and cuts the mean per-bit signal by 9.1×; the shared-output masked d=1 variant drives the peak down 11.1× and the bit-6 signal 31× relative to BIT0MASK (14.5× relative to unmasked PermNet). See LIMITATIONS.md and the table below.
Not yet measured: real Cortex-M4 hardware (ChipWhisperer + STM32F303/F415), which is the Jeon attack's actual target platform. ELMO models Cortex-M0 only; no maintained public Cortex-M4 leakage simulator exists (see elmo/README.md).

See LIMITATIONS.md for the full list.

Recent findings (2026-04-19)

Paper Table 5 is reproducible

Running the one-command elmo/run_table5.sh with the pinned ELMO commit (sca-research/ELMO @ 7c4e293) and coeffs_M3.txt reproduces paper Table 5 to four significant figures on a current toolchain (arm-none-eabi-gcc 15.2.0):

Cortex-M0 ELMO, 256 traces per encoder (arm-none-eabi-gcc 15.2.0, ELMO commit 7c4e293, coeffs_M3.txt):

Metric	Unmasked PermNet (post-fix)	Masked d=1 shared-output	BIT0MASK
Trace length (cycles)	144	284	293
Max single-bit signal	1,757.7	405.6	4,493.4
Mean single-bit signal	294.97	229.58	2,687.0
Leaking-cycle fraction	55/144 (38%)	3/284 (1.06%)	199/293 (68%)
Bit 6 signal	1,757.7	120.9	3,778.4
Bit 7 signal	221.2	99.5	3,778.4
Mean reduction vs BIT0MASK	9.1×	11.7×	—
Mask-idiom instructions in encoder body	0	0	5 (`ands`+`muls`)

See elmo/RUN_2026-04-19.md for the full reproduction report.

Masked d=1 reduces leaking surface but not peak amplitude

A Boolean-masked d=1 composition is implemented in source/permnet_rm17_masked_d1.c (exhaustively verified over all 65,536 share pairs) and a matching ELMO harness in elmo/elmo_masked_d1.c. The measured effect on Cortex-M0 ELMO:

Metric	Unmasked PermNet (post-fix)	Masked d=1 (reconstructed)	Masked d=1 (shared output)	BIT0MASK
Peak single-bit signal	1,757.7	3,794.5	405.6	4,493.4
Mean single-bit signal	294.97	692.47	229.58	2,687.0
Leaking-cycle fraction	38% (55/144)	2% (7/311)	1.06% (3/284)	68% (199/293)
Bit 6 signal	1,757.7	3,794.5	120.9	3,778.4

The reconstructed-masked variant reduces the leaking surface by an order of magnitude but does not reduce peak amplitude: the final XOR that reconstructs the codeword from its two shares is unmasked by construction and leaks the message bit on that one cycle. The shared-output variant (source/permnet_rm17_masked_d1_shared_output.c) returns the two shares separately (cw_share0, cw_share1 with cw_share0 XOR cw_share1 = E(m)) and performs no unmask XOR inside the encoder. ELMO measures an 11.1× peak-signal reduction vs BIT0MASK and a 14.5× reduction on bit 6 vs unmasked PermNet-RM (31× vs BIT0MASK). Bit 6 is no longer the dominant leaker. Cost: API change — downstream HQC consumer must hold both cw[2] halves until it is in a region where unmasking is safe.

Shared-output masked variant recommended for full probing-model security

The stage-1 register of the unmasked encoder exhibits a small per-bit integer Hamming-weight residual that is fully characterised and empirically visible as the bit-6 isolation effect in 32-bit registers. The shared-output masked d=1 variant (source/permnet_rm17_masked_d1_shared_output.c) drives that residual to zero by randomising each share's register state. Analytical details, a per-bit table, and a brute-force verifier (source/verify_theorem_4_2.py) are in PROOF_NOTES.md.

Stage reordering does NOT fix bit-6 isolation

An exploratory source/permnet_rm17_stage_reordered.c kept in the tree as a documented negative result: the 7 butterfly stages commute (they act on orthogonal hypercube axes), so running the cross-word stages first is algebraically equivalent, but every stage is a left shift and cannot pull an isolated m6 or m7 back into a shared 32-bit word. True interleaved injection per paper §5.5 requires non-standard placement and a correspondingly non-standard linear network; that is open work.

Files

File	Description
`source/permnet_rm17.c`	RM(1,7) encoder for HQC-128 + exhaustive correctness test
`source/permnet_rm17_bench.c`	x86-64 benchmark — PermNet, BIT0MASK, branchy, masked-d1
`source/permnet_rm17_masked_d1.c`	Boolean-masked d=1 composition (reconstructed output) + 65,536-pair exhaustive test
`source/permnet_rm17_masked_d1_shared_output.c`	Boolean-masked d=1 composition with shared output (`(cw_share0, cw_share1)`); 11.1× peak / 31× bit-6 ELMO reduction vs BIT0MASK
`source/permnet_rm17_stage_reordered.c`	Exploratory stage-reordered variant (documented negative result)
`source/permnet_rm18.c`	RM(1,8) encoder for HQC-192/HQC-256 + exhaustive correctness test
`source/verify_theorem_4_2.py`	Brute-force enumeration of the stage-1 per-bit residual
`source/_enc_O3.s`	x86-64 disassembly at `-O3` (gcc 15.2.0)
`source/_enc_O0.s`	x86-64 disassembly at `-O0`
`elmo/`	Thumb harnesses, Makefile, and `run_table5.sh` for the ELMO reproduction pack
`elmo/RUN_2026-04-19.md`	Headline numbers and paper-comparison from the last ELMO rerun
`LIMITATIONS.md`	Scope, simulated-vs-measured, known residual leakage
`PROOF_NOTES.md`	Stage-1 register structure, per-bit residual table, masking as the fix
`CHANGELOG.md`	Phase-by-phase change log
`FIXES_APPLIED.md`	Mapping from review prompt items to code changes
`.github/workflows/ci.yml`	CI: build + exhaustive tests + disassembly grep at `-O0..-Ofast`

Building

# Correctness test (RM(1,7))
gcc -O3 -o permnet_rm17 source/permnet_rm17.c && ./permnet_rm17

# Correctness test (RM(1,8))
gcc -O3 -o permnet_rm18 source/permnet_rm18.c && ./permnet_rm18

# Masked d=1: exhaustive 65,536-pair check
gcc -O3 -o permnet_rm17_masked source/permnet_rm17_masked_d1.c && ./permnet_rm17_masked

# Stage-reordered variant: correctness + per-stage HW trace (`-v`)
gcc -O3 -o permnet_rm17_sr source/permnet_rm17_stage_reordered.c && ./permnet_rm17_sr

# Benchmark (x86-64 only, uses TSC via <x86intrin.h>)
gcc -O3 -march=native -o bench source/permnet_rm17_bench.c && ./bench

# Stage-1 per-bit residual verifier
python3 source/verify_theorem_4_2.py

# ELMO reproduction pack (requires ../elmo_tool/ + arm-none-eabi-gcc)
cd elmo && ./run_table5.sh

How it works

Injection: Each message bit is placed at a fixed bit position (powers of 2) in an n-bit register.
Butterfly propagation: m stages of reg ^= (reg & MASK) << SHIFT where masks and shifts are compile-time constants.
Output: The final register state is the RM(1,m) codeword.

No message bit is ever compared, branched on, or used to index memory.

Paper & Citation

Preprint: Zenodo DOI 10.5281/zenodo.19556200

Plain-English summary: https://vaultbytes.com/research-permnet-rm

BibTeX:

@misc{alissaei2026permnet,
  title     = {PermNet-RM: Eliminating Side-Channel Leakage in HQC
               Reed-Muller Encoding via the GF(2) Zeta Transform},
  author    = {Alissaei, Bader},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19556200},
  url       = {https://doi.org/10.5281/zenodo.19556200}
}

License

MIT

Author

Bader Alissaei — VaultBytes Innovations Ltd — ORCID: 0009-0003-5312-383X

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PermNet-RM — The Unmasked Butterfly

What this is

What is (and is not) demonstrated

Recent findings (2026-04-19)

Paper Table 5 is reproducible

Masked d=1 reduces leaking surface but not peak amplitude

Shared-output masked variant recommended for full probing-model security

Stage reordering does NOT fix bit-6 isolation

Files

Building

How it works

Paper & Citation

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
elmo		elmo
source		source
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
FIXES_APPLIED.md		FIXES_APPLIED.md
LIMITATIONS.md		LIMITATIONS.md
PROOF_NOTES.md		PROOF_NOTES.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

PermNet-RM — The Unmasked Butterfly

What this is

What is (and is not) demonstrated

Recent findings (2026-04-19)

Paper Table 5 is reproducible

Masked d=1 reduces leaking surface but not peak amplitude

Shared-output masked variant recommended for full probing-model security

Stage reordering does NOT fix bit-6 isolation

Files

Building

How it works

Paper & Citation

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages