Skip to content

libnvmf: NVMe Controller Ownership Registry (V2)#3425

Open
martin-belanger wants to merge 4 commits into
linux-nvme:masterfrom
martin-belanger:registry-v2
Open

libnvmf: NVMe Controller Ownership Registry (V2)#3425
martin-belanger wants to merge 4 commits into
linux-nvme:masterfrom
martin-belanger:registry-v2

Conversation

@martin-belanger
Copy link
Copy Markdown

@martin-belanger martin-belanger commented Jun 3, 2026

This addresses issue #2913. It is a clean replacement for PR #3400, which was closed in favor of this redesigned version.


Problem

Multiple independent orchestrators can establish NVMe-oF controller connections on the same Linux host simultaneously:

  • Manual nvme connect / nvme connect-all (stateless, one-shot)
  • nvme-stas (stafd / stacd) — production-grade daemon with CDC/fabric-zoning support
  • NBFT firmware boot connections
  • nvme-discoverd — ships with nvme-cli, runs by default (future project)

All connected controllers appear in a single flat namespace (/dev/nvmeX) with no indication of which orchestrator created or manages each one. Commands like nvme disconnect-all are therefore indiscriminate and dangerous.

The registry gives well-behaved orchestrators the information they need to avoid accidentally disconnecting controllers managed by another tool. It is a cooperative coordination mechanism, not an enforcement boundary — all participants are assumed to cooperate.


What changed from PR #3400

Area PR #3400 This PR
Storage format One JSON file per controller (nvme3.json) One directory per controller, one plain-text file per attribute (nvme3/owner)
json-c dependency Required in the registry path Eliminated — registry works in all build configurations
CLI commands Top-level built-ins (nvme registry-list) Plugin (nvme registry list)
API terminology key attr
Iteration API for_each(device, owner, user_data) device_for_each(device, user_data) + attr_for_each(attr, value, user_data)
Owner parameter libnvmf_context_set_owner() setter Passed at context creation: libnvme_create_global_ctx(fp, level, owner)
application field Left in place Removed from libnvme_global_ctx and libnvme_subsystem
Parallel tests None 10-process concurrent write test (C and Python)

Design highlights

Directory-based storage

/run/nvme/registry/
    nvme1/
        owner          ← contains "stas\n"
    nvme3/
        owner          ← contains "nbft\n"

Mirrors the kernel's sysfs convention. Trivially inspectable with cat. Extensible — new attributes are new files, no schema change needed. No json-c dependency in the registry path.

Atomic write protocol prevents corruption under concurrent access from multiple processes:

mkstemp  →  <attr>.tmp.XXXXXX
fchmod   →  0644
write + fsync
rename   →  <attr>
fsync    →  directory

Owner is immutable

The owner name is passed at context creation time and fixed for the lifetime of the context:

struct libnvme_global_ctx *libnvme_create_global_ctx(FILE *fp,
                                                      int log_level,
                                                      const char *owner);

This expresses the intent clearly — the owner is the identity of the process, not a per-connect property — and makes it impossible to change accidentally at runtime. Pass NULL to opt out of registry participation entirely.

Removal of application

The application field was a previous attempt to solve the same multi-orchestrator coordination problem. It assumed all orchestrators share a hand-written JSON config file, which is false for every real orchestrator in the ecosystem (NBFT reads firmware tables; nvme-stas uses DNS-SD; nvme-discoverd reacts to udev events). owner solves the same problem through a shared runtime path (/run/nvme/registry/) without any shared config file assumption.

Both libnvme_global_ctx.application and libnvme_subsystem.application are removed, along with libnvme_get/set_application() and the connect-time filtering logic that compared them. This is an intentional v3.0 API break.

Plugin placement

Registry commands live in plugins/registry/ (nvme registry list, nvme registry retrieve, etc.) rather than as top-level built-ins, keeping the core binary lean and signalling that these are orchestration tools rather than general NVMe commands.


Commit series

  1. libnvme: add NVMe controller ownership registryregistry.c, registry.h, Python bindings (registry_retrieve, registry_entries, registry_update, registry_delete), C unit tests with 10-process parallel write test, Python unit tests, udev cleanup rule.

  2. nvme: add registry plugin and orchestrator columnplugins/registry/ plugin with four subcommands, nvme list -v Orchestrator column, CLI integration tests (tests/nvme_registry_test.py).

  3. libnvmf: wire registry into connect and disconnect paths — Connect hook in __nvmf_add_ctrl(), libnvme_create_global_ctx() signature change, connect-all --nbft registers boot controllers as owner=nbft, ownership-aware disconnect-all with --owner / --force, removal of application from both structs and all associated code.

  4. nvme: add man pages for registry commands and update disconnect-all — Man pages for all four registry subcommands, updated nvme-disconnect-all(1).


API changes

Changed

/* Before */
struct libnvme_global_ctx *libnvme_create_global_ctx(FILE *fp, int log_level);

/* After */
struct libnvme_global_ctx *libnvme_create_global_ctx(FILE *fp, int log_level, const char *owner);

All existing callers pass NULL (no registry participation).

Removed

  • libnvme_get_application() / libnvme_set_application()
  • libnvme_subsystem_get_application() / libnvme_subsystem_set_application()
  • libnvme_global_ctx.application field
  • libnvme_subsystem.application field

Added

/* Public (libnvmf.ld) */
int libnvmf_registry_retrieve(const char *device, const char *attr, char **value);
int libnvmf_registry_update(const char *device, const char *attr, const char *value);
int libnvmf_registry_delete(const char *device);
int libnvmf_registry_device_for_each(void (*cback)(const char *device, void *user_data), void *user_data);
int libnvmf_registry_attr_for_each(const char *device, void (*cback)(const char *attr, const char *value, void *user_data), void *user_data);

Testing

  • C unit tests: CRUD operations, stale-entry skipping, iteration, and 10-process concurrent write test (libnvme/test/registry.c)
  • Python unit tests: same coverage via Python bindings, including 10-process parallel write test (libnvme/libnvme/tests/test-registry.py)
  • CLI integration tests: all four subcommands, argument validation, error cases (tests/nvme_registry_test.py)
  • No-fabrics build: registry code is correctly excluded
  • Musl-style build (-Ujson-c, static): clean build and tests pass

Martin Belanger added 4 commits June 3, 2026 19:35
Multiple orchestrators (nvme-stas, nvme-discoverd, NBFT) can connect
NVMe-oF controllers on the same host simultaneously, with no way to
tell which tool manages which controller.  Commands like disconnect-all
are therefore indiscriminate and can disrupt running daemons.

Add a registry under /run/nvme/registry/ so orchestrators can declare
and respect ownership of connected controllers.  One directory per
controller, one plain-text file per attribute — no json-c dependency,
trivially inspectable with cat(1), naturally extensible.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
The registry commands belong in a plugin rather than as top-level
built-ins to keep the core nvme binary lean and to make it clear
these are orchestration tools, not general NVMe commands.

'nvme list -v' gains an Orchestrator column so operators can see at
a glance which tool manages each connected controller.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
When an owner name is set in the global context, a registry entry is
automatically written after every successful fabrics connect.  This
lets orchestrators declare ownership without any explicit registration
call.

libnvme_create_global_ctx() gains an owner parameter so the identity
is set at construction time and cannot change during the context
lifetime.  All callers that do not participate in the registry pass
NULL.  connect-all --nbft passes "nbft", registering boot-volume
controllers so they are protected from accidental disconnection.

disconnect-all is now ownership-aware by default: only unowned
controllers are disconnected.  --owner NAME targets a specific
orchestrator; --force restores the original unconditional behaviour
and requires confirmation on a terminal.

The application field and its filtering mechanism are removed from
libnvme_global_ctx and libnvme_subsystem.  The application mechanism
assumed all orchestrators share a JSON config file, which does not
hold for NBFT, nvme-stas, or nvme-discoverd.  The registry solves
the same coordination problem without that assumption.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
Add man pages for the four registry plugin subcommands:
  nvme-registry-list(1)
  nvme-registry-retrieve(1)
  nvme-registry-update(1)
  nvme-registry-delete(1)

Update nvme-disconnect-all(1) to document the new ownership-aware
default behavior and the --owner, --force, and --transport options.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]
@martin-belanger martin-belanger changed the title NVMe Controller Ownership Registry (V2) libnvmf: NVMe Controller Ownership Registry (V2) Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant