From 22954f58a825e4a99b1f33c1d57a283464cb73be Mon Sep 17 00:00:00 2001 From: afourniernv Date: Mon, 15 Jun 2026 10:53:12 -0700 Subject: [PATCH 1/3] docs(openshell): add NAT deployment guide Signed-off-by: afourniernv --- .../run-workflows/existing-agents/index.md | 2 + .../existing-agents/openshell.md | 546 ++++++++++++++++++ examples/a365_example/README.md | 4 + 3 files changed, 552 insertions(+) create mode 100644 docs/source/run-workflows/existing-agents/openshell.md diff --git a/docs/source/run-workflows/existing-agents/index.md b/docs/source/run-workflows/existing-agents/index.md index b5c3481997..b27b4af68f 100644 --- a/docs/source/run-workflows/existing-agents/index.md +++ b/docs/source/run-workflows/existing-agents/index.md @@ -36,6 +36,7 @@ While this new approach simplifies integration, wrapping agents is still the pre NeMo Agent Toolkit currently provides automatic wrappers for the following frameworks: - [LangGraph](langgraph.md): Integrate existing LangGraph agents and workflows +- [OpenShell](openshell.md): Run NeMo Agent Toolkit workloads inside an OpenShell sandbox with explicit auth and egress boundaries ## Benefits of Using Automatic Wrappers @@ -84,4 +85,5 @@ Each framework guide provides complete examples and step-by-step instructions fo :titlesonly: LangGraph <./langgraph.md> +OpenShell <./openshell.md> ``` diff --git a/docs/source/run-workflows/existing-agents/openshell.md b/docs/source/run-workflows/existing-agents/openshell.md new file mode 100644 index 0000000000..5855d197ec --- /dev/null +++ b/docs/source/run-workflows/existing-agents/openshell.md @@ -0,0 +1,546 @@ + + +# Run NeMo Agent Toolkit Agents in OpenShell + +OpenShell is a good fit when you want to run a NeMo Agent Toolkit workload as a sandboxed, long-lived service with tighter runtime controls around network access, filesystem access, and credential delivery. + +Use this pattern when you want to: + +- run a NAT agent as a managed service instead of a one-shot CLI workflow +- expose a frontend such as Teams, webhooks, or another callback-driven channel +- give the agent outbound access to tools, MCP servers, or external APIs without giving it unrestricted egress +- keep long-lived identity material outside the workload when the target system supports brokered or runtime token exchange + +This guide focuses on the NAT side of the integration. OpenShell is the runtime boundary. NAT still owns the workflow, tool configuration, and frontend behavior. + +## Architecture Split + +In this deployment model, the responsibilities are split across three layers: + +| Layer | Responsibility | +|---|---| +| NAT workload | Agent workflow, tool definitions, frontend integrations, tracing, and business logic | +| OpenShell runtime | Sandboxed execution, outbound policy enforcement, provider-backed credential delivery, and service exposure | +| Cloud and identity systems | Tenant-specific identity, callback registration, ingress, and cloud resources | + +The important boundary is that OpenShell should own runtime controls and credential delivery. The NAT image should own only the agent behavior and the agent-side configuration it needs to consume those credentials safely. + +## Package the Agent + +Package the agent as a container image with a deterministic entrypoint. Treat the container as a long-lived service, not just as a development shell. + +A typical image shape is: + +- install the NAT project and dependencies +- include the workflow YAML and supporting assets +- expose the frontend port if the agent listens for inbound traffic +- start NAT with an explicit config file + +For example: + +```dockerfile +ENTRYPOINT ["nat"] +CMD ["start", "a365", "--config_file", "/app/configs/a365_worker.yml"] +``` + +Keep these concerns out of the image when possible: + +- long-lived secrets +- environment-specific ingress hostnames +- cluster-specific service wiring +- cloud identity bootstrap details + +Those belong in the deployment layer. + +## Configure Auth Boundaries + +The safest model is to separate: + +- agent configuration +- runtime credential delivery +- cloud identity setup + +There are two broad auth patterns. + +## How OpenShell Providers Map to NAT Auth Providers + +The most important integration boundary is the handoff between: + +- the **OpenShell provider system** +- the **NAT auth provider or auth configuration** + +These are not the same thing. + +OpenShell providers own the runtime-side identity contract. They decide: + +- where long-lived credential or refresh material is stored +- whether the runtime receives a raw credential or a brokered token contract +- what audiences, resources, or upstream systems the sandbox is allowed to access + +NAT auth providers are the application-side consumers of that contract. They decide: + +- how the workflow asks for credentials +- how bearer tokens are attached to outbound requests +- how a frontend or tool-facing workflow resolves auth at runtime + +The clean mental model is: + +- OpenShell providers **produce** an auth contract for the sandbox +- NAT auth providers **consume** that contract inside the workflow + +Typical mappings look like this: + +| OpenShell runtime contract | NAT-side auth shape | +|---|---| +| static env credential | `api_key` or another env-driven NAT auth provider | +| brokered token URL | `openshell_bearer_token` or another callback-driven NAT auth provider | +| workload-specific callback contract | a custom NAT auth adapter | + +For example, in the Microsoft A365 lane: + +- OpenShell owns the `microsoft-agent-s2s` provider +- the runtime injects `A365_TOKEN_PROVIDER_URL` +- NAT consumes that contract through an auth block such as: + +```yaml +authentication: + a365_auth: + _type: openshell_bearer_token + token_url: ${A365_TOKEN_PROVIDER_URL} + audience: "" +``` + +That keeps the long-lived Microsoft identity material out of the NAT workload while still letting the workflow obtain a short-lived token when it needs one. + +### When the Existing Contract Is Enough + +You usually do **not** need new platform work if: + +- OpenShell can already expose the credential as env or a token URL +- NAT already has an auth provider that can consume that shape +- the downstream system only needs a standard bearer token or static credential + +In that case, the work is mainly: + +- provider configuration +- policy for the downstream API +- workflow wiring inside NAT + +### When You May Need to Extend OpenShell or NAT + +You may need additional work when the downstream system expects a more specialized runtime exchange than plain env injection or a simple bearer-token callback. + +Examples include: + +- a cloud or enterprise API that needs stricter token-exchange semantics +- a workload that should never see the raw long-lived credential +- a harness that has no callback seam and only understands static auth +- a deployment that needs a stronger local security boundary than a plain HTTP token URL + +In those cases, the right move is usually: + +- extend **OpenShell** when you need a stronger provider/runtime security boundary +- extend **NAT** when you need a new auth adapter that can consume the runtime contract safely + +Treat this as a design boundary, not a hack. + +If the secure path cannot be expressed with existing OpenShell provider contracts and NAT auth providers, add a first-class integration instead of pushing secrets down into the workflow just to make the demo work. + +### Static credentials + +Use this when the agent or tool expects: + +- an API key +- a bot token +- a client secret +- another stable credential presented directly in env or config + +This is the simplest path, but it means the workload is holding the credential value directly. + +### Brokered runtime credentials + +Use this when the target system supports: + +- short-lived access tokens +- service-to-service identity +- a token callback or token URL contract + +In this model: + +- OpenShell stores the long-lived identity material or refresh material +- the workload receives a local token provider contract +- the workload asks for a token at runtime +- OpenShell validates the request and returns a short-lived token + +This is the cleaner model for cloud APIs and non-human worker identities because the long-lived secret does not have to live inside the NAT container. + +### What NAT needs from the runtime + +For brokered auth to work well, the NAT workload or adapter layer needs an auth seam such as: + +- a token URL +- a token callback +- a pluggable auth provider + +If the agent only supports static bearer tokens or hard-coded credential exchange logic, you may need a small adapter before it can consume brokered runtime auth cleanly. + +## Allow External Tool Calls + +OpenShell sandboxes do not assume unrestricted egress. If your NAT workflow calls external tools or APIs, you must explicitly allow that traffic. + +This is especially important for: + +- MCP servers +- GitHub, Jira, Slack, or ServiceNow APIs +- internal HTTP APIs +- vector stores, databases, or search backends +- cloud control-plane APIs + +There are two separate pieces to configure: + +1. **Policy** + - which destinations the sandbox can reach + - which binaries or processes can make those calls + - whether the traffic is plain L4 or inspected REST/WebSocket traffic + +2. **Auth** + - how the tool or client gets its credentials + - whether those credentials are static or brokered at runtime + +When planning tool access, treat each external integration as its own contract: + +- what host or service does the agent need to reach +- what identity should it use +- does the workload need the raw secret, or just a short-lived token + +## MCP and Tooling + +If the NAT workflow uses MCP or another remote tool host: + +- the MCP endpoint must be reachable from inside the OpenShell sandbox +- the sandbox policy must allow that outbound route +- any tool credentials should still come from the runtime or provider layer rather than being baked into the image + +In practice, the agent image contains the workflow and tool wiring, while OpenShell controls what the workflow can reach and how secrets cross the boundary. + +## Minimal OpenShell Example + +Start with a simple non-A365 workflow before you move to Teams, Entra, or callback-driven deployments. + +This example shows: + +- a NAT workflow with one outbound HTTP tool +- a container image that runs the workflow +- an OpenShell policy that allows only the required API +- exact `openshell` commands to create the sandbox and test it + +The point of this example is not the tool itself. The point is the contract between: + +- the NAT workflow +- the container image +- the OpenShell policy +- the OpenShell sandbox runtime + +### Example workflow + +Save this as `weather-workflow.yml`: + +```yaml +functions: + current_datetime: + _type: current_datetime + + weather_api: + _type: http_function + url: https://api.weatherapi.com/v1/current.json + method: GET + query_params: + q: "San Francisco" + key: ${WEATHER_API_KEY} + +llms: + nim_llm: + _type: nim + model_name: nvidia/nemotron-3-mini-4b-instruct + temperature: 0.0 + max_tokens: 512 + +workflow: + _type: react_agent + tool_names: [current_datetime, weather_api] + llm_name: nim_llm + verbose: true + parse_agent_response_max_retries: 3 +``` + +This is deliberately simple: + +- `weather_api` is the external tool +- `WEATHER_API_KEY` is the auth boundary +- the workflow can only succeed if the sandbox can both reach the API and resolve the credential + +### Example container + +Package the workflow into a container image with a stable startup command. + +```dockerfile +FROM python:3.11-slim + +WORKDIR /app + +COPY . /app + +RUN pip install nvidia-nat + +ENTRYPOINT ["nat"] +CMD ["run", "--config_file", "/app/weather-workflow.yml", "--input", "What is the weather right now?"] +``` + +This is enough for a local smoke. For a long-lived service workload, switch the command to a service-oriented NAT entrypoint such as `nat serve` or a channel-specific `nat start ...` command. + +### Example OpenShell policy + +Save this as `weather-policy.yaml`: + +```yaml +version: 1 + +filesystem_policy: + include_workdir: true + read_only: + - /usr + - /lib + - /etc + read_write: + - /sandbox + - /tmp + +landlock: + compatibility: best_effort + +process: + run_as_user: sandbox + run_as_group: sandbox + +network_policies: + weather_api: + name: weather-api + endpoints: + - host: api.weatherapi.com + port: 443 + protocol: rest + enforcement: enforce + access: read-only + binaries: + - path: /usr/local/bin/python + - path: /usr/local/bin/nat +``` + +This policy does one important thing: it allows the NAT workload to reach only `api.weatherapi.com:443` through the expected binaries. + +If the workflow needs more tools later, add them intentionally instead of broadening egress all at once. + +### Example OpenShell commands + +Create a provider or credential source first. + +For a static env-style credential: + +```shell +export WEATHER_API_KEY= + +openshell provider create \ + --name weather-api \ + --type generic \ + --credential WEATHER_API_KEY +``` + +Create the sandbox from the image and attach the provider: + +```shell +openshell sandbox create \ + --from my-registry.example.com/nat-weather:latest \ + --provider weather-api \ + --policy ./weather-policy.yaml +``` + +Inspect the sandbox: + +```shell +openshell sandbox list +openshell sandbox get +``` + +Check logs: + +```shell +openshell logs --tail --source sandbox +``` + +If the workflow is blocked, iterate on policy: + +```shell +openshell policy get --full > current-policy.yaml +openshell policy set --policy current-policy.yaml --wait +``` + +### What this example proves + +This minimal lane proves the most important parts of the deployment model: + +- the NAT image boots correctly inside OpenShell +- the workflow can read its config +- the workflow can consume a runtime-provided credential +- outbound tool access is controlled by explicit policy + +Once this works, then you can add: + +- more tools +- MCP backends +- brokered runtime auth +- long-lived service commands +- Kubernetes ingress +- Teams, A365, or other callback-driven frontends + +### Brokered-auth variant + +If the downstream system supports brokered runtime auth, the same workflow pattern can consume a token URL instead of a raw key. + +For example: + +```yaml +authentication: + downstream_auth: + _type: openshell_bearer_token + token_url: ${DOWNSTREAM_TOKEN_PROVIDER_URL} + audience: "api://example" +``` + +In that shape: + +- OpenShell owns the long-lived identity material +- the NAT workflow only asks for short-lived tokens at runtime +- the overall workflow shape stays the same + +That is the preferred model when the target API and the NAT auth seam both support it. + +## Deployment Lanes + +There are two common lanes for running NAT agents in OpenShell. + +### Local or container lane + +Use this when you want to iterate quickly on: + +- workflow config +- policy +- auth integration +- local testing of outbound tools + +The usual flow is: + +1. build the NAT image +2. run OpenShell locally +3. create a sandbox from the image +4. attach the required providers or credentials +5. apply policy for outbound tools and APIs +6. validate the workflow with a small smoke test + +This lane is the easiest place to verify that: + +- the image boots correctly +- the entrypoint is correct +- the workflow can read the expected config +- outbound tool calls and provider-backed auth work from inside the sandbox + +### Kubernetes or cloud lane + +Use this when the agent needs: + +- stable hosting +- public or tenant-facing callbacks +- integration with cloud identity and ingress +- long-running service operation + +In this lane: + +- OpenShell hosts the workload as a sandboxed pod or workload +- cluster-specific ingress and service wiring expose the agent when needed +- tenant-specific identity setup happens outside the generic NAT image + +For callback-driven agents, this layer often includes: + +- an ingress path or HTTPS endpoint +- bot or webhook registration +- cloud identity resources +- DNS or public hostnames + +These are deployment concerns, not part of the core NAT workflow. + +## Microsoft A365 Example + +One validated example of this pattern is a NAT A365 worker hosted inside OpenShell on AKS. + +In that shape: + +- the NAT worker listens for a frontend channel such as Teams or another Microsoft-triggered event +- the NAT workflow runs inside an OpenShell sandbox +- Microsoft or Entra identity is configured in the deployment and provider layers +- the NAT runtime consumes an OpenShell-managed auth contract rather than minting every token itself +- additional AKS ingress or service plumbing exposes `/api/messages` and related callback paths + +The Microsoft-specific pieces are optional. The broader OpenShell hosting model applies to non-Microsoft agents too. + +## What Belongs Where + +When documenting or implementing this pattern, keep these boundaries clear: + +- **NAT owns** + - workflow YAML + - tool definitions + - frontend behavior + - tracing and evaluation hooks + - agent-specific auth adapters + +- **OpenShell owns** + - sandbox lifecycle + - outbound policy + - provider-backed credential delivery + - service exposure and runtime boundaries + +- **Cloud deployment owns** + - ingress + - bot registration + - tenant setup + - cloud IAM and cluster resources + +Keeping those boundaries clean makes it easier to move the same agent between local, container, and Kubernetes deployment lanes. + +## Recommended Deployment Workflow + +If you are bringing a NAT workload into OpenShell for the first time, use this order: + +1. Package the workload as a container with a stable command. +2. Make the workflow run locally without cloud-specific glue. +3. Identify every outbound tool or API the workflow needs. +4. Decide which credentials can stay static and which should move behind a brokered runtime auth seam. +5. Validate the image in a local OpenShell sandbox first. +6. Move to Kubernetes or cloud deployment only after the local contract is stable. +7. Add ingress, callback registration, and tenant-specific identity last. + +This sequence keeps agent behavior, auth, policy, and cloud deployment concerns separate enough to debug them independently. diff --git a/examples/a365_example/README.md b/examples/a365_example/README.md index a35282241f..e61505c23d 100644 --- a/examples/a365_example/README.md +++ b/examples/a365_example/README.md @@ -129,6 +129,10 @@ This example involves one manifest concept that sits outside the repo: ## Documentation +- [docs/source/run-workflows/existing-agents/openshell.md](../../docs/source/run-workflows/existing-agents/openshell.md) + explains the broader pattern for packaging and deploying NeMo Agent Toolkit + agents inside OpenShell, including auth boundaries, outbound tool access, and + local versus Kubernetes lanes. - [docs/SETUP.md](./docs/SETUP.md) explains identities, permissions, licenses, Azure resources, and the rebuild sequence. - [docs/DEPLOYMENT.md](./docs/DEPLOYMENT.md) explains image build, worker From 2164fa517cafdd101f7f492a24d0fbf22c1babac Mon Sep 17 00:00:00 2001 From: afourniernv Date: Mon, 15 Jun 2026 11:41:35 -0700 Subject: [PATCH 2/3] docs(openshell): align toolkit naming Signed-off-by: afourniernv --- .../existing-agents/openshell.md | 86 +++++++++---------- 1 file changed, 43 insertions(+), 43 deletions(-) diff --git a/docs/source/run-workflows/existing-agents/openshell.md b/docs/source/run-workflows/existing-agents/openshell.md index 5855d197ec..3aed315ad4 100644 --- a/docs/source/run-workflows/existing-agents/openshell.md +++ b/docs/source/run-workflows/existing-agents/openshell.md @@ -21,12 +21,12 @@ OpenShell is a good fit when you want to run a NeMo Agent Toolkit workload as a Use this pattern when you want to: -- run a NAT agent as a managed service instead of a one-shot CLI workflow +- run a NeMo Agent Toolkit agent as a managed service instead of a one-shot CLI workflow - expose a frontend such as Teams, webhooks, or another callback-driven channel - give the agent outbound access to tools, MCP servers, or external APIs without giving it unrestricted egress - keep long-lived identity material outside the workload when the target system supports brokered or runtime token exchange -This guide focuses on the NAT side of the integration. OpenShell is the runtime boundary. NAT still owns the workflow, tool configuration, and frontend behavior. +This guide focuses on the NeMo Agent Toolkit side of the integration. OpenShell is the runtime boundary. The toolkit still owns the workflow, tool configuration, and frontend behavior. ## Architecture Split @@ -34,11 +34,11 @@ In this deployment model, the responsibilities are split across three layers: | Layer | Responsibility | |---|---| -| NAT workload | Agent workflow, tool definitions, frontend integrations, tracing, and business logic | +| NeMo Agent Toolkit workload | Agent workflow, tool definitions, frontend integrations, tracing, and business logic | | OpenShell runtime | Sandboxed execution, outbound policy enforcement, provider-backed credential delivery, and service exposure | | Cloud and identity systems | Tenant-specific identity, callback registration, ingress, and cloud resources | -The important boundary is that OpenShell should own runtime controls and credential delivery. The NAT image should own only the agent behavior and the agent-side configuration it needs to consume those credentials safely. +The important boundary is that OpenShell should own runtime controls and credential delivery. The image should own only the agent behavior and the toolkit-side configuration it needs to consume those credentials safely. ## Package the Agent @@ -46,10 +46,10 @@ Package the agent as a container image with a deterministic entrypoint. Treat th A typical image shape is: -- install the NAT project and dependencies +- install the NeMo Agent Toolkit project and dependencies - include the workflow YAML and supporting assets - expose the frontend port if the agent listens for inbound traffic -- start NAT with an explicit config file +- start the toolkit with an explicit config file For example: @@ -77,12 +77,12 @@ The safest model is to separate: There are two broad auth patterns. -## How OpenShell Providers Map to NAT Auth Providers +## How OpenShell Providers Map to NeMo Agent Toolkit Auth Providers The most important integration boundary is the handoff between: - the **OpenShell provider system** -- the **NAT auth provider or auth configuration** +- the **NeMo Agent Toolkit auth provider or auth configuration** These are not the same thing. @@ -92,7 +92,7 @@ OpenShell providers own the runtime-side identity contract. They decide: - whether the runtime receives a raw credential or a brokered token contract - what audiences, resources, or upstream systems the sandbox is allowed to access -NAT auth providers are the application-side consumers of that contract. They decide: +NeMo Agent Toolkit auth providers are the application-side consumers of that contract. They decide: - how the workflow asks for credentials - how bearer tokens are attached to outbound requests @@ -101,21 +101,21 @@ NAT auth providers are the application-side consumers of that contract. They dec The clean mental model is: - OpenShell providers **produce** an auth contract for the sandbox -- NAT auth providers **consume** that contract inside the workflow +- NeMo Agent Toolkit auth providers **consume** that contract inside the workflow Typical mappings look like this: -| OpenShell runtime contract | NAT-side auth shape | +| OpenShell runtime contract | Toolkit-side auth shape | |---|---| -| static env credential | `api_key` or another env-driven NAT auth provider | -| brokered token URL | `openshell_bearer_token` or another callback-driven NAT auth provider | -| workload-specific callback contract | a custom NAT auth adapter | +| static env credential | `api_key` or another env-driven toolkit auth provider | +| brokered token URL | `openshell_bearer_token` or another callback-driven toolkit auth provider | +| workload-specific callback contract | a custom toolkit auth adapter | For example, in the Microsoft A365 lane: - OpenShell owns the `microsoft-agent-s2s` provider - the runtime injects `A365_TOKEN_PROVIDER_URL` -- NAT consumes that contract through an auth block such as: +- the toolkit consumes that contract through an auth block such as: ```yaml authentication: @@ -125,23 +125,23 @@ authentication: audience: "" ``` -That keeps the long-lived Microsoft identity material out of the NAT workload while still letting the workflow obtain a short-lived token when it needs one. +That keeps the long-lived Microsoft identity material out of the toolkit workload while still letting the workflow obtain a short-lived token when it needs one. ### When the Existing Contract Is Enough You usually do **not** need new platform work if: - OpenShell can already expose the credential as env or a token URL -- NAT already has an auth provider that can consume that shape +- the toolkit already has an auth provider that can consume that shape - the downstream system only needs a standard bearer token or static credential In that case, the work is mainly: - provider configuration - policy for the downstream API -- workflow wiring inside NAT +- workflow wiring inside the toolkit -### When You May Need to Extend OpenShell or NAT +### When You May Need to Extend OpenShell or the Toolkit You may need additional work when the downstream system expects a more specialized runtime exchange than plain env injection or a simple bearer-token callback. @@ -155,11 +155,11 @@ Examples include: In those cases, the right move is usually: - extend **OpenShell** when you need a stronger provider/runtime security boundary -- extend **NAT** when you need a new auth adapter that can consume the runtime contract safely +- extend **NeMo Agent Toolkit** when you need a new auth adapter that can consume the runtime contract safely Treat this as a design boundary, not a hack. -If the secure path cannot be expressed with existing OpenShell provider contracts and NAT auth providers, add a first-class integration instead of pushing secrets down into the workflow just to make the demo work. +If the secure path cannot be expressed with existing OpenShell provider contracts and toolkit auth providers, add a first-class integration instead of pushing secrets down into the workflow just to make the demo work. ### Static credentials @@ -187,11 +187,11 @@ In this model: - the workload asks for a token at runtime - OpenShell validates the request and returns a short-lived token -This is the cleaner model for cloud APIs and non-human worker identities because the long-lived secret does not have to live inside the NAT container. +This is the cleaner model for cloud APIs and non-human worker identities because the long-lived secret does not have to live inside the toolkit container. -### What NAT needs from the runtime +### What the Toolkit Needs from the Runtime -For brokered auth to work well, the NAT workload or adapter layer needs an auth seam such as: +For brokered auth to work well, the toolkit workload or adapter layer needs an auth seam such as: - a token URL - a token callback @@ -201,7 +201,7 @@ If the agent only supports static bearer tokens or hard-coded credential exchang ## Allow External Tool Calls -OpenShell sandboxes do not assume unrestricted egress. If your NAT workflow calls external tools or APIs, you must explicitly allow that traffic. +OpenShell sandboxes do not assume unrestricted egress. If your toolkit workflow calls external tools or APIs, you must explicitly allow that traffic. This is especially important for: @@ -230,7 +230,7 @@ When planning tool access, treat each external integration as its own contract: ## MCP and Tooling -If the NAT workflow uses MCP or another remote tool host: +If the NeMo Agent Toolkit workflow uses MCP or another remote tool host: - the MCP endpoint must be reachable from inside the OpenShell sandbox - the sandbox policy must allow that outbound route @@ -244,14 +244,14 @@ Start with a simple non-A365 workflow before you move to Teams, Entra, or callba This example shows: -- a NAT workflow with one outbound HTTP tool +- a toolkit workflow with one outbound HTTP tool - a container image that runs the workflow - an OpenShell policy that allows only the required API - exact `openshell` commands to create the sandbox and test it The point of this example is not the tool itself. The point is the contract between: -- the NAT workflow +- the toolkit workflow - the container image - the OpenShell policy - the OpenShell sandbox runtime @@ -311,7 +311,7 @@ ENTRYPOINT ["nat"] CMD ["run", "--config_file", "/app/weather-workflow.yml", "--input", "What is the weather right now?"] ``` -This is enough for a local smoke. For a long-lived service workload, switch the command to a service-oriented NAT entrypoint such as `nat serve` or a channel-specific `nat start ...` command. +This is enough for a local smoke. For a long-lived service workload, switch the command to a service-oriented toolkit entrypoint such as `nat serve` or a channel-specific `nat start ...` command. ### Example OpenShell policy @@ -351,7 +351,7 @@ network_policies: - path: /usr/local/bin/nat ``` -This policy does one important thing: it allows the NAT workload to reach only `api.weatherapi.com:443` through the expected binaries. +This policy does one important thing: it allows the toolkit workload to reach only `api.weatherapi.com:443` through the expected binaries. If the workflow needs more tools later, add them intentionally instead of broadening egress all at once. @@ -403,7 +403,7 @@ openshell policy set --policy current-policy.yaml --wait This minimal lane proves the most important parts of the deployment model: -- the NAT image boots correctly inside OpenShell +- the toolkit image boots correctly inside OpenShell - the workflow can read its config - the workflow can consume a runtime-provided credential - outbound tool access is controlled by explicit policy @@ -434,14 +434,14 @@ authentication: In that shape: - OpenShell owns the long-lived identity material -- the NAT workflow only asks for short-lived tokens at runtime +- the toolkit workflow only asks for short-lived tokens at runtime - the overall workflow shape stays the same -That is the preferred model when the target API and the NAT auth seam both support it. +That is the preferred model when the target API and the toolkit auth seam both support it. ## Deployment Lanes -There are two common lanes for running NAT agents in OpenShell. +There are two common lanes for running NeMo Agent Toolkit agents in OpenShell. ### Local or container lane @@ -454,7 +454,7 @@ Use this when you want to iterate quickly on: The usual flow is: -1. build the NAT image +1. build the toolkit image 2. run OpenShell locally 3. create a sandbox from the image 4. attach the required providers or credentials @@ -481,7 +481,7 @@ In this lane: - OpenShell hosts the workload as a sandboxed pod or workload - cluster-specific ingress and service wiring expose the agent when needed -- tenant-specific identity setup happens outside the generic NAT image +- tenant-specific identity setup happens outside the generic toolkit image For callback-driven agents, this layer often includes: @@ -490,18 +490,18 @@ For callback-driven agents, this layer often includes: - cloud identity resources - DNS or public hostnames -These are deployment concerns, not part of the core NAT workflow. +These are deployment concerns, not part of the core toolkit workflow. ## Microsoft A365 Example -One validated example of this pattern is a NAT A365 worker hosted inside OpenShell on AKS. +One validated example of this pattern is a NeMo Agent Toolkit A365 worker hosted inside OpenShell on AKS. In that shape: -- the NAT worker listens for a frontend channel such as Teams or another Microsoft-triggered event -- the NAT workflow runs inside an OpenShell sandbox +- the toolkit worker listens for a frontend channel such as Teams or another Microsoft-triggered event +- the toolkit workflow runs inside an OpenShell sandbox - Microsoft or Entra identity is configured in the deployment and provider layers -- the NAT runtime consumes an OpenShell-managed auth contract rather than minting every token itself +- the toolkit runtime consumes an OpenShell-managed auth contract rather than minting every token itself - additional AKS ingress or service plumbing exposes `/api/messages` and related callback paths The Microsoft-specific pieces are optional. The broader OpenShell hosting model applies to non-Microsoft agents too. @@ -510,7 +510,7 @@ The Microsoft-specific pieces are optional. The broader OpenShell hosting model When documenting or implementing this pattern, keep these boundaries clear: -- **NAT owns** +- **NeMo Agent Toolkit owns** - workflow YAML - tool definitions - frontend behavior @@ -533,7 +533,7 @@ Keeping those boundaries clean makes it easier to move the same agent between lo ## Recommended Deployment Workflow -If you are bringing a NAT workload into OpenShell for the first time, use this order: +If you are bringing a NeMo Agent Toolkit workload into OpenShell for the first time, use this order: 1. Package the workload as a container with a stable command. 2. Make the workflow run locally without cloud-specific glue. From ee4ff20c07f40243b779c235d0881e15254951a1 Mon Sep 17 00:00:00 2001 From: afourniernv Date: Mon, 15 Jun 2026 15:14:42 -0700 Subject: [PATCH 3/3] docs(openshell): fix doc lint issues Signed-off-by: afourniernv --- .../existing-agents/openshell.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/source/run-workflows/existing-agents/openshell.md b/docs/source/run-workflows/existing-agents/openshell.md index 3aed315ad4..063b1e904d 100644 --- a/docs/source/run-workflows/existing-agents/openshell.md +++ b/docs/source/run-workflows/existing-agents/openshell.md @@ -17,7 +17,7 @@ limitations under the License. # Run NeMo Agent Toolkit Agents in OpenShell -OpenShell is a good fit when you want to run a NeMo Agent Toolkit workload as a sandboxed, long-lived service with tighter runtime controls around network access, filesystem access, and credential delivery. +OpenShell is a good fit when you want to run a NeMo Agent Toolkit workload as an isolated, long-lived service with tighter runtime controls around network access, filesystem access, and credential delivery. Use this pattern when you want to: @@ -35,7 +35,7 @@ In this deployment model, the responsibilities are split across three layers: | Layer | Responsibility | |---|---| | NeMo Agent Toolkit workload | Agent workflow, tool definitions, frontend integrations, tracing, and business logic | -| OpenShell runtime | Sandboxed execution, outbound policy enforcement, provider-backed credential delivery, and service exposure | +| OpenShell runtime | Isolated execution, outbound policy enforcement, provider-backed credential delivery, and service exposure | | Cloud and identity systems | Tenant-specific identity, callback registration, ingress, and cloud resources | The important boundary is that OpenShell should own runtime controls and credential delivery. The image should own only the agent behavior and the toolkit-side configuration it needs to consume those credentials safely. @@ -107,7 +107,7 @@ Typical mappings look like this: | OpenShell runtime contract | Toolkit-side auth shape | |---|---| -| static env credential | `api_key` or another env-driven toolkit auth provider | +| static environment-variable credential | `api_key` or another environment-driven toolkit auth provider | | brokered token URL | `openshell_bearer_token` or another callback-driven toolkit auth provider | | workload-specific callback contract | a custom toolkit auth adapter | @@ -131,7 +131,7 @@ That keeps the long-lived Microsoft identity material out of the toolkit workloa You usually do **not** need new platform work if: -- OpenShell can already expose the credential as env or a token URL +- OpenShell can already expose the credential as an environment variable or a token URL - the toolkit already has an auth provider that can consume that shape - the downstream system only needs a standard bearer token or static credential @@ -143,7 +143,7 @@ In that case, the work is mainly: ### When You May Need to Extend OpenShell or the Toolkit -You may need additional work when the downstream system expects a more specialized runtime exchange than plain env injection or a simple bearer-token callback. +You may need additional work when the downstream system expects a more specialized runtime exchange than plain environment-variable injection or a simple bearer-token callback. Examples include: @@ -154,7 +154,7 @@ Examples include: In those cases, the right move is usually: -- extend **OpenShell** when you need a stronger provider/runtime security boundary +- extend **OpenShell** when you need a stronger provider and runtime security boundary - extend **NeMo Agent Toolkit** when you need a new auth adapter that can consume the runtime contract safely Treat this as a design boundary, not a hack. @@ -168,7 +168,7 @@ Use this when the agent or tool expects: - an API key - a bot token - a client secret -- another stable credential presented directly in env or config +- another stable credential presented directly in an environment variable or config This is the simplest path, but it means the workload is holding the credential value directly. @@ -216,7 +216,7 @@ There are two separate pieces to configure: 1. **Policy** - which destinations the sandbox can reach - which binaries or processes can make those calls - - whether the traffic is plain L4 or inspected REST/WebSocket traffic + - whether the traffic is plain L4 or inspected REST or WebSocket traffic 2. **Auth** - how the tool or client gets its credentials @@ -359,7 +359,7 @@ If the workflow needs more tools later, add them intentionally instead of broade Create a provider or credential source first. -For a static env-style credential: +For a static environment-variable credential: ```shell export WEATHER_API_KEY= @@ -376,7 +376,7 @@ Create the sandbox from the image and attach the provider: openshell sandbox create \ --from my-registry.example.com/nat-weather:latest \ --provider weather-api \ - --policy ./weather-policy.yaml + --policy /path/to/weather-policy.yaml ``` Inspect the sandbox: @@ -479,7 +479,7 @@ Use this when the agent needs: In this lane: -- OpenShell hosts the workload as a sandboxed pod or workload +- OpenShell hosts the workload as an isolated pod or workload - cluster-specific ingress and service wiring expose the agent when needed - tenant-specific identity setup happens outside the generic toolkit image