[Bug] `node_count` and explicit resource fields (`accelerators`, `vcpu`, `memory`) are incorrectly mutually exclusive

`hyp create hyp-pytorch-job` rejects configurations that specify both `node_count` and explicit resource fields (`accelerators`, `vcpu`, `memory`). The CLI raises:

```
❌ Either node-count OR a combination of accelerators, vcpu, memory-in-gib must be specified for instance-type
ml.p4d.24xlarge
```

But both are needed simultaneously: `node_count` controls the number of replicas, while the resource fields control per-pod requests/limits. The underlying `HyperPodPyTorchJob` CRD supports both together.

Without explicit resource requests, the operator can auto-calculate resource requests that exceed what's actually available after system pod overhead, causing Kueue to never admit the job. This can make multi-node jobs with Kueue scheduling unusable through the CLI.

CLI version: v3.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] `node_count` and explicit resource fields (`accelerators`, `vcpu`, `memory`) are incorrectly mutually exclusive #383

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] node_count and explicit resource fields (accelerators, vcpu, memory) are incorrectly mutually exclusive #383

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] `node_count` and explicit resource fields (`accelerators`, `vcpu`, `memory`) are incorrectly mutually exclusive #383