hyp create hyp-pytorch-job rejects configurations that specify both node_count and explicit resource fields (accelerators, vcpu, memory). The CLI raises:
❌ Either node-count OR a combination of accelerators, vcpu, memory-in-gib must be specified for instance-type
ml.p4d.24xlarge
But both are needed simultaneously: node_count controls the number of replicas, while the resource fields control per-pod requests/limits. The underlying HyperPodPyTorchJob CRD supports both together.
Without explicit resource requests, the operator can auto-calculate resource requests that exceed what's actually available after system pod overhead, causing Kueue to never admit the job. This can make multi-node jobs with Kueue scheduling unusable through the CLI.
CLI version: v3.7.0