Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .github/workflows/jobs_build_documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Build Jobs documentation

on:
push:
paths:
- "docs/jobs/**"
branches:
- main

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: hub-docs
package_name: jobs
path_to_docs: hub-docs/docs/jobs/
additional_args: --not_python_module
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
21 changes: 21 additions & 0 deletions .github/workflows/jobs_build_pr_documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Build Jobs PR Documentation

on:
pull_request:
paths:
- "docs/jobs/**"

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: hub-docs
package_name: jobs
path_to_docs: hub-docs/docs/jobs/
additional_args: --not_python_module
16 changes: 16 additions & 0 deletions .github/workflows/jobs_upload_pr_documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Upload Jobs PR Documentation

on:
workflow_run:
workflows: ["Build Jobs PR Documentation"]
types:
- completed

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: jobs
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
48 changes: 48 additions & 0 deletions docs/jobs/_toctree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
- local: index
title: Hugging Face Jobs

- title: Overview
sections:
- local: index
title: Hugging Face Jobs
- local: quickstart
title: Quickstart
- local: pricing
title: Pricing and Billing

- title: Tutorials
sections:
- title: Training
sections:
- local: training1
title: Training Tuto 1
- title: Inference
sections:
- local: inference1
title: Inference Tuto 1
- title: Data
sections:
- local: data1
title: Data Tuto 1

- title: Guides
sections:
- local: manage
title: Manage Jobs
- local: configuration
title: Configuration
- local: frameworks
title: Frameworks Setups
- local: schedule
title: Schedule Jobs
- local: webhooks
title: Webhook Automation

- title: Reference
sections:
- local: cli
title: Command Line Interface (CLI)
- local: python
title: Python client
- local: api
title: Jobs API Endpoints
66 changes: 66 additions & 0 deletions docs/jobs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Jobs API Endpoints

The Jobs HTTP API Endpoints are available under `https://huggingface.co/api/jobs`.

Authenticate using a Hugging face token with the permission to start and manage Jobs under your namespace (your account or organization).
Pass the token as a Bearer token with the header: `"Authorization: Bearer {token}"`.

Here is a list of available endpoints and arguments:

## Jobs

* POST `https://huggingface.co/api/jobs/{namespace}`
Run a Job.
Arguments:
* image: string
* command: string
* env, *optional*: object key -> value
* secrets, *optional*: object key -> value
* flavor, *optional*: string
* timeout, *optional*: number
* GET `https://huggingface.co/api/jobs/{namespace}`
List Jobs.
* GET `https://huggingface.co/api/jobs/{namespace}/{job_id}`
Inspect a Job.
* GET `https://huggingface.co/api/jobs/{namespace}/{job_id}/logs`
Fetch the logs of a Job.
* GET `https://huggingface.co/api/jobs/{namespace}/{job_id}/cancel`
Cancel a Job.

## Scheduled Jobs

* POST `https://huggingface.co/api/scheduled-jobs/{namespace}`
Create a scheduled Job.
Arguments:
* jobSpec:
* image: string
* command: string
* env: object key -> value
* secrets: object key -> value
* flavor: string
* timeout: number
* schedule: string
* concurrency, *optional*: bool
* suspend, *optional*: bool
* GET `https://huggingface.co/api/scheduled-jobs/{namespace}`
List scheduled Jobs.
* GET `https://huggingface.co/api/scheduled-jobs/{namespace}/{job_id}`
Inspect a scheduled Job.
* DELETE `https://huggingface.co/api/scheduled-jobs/{namespace}/{job_id}`
Delete a scheduled Job.
* GET `https://huggingface.co/api/scheduled-jobs/{namespace}/{job_id}/suspend`
Suspend a scheduled Job.
* GET `https://huggingface.co/api/scheduled-jobs/{namespace}/{job_id}/resume`
Resume a scheduled Job.

## Webhooks

* POST `https://huggingface.co/api/settings/webhooks`
Create a webhook that triggers this Job.
Arguments:
* watched: list of objects
* type: one of "dataset", "model", "org", "space", "user"
* name: string
* jobSourceId: string
* domains, *optional*: list of "repo", "discussion"
* secret, *optional*: string
9 changes: 9 additions & 0 deletions docs/jobs/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Jobs Command Line Interface (CLI)

The `huggingface_hub` Python package comes with a built-in CLI called `hf`. This tool allows you to interact with the Hugging Face Hub directly from a terminal. For example, you can log in to your account, create a repository, upload and download files, etc. It also comes with handy features to configure your machine or manage your cache, and start and manage Jobs.

Find the `hf jobs` installation steps, guides and reference in the `huggingface_hub` documentation here:

* [Installation](https://huggingface.co/docs/huggingface_hub/en/guides/cli#getting-started)
* [Run and manage Jobs](https://huggingface.co/docs/huggingface_hub/en/guides/cli#hf-jobs)
* [CLI reference for Jobs](https://huggingface.co/docs/huggingface_hub/en/package_reference/cli#hf-jobs)
167 changes: 167 additions & 0 deletions docs/jobs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Configuration

## Authentication

You need to be authenticated with `hf auth login` to run Jobs, and use a token with the permission to start and manage Jobs.

Alternatively, pass a Hugging Face token manually with `--token` in the CLI, the `token` argument in Python or a Bearer token for the HTTP API.

## UV Jobs

Specify the UV script or python command to run as you would with UV:

```bash
>>> hf jobs uv run train.py
```

```bash
>>> hf jobs uv run python -c 'print("Hello from the cloud!")'
```

The `hf jobs uv run` command accepts an UV argument like `--with` and `--python`. The `--with` argument lets you specify python dependencies, and `--python` lets you choose the python version to use:


```bash
>>> hf jobs uv run --with trl train.py
>>> hf jobs uv run --python 3.12 train.py
```

Arguments following the command (or script) are not interpreted as arguments to uv. All options to uv must be provided before the command, e.g., uv run --verbose foo. A `--` can be used to separate the command from jobs/uv options for clarity, e.g.

```bash
>>> hf jobs uv run --with trl-jobs -- trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara
```

Find the list of all arguments in the [CLI documentation](https://huggingface.co/docs/huggingface_hub/package_reference/cli#hf-jobs-uv-run) and the [UV Commands documentation](https://docs.astral.sh/uv/reference/cli/#uv-run).

By default, UV Jobs run with the `ghcr.io/astral-sh/uv:python3.12-bookworm` Docker image, but you can use another image as long as it has UV installed, using `--image <docker-image>`.

## Docker Jobs

Specify the Docker image and the command to run as you would with docker:

```bash
>>> hf jobs run ubuntu echo "Hello from the cloud!"
```

All options to Jobs must be provided before the command. A `--` can be used to separate the command from jobs/uv options for clarity, e.g.

```bash
>>> hf jobs run --token hf_xxx ubuntu -- echo "Hello from the cloud!"
```

Find the list of all arguments in the [CLI documentation](https://huggingface.co/docs/huggingface_hub/package_reference/cli#hf-jobs-run).

## Environment variables and Secrets

You can pass environment variables to your job using

```bash
# Pass environment variables
>>> hf jobs uv run -e FOO=foo -e BAR=bar python -c 'import os; print(os.environ["FOO"], os.environ["BAR"])'
```

```bash
# Pass an environment from a local .env file
>>> hf jobs uv run --env-file .env python -c 'import os; print(os.environ["FOO"], os.environ["BAR"])'
```

```bash
# Pass secrets - they will be encrypted server side
>>> hf jobs uv run -s MY_SECRET=psswrd python -c 'import os; print(os.environ["MY_SECRET"])'
```

```bash
# Pass secrets from a local .env.secrets file - they will be encrypted server side
>>> hf jobs uv run --secrets-file .env.secrets python -c 'import os; print(os.environ["MY_SECRET"])'
```

> [!TIP]
> Use `--secrets HF_TOKEN` to pass your local Hugging Face token implicitly.
> With this syntax, the secret is retrieved from the environment variable.
> For `HF_TOKEN`, it may read the token file located in the Hugging Face home folder if the environment variable is unset.

## Hardware flavor

Run jobs on GPUs or TPUs with the `flavor` argument. For example, to run a PyTorch job on an A10G GPU:

```bash
>>> hf jobs uv run --with torch --flavor a10g-small python -c "import torch; print(f'This code ran with the following GPU: {torch.cuda.get_device_name()}')"
```

Running this will show the following output!

```
This code ran with the following GPU: NVIDIA A10G
```

Here is another example to run a fine-tuning script like [trl/scripts/sft.py](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py):

```bash
>>> hf jobs uv run --with trl --flavor a10g-small -s HF_TOKEN -- sft.py --model_name_or_path Qwen/Qwen2-0.5B ...
```

> [!TIP]
> For comprehensive guidance on running model training jobs with TRL on Hugging Face infrastructure, check out the [TRL Jobs Training documentation](https://huggingface.co/docs/trl/main/en/jobs_training). It covers fine-tuning recipes, hardware selection, and best practices for training models efficiently.

Available `--flavor` options:

- CPU: `cpu-basic`, `cpu-upgrade`
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`

(updated in 12/2025 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))

## Timeout

Jobs have a default timeout (30 minutes), after which they will automatically stop. This is important to know when running long-running tasks like model training.

You can specify a custom timeout value using the `--timeout` parameter when running a job. The timeout can be specified in two ways:

1. **As a number** (interpreted as seconds):

Use `--timeout` and pass the number in seconds (here 2 hours = 7200 seconds):

```bash
>>> hf jobs uv run --timeout 7200 --with torch --flavor a10g-large train.py
```

2. **As a string with time units**:

Or use `--timeout` and use diffetent time units:

```bash
>>> hf jobs uv run --timeout 2h --with torch --flavor a10g-large train.py
```

Other examples:

```bash
--timeout 30m # 30 minutes
--timeout 1.5h # 1.5 hours
--timeout 1d # 1 day
--timeout 3600s # 3600 seconds
```

Supported time units:
- `s` - seconds
- `m` - minutes
- `h` - hours
- `d` - days

> [!WARNING]
> If you don't specify a timeout, a default timeout will be applied to your job. For long-running tasks like model training that may take hours, make sure to set an appropriate timeout to avoid unexpected job terminations.

## Namespace

Run Jobs under your organization account using the `--namespace` argument. Make sure you are logged in with a token that has the permission to start and manage Jobs under your orgzanization account.

```bash
>>> hf jobs uv run --namespace my-org-name python -c "print('Running in an org account')"
```

Note that you can pass a token with the right permission manually:

```bash
>>> hf jobs uv run --namespace my-org-name --token hf_xxx python -c "print('Running in an org account')"
```
1 change: 1 addition & 0 deletions docs/jobs/data1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
🚧 this section is under construction 🚧
28 changes: 28 additions & 0 deletions docs/jobs/frameworks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Frameworks Setups

Here is the list of frameworks that provide ready-to-use Docker images with UV that you can use in Jobs.

These Docker images already have uv installed but if you want to use an image + uv for an image without uv insalled you’ll need to make sure uv is installed first. This will work well in many cases but for LLM inference libraries which can have quite specific requirements, it can be useful to use a specific image that has the library installed.

## vLLM

vLLM is a very well known and heavily used inference engine. It is known for its ability to scale inference for LLMs.
They provide the `vllm/vllm-openai` Docker image with vLLM and UV ready. This image is ideal to run batch inference.

Use the `--image` argument to use this Docker image:

```bash
>>> hf jobs uv run --image vllm/vllm-openai --flavor l4x4 generate-responses.py
```

You can find more information on vLLM batch inference on Jobs in [Daniel Van Strien's blog post](https://danielvanstrien.xyz/posts/2025/hf-jobs/vllm-batch-inference.html).

## TRL

TRL is a library designed for post-training models using techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). An up-to-date Docker image with UV and all TRL dependencies is available at `huggingface/trl` and can be used directly with Hugging Face Jobs.

Use the `--image` argument to use this Docker image:

```bash
>>> hf jobs uv run --image huggingface/trl --flavor a100-large -s HF_TOKEN train.py
```
Loading