Skip to content

Tutorial-style blog posts #32

@ben-z

Description

@ben-z

This ticket is for creating tutorial blog posts. These posts are aimed at WATcloud users, providing practical examples of what can be done with the cluster. Prospective WATcloud members can also write a blog post as their onboarding project.

  • vLLM: PR, Post
  • Bagel (can be similar to https://www.digitalocean.com/community/tutorials/bagel-vlm-gpu-droplet)
  • Multi-node model training (can be similar to https://www.digitalocean.com/community/tutorials/multi-node-llm-training-at-scale)
  • Stable diffusion web UI, Outpainting using the stable diffusion API.
  • Using VSCode with SLURM. There are 2 major approaches, sshd server and vscode tunnel
    • my sshd server command (works with any editor that supports ssh. Requires setting up hostkey and ssh config): ssh wato-login1 '__num=1 && __jobid=$(sbatch --cpus-per-task 4 --mem 8G --gres tmpdisk:20480 --nodelist trpro-slurm2 --time 6:00:00 --job-name sshd1 --wrap "slurm-start-dockerd.sh && /usr/sbin/sshd -D -e -f /dev/null -h \"$HOME\"/.ssh/slurm-sshd-host-key -p $(($(id -u)*20+${__num})) -o SetEnv=XDG_RUNTIME_DIR=/tmp/run -o PidFile=/tmp/sshd.pid" | awk '\''{print $4}'\''); echo submitted job $__jobid; sleep 3; tail -n+1 -f slurm-$__jobid.out'
    • my vscode tunnel command (Editors need to implement support. Works with vscode and cursor, not windsurf):
      • vscode: ssh wato-login1 '__jobid=$(sbatch --cpus-per-task 4 --mem 8G --gres tmpdisk:20480 --nodelist trpro-slurm1 --time 1:00:00 --wrap="cd /tmp && eval "'\''$(ssh-agent)'\''" && slurm-start-dockerd.sh && curl --silent --location --user-agent '\''Mobile Safari/537.36'\'' '\''https://code.visualstudio.com/sha/download?build=stable&os=cli-alpine-x64'\'' | tar -xzv && VSCODE_CLI_DISABLE_KEYCHAIN_ENCRYPT=true ./code tunnel --name slurm" | awk '\''{print $4}'\''); echo submitted job $__jobid; sleep 3; tail -n+1 -f slurm-$__jobid.out'
      • cursor: ssh wato-login1 '__jobid=$(sbatch --cpus-per-task 4 --mem 8G --gres tmpdisk:20480 --nodelist trpro-slurm1 --time 1:00:00 --wrap="cd /tmp && eval "'\''$(ssh-agent)'\''" && slurm-start-dockerd.sh && curl --silent --location --user-agent '\''Mobile Safari/537.36'\'' '\''https://api2.cursor.sh/updates/download-latest?os=cli-alpine-x64'\'' | tar -xzv && VSCODE_CLI_DISABLE_KEYCHAIN_ENCRYPT=true ./cursor tunnel --name slurm" | awk '\''{print $4}'\''); echo submitted job $__jobid; sleep 3; tail -n+1 -f slurm-$__jobid.out'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions