CLUSTERNAME-pod-0
, to expand the Pod.
main.py
file into the Pod’s main directory:
main.py
file:
main()
function prints the local and global rank for each GPU process (this is also where you can add your own code). LOCAL_RANK
is assigned dynamically to each process by PyTorch. All other environment variables are set automatically by Runpod during deployment.
main.py
processes per node (one per GPU in the Pod).
After running the command on the last Pod, you should see output similar to this:
0
to WORLD_SIZE-1
(WORLD_SIZE
= the total number of GPUs in the cluster). In our example there are two Pods of eight GPUs, so the global rank spans from 0-15. The second number is the local rank, which defines the order of GPUs within a single Pod (0-7 for this example).
The specific number and order of ranks may be different in your terminal, and the global ranks listed will be different for each Pod.
This diagram illustrates how local and global ranks are distributed across multiple Pods: