add quickstart

2025-07-21 18:48:10 +08:00 · 2025-07-21 18:48:10 +08:00 · 92495df9e3
parent 4804b05637
commit 92495df9e3
4 changed files with 261 additions and 111 deletions
--- a/docs/_toc.yml
+++ b/docs/_toc.yml
@ -7,9 +7,13 @@ parts:
  - caption: Tutorial
    chapters:
    - file: tutorial/installation
+    - file: tutorial/quickstart_arealite
    - file: tutorial/quickstart
    - file: tutorial/eval
    - file: tutorial/troubleshooting
+  - caption: Getting Started with AReaLite
+    chapters:
+    - file: arealite/gsm8k_grpo
  - caption: References
    chapters:
    - file: references/benchmark
--- a/docs/arealite/gsm8k_grpo.md
+++ b/docs/arealite/gsm8k_grpo.md
@ -0,0 +1,132 @@
+# Code Walkthrough: Running GRPO on GSM8K Dataset
+
+In this guide, we will walk you through the detailed code of an example that runs GRPO algorithm on GSM8K dataset, with training script [examples/arealite/gsm8k_grpo.py](../../examples/arealite/gsm8k_grpo.py) and configuration file [examples/arealite/configs/gsm8k_grpo.yaml](../../examples/arealite/configs/gsm8k_grpo.yaml). 
+
+## Launching the Experiment
+
+As shown in [Quickstart Guide](../tutorial/quickstart_arealite.md), an experiment of AReaLite is launched by standalone launchers with command:
+
+```
+# Local Launcher
+python -m arealite.launcher.local <training script> --config <configuration file> <cli args>
+# Ray Launcher
+python -m arealite.launcher.ray <training script> --config <configuration file> <cli args>
+# Slurm Launcher
+python -m arealite.launcher.slurm <training script> --config <configuration file> <cli args>
+```
+
+In AReaLite, the **training script** is **an SPMD python script** that serves as an entry point to launch the experiment.
+The launcher directly runs the training script with their distributed backend (`subprocess` for `LocalLauncher`, `ray.remote` for `RayLauncher`, `srun` for `SlurmLauncher`).
+Except for the training script, the launcher also is responsible for running inference servers (currently only support `SGLangServer`). 
+For distributed launchers (`RayLauncher` and `SlurmLauncher`), they run inference servers with a wrapper [arealite/launcher/sglang_server.py](../../arealite/launcher/sglang_server.py) for managing addresses and ports in the distributed settings.
+
+The **configuration file**, which is a YAML file that sets the options provided in [arealite/api/cli_args.py](../../arealite/api/cli_args.py).
+It could be changed by CLI arguments such as `actor.path=Qwen/Qwen3-1.7B` and `+sglang.attention_backend=triton`.
+The training scripts uses [load_expr_config(args, config_cls)](../../arealite/api/cli_args.py#L886) to parse the config with CLI arguments to the config class defined in [arealite/api/cli_args.py](../../arealite/api/cli_args.py). 
+
+In the example:
+```
+config, _ = load_expr_config(args, GRPOConfig)
+config: GRPOConfig
+```
+
+## Loading and Pre-processing Dataset
+
+In our example, we directly use tools from package `datasets` and `torchdata` to load and pre-process the dataset into our dataloader.  
+we first download `openai/gsm8k` from huggingface and split them by data parallel ranks, and then map them to the format we want.
+
+```python
+def process_gsm8k_rl_dataset(dataset: Dataset):
+    def process(sample):
+        messages = [{"role": "user", "content": sample["question"]}]
+        return {"messages": messages}
+    dataset = dataset.map(process).remove_columns(["question"])
+    return dataset
+
+def get_gsm8k_dataset(split, rank, world_size):
+    dataset = load_dataset(path="openai/gsm8k", name="main", split=split)
+    dataset = split_dataset_by_node(dataset, rank=rank, world_size=world_size)
+    return process_gsm8k_rl_dataset(dataset)
+```
+
+Then we prepare training and evaluation data loaders with `torchdata.StatefulDataLoader`, which will serve as an input for data rollout.
+
+```python
+train_dataloader = torchdata.StatefulDataLoader(
+    get_gsm8k_dataset("train", rank, world_size),
+    batch_size=config.train_dataset.batch_size // world_size,
+    shuffle=config.train_dataset.shuffle,
+    num_workers=config.train_dataset.num_workers,
+    collate_fn=lambda x: x,
+    drop_last=config.train_dataset.drop_last,
+)
+valid_dataloader = ...
+```
+
+If you wish to use your own huggingface datasets or datasets on your local storage, please refers to [Customization: Dataset](../customization/dataset.md) for further details.
+
+## Rollout
+
+Next, we prepare for data rollout. The life-cycle of a piece of data is controlled by a `RLVRWorkflow`, which defines how data is processed from a prompt to a complete rollout data with fields required for training. Note that the workflow can involve multiple turns of generation, tool calling and reward calculation. In our example here, we only show a single-turn RLVR workflow with a math reward function.
+
+First, we define a math reward function for GSM8K.
+
+```python
+from ... import extract_answer, extract_solution
+
+def gsm8k_reward_fn(prompt, completions, prompt_ids, completion_ids, answer, **kwargs):
+    sol = extract_answer(completions, data_name="math")
+    ans = extract_solution(solution_str=answer, method="strict")
+    if sol is None:
+        return 0
+    if ans is None:
+        return 0
+    return int(sol.strip() == ans.strip())
+```
+
+Then we initialize `RLVRWorkflow` for reward computation.
+
+```python
+tokenizer = load_hf_tokenizer(config.tokenizer_path)
+workflow = RLVRWorkflow(
+    reward_fn=gsm8k_reward_fn, 
+    gconfig=config.gconfig, 
+    tokenizer=tokenizer,
+)
+```
+
+As for generation, we assume that the launchers have already launched instances of `SGLangServer`, and passed in the environment variable `AREAL_LLM_SERVER_ADDRS` to tell us the addresses and ports of these inference servers to connect to. 
+
+In the next step, we initialize `RemoteSGLangEngine` in the training script. Its APIs could be catagorized into two types:
+-  Sending requests such as generation and update weights to remote inference servers and returns the replies. Related APIs include `agenerate` and `update_weights`.
+-  Execute the rollout workflow. Manage the streaming data going through the rollout workflow to control parameter version differences between generation and training (data offpolicyness), and collate completed rollout data into a batched training sample. Related APIs include `prepare_batch` and `rollout_batch`.
+
+```python
+rollout = RemoteSGLangEngine(config.rollout)
+rollout.initialize()
+eval_rollout = ...
+
+data_generator = iter(train_dataloader)
+for global_step in range(max_steps):
+    # rollout batched training data for current step
+    if config.async_training:
+        batch = rollout.prepare_batch(train_dataloader, workflow=workflow)
+    else:
+        try:
+            data = next(data_generator)
+        except StopIteration:
+            data_generator = iter(train_dataloader)
+            data = next(data_generator)
+        batch = rollout.rollout_batch(data, workflow=workflow)
+```
+If you want to customize your own rollout workflow with customized reward functions or agentic tool calling, please refer to [Customization: Rollout Workflows](agent.md).
+
+## Training
+
+
+## Utilities 
+
+
+
+
+
--- a/docs/tutorial/quickstart.md
+++ b/docs/tutorial/quickstart.md
@ -1,35 +1,10 @@
-# Quickstart
+# Quickstart (Legacy)

-This guide walks you through a simple example of training an LLM to solve math problems.
-Please ensure you have properly
-[installed dependencies and set up the runtime environment](installation.md) before
-proceeding.
+> **Note**: This is a quickstart guide for launching AReaL experiment with legacy code in [realhf/](../../realhf/). We strongly recommend users to try AReaLite for better experiences. [Click here](quickstart_arealite.md) for AReaLite quickstart guide!

-## Option 1: Using *AReaLite*
+This guide walks you through a simple example of training an LLM to solve math problems. Please ensure you have properly [installed dependencies and set up the runtime environment](installation.md) before proceeding.

-AReaLite is an RL training framework that provides the same functionality as AReaL, but
-is much easier to use, customize, and understand. It does not depend on AReaL except for
-some common core utilities such as logging.
-
-We provide usage examples in the `examples/arealite` folder. To launch an experiment
-that trains your LLM to solve GSM8k math problems, run the following command:
-
-```bash
-python3 -m arealite.launcher.local examples/arealite/gsm8k_grpo.py --config examples/arealite/configs/gsm8k_grpo.yaml
-```
-
-You can modify any options in `examples/arealite/configs/gsm8k_grpo.yaml`, such as the
-base model to use and hyperparameters. Note that this example does not support changing
-the dataset through configuration modifications. Users can modify the dataset processing
-logic using the HuggingFace `datasets` package in the training script
-`examples/arealite/gsm8k_grpo.py` to use other datasets.
-
-> **Note**: This command assumes you can connect to the HuggingFace Hub to download
-> models and datasets. Use [hf-mirror](https://hf-mirror.com/) if necessary.
-
-## Option 2: Using the old version of AReaL
-
-### Dataset
+## Dataset

 Use `huggingface-cli` to download our open-source dataset:

@ -37,24 +12,20 @@ Use `huggingface-cli` to download our open-source dataset:
 huggingface-cli download --repo-type=dataset inclusionAI/AReaL-RL-Data
 ```

-> **Note**: The command above will display the path of the downloaded dataset. You'll
-> need to pass this path to the training command.
+> **Note**: The command above will display the path of the downloaded dataset. You'll need to pass this path to the training command.

-### Model
+## Model

-We train using open-source models available on Hugging Face Hub. You can either download
-the model in advance or use the model identifier when running the experiment.
+We train using open-source models available on Hugging Face Hub. You can either download the model in advance or use the model identifier when running the experiment.

 ```bash
 # If you want to download it in advance
 huggingface-cli download Qwen/Qwen3-1.7B
 ```

-Refer to the
-[official documentation](https://huggingface.co/docs/huggingface_hub/guides/cli) for
-more information on using `huggingface-cli`.
+Refer to the [official documentation](https://huggingface.co/docs/huggingface_hub/guides/cli) for more information on using `huggingface-cli`.

-### Training
+## Training

 From the repository directory, run:

@ -79,13 +50,11 @@ python3 training/main_async_ppo.py \
    max_head_offpolicyness=4
 ```

-::::{important} Running `main_async_ppo.py` with `ppo.recompute_logprob=False`,
-`ppo.use_decoupled_loss=False`, and `max_head_offpolicyness=0` will essentially
-replicate the behavior of synchronous PPO. Therefore, it's usually not recommended to
-run synchronous PPO directly (i.e., `main_sync_ppo.py`). The workflow of asynchronous RL
-is more stable and easier to customize. ::::
+::::{important}
+Running `main_async_ppo.py` with `ppo.recompute_logprob=False`, `ppo.use_decoupled_loss=False`, and `max_head_offpolicyness=0` will essentially replicate the behavior of synchronous PPO. Therefore, it's usually not recommended to run synchronous PPO directly (i.e., `main_sync_ppo.py`). The workflow of asynchronous RL is more stable and easier to customize.
+::::

-### Command Line Options
+## Command Line Options

 To view all available options:

@ -93,97 +62,62 @@ To view all available options:
 python3 training/main_sync_ppo.py --help
 ```

-#### Configuration Parameters
+### Configuration Parameters

 - **`experiment_name`**: The name of your project.
 - **`trial_name`**: The name of this trial in your project.
 - **`{actor|ref}.path`**: The path to the model files.
 - **`dataset.path`**: The path to the dataset JSONL file.
- **`cluster.fileroot`**: The root path for saving training outputs (logs and
-  checkpoints).
+- **`cluster.fileroot`**: The root path for saving training outputs (logs and checkpoints).
 - **`n_nodes`**: The number of nodes in the cluster.
 - **`n_gpus_per_node`**: The number of GPUs per node.
- **`allocation_mode`**: The GPU allocation strategy and 3D parallelism configuration
-  for the experiment. Format:
-  - `sglang.d${DP1}m${TP1}p${PP1}+d${DP2}m${TP2}p${PP2}`: Configures parallel strategies
-    for SGLang generation and training respectively. Generation and training use
-    separate GPU sets, and the total GPU count must equal: DP1×TP1×PP1 + DP2×TP2×PP2 =
-    #GPUs.
+- **`allocation_mode`**: The GPU allocation strategy and 3D parallelism configuration for the experiment. Format:
+  - `sglang.d${DP1}m${TP1}p${PP1}+d${DP2}m${TP2}p${PP2}`: Configures parallel strategies for SGLang generation and training respectively. Generation and training use separate GPU sets, and the total GPU count must equal: DP1×TP1×PP1 + DP2×TP2×PP2 = #GPUs.

-#### Training Control
+### Training Control

- **`exp_ctrl.total_train_epochs`**: Number of training epochs (complete dataset
-  iterations).
- **`exp_ctrl.save_freq_{epochs|steps|secs}`**: Frequency for saving model parameters to
-  persistent storage. Set to null to disable saving.
- **`exp_ctrl.ckpt_freq_{epochs|steps|secs}`**: Frequency for saving temporary
-  parameters for restart capability.
- **`dataset.train_bs_n_seqs`**: Training batch size (number of prompts sampled per
-  training iteration).
+- **`exp_ctrl.total_train_epochs`**: Number of training epochs (complete dataset iterations).
+- **`exp_ctrl.save_freq_{epochs|steps|secs}`**: Frequency for saving model parameters to persistent storage. Set to null to disable saving.
+- **`exp_ctrl.ckpt_freq_{epochs|steps|secs}`**: Frequency for saving temporary parameters for restart capability.
+- **`dataset.train_bs_n_seqs`**: Training batch size (number of prompts sampled per training iteration).
 - **`group_size`**: Number of responses sampled per prompt.

-#### Memory and Performance
+### Memory and Performance

- **`{actor_train|ref_inf|actor_inf}.mb_spec.max_tokens_per_mb`**: Maximum tokens per
-  mini-batch for forward/backward passes during reference model inference and actor
-  model training. Reduce this value to avoid OOM errors.
- **`max_concurrent_rollouts`**: The maximum number of concurrent rollouts. SGLang will
-  run out of memory if this value is too large. Defaults to `dataset.train_bs_n_seqs`.
+- **`{actor_train|ref_inf|actor_inf}.mb_spec.max_tokens_per_mb`**: Maximum tokens per mini-batch for forward/backward passes during reference model inference and actor model training. Reduce this value to avoid OOM errors.
+- **`max_concurrent_rollouts`**: The maximum number of concurrent rollouts. SGLang will run out of memory if this value is too large. Defaults to `dataset.train_bs_n_seqs`.

-#### Algorithm Configuration
+### Algorithm Configuration

- **`max_head_offpolicyness`**: The allowed maximum data staleness. 0 recovers
-  synchronous training. A large value will increase generation throughput but degrade
-  final performance. We recommend keeping this value at 8 or below.
- **`ppo.recompute_logprob`**: Whether to compute proximal log probabilities for
-  training. Defaults to True for asynchronous experiments and False for synchronous
-  baselines.
- **`ppo.use_decoupled_loss`**: Use decoupled loss to stabilize asynchronous training.
-  Defaults to True.
+- **`max_head_offpolicyness`**: The allowed maximum data staleness. 0 recovers synchronous training. A large value will increase generation throughput but degrade final performance. We recommend keeping this value at 8 or below.
+- **`ppo.recompute_logprob`**: Whether to compute proximal log probabilities for training. Defaults to True for asynchronous experiments and False for synchronous baselines.
+- **`ppo.use_decoupled_loss`**: Use decoupled loss to stabilize asynchronous training. Defaults to True.
 - **`ppo.gen.max_new_tokens`**: Maximum tokens to generate per prompt.
- **`ppo.ppo_n_minibatches`**: Number of mini-batches for dividing data during each PPO
-  update.
- **`success_rate_ub`**: Upper bound of success rate. Prompts with a higher success rate
-  will be filtered out.
- **`success_rate_lb`**: Lower bound of success rate. Prompts with a lower success rate
-  will be filtered out.
+- **`ppo.ppo_n_minibatches`**: Number of mini-batches for dividing data during each PPO update.
+- **`success_rate_ub`**: Upper bound of success rate. Prompts with a higher success rate will be filtered out.
+- **`success_rate_lb`**: Lower bound of success rate. Prompts with a lower success rate will be filtered out.

-### Monitoring the Training Process
+## Monitoring the Training Process
+
+ We recommend using [Weights & Biases (wandb)](https://github.com/wandb/wandb)  or [SwanLab](https://github.com/SwanHubX/SwanLab)  for monitoring—run `wandb login` or `swanlab login`, or set the corresponding environment variable API key (`WANDB_API_KEY` or `SWANLAB_API_KEY`). Set `wandb.mode="online"` or `swanlab.mode="cloud"` in your configuration to upload training statistics. If you cannot connect to the server, you can also use `wandb.mode="offline"` or `swanlab.mode="local"` to save data locally without uploading.

- We recommend using [Weights & Biases (wandb)](https://github.com/wandb/wandb) or
-  [SwanLab](https://github.com/SwanHubX/SwanLab) for monitoring—run `wandb login` or
-  `swanlab login`, or set the corresponding environment variable API key
-  (`WANDB_API_KEY` or `SWANLAB_API_KEY`). Set `wandb.mode="online"` or
-  `swanlab.mode="cloud"` in your configuration to upload training statistics. If you
-  cannot connect to the server, you can also use `wandb.mode="offline"` or
-  `swanlab.mode="local"` to save data locally without uploading.

 You can also use TensorBoard by setting the `tensorboard.path` parameter.

-The main log will be saved to
-`${fileroot}/logs/${USER}/${experiment_name}/${trial_name}/main.log` and contains the
-statistics uploaded to wandb.
+The main log will be saved to `${fileroot}/logs/${USER}/${experiment_name}/${trial_name}/main.log` and contains the statistics uploaded to wandb.

-If SwanLab is enabled, logs will be saved to the directory specified by
-`swanlab.logdir`.
+If SwanLab is enabled, logs will be saved to the directory specified by `swanlab.logdir`.

-#### Key Training Statistics
+### Key Training Statistics

- **`Epoch 1/5`**: Indicates the total epochs required and the current epoch being
-  trained.
- **`step 6/19`**: Shows that the current epoch has 19 steps, with the 6th step just
-  completed.
+- **`Epoch 1/5`**: Indicates the total epochs required and the current epoch being trained.
+- **`step 6/19`**: Shows that the current epoch has 19 steps, with the 6th step just completed.
 - **`global step 6`**: Step count across all epochs.
- **`ppo_actor/task_reward/avg`**: Average reward value of all sampled responses in this
-  step. This should steadily increase during training and eventually stabilize.
- **`ppo_actor/importance_weight/avg`**: Average importance sampling ratio across all
-  tokens in the PPO loss. This is typically close to 1.0.
- **`ppo_actor/actor_clip_ratio/avg`**: Ratio of clipped tokens in PPO loss to total
-  tokens. This is usually less than 0.1.
- **`ppo_actor/actor_loss/avg`**: PPO loss value. **This does not show clear trends
-  during training** and should not be used as a performance indicator.
+- **`ppo_actor/task_reward/avg`**: Average reward value of all sampled responses in this step. This should steadily increase during training and eventually stabilize.
+- **`ppo_actor/importance_weight/avg`**: Average importance sampling ratio across all tokens in the PPO loss. This is typically close to 1.0.
+- **`ppo_actor/actor_clip_ratio/avg`**: Ratio of clipped tokens in PPO loss to total tokens. This is usually less than 0.1.
+- **`ppo_actor/actor_loss/avg`**: PPO loss value. **This does not show clear trends during training** and should not be used as a performance indicator.

 ## Next Steps

-[Evaluate your model](eval.md) or check the
-[troubleshooting section](troubleshooting.md) if you encounter any issues.
+[Evaluate your model](eval.md) or check the [troubleshooting section](troubleshooting.md) if you encounter any issues.
--- a/docs/tutorial/quickstart_arealite.md
+++ b/docs/tutorial/quickstart_arealite.md
@ -0,0 +1,80 @@
+# Quickstart
+
+Welcome to AReaLite quickstart guide!
+In this guide, we provide an example that runs an AReaLite experiment that trains an LLM on GSM8K dataset with GRPO algorithm and function-based rewards. 
+Please ensure you have properly [installed dependencies and set up the runtime environment](installation.md) before proceeding.
+
+# Running the Experiment (on a single node)
+
+To run the experiment, you need a training script and a config YAML file.
+- Training script: [examples/arealite/gsm8k_grpo.py](../../examples/arealite/gsm8k_grpo.py)
+- Config YAML: [examples/arealite/configs/gsm8k_grpo.yaml](../../examples/arealite/configs/gsm8k_grpo.yaml)
+
+Our training scripts will automatically download the dataset (openai/gsm8k) and model (Qwen/Qwen3-1.7B) for you. 
+You do not need to prepare any dataset or model files before running the experiment.
+To run the example with the default configuration, execute following command from the repository directory: 
+```
+python3 -m arealite.launcher.local examples/arealite/gsm8k_grpo.py --config examples/arealite/configs/gsm8k_grpo.yaml experiment_name=<your experiment name> trial_name=<your trial name>
+```
+
+> **Note**: The command above uses `LocalLauncher`, which only works for a single node (`cluster.n_nodes == 1`). For launching distributed experiments, please check out [Distributed Experiments with Ray or Slurm](quickstart.md#distributed-experiments-with-ray-or-slurm).
+
+### Modifying configuration
+
+All available options for experiment configuration are listed in [arealite/api/cli_args.py](https://github.com/inclusionAI/AReaL/blob/main/arealite/api/cli_args.py). 
+To change the experiment configuration, including models, resource allocation, and algorithm options, you can:
+1. Directly modifying the config YAML file at [examples/arealite/configs/gsm8k_grpo.yaml](../../examples/arealite/configs/gsm8k_grpo.yaml).
+2. Adding command line options. For entries that exist in the config YAML, you could directly add the options after your command. For example: `actor.path=Qwen/Qwen3-1.7B`. For other options in `cli_args.py` but not in YAML, you could add these options with a prefix "+". For example: `+sglang.attention_backend=triton`. 
+
+For example, here is the command to launch a customized configuration, based on our GSM8K GRPO example:
+```
+python3 -m arealite.launcher.local examples/arealite/gsm8k_grpo.py \
+    --config examples/arealite/configs/gsm8k_grpo.yaml \
+    experiment_name=<your experiment name> \
+    trial_name=<your trial name> \
+    allocation_mode=sglang.d2p1t1+d2p1t1 \
+    cluster.n_nodes=1 \
+    cluster.n_gpus_per_node=4 \
+    gconfig.max_new_tokens=2048 \
+    train_dataset.batch_size=1024 \
+    +sglang.attention_backend=triton
+```
+
+::::{important}
+Since we are working on a refactor from legacy AReaL to AReaLite, there are some changes that makes available options for AReaLite slightly different from legacy AReaL. We provide a **config converter** to transfer old AReaL config into AReaLite YAML file for users' convenience. [Click here](xxx) for the usage of **config converter**. 
+::::
+
+### Distributed Experiments with Ray or Slurm
+
+AReaLite also provide standalone Ray or Slurm launchers for distributed experiments. Once you have properly setup your Ray or Slurm cluster, you could launch your experiment with `arealite.launcher.ray` and `arealite.launcher.slurm`, similar to the `LocalLauncher`:
+
+```
+# Launch with Ray launcher. 4 nodes with 4 GPUs each node, 3 nodes for generation, 1 node for training.
+python3 -m arealite.launcher.ray examples/arealite/gsm8k_grpo.py \
+    --config examples/arealite/configs/gsm8k_grpo.yaml \
+    experiment_name=<your experiment name> \
+    trial_name=<your trial name> \
+    allocation_mode=sglang.d12p1t1+d4p1t1 \
+    cluster.n_nodes=4 \
+    cluster.n_gpus_per_node=4 \
+    ...
+
+# Launch with Slurm launcher. 16 nodes with 8 GPUs each node, 12 nodes for generation, 4 nodes for training
+python3 -m arealite.launcher.slurm examples/arealite/gsm8k_grpo.py \
+    --config examples/arealite/configs/gsm8k_grpo.yaml \
+    experiment_name=<your experiment name> \
+    trial_name=<your trial name> \
+    allocation_mode=sglang.d96p1t1+d32p1t1 \
+    cluster.n_nodes=16 \
+    cluster.n_gpus_per_node=8 \
+    ...
+```
+
+[Click here](installation.md#optional-launch-ray-cluster-for-distributed-training) for a guide on how to set up a ray cluster. For more options for launchers, check `LauncherConfig` in [arealite/cli_args.py](quickstart.md#distributed-experiments-with-ray-or-slurm).
+
+> **Note**: Before launching distributed experiments, please check if your `allocation_mode` matches your cluster configuration. Make sure #GPUs allocated by `allocation_mode` equals to `cluster.n_nodes * cluster.n_gpus_per_node`. 
+> **Note**: Ray and Slurm launchers only work for distributed experiments with more than 1 node (`cluster.n_nodes > 1`). They allocate GPUs for training and generation at the granularity of **nodes**, which means the number of GPUs allocated for generation and training must be integer multiples of `cluster.n_gpus_per_node`.
+
+## Next Steps
+
+Check [Getting Started with AReaLite](../arealite/gsm8k_grpo.md) for a complete code walkthrough for the GRPO GSM8K Example.