5.7 KiB

Raw Blame History

Quickstart

Welcome to the AReaLite Quickstart Guide! This guide demonstrates how to run an AReaLite experiment training an LLM on the GSM8K dataset using the GRPO algorithm with function-based rewards. Ensure you've completed the installation and environment setup before proceeding.

Running the Experiment (on a single node)

To run the experiment, you will need:

Training script: examples/arealite/gsm8k_grpo.py
Config YAML: examples/arealite/configs/gsm8k_grpo.yaml

Our training scripts will automatically download the dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run the example with default configuration, execute from the repository directory:

python3 -m arealite.launcher.local examples/arealite/gsm8k_grpo.py --config examples/arealite/configs/gsm8k_grpo.yaml experiment_name=<your experiment name> trial_name=<your trial name>

Note: The command above uses LocalLauncher, which only works for a single node (cluster.n_nodes == 1). For distributed experiments, see Distributed Experiments with Ray or Slurm.

Modifying configuration

All available configuration options are listed in arealite/api/cli_args.py. To customize the experiment (models, resources, algorithm options), you can:

Edit the YAML file directly at examples/arealite/configs/gsm8k_grpo.yaml.
Add command-line options:
- For existing options in the YAML file, directly add the option: actor.path=Qwen/Qwen3-1.7B.
- For other options in cli_args.py, but not in the YAML file, add with a prefix "+": +sglang.attention_backend=triton.

For example, here is the command to launch a customized configuration, based on our GSM8K GRPO example:

python3 -m arealite.launcher.local examples/arealite/gsm8k_grpo.py \
    --config examples/arealite/configs/gsm8k_grpo.yaml \
    experiment_name=<your experiment name> \
    trial_name=<your trial name> \
    allocation_mode=sglang.d2p1t1+d2p1t1 \
    cluster.n_nodes=1 \
    cluster.n_gpus_per_node=4 \
    gconfig.max_new_tokens=2048 \
    train_dataset.batch_size=1024 \
    +sglang.attention_backend=triton

::::{important} We're currently refactoring from legacy AReaL to AReaLite, which introduces some configuration differences. We provide a config converter to transfer old AReaL config into AReaLite YAML file for users' convenience. Click here to learn how to use the config converter. ::::

Distributed Experiments with Ray or Slurm

AReaLite provides standalone launchers for distributed experiments. After setting up your Ray or Slurm cluster, launch experiments similarly to LocalLauncher:

# Launch with Ray launcher. 4 nodes (4 GPUs each), 3 nodes for generation, 1 node for training.
python3 -m arealite.launcher.ray examples/arealite/gsm8k_grpo.py \
    --config examples/arealite/configs/gsm8k_grpo.yaml \
    experiment_name=<your experiment name> \
    trial_name=<your trial name> \
    allocation_mode=sglang.d12p1t1+d4p1t1 \
    cluster.n_nodes=4 \
    cluster.n_gpus_per_node=4 \
    ...

# Launch with Slurm launcher. 16 nodes (8 GPUs each), 12 nodes for generation, 4 nodes for training
python3 -m arealite.launcher.slurm examples/arealite/gsm8k_grpo.py \
    --config examples/arealite/configs/gsm8k_grpo.yaml \
    experiment_name=<your experiment name> \
    trial_name=<your trial name> \
    allocation_mode=sglang.d96p1t1+d32p1t1 \
    cluster.n_nodes=16 \
    cluster.n_gpus_per_node=8 \
    ...

Additional references:

For more options for launchers, check LauncherConfig in arealite/api/cli_args.py.
Ray cluster setup guide for a guide on how to set up a ray cluster.

Important Notes:

Ensure allocation_mode matches your cluster configuration (#GPUs == cluster.n_nodes * cluster.n_gpus_per_node)

Ray/Slurm launchers only works for more than 1 node (cluster.n_nodes > 1). For single node scenario, please use LocalLauncher.

In Ray/Slurm launchers, GPUs are allocated at node granularity, which means #GPUs for generation or training must be integer multiples of cluster.n_gpus_per_node.

Next Steps

Customization guides:

5.7 KiB Raw Blame History