change name to AReaL-lite

This commit is contained in:
晓雷 2025-08-01 16:01:51 +08:00
parent 3a97f06be1
commit 04b26f42bb
14 changed files with 110 additions and 101 deletions

View File

@ -1,4 +1,4 @@
name: Test AReaLite
name: Test AReaL-lite
on:
pull_request:
@ -12,7 +12,7 @@ on:
jobs:
test-arealite:
environment:
name: AReaLite-unittests
name: AReaL-lite-unittests
runs-on: ubuntu-latest
concurrency:
group: test-arealite

View File

@ -20,12 +20,12 @@ like how you enjoy real-world milk tea (cheers).
**AReaL Highlights**
- ⚡ <span style="color: red; font-weight: bold;">**\[NEW\] AReaLite:**</span> Our new
release AReaLite is a **light-weight** and **AI-centric** codebase that prioritizes
better development experiences for AI researchers. As a result, AReaLite delivers most
AReaL functionalities while maintains its high performance with much fewer lines of
code. This allows users to build their own **agentic** and **RLVR** training workflows
with minimal effort.
- ⚡ <span style="color: red; font-weight: bold;">**\[NEW\] AReaL-lite:**</span> Our new
release AReaL-lite is a **light-weight** and **AI-centric** codebase that prioritizes
better development experiences for AI researchers. As a result, AReaL-lite delivers
most AReaL functionalities while maintains its high performance with much fewer lines
of code. This allows users to build their own **agentic** and **RLVR** training
workflows with minimal effort.
- 🔥 **Asynchronous RL**: With algorithm-system co-design, AReaL supports fully
asynchronous RL for **the fastest training speed**! Experimental support for
multi-turn agentic RL is also provided.
@ -38,14 +38,14 @@ like how you enjoy real-world milk tea (cheers).
## News
**\[2025/07/31\] (AReaLite)** We introduce AReaLite, a **light-weight** version of AReaL
designed specifically for AI researchers and rapid prototyping. AReaLite features an
**AI-centric** API design that prioritizes ease of use and algorithm development, while
inherently supporting fully asynchronous **agentic RL**. With 80% fewer lines of code,
AReaLite maintains 90% of AReaL's high performance and core functionality. Check out
[our AReaLite design doc](/arealite/README.md) and
**\[2025/07/31\] (AReaL-lite)** We introduce AReaL-lite, a **light-weight** version of
AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite
features an **AI-centric** API design that prioritizes ease of use and algorithm
development, while inherently supporting fully asynchronous **agentic RL**. With 80%
fewer lines of code, AReaL-lite maintains 90% of AReaL's high performance and core
functionality. Check out [our AReaL-lite design doc](/arealite/README.md) and
[the quickstart guide](/docs/tutorial/quickstart.md) to begin your journey with
**AReaLite**!
**AReaL-lite**!
**\[2025/06/03\] (v0.3, boba²)** We release **boba²** (double-boba) for fully
asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or
@ -62,20 +62,20 @@ SOTA 7B and 32B models on math reasoning. Check our
**\[2025/02/24\] (v0.1)** Our initial release includes reproducible results for 1.5B and
7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
## AReaLite Release Highlights
## AReaL-lite Release Highlights
New highlights in AReaLite:
New highlights in AReaL-lite:
- Instead of the *system-centric* architecture in old AReaL, AReaLite follows an
- Instead of the *system-centric* architecture in old AReaL, AReaL-lite follows an
**AI-centric** API design that aims to provide the following key features:
- **Light-weight** & **easy-to-write** algorithm and training workflow customization.
- **Easy to scale up** without knowing system and infrastructure details.
- **Adaptable and plugable:** Smooth to integrate with other modern AI applications.
These features make AReaLite easy for AI researchers to adopt, understand, and develop
effectively and efficiently. To learn more about the design principles of AReaL,
please read the [AReaLite design doc](/arealite/README.md)!
These features make AReaL-lite easy for AI researchers to adopt, understand, and
develop effectively and efficiently. To learn more about the design principles of
AReaL, please read the [AReaL-lite design doc](/arealite/README.md)!
- A much more *light-weight* codebase compared to old AReaL codebase with only **20%** #
lines of code, with a detailed [code walkthrough](/docs/arealite/gsm8k_grpo.md) on an
@ -94,10 +94,10 @@ Good old stuff from AReaL:
- A single command line to launch an experiment, no matter on a single node or a
large-scale distributed cluster.
Now, let us run an example experiment with AReaLite following the quickstart guide
Now, let us run an example experiment with AReaL-lite following the quickstart guide
below!
## Getting Started with AReaLite
## Getting Started with AReaL-lite
Our training scripts will automatically download the dataset (openai/gsm8k) and model
(Qwen/Qwen2-1.5B-Instruct). On a single node, runs:
@ -125,12 +125,12 @@ python3 -m arealite.launcher.local examples/arealite/eval.py --config examples/a
```
-->
For more detailed guide on how to run experiments in AReaLite, please check out
For more detailed guide on how to run experiments in AReaL-lite, please check out
[our quickstart guide](/docs/tutorial/quickstart.md)!
## Switching from legacy AReaL to AReaLite
## Switching from legacy AReaL to AReaL-lite
We also provide a convenient script to convert your AReaL YAML config into AReaLite
We also provide a convenient script to convert your AReaL YAML config into AReaL-lite
config in one command line. First you need to locate your AReaL config either modified
from files from `examples` folder, or generated when you run your experiments in
`<fileroot>/<expr_name>/<trial_name>` folder. Runs:
@ -139,17 +139,17 @@ from files from `examples` folder, or generated when you run your experiments in
python3 examples/arealite/convert_config.py -f <config_path> -o <output_path>
```
Then you should be able to run experiments with your old settings on AReaLite!
Then you should be able to run experiments with your old settings on AReaL-lite!
## AReaLite vs legacy AReaL
## AReaL-lite vs legacy AReaL
AReaLite is an initiative to fully refactor AReaL, addressing historical issues such as
redundant code and unnecessary system-level abstractions. Currently, AReaLite provides a
lightweight codebase that enables fast prototyping for new RL training workflows and
algorithms on a relatively small scale. For large-scale experiments (1K+ GPUs), we
recommend using the battle-tested legacy AReaL to ensure stability. In the future, we
will continue developing AReaLite by expanding its APIs, migrating legacy features,
introducing new functionality, and validating the system through large-scale
AReaL-lite is an initiative to fully refactor AReaL, addressing historical issues such
as redundant code and unnecessary system-level abstractions. Currently, AReaL-lite
provides a lightweight codebase that enables fast prototyping for new RL training
workflows and algorithms on a relatively small scale. For large-scale experiments (1K+
GPUs), we recommend using the battle-tested legacy AReaL to ensure stability. In the
future, we will continue developing AReaL-lite by expanding its APIs, migrating legacy
features, introducing new functionality, and validating the system through large-scale
experiments.
## Resources
@ -160,17 +160,17 @@ experiments.
### Quickstart
- [Installation](https://inclusionai.github.io/AReaL/tutorial/installation.html)
- [AReaLite Quickstart](/docs/tutorial/quickstart.md)
- [AReaL-lite Quickstart](/docs/tutorial/quickstart.md)
### Code Walkthrough
- [Running GRPO on GSM8K dataset with AReaLite](/docs/arealite/gsm8k_grpo.md)
- [Running GRPO on GSM8K dataset with AReaL-lite](/docs/arealite/gsm8k_grpo.md)
### Customization
- [Customize dataset with AReaLite](../customization/dataset.md)
- [Customize Agentic/RVLR rollout workflows with AReaLite](../customization/agent.md)
- [Customize algorithms with AReaLite](../customization/algorithm.md)
- [Customize dataset with AReaL-lite](../customization/dataset.md)
- [Customize Agentic/RVLR rollout workflows with AReaL-lite](../customization/agent.md)
- [Customize algorithms with AReaL-lite](../customization/algorithm.md)
### AReaL Legacy

View File

@ -1,9 +1,9 @@
# AReaLite Design Doc
# AReaL-lite Design Doc
## TL;DR
Follow our [step-by-step code walk-through](../docs/arealite/gsm8k_grpo.md) to
immediately get started with AReaLite!
immediately get started with AReaL-lite!
## Motivation
@ -27,7 +27,7 @@ Graph). To customize a training workflow, researchers first need to understand t
system-level concepts. Then they are forced to find code to modify, which is scattered
around in the codebase. It is also nearly impossible to exploit packages like `datasets`
since it is not compatible with the workers. This gap is the core motivation behind
AReaLite: rebuilding AReaL with an AI-centric architecture and APIs.
AReaL-lite: rebuilding AReaL with an AI-centric architecture and APIs.
Beyond architectural concerns, AReaL suffers from accumulated technical debt. The
codebase contains substantial legacy code inherited from previous projects that no
@ -40,13 +40,13 @@ possible to achieve comparable efficiency with significantly fewer lines of code
presents an ideal opportunity to redesign the API and distill the massive codebase into
something clean and maintainable. Rather than pursuing maximum efficiency, our goal is
to deliver 90% of AReaL's functionality while dramatically reducing code complexity and
user burden. This philosophy drives AReaLite — the lightweight version of AReaL.
user burden. This philosophy drives AReaL-lite — the lightweight version of AReaL.
AReaLite serves as the first phase in AReaL's broader refactoring initiative. It
AReaL-lite serves as the first phase in AReaL's broader refactoring initiative. It
functions both as a standalone training library with intuitive interfaces and as the
foundation for AReaL's future core API definitions. The plan is to transform AReaL's
current worker-based architecture into an AI-centric architecture similar to AReaLite,
where AReaL will **extend** AReaLite's APIs and implementations to support additional
current worker-based architecture into an AI-centric architecture similar to AReaL-lite,
where AReaL will **extend** AReaL-lite's APIs and implementations to support additional
backends for efficient large-scale training.
## Design Principles
@ -80,13 +80,13 @@ arealite/
### Component Overview
The AReaLite codebase is structured into four distinct layers: the API layer, backend
The AReaL-lite codebase is structured into four distinct layers: the API layer, backend
layer, customization layer, and entry point layer. As illustrated in the figure below,
workflow and algorithm customization logic resides in separate layers above the backend.
We prioritize keeping the entry point and customization layers clean and intuitive,
isolating them from the complex backend implementation. With AReaLite, users can define
their custom training workflows and algorithms entirely within a single entry point
file.
isolating them from the complex backend implementation. With AReaL-lite, users can
define their custom training workflows and algorithms entirely within a single entry
point file.
![arealite-layers](../assets/arealite_layers.png)

View File

@ -11,7 +11,7 @@ parts:
- file: tutorial/quickstart_legacy
- file: tutorial/eval
- file: tutorial/troubleshooting
- caption: Getting Started with AReaLite
- caption: Getting Started with AReaL-lite
chapters:
- file: arealite/gsm8k_grpo
- caption: References

View File

@ -1,17 +1,17 @@
# Running GRPO on GSM8K Dataset
This guide introduces how AReaLite runs the GRPO algorithm on the GSM8K dataset, using
This guide introduces how AReaL-lite runs the GRPO algorithm on the GSM8K dataset, using
the training script
[examples/arealite/gsm8k_grpo.py](../../examples/arealite/gsm8k_grpo.py) and
configuration file
[examples/arealite/configs/gsm8k_grpo.yaml](../../examples/arealite/configs/gsm8k_grpo.yaml).
## How AReaLite Works
## How AReaL-lite Works
The following figure illustrates the launching and one asynchronous training step of the
GRPO algorithm on the GSM8K dataset on AReaLite. Compared with the old AReaL
implementation, AReaLite runs inference servers and a SPMD training script instead of a
bunch of various workers. In a training step, AReaLite:
GRPO algorithm on the GSM8K dataset on AReaL-lite. Compared with the old AReaL
implementation, AReaL-lite runs inference servers and a SPMD training script instead of
a bunch of various workers. In a training step, AReaL-lite:
1. Submits prompts from the dataset to `RemoteSGLangEngine`, who runs `RLVRWorkflow` in
a streaming manner.
@ -29,7 +29,7 @@ show you how these steps are done in details.
## Launching the Experiment
As shown in the [quickstart guide](../tutorial/quickstart.md), experiments in AReaLite
As shown in the [quickstart guide](../tutorial/quickstart.md), experiments in AReaL-lite
are launched using standalone launchers with the following commands:
```
@ -41,7 +41,7 @@ python -m arealite.launcher.ray <training script> --config <configuration file>
python -m arealite.launcher.slurm <training script> --config <configuration file> <cli args>
```
In AReaLite:
In AReaL-lite:
- The **training script** is an SPMD python script that serves as the experiment entry
point.
@ -111,14 +111,14 @@ details.
### Inference Engine: `RemoteSGLangEngine`
In AReaLite, generation tasks are offloaded to remote inference servers, which operate
In AReaL-lite, generation tasks are offloaded to remote inference servers, which operate
on separate GPUs from those used for training. The `RemoteSGLangEngine` acts as a client
that interacts with the servers. `RemoteSGLangEngine` runs in a SPMD manner on every
training process, without occupying any GPUs.
`RemoteSGLangEngine` provides two core APIs that access the remote servers, `agenerate`
and `update_weights_async`. It is worth mentioning that, in asynchronous RL experiment
in AReaLite, inference-side weight update could happen **in the middle of** generation
in AReaL-lite, inference-side weight update could happen **in the middle of** generation
of one prompt. With that being said, one output sequence could be generated by multiple
versions of models. Let us glimpse into code of `agenerate` and `update_weights_async`
for a better understanding.
@ -470,7 +470,7 @@ actor.set_version(global_step + 1)
rollout.set_version(global_step + 1)
```
Now a complete GRPO training step in AReaLite is done! The core logic of our example
Now a complete GRPO training step in AReaL-lite is done! The core logic of our example
training script can be summarized as:
```python
@ -508,8 +508,8 @@ for global_step in range(max_steps):
## Utilities
In AReaLite, we provide a wide range of utilities for basic functionalities required for
observing and tuning your experiments.
In AReaL-lite, we provide a wide range of utilities for basic functionalities required
for observing and tuning your experiments.
### `Saver` and `Evaluator`

View File

@ -1,17 +1,17 @@
# Rollout and Agentic RL
This guide shows you how to create custom rollout behaviors for RL training by building
a multi-turn math agent with **AReaLite**. This agent keeps trying to solve math
a multi-turn math agent with **AReaL-lite**. This agent keeps trying to solve math
problems until it finds the correct answer.
You can find the complete implementation in `arealite/workflow/multi_turn.py`.
## Step 1: Define Your Workflow
AReaLite gives you flexibility in how you design your agents to run **an episode**. **An
episode** defines how your agent rollouts a complete training sample from an input
AReaL-lite gives you flexibility in how you design your agents to run **an episode**.
**An episode** defines how your agent rollouts a complete training sample from an input
prompt, using tools, reward functions, and (multi-turn) generation. Instead of rigid
`Agent` classes that might constrain your agent's capabilities, AReaLite captures all
`Agent` classes that might constrain your agent's capabilities, AReaL-lite captures all
rollout behavior in a `RolloutWorkflow` class. This approach allows you to customize
your agent's behavior however you need.
@ -222,7 +222,7 @@ class MultiTurnWorkflow(RolloutWorkflow):
```
> **Important**: The returned `TensorDict` must follow HuggingFace's padded data format,
> where each tensor has shape `[batch_size, sequence_length, *]`. This allows AReaLite
> where each tensor has shape `[batch_size, sequence_length, *]`. This allows AReaL-lite
> to automatically batch multiple trajectories for training. Since this example returns
> a single trajectory, we use `unsqueeze(0)` to create a batch of size 1.

View File

@ -3,7 +3,7 @@
> **Note**: We recommend the user to first read the
> [agent customization guide](agent.md).
**AReaLite** structures RL algorithms around two core components:
**AReaL-lite** structures RL algorithms around two core components:
- **RolloutWorkflow**: Defines what data to generate during rollouts
- **TrainEngine**: Defines how to process the generated data for training
@ -108,7 +108,7 @@ def reinforce_loss_fn(logits, data):
```
```{note}
To decrease memory usage, AReaLite automatically packs multiple sequences in an 1D tensor before forward passes. Hence, the loss function should assume handling 1D *packed* tensors instead of *padded* tensors.
To decrease memory usage, AReaL-lite automatically packs multiple sequences in an 1D tensor before forward passes. Hence, the loss function should assume handling 1D *packed* tensors instead of *padded* tensors.
```
Next, we implement the training engine. We use a two-class design to maintain backend

View File

@ -1,6 +1,6 @@
# Dataset
**AReaLite** directly integrates with the `Dataset` class from the HuggingFace
**AReaL-lite** directly integrates with the `Dataset` class from the HuggingFace
`datasets` package. This gives you full flexibility to load, process, and filter your
data before training.

View File

@ -1,6 +1,6 @@
# Rollout and Agentic RL (Legacy)
> **Note**: While this legacy approach works, we strongly recommend using the AReaLite
> **Note**: While this legacy approach works, we strongly recommend using the AReaL-lite
> for new projects. It provides better flexibility, cleaner abstractions, and easier
> maintenance.

View File

@ -1,7 +1,7 @@
# Training Algorithm (Legacy)
> **Note**: The AReaLite approach is more recommended for new implementations due to its
> cleaner separation of concerns and better maintainability.
> **Note**: The AReaL-lite approach is more recommended for new implementations due to
> its cleaner separation of concerns and better maintainability.
The legacy approach encapsulates algorithms in a `ModelInterface` with three core
methods:

View File

@ -1,6 +1,6 @@
# Dataset (Legacy)
> **Note**: While this legacy approach works, we strongly recommend using the AReaLite
> **Note**: While this legacy approach works, we strongly recommend using the AReaL-lite
> for new projects. It provides better flexibility, cleaner abstractions, and easier
> maintenance.

View File

@ -10,31 +10,36 @@ The following hardware configuration has been extensively tested:
- **CPU**: 64 cores per node
- **Memory**: 1TB per node
- **Network**: NVSwitch + RoCE 3.2 Tbps
- **Storage**:
- **Storage**:
- 1TB local storage for single-node experiments
- 10TB shared storage (NAS) for distributed experiments
### Software Requirements
| Component | Version |
|---|:---:|
| Operating System | CentOS 7 / Ubuntu 22.04 or any system meeting the requirements below |
| NVIDIA Driver | 550.127.08 |
| CUDA | 12.8 |
| Git LFS | Required for downloading models, datasets, and AReaL code. See [installation guide](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) |
| Docker | 27.5.1 |
| NVIDIA Container Toolkit | See [installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) |
| AReaL Image | `ghcr.io/inclusionai/areal-runtime:v0.3.0.post1` (includes runtime dependencies and Ray components) |
| Component | Version |
| ------------------------ | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| Operating System | CentOS 7 / Ubuntu 22.04 or any system meeting the requirements below |
| NVIDIA Driver | 550.127.08 |
| CUDA | 12.8 |
| Git LFS | Required for downloading models, datasets, and AReaL code. See [installation guide](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) |
| Docker | 27.5.1 |
| NVIDIA Container Toolkit | See [installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) |
| AReaL Image | `ghcr.io/inclusionai/areal-runtime:v0.3.0.post2` (includes runtime dependencies and Ray components) |
**Note**: This tutorial does not cover the installation of NVIDIA Drivers, CUDA, or shared storage mounting, as these depend on your specific node configuration and system version. Please complete these installations independently.
**Note**: This tutorial does not cover the installation of NVIDIA Drivers, CUDA, or
shared storage mounting, as these depend on your specific node configuration and system
version. Please complete these installations independently.
## Runtime Environment
**For multi-node training**: Ensure a shared storage path is mounted on every node (and mounted to the container if you are using Docker). This path will be used to save checkpoints and logs.
**For multi-node training**: Ensure a shared storage path is mounted on every node (and
mounted to the container if you are using Docker). This path will be used to save
checkpoints and logs.
### Option 1: Docker (Recommended)
We recommend using Docker with our provided image. The Dockerfile is available in the top-level directory of the AReaL repository.
We recommend using Docker with our provided image. The Dockerfile is available in the
top-level directory of the AReaL repository.
```bash
docker pull ghcr.io/inclusionai/areal-runtime:v0.3.0.post1
@ -50,9 +55,10 @@ bash examples/env/scripts/setup-container-deps.sh
### Option 2: Custom Environment Installation
1. Install [Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/install) or [Anaconda](https://www.anaconda.com/docs/getting-started/anaconda/install).
1. Install [Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/install)
or [Anaconda](https://www.anaconda.com/docs/getting-started/anaconda/install).
2. Create a conda virtual environment:
1. Create a conda virtual environment:
```bash
conda create -n areal python=3.12
@ -67,9 +73,11 @@ cd AReaL
bash examples/env/scripts/setup-pip-deps.sh
```
<!-- NO SGLang patch now
::::{note}
The SGLang patch is applied via `examples/env/scripts/setup-container-deps.sh` or `examples/env/scripts/setup-pip-deps.sh`. To confirm whether it has been applied, run `git status` in the `/sglang` directory (for Docker) or `AReaL/sglang` (for custom setups).
::::
-->
## (Optional) Launch Ray Cluster for Distributed Training
@ -89,8 +97,9 @@ ray start --address $RAY_HEAD_IP
You should see the Ray resource status displayed when running `ray status`.
Properly set the `n_nodes` argument in AReaL's training command, then AReaL's training script will automatically detect the resources and allocate workers to the cluster.
Properly set the `n_nodes` argument in AReaL's training command, then AReaL's training
script will automatically detect the resources and allocate workers to the cluster.
## Next Steps
Check the [quickstart section](quickstart.md) to launch your first AReaL job.
Check the [quickstart section](quickstart.md) to launch your first AReaL job.

View File

@ -1,7 +1,7 @@
# Quickstart
Welcome to the **AReaLite** Quickstart Guide! This guide demonstrates how to run an
AReaLite experiment training an LLM on the GSM8K dataset using the GRPO algorithm with
Welcome to the **AReaL-lite** Quickstart Guide! This guide demonstrates how to run an
AReaL-lite experiment training an LLM on the GSM8K dataset using the GRPO algorithm with
function-based rewards. Ensure you've completed
[the installation and environment setup](installation.md) before proceeding.
@ -56,14 +56,14 @@ python3 -m arealite.launcher.local examples/arealite/gsm8k_grpo.py \
+sglang.attention_backend=triton
```
::::{important} We're currently refactoring from legacy AReaL to AReaLite, which
::::{important} We're currently refactoring from legacy AReaL to AReaL-lite, which
introduces some configuration differences. We provide a **config converter** to transfer
old AReaL config into AReaLite YAML file for users' convenience. [Click here](xxx) to
old AReaL config into AReaL-lite YAML file for users' convenience. [Click here](xxx) to
learn how to use the **config converter**. ::::
## Distributed Experiments with Ray or Slurm
AReaLite provides standalone launchers for distributed experiments. After setting up
AReaL-lite provides standalone launchers for distributed experiments. After setting up
your Ray or Slurm cluster, launch experiments similarly to `LocalLauncher`:
```
@ -109,7 +109,7 @@ Additional references:
## Next Steps
Check [Getting Started with AReaLite](../arealite/gsm8k_grpo.md) for a complete code
Check [Getting Started with AReaL-lite](../arealite/gsm8k_grpo.md) for a complete code
walkthrough on the GRPO GSM8K Example.
Customization guides:

View File

@ -1,8 +1,8 @@
# Quickstart (Legacy)
> **Note**: This is a quickstart guide for launching AReaL experiment with legacy code
> in `realhf/`. We strongly recommend users to try AReaLite for better experiences.
> [Click here](quickstart.md) for AReaLite quickstart guide!
> in `realhf/`. We strongly recommend users to try AReaL-lite for better experiences.
> [Click here](quickstart.md) for AReaL-lite quickstart guide!
This guide walks you through a simple example of training an LLM to solve math problems.
Please ensure you have properly