mirror of https://github.com/inclusionAI/AReaL
change name to AReaL-lite
This commit is contained in:
parent
3a97f06be1
commit
04b26f42bb
|
@ -1,4 +1,4 @@
|
|||
name: Test AReaLite
|
||||
name: Test AReaL-lite
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
|
@ -12,7 +12,7 @@ on:
|
|||
jobs:
|
||||
test-arealite:
|
||||
environment:
|
||||
name: AReaLite-unittests
|
||||
name: AReaL-lite-unittests
|
||||
runs-on: ubuntu-latest
|
||||
concurrency:
|
||||
group: test-arealite
|
||||
|
|
76
README.md
76
README.md
|
@ -20,12 +20,12 @@ like how you enjoy real-world milk tea (cheers).
|
|||
|
||||
**AReaL Highlights**
|
||||
|
||||
- ⚡ <span style="color: red; font-weight: bold;">**\[NEW\] AReaLite:**</span> Our new
|
||||
release AReaLite is a **light-weight** and **AI-centric** codebase that prioritizes
|
||||
better development experiences for AI researchers. As a result, AReaLite delivers most
|
||||
AReaL functionalities while maintains its high performance with much fewer lines of
|
||||
code. This allows users to build their own **agentic** and **RLVR** training workflows
|
||||
with minimal effort.
|
||||
- ⚡ <span style="color: red; font-weight: bold;">**\[NEW\] AReaL-lite:**</span> Our new
|
||||
release AReaL-lite is a **light-weight** and **AI-centric** codebase that prioritizes
|
||||
better development experiences for AI researchers. As a result, AReaL-lite delivers
|
||||
most AReaL functionalities while maintains its high performance with much fewer lines
|
||||
of code. This allows users to build their own **agentic** and **RLVR** training
|
||||
workflows with minimal effort.
|
||||
- 🔥 **Asynchronous RL**: With algorithm-system co-design, AReaL supports fully
|
||||
asynchronous RL for **the fastest training speed**! Experimental support for
|
||||
multi-turn agentic RL is also provided.
|
||||
|
@ -38,14 +38,14 @@ like how you enjoy real-world milk tea (cheers).
|
|||
|
||||
## News
|
||||
|
||||
**\[2025/07/31\] (AReaLite)** We introduce AReaLite, a **light-weight** version of AReaL
|
||||
designed specifically for AI researchers and rapid prototyping. AReaLite features an
|
||||
**AI-centric** API design that prioritizes ease of use and algorithm development, while
|
||||
inherently supporting fully asynchronous **agentic RL**. With 80% fewer lines of code,
|
||||
AReaLite maintains 90% of AReaL's high performance and core functionality. Check out
|
||||
[our AReaLite design doc](/arealite/README.md) and
|
||||
**\[2025/07/31\] (AReaL-lite)** We introduce AReaL-lite, a **light-weight** version of
|
||||
AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite
|
||||
features an **AI-centric** API design that prioritizes ease of use and algorithm
|
||||
development, while inherently supporting fully asynchronous **agentic RL**. With 80%
|
||||
fewer lines of code, AReaL-lite maintains 90% of AReaL's high performance and core
|
||||
functionality. Check out [our AReaL-lite design doc](/arealite/README.md) and
|
||||
[the quickstart guide](/docs/tutorial/quickstart.md) to begin your journey with
|
||||
**AReaLite**!
|
||||
**AReaL-lite**!
|
||||
|
||||
**\[2025/06/03\] (v0.3, boba²)** We release **boba²** (double-boba) for fully
|
||||
asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or
|
||||
|
@ -62,20 +62,20 @@ SOTA 7B and 32B models on math reasoning. Check our
|
|||
**\[2025/02/24\] (v0.1)** Our initial release includes reproducible results for 1.5B and
|
||||
7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
|
||||
|
||||
## AReaLite Release Highlights
|
||||
## AReaL-lite Release Highlights
|
||||
|
||||
New highlights in AReaLite:
|
||||
New highlights in AReaL-lite:
|
||||
|
||||
- Instead of the *system-centric* architecture in old AReaL, AReaLite follows an
|
||||
- Instead of the *system-centric* architecture in old AReaL, AReaL-lite follows an
|
||||
**AI-centric** API design that aims to provide the following key features:
|
||||
|
||||
- **Light-weight** & **easy-to-write** algorithm and training workflow customization.
|
||||
- **Easy to scale up** without knowing system and infrastructure details.
|
||||
- **Adaptable and plugable:** Smooth to integrate with other modern AI applications.
|
||||
|
||||
These features make AReaLite easy for AI researchers to adopt, understand, and develop
|
||||
effectively and efficiently. To learn more about the design principles of AReaL,
|
||||
please read the [AReaLite design doc](/arealite/README.md)!
|
||||
These features make AReaL-lite easy for AI researchers to adopt, understand, and
|
||||
develop effectively and efficiently. To learn more about the design principles of
|
||||
AReaL, please read the [AReaL-lite design doc](/arealite/README.md)!
|
||||
|
||||
- A much more *light-weight* codebase compared to old AReaL codebase with only **20%** #
|
||||
lines of code, with a detailed [code walkthrough](/docs/arealite/gsm8k_grpo.md) on an
|
||||
|
@ -94,10 +94,10 @@ Good old stuff from AReaL:
|
|||
- A single command line to launch an experiment, no matter on a single node or a
|
||||
large-scale distributed cluster.
|
||||
|
||||
Now, let us run an example experiment with AReaLite following the quickstart guide
|
||||
Now, let us run an example experiment with AReaL-lite following the quickstart guide
|
||||
below!
|
||||
|
||||
## Getting Started with AReaLite
|
||||
## Getting Started with AReaL-lite
|
||||
|
||||
Our training scripts will automatically download the dataset (openai/gsm8k) and model
|
||||
(Qwen/Qwen2-1.5B-Instruct). On a single node, runs:
|
||||
|
@ -125,12 +125,12 @@ python3 -m arealite.launcher.local examples/arealite/eval.py --config examples/a
|
|||
```
|
||||
-->
|
||||
|
||||
For more detailed guide on how to run experiments in AReaLite, please check out
|
||||
For more detailed guide on how to run experiments in AReaL-lite, please check out
|
||||
[our quickstart guide](/docs/tutorial/quickstart.md)!
|
||||
|
||||
## Switching from legacy AReaL to AReaLite
|
||||
## Switching from legacy AReaL to AReaL-lite
|
||||
|
||||
We also provide a convenient script to convert your AReaL YAML config into AReaLite
|
||||
We also provide a convenient script to convert your AReaL YAML config into AReaL-lite
|
||||
config in one command line. First you need to locate your AReaL config either modified
|
||||
from files from `examples` folder, or generated when you run your experiments in
|
||||
`<fileroot>/<expr_name>/<trial_name>` folder. Runs:
|
||||
|
@ -139,17 +139,17 @@ from files from `examples` folder, or generated when you run your experiments in
|
|||
python3 examples/arealite/convert_config.py -f <config_path> -o <output_path>
|
||||
```
|
||||
|
||||
Then you should be able to run experiments with your old settings on AReaLite!
|
||||
Then you should be able to run experiments with your old settings on AReaL-lite!
|
||||
|
||||
## AReaLite vs legacy AReaL
|
||||
## AReaL-lite vs legacy AReaL
|
||||
|
||||
AReaLite is an initiative to fully refactor AReaL, addressing historical issues such as
|
||||
redundant code and unnecessary system-level abstractions. Currently, AReaLite provides a
|
||||
lightweight codebase that enables fast prototyping for new RL training workflows and
|
||||
algorithms on a relatively small scale. For large-scale experiments (1K+ GPUs), we
|
||||
recommend using the battle-tested legacy AReaL to ensure stability. In the future, we
|
||||
will continue developing AReaLite by expanding its APIs, migrating legacy features,
|
||||
introducing new functionality, and validating the system through large-scale
|
||||
AReaL-lite is an initiative to fully refactor AReaL, addressing historical issues such
|
||||
as redundant code and unnecessary system-level abstractions. Currently, AReaL-lite
|
||||
provides a lightweight codebase that enables fast prototyping for new RL training
|
||||
workflows and algorithms on a relatively small scale. For large-scale experiments (1K+
|
||||
GPUs), we recommend using the battle-tested legacy AReaL to ensure stability. In the
|
||||
future, we will continue developing AReaL-lite by expanding its APIs, migrating legacy
|
||||
features, introducing new functionality, and validating the system through large-scale
|
||||
experiments.
|
||||
|
||||
## Resources
|
||||
|
@ -160,17 +160,17 @@ experiments.
|
|||
### Quickstart
|
||||
|
||||
- [Installation](https://inclusionai.github.io/AReaL/tutorial/installation.html)
|
||||
- [AReaLite Quickstart](/docs/tutorial/quickstart.md)
|
||||
- [AReaL-lite Quickstart](/docs/tutorial/quickstart.md)
|
||||
|
||||
### Code Walkthrough
|
||||
|
||||
- [Running GRPO on GSM8K dataset with AReaLite](/docs/arealite/gsm8k_grpo.md)
|
||||
- [Running GRPO on GSM8K dataset with AReaL-lite](/docs/arealite/gsm8k_grpo.md)
|
||||
|
||||
### Customization
|
||||
|
||||
- [Customize dataset with AReaLite](../customization/dataset.md)
|
||||
- [Customize Agentic/RVLR rollout workflows with AReaLite](../customization/agent.md)
|
||||
- [Customize algorithms with AReaLite](../customization/algorithm.md)
|
||||
- [Customize dataset with AReaL-lite](../customization/dataset.md)
|
||||
- [Customize Agentic/RVLR rollout workflows with AReaL-lite](../customization/agent.md)
|
||||
- [Customize algorithms with AReaL-lite](../customization/algorithm.md)
|
||||
|
||||
### AReaL Legacy
|
||||
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
# AReaLite Design Doc
|
||||
# AReaL-lite Design Doc
|
||||
|
||||
## TL;DR
|
||||
|
||||
Follow our [step-by-step code walk-through](../docs/arealite/gsm8k_grpo.md) to
|
||||
immediately get started with AReaLite!
|
||||
immediately get started with AReaL-lite!
|
||||
|
||||
## Motivation
|
||||
|
||||
|
@ -27,7 +27,7 @@ Graph). To customize a training workflow, researchers first need to understand t
|
|||
system-level concepts. Then they are forced to find code to modify, which is scattered
|
||||
around in the codebase. It is also nearly impossible to exploit packages like `datasets`
|
||||
since it is not compatible with the workers. This gap is the core motivation behind
|
||||
AReaLite: rebuilding AReaL with an AI-centric architecture and APIs.
|
||||
AReaL-lite: rebuilding AReaL with an AI-centric architecture and APIs.
|
||||
|
||||
Beyond architectural concerns, AReaL suffers from accumulated technical debt. The
|
||||
codebase contains substantial legacy code inherited from previous projects that no
|
||||
|
@ -40,13 +40,13 @@ possible to achieve comparable efficiency with significantly fewer lines of code
|
|||
presents an ideal opportunity to redesign the API and distill the massive codebase into
|
||||
something clean and maintainable. Rather than pursuing maximum efficiency, our goal is
|
||||
to deliver 90% of AReaL's functionality while dramatically reducing code complexity and
|
||||
user burden. This philosophy drives AReaLite — the lightweight version of AReaL.
|
||||
user burden. This philosophy drives AReaL-lite — the lightweight version of AReaL.
|
||||
|
||||
AReaLite serves as the first phase in AReaL's broader refactoring initiative. It
|
||||
AReaL-lite serves as the first phase in AReaL's broader refactoring initiative. It
|
||||
functions both as a standalone training library with intuitive interfaces and as the
|
||||
foundation for AReaL's future core API definitions. The plan is to transform AReaL's
|
||||
current worker-based architecture into an AI-centric architecture similar to AReaLite,
|
||||
where AReaL will **extend** AReaLite's APIs and implementations to support additional
|
||||
current worker-based architecture into an AI-centric architecture similar to AReaL-lite,
|
||||
where AReaL will **extend** AReaL-lite's APIs and implementations to support additional
|
||||
backends for efficient large-scale training.
|
||||
|
||||
## Design Principles
|
||||
|
@ -80,13 +80,13 @@ arealite/
|
|||
|
||||
### Component Overview
|
||||
|
||||
The AReaLite codebase is structured into four distinct layers: the API layer, backend
|
||||
The AReaL-lite codebase is structured into four distinct layers: the API layer, backend
|
||||
layer, customization layer, and entry point layer. As illustrated in the figure below,
|
||||
workflow and algorithm customization logic resides in separate layers above the backend.
|
||||
We prioritize keeping the entry point and customization layers clean and intuitive,
|
||||
isolating them from the complex backend implementation. With AReaLite, users can define
|
||||
their custom training workflows and algorithms entirely within a single entry point
|
||||
file.
|
||||
isolating them from the complex backend implementation. With AReaL-lite, users can
|
||||
define their custom training workflows and algorithms entirely within a single entry
|
||||
point file.
|
||||
|
||||

|
||||
|
||||
|
|
|
@ -11,7 +11,7 @@ parts:
|
|||
- file: tutorial/quickstart_legacy
|
||||
- file: tutorial/eval
|
||||
- file: tutorial/troubleshooting
|
||||
- caption: Getting Started with AReaLite
|
||||
- caption: Getting Started with AReaL-lite
|
||||
chapters:
|
||||
- file: arealite/gsm8k_grpo
|
||||
- caption: References
|
||||
|
|
|
@ -1,17 +1,17 @@
|
|||
# Running GRPO on GSM8K Dataset
|
||||
|
||||
This guide introduces how AReaLite runs the GRPO algorithm on the GSM8K dataset, using
|
||||
This guide introduces how AReaL-lite runs the GRPO algorithm on the GSM8K dataset, using
|
||||
the training script
|
||||
[examples/arealite/gsm8k_grpo.py](../../examples/arealite/gsm8k_grpo.py) and
|
||||
configuration file
|
||||
[examples/arealite/configs/gsm8k_grpo.yaml](../../examples/arealite/configs/gsm8k_grpo.yaml).
|
||||
|
||||
## How AReaLite Works
|
||||
## How AReaL-lite Works
|
||||
|
||||
The following figure illustrates the launching and one asynchronous training step of the
|
||||
GRPO algorithm on the GSM8K dataset on AReaLite. Compared with the old AReaL
|
||||
implementation, AReaLite runs inference servers and a SPMD training script instead of a
|
||||
bunch of various workers. In a training step, AReaLite:
|
||||
GRPO algorithm on the GSM8K dataset on AReaL-lite. Compared with the old AReaL
|
||||
implementation, AReaL-lite runs inference servers and a SPMD training script instead of
|
||||
a bunch of various workers. In a training step, AReaL-lite:
|
||||
|
||||
1. Submits prompts from the dataset to `RemoteSGLangEngine`, who runs `RLVRWorkflow` in
|
||||
a streaming manner.
|
||||
|
@ -29,7 +29,7 @@ show you how these steps are done in details.
|
|||
|
||||
## Launching the Experiment
|
||||
|
||||
As shown in the [quickstart guide](../tutorial/quickstart.md), experiments in AReaLite
|
||||
As shown in the [quickstart guide](../tutorial/quickstart.md), experiments in AReaL-lite
|
||||
are launched using standalone launchers with the following commands:
|
||||
|
||||
```
|
||||
|
@ -41,7 +41,7 @@ python -m arealite.launcher.ray <training script> --config <configuration file>
|
|||
python -m arealite.launcher.slurm <training script> --config <configuration file> <cli args>
|
||||
```
|
||||
|
||||
In AReaLite:
|
||||
In AReaL-lite:
|
||||
|
||||
- The **training script** is an SPMD python script that serves as the experiment entry
|
||||
point.
|
||||
|
@ -111,14 +111,14 @@ details.
|
|||
|
||||
### Inference Engine: `RemoteSGLangEngine`
|
||||
|
||||
In AReaLite, generation tasks are offloaded to remote inference servers, which operate
|
||||
In AReaL-lite, generation tasks are offloaded to remote inference servers, which operate
|
||||
on separate GPUs from those used for training. The `RemoteSGLangEngine` acts as a client
|
||||
that interacts with the servers. `RemoteSGLangEngine` runs in a SPMD manner on every
|
||||
training process, without occupying any GPUs.
|
||||
|
||||
`RemoteSGLangEngine` provides two core APIs that access the remote servers, `agenerate`
|
||||
and `update_weights_async`. It is worth mentioning that, in asynchronous RL experiment
|
||||
in AReaLite, inference-side weight update could happen **in the middle of** generation
|
||||
in AReaL-lite, inference-side weight update could happen **in the middle of** generation
|
||||
of one prompt. With that being said, one output sequence could be generated by multiple
|
||||
versions of models. Let us glimpse into code of `agenerate` and `update_weights_async`
|
||||
for a better understanding.
|
||||
|
@ -470,7 +470,7 @@ actor.set_version(global_step + 1)
|
|||
rollout.set_version(global_step + 1)
|
||||
```
|
||||
|
||||
Now a complete GRPO training step in AReaLite is done! The core logic of our example
|
||||
Now a complete GRPO training step in AReaL-lite is done! The core logic of our example
|
||||
training script can be summarized as:
|
||||
|
||||
```python
|
||||
|
@ -508,8 +508,8 @@ for global_step in range(max_steps):
|
|||
|
||||
## Utilities
|
||||
|
||||
In AReaLite, we provide a wide range of utilities for basic functionalities required for
|
||||
observing and tuning your experiments.
|
||||
In AReaL-lite, we provide a wide range of utilities for basic functionalities required
|
||||
for observing and tuning your experiments.
|
||||
|
||||
### `Saver` and `Evaluator`
|
||||
|
||||
|
|
|
@ -1,17 +1,17 @@
|
|||
# Rollout and Agentic RL
|
||||
|
||||
This guide shows you how to create custom rollout behaviors for RL training by building
|
||||
a multi-turn math agent with **AReaLite**. This agent keeps trying to solve math
|
||||
a multi-turn math agent with **AReaL-lite**. This agent keeps trying to solve math
|
||||
problems until it finds the correct answer.
|
||||
|
||||
You can find the complete implementation in `arealite/workflow/multi_turn.py`.
|
||||
|
||||
## Step 1: Define Your Workflow
|
||||
|
||||
AReaLite gives you flexibility in how you design your agents to run **an episode**. **An
|
||||
episode** defines how your agent rollouts a complete training sample from an input
|
||||
AReaL-lite gives you flexibility in how you design your agents to run **an episode**.
|
||||
**An episode** defines how your agent rollouts a complete training sample from an input
|
||||
prompt, using tools, reward functions, and (multi-turn) generation. Instead of rigid
|
||||
`Agent` classes that might constrain your agent's capabilities, AReaLite captures all
|
||||
`Agent` classes that might constrain your agent's capabilities, AReaL-lite captures all
|
||||
rollout behavior in a `RolloutWorkflow` class. This approach allows you to customize
|
||||
your agent's behavior however you need.
|
||||
|
||||
|
@ -222,7 +222,7 @@ class MultiTurnWorkflow(RolloutWorkflow):
|
|||
```
|
||||
|
||||
> **Important**: The returned `TensorDict` must follow HuggingFace's padded data format,
|
||||
> where each tensor has shape `[batch_size, sequence_length, *]`. This allows AReaLite
|
||||
> where each tensor has shape `[batch_size, sequence_length, *]`. This allows AReaL-lite
|
||||
> to automatically batch multiple trajectories for training. Since this example returns
|
||||
> a single trajectory, we use `unsqueeze(0)` to create a batch of size 1.
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
> **Note**: We recommend the user to first read the
|
||||
> [agent customization guide](agent.md).
|
||||
|
||||
**AReaLite** structures RL algorithms around two core components:
|
||||
**AReaL-lite** structures RL algorithms around two core components:
|
||||
|
||||
- **RolloutWorkflow**: Defines what data to generate during rollouts
|
||||
- **TrainEngine**: Defines how to process the generated data for training
|
||||
|
@ -108,7 +108,7 @@ def reinforce_loss_fn(logits, data):
|
|||
```
|
||||
|
||||
```{note}
|
||||
To decrease memory usage, AReaLite automatically packs multiple sequences in an 1D tensor before forward passes. Hence, the loss function should assume handling 1D *packed* tensors instead of *padded* tensors.
|
||||
To decrease memory usage, AReaL-lite automatically packs multiple sequences in an 1D tensor before forward passes. Hence, the loss function should assume handling 1D *packed* tensors instead of *padded* tensors.
|
||||
```
|
||||
|
||||
Next, we implement the training engine. We use a two-class design to maintain backend
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Dataset
|
||||
|
||||
**AReaLite** directly integrates with the `Dataset` class from the HuggingFace
|
||||
**AReaL-lite** directly integrates with the `Dataset` class from the HuggingFace
|
||||
`datasets` package. This gives you full flexibility to load, process, and filter your
|
||||
data before training.
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Rollout and Agentic RL (Legacy)
|
||||
|
||||
> **Note**: While this legacy approach works, we strongly recommend using the AReaLite
|
||||
> **Note**: While this legacy approach works, we strongly recommend using the AReaL-lite
|
||||
> for new projects. It provides better flexibility, cleaner abstractions, and easier
|
||||
> maintenance.
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# Training Algorithm (Legacy)
|
||||
|
||||
> **Note**: The AReaLite approach is more recommended for new implementations due to its
|
||||
> cleaner separation of concerns and better maintainability.
|
||||
> **Note**: The AReaL-lite approach is more recommended for new implementations due to
|
||||
> its cleaner separation of concerns and better maintainability.
|
||||
|
||||
The legacy approach encapsulates algorithms in a `ModelInterface` with three core
|
||||
methods:
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Dataset (Legacy)
|
||||
|
||||
> **Note**: While this legacy approach works, we strongly recommend using the AReaLite
|
||||
> **Note**: While this legacy approach works, we strongly recommend using the AReaL-lite
|
||||
> for new projects. It provides better flexibility, cleaner abstractions, and easier
|
||||
> maintenance.
|
||||
|
||||
|
|
|
@ -17,24 +17,29 @@ The following hardware configuration has been extensively tested:
|
|||
### Software Requirements
|
||||
|
||||
| Component | Version |
|
||||
|---|:---:|
|
||||
| ------------------------ | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
||||
| Operating System | CentOS 7 / Ubuntu 22.04 or any system meeting the requirements below |
|
||||
| NVIDIA Driver | 550.127.08 |
|
||||
| CUDA | 12.8 |
|
||||
| Git LFS | Required for downloading models, datasets, and AReaL code. See [installation guide](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) |
|
||||
| Docker | 27.5.1 |
|
||||
| NVIDIA Container Toolkit | See [installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) |
|
||||
| AReaL Image | `ghcr.io/inclusionai/areal-runtime:v0.3.0.post1` (includes runtime dependencies and Ray components) |
|
||||
| AReaL Image | `ghcr.io/inclusionai/areal-runtime:v0.3.0.post2` (includes runtime dependencies and Ray components) |
|
||||
|
||||
**Note**: This tutorial does not cover the installation of NVIDIA Drivers, CUDA, or shared storage mounting, as these depend on your specific node configuration and system version. Please complete these installations independently.
|
||||
**Note**: This tutorial does not cover the installation of NVIDIA Drivers, CUDA, or
|
||||
shared storage mounting, as these depend on your specific node configuration and system
|
||||
version. Please complete these installations independently.
|
||||
|
||||
## Runtime Environment
|
||||
|
||||
**For multi-node training**: Ensure a shared storage path is mounted on every node (and mounted to the container if you are using Docker). This path will be used to save checkpoints and logs.
|
||||
**For multi-node training**: Ensure a shared storage path is mounted on every node (and
|
||||
mounted to the container if you are using Docker). This path will be used to save
|
||||
checkpoints and logs.
|
||||
|
||||
### Option 1: Docker (Recommended)
|
||||
|
||||
We recommend using Docker with our provided image. The Dockerfile is available in the top-level directory of the AReaL repository.
|
||||
We recommend using Docker with our provided image. The Dockerfile is available in the
|
||||
top-level directory of the AReaL repository.
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/inclusionai/areal-runtime:v0.3.0.post1
|
||||
|
@ -50,9 +55,10 @@ bash examples/env/scripts/setup-container-deps.sh
|
|||
|
||||
### Option 2: Custom Environment Installation
|
||||
|
||||
1. Install [Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/install) or [Anaconda](https://www.anaconda.com/docs/getting-started/anaconda/install).
|
||||
1. Install [Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/install)
|
||||
or [Anaconda](https://www.anaconda.com/docs/getting-started/anaconda/install).
|
||||
|
||||
2. Create a conda virtual environment:
|
||||
1. Create a conda virtual environment:
|
||||
|
||||
```bash
|
||||
conda create -n areal python=3.12
|
||||
|
@ -67,9 +73,11 @@ cd AReaL
|
|||
bash examples/env/scripts/setup-pip-deps.sh
|
||||
```
|
||||
|
||||
<!-- NO SGLang patch now
|
||||
::::{note}
|
||||
The SGLang patch is applied via `examples/env/scripts/setup-container-deps.sh` or `examples/env/scripts/setup-pip-deps.sh`. To confirm whether it has been applied, run `git status` in the `/sglang` directory (for Docker) or `AReaL/sglang` (for custom setups).
|
||||
::::
|
||||
-->
|
||||
|
||||
## (Optional) Launch Ray Cluster for Distributed Training
|
||||
|
||||
|
@ -89,7 +97,8 @@ ray start --address $RAY_HEAD_IP
|
|||
|
||||
You should see the Ray resource status displayed when running `ray status`.
|
||||
|
||||
Properly set the `n_nodes` argument in AReaL's training command, then AReaL's training script will automatically detect the resources and allocate workers to the cluster.
|
||||
Properly set the `n_nodes` argument in AReaL's training command, then AReaL's training
|
||||
script will automatically detect the resources and allocate workers to the cluster.
|
||||
|
||||
## Next Steps
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# Quickstart
|
||||
|
||||
Welcome to the **AReaLite** Quickstart Guide! This guide demonstrates how to run an
|
||||
AReaLite experiment training an LLM on the GSM8K dataset using the GRPO algorithm with
|
||||
Welcome to the **AReaL-lite** Quickstart Guide! This guide demonstrates how to run an
|
||||
AReaL-lite experiment training an LLM on the GSM8K dataset using the GRPO algorithm with
|
||||
function-based rewards. Ensure you've completed
|
||||
[the installation and environment setup](installation.md) before proceeding.
|
||||
|
||||
|
@ -56,14 +56,14 @@ python3 -m arealite.launcher.local examples/arealite/gsm8k_grpo.py \
|
|||
+sglang.attention_backend=triton
|
||||
```
|
||||
|
||||
::::{important} We're currently refactoring from legacy AReaL to AReaLite, which
|
||||
::::{important} We're currently refactoring from legacy AReaL to AReaL-lite, which
|
||||
introduces some configuration differences. We provide a **config converter** to transfer
|
||||
old AReaL config into AReaLite YAML file for users' convenience. [Click here](xxx) to
|
||||
old AReaL config into AReaL-lite YAML file for users' convenience. [Click here](xxx) to
|
||||
learn how to use the **config converter**. ::::
|
||||
|
||||
## Distributed Experiments with Ray or Slurm
|
||||
|
||||
AReaLite provides standalone launchers for distributed experiments. After setting up
|
||||
AReaL-lite provides standalone launchers for distributed experiments. After setting up
|
||||
your Ray or Slurm cluster, launch experiments similarly to `LocalLauncher`:
|
||||
|
||||
```
|
||||
|
@ -109,7 +109,7 @@ Additional references:
|
|||
|
||||
## Next Steps
|
||||
|
||||
Check [Getting Started with AReaLite](../arealite/gsm8k_grpo.md) for a complete code
|
||||
Check [Getting Started with AReaL-lite](../arealite/gsm8k_grpo.md) for a complete code
|
||||
walkthrough on the GRPO GSM8K Example.
|
||||
|
||||
Customization guides:
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
# Quickstart (Legacy)
|
||||
|
||||
> **Note**: This is a quickstart guide for launching AReaL experiment with legacy code
|
||||
> in `realhf/`. We strongly recommend users to try AReaLite for better experiences.
|
||||
> [Click here](quickstart.md) for AReaLite quickstart guide!
|
||||
> in `realhf/`. We strongly recommend users to try AReaL-lite for better experiences.
|
||||
> [Click here](quickstart.md) for AReaL-lite quickstart guide!
|
||||
|
||||
This guide walks you through a simple example of training an LLM to solve math problems.
|
||||
Please ensure you have properly
|
||||
|
|
Loading…
Reference in New Issue