This commit is contained in:
晓雷 2025-07-30 15:46:50 +08:00
parent 7f876e7182
commit e6bf47f7b0
4 changed files with 50 additions and 46 deletions

View File

@ -20,10 +20,11 @@ like how you enjoy real-world milk tea (cheers).
**AReaL Highlights**
- ⚡ **\[NEW\] Light-weight & AI-centric:** In our new release AReaLite, we deliver
**90%** of AReaL functionalities with only **20%** # lines of code! AReaLite also
follows an **AI-centric** design that make users build their own **agentic** and
**RLVR** training workflows with much less effort.
- ⚡ **\[NEW\] Light-weight & AI-centric:** Our new release **AReaLite** follows an
**AI-centric** design that prioritizes better development experiences for AI
researchers. As a result, **AReaLite** delivers most AReaL functionalities with a much
more light-weight codebase, supporting users to build their own **agentic** and
**RLVR** training workflows with less effort.
- 🔥 **Asynchronous RL**: With algorithm-system co-design, AReaL supports fully
asynchronous RL for **the fastest training**! Experimental support for multi-turn
agentic RL is also provided.
@ -37,9 +38,12 @@ like how you enjoy real-world milk tea (cheers).
## News
**\[2025/07/31\] (v0.4, AReaLite)** We introduce **AReaLite**, a **light-weight**
version of AReaL with an **AI-centric** API design that inherently supports fully
asynchronous **agentic RL**. Check out [our AReaLite Design Doc](/arealite/README.md)
and [the quickstart guide](/docs/tutorial/quickstart.md) to begin your journey with
version of AReaL designed specifically for AI researchers and rapid prototyping.
AReaLite features an **AI-centric** API design that prioritizes ease of use and
algorithm development, while inherently supporting fully asynchronous **agentic RL**.
With 80% fewer lines of code, AReaLite maintains 90% of AReaL's core functionality.
Check out [our AReaLite design doc](/arealite/README.md) and
[the quickstart guide](/docs/tutorial/quickstart.md) to begin your journey with
**AReaLite**!
**\[2025/06/03\] (v0.3, boba²)** We release **boba²** (double-boba) for fully
@ -64,7 +68,7 @@ New highlights in AReaLite:
- Follows an *AI-centric* API design instead of the *system-centric* architecture in old
AReaL, which make it easier for AI researchers to adopt, understand, and develop
effectively and efficiently. To learn more about the design principles of AReaL,
please read [AReaLite Design Doc](/arealite/README.md)!
please read the [AReaLite design doc](/arealite/README.md)!
- A much more *light-weight* codebase compared to old AReaL codebase with only **20%** #
lines of code, with a detailed [code walkthrough](/docs/arealite/gsm8k_grpo.md) on an

View File

@ -29,8 +29,8 @@ show you how these steps are done in details.
## Launching the Experiment
As shown in [Quickstart Guide](../tutorial/quickstart.md), experiments in AReaLite are
launched using standalone launchers with the following commands:
As shown in the [quickstart guide](../tutorial/quickstart.md), experiments in AReaLite
are launched using standalone launchers with the following commands:
```
# Local Launcher
@ -72,7 +72,7 @@ config: GRPOConfig
## Loading and Preprocessing Dataset
We use the `datasets` and `torchdata` packages to load and preprocess the dataset into
our dataloader. First, we download `openai/gsm8k` from Huggingface and split it by data
our dataloader. First, we download `openai/gsm8k` from Hugging Face and split it by data
parallel ranks, then map it to our desired format:
```python
@ -116,14 +116,14 @@ on separate GPUs from those used for training. The `RemoteSGLangEngine` acts as
that interacts with the servers. `RemoteSGLangEngine` runs in a SPMD manner on every
training process, without occupying any GPUs.
`RemoteSGLangEngine` provides two APIs, `agenerate` and `update_weights`. It is worth
mentioning that, in asynchronous RL experiment in AReaLite, inference-side weight update
could happen **in the middle of** generation of one prompt. With that being said, one
output sequence could be generated by multiple versions of models. Let us glimpse into
code of `agenerate` and `update_weights` for a better understanding.
`RemoteSGLangEngine` provides two APIs, `agenerate` and `async_update_weights`. It is
worth mentioning that, in asynchronous RL experiment in AReaLite, inference-side weight
update could happen **in the middle of** generation of one prompt. With that being said,
one output sequence could be generated by multiple versions of models. Let us glimpse
into code of `agenerate` and `async_update_weights` for a better understanding.
In `update_weights`, the engine first send `pause_generation` requests to all inference
servers, notifying them a weight update is about to happen. Upon receiveing
In `async_update_weights`, the engine first send `pause_generation` requests to all
inference servers, notifying them a weight update is about to happen. Upon receiveing
`pause_generation`, inference servers will immediately stop generating and respond with
already generated tokens. Then, the engine sends `update_weights_from_distributed` (for
NCCL update) or `update_weights_from_disk` (for disk update). After the update is
@ -133,8 +133,8 @@ start working again.
```python
class RemoteSGLangEngine:
...
def update_weights(self, meta: WeightUpdateMeta):
# `update_weights` is completely async.
def async_update_weights(self, meta: WeightUpdateMeta):
# `async_update_weights` is completely async.
# It submits task to a ProcessPoolExecutor and returns a future
for addr in self.addresses:
res = requests.post(f"http://{addr}/pause_generation")
@ -148,7 +148,7 @@ class RemoteSGLangEngine:
def callback(future):
for addr in self.addresses
requests.post(f"http://{addr}/continue_generation")
requests.post(f"http://{addr}/continue_generation")
future.add_done_callback(callback)
return future
@ -285,15 +285,15 @@ class WorkflowExecutor:
rid += 1
# Wait for rollout completion
tasks = list(rollout_tasks.values())
done = []
completed_tasks = []
if tasks:
done, _ = await asyncio.wait(
completed_tasks, _ = await asyncio.wait(
tasks,
timeout=ROLLOUT_POLL_WAIT_TIME,
return_when=asyncio.FIRST_COMPLETED,
)
# Collect done results, put the results into output queue
for task in done:
for task in completed_tasks:
traj = await task
task_rid = task.get_name()
rollout_tasks.pop(task_rid)
@ -432,25 +432,24 @@ weight_update_meta = WeightUpdateMeta.from_disk(config.saver)
```
After a training step is finished, we transfer new weights from actor engine to remote
inference servers with steps shown in the following code:
inference servers:
1. The rollout engine needs to stop sending generation requests to remote servers
(`rollout.pause()`) to avoid server-side congestion.
1. Since we need to invoke weight update on the trainer engine and remote inference
servers at the same time, in the training script, we asynchronously send requests to
remote inference servers, and then immediately upload weights on the trainer engine.
```python
# 1. Pause rollout on remote inference servers
rollout.pause()
# 2. Send requests to remote servers, tell them to update weights
if dist.get_rank() == 0:
future = rollout.update_weights(weight_update_meta)
# 3. Actor begins to transfer weights
future = rollout.async_update_weights(weight_update_meta)
actor.upload_weights(weight_update_meta)
# 4. Wait for remote servers to return after finishing updates
if dist.get_rank() == 0:
future.result()
# 5. Synchronize rollout processes for model version update
dist.barrier(device_ids=[actor.device.index])
torch.cuda.synchronize()
# 6. Resume rollout on remote inference servers
rollout.resume()
# 7. Set version, ensures versions on actor and rollout engine are identical
actor.set_version(global_step + 1)
rollout.set_version(global_step + 1)
```
@ -482,7 +481,7 @@ for global_step in range(max_steps):
rollout.pause()
if dist.get_rank() == 0:
future = rollout.update_weights(weight_update_meta)
future = rollout.async_update_weights(weight_update_meta)
actor.upload_weights(weight_update_meta)
if dist.get_rank() == 0:
future.result()

View File

@ -8,10 +8,12 @@ You can find the complete implementation in `arealite/workflow/multi_turn.py`.
## Step 1: Define Your Workflow
AReaLite gives you flexibility in how you design your agents. Instead of rigid `Agent`
classes that might constrain your agent's capabilities, AReaLite captures all rollout
behavior in a `RolloutWorkflow` class. This approach lets you customize your agent's
behavior however you need.
AReaLite gives you flexibility in how you design your agents to run **an episode**. **An
episode** defines how your agent rollouts a complete training sample from an input
prompt, using tools, reward functions, and (multi-turn) generation. Instead of rigid
`Agent` classes that might constrain your agent's capabilities, AReaLite captures all
rollout behavior in a `RolloutWorkflow` class. This approach allows you to customize
your agent's behavior however you need.
```python
# arealite/api/workflow_api.py
@ -40,7 +42,7 @@ interact.
> generated from that prompt—it's not batched. However, you can generate multiple
> trajectories from a single prompt (for example, with GRPO or tree search).
### Setting Up the Multi-Turn Math Workflow
### Setting Up the Multi-turn Math Workflow
Let's build a multi-turn rollout workflow for solving math problems. First, we'll define
the `__init__` method to set up what we need during rollout:
@ -82,10 +84,11 @@ class MultiTurnWorkflow(RolloutWorkflow):
seq, logprobs, loss_mask, versions = [], [], [], []
messages = data["messages"]
# Run multi-turn rollout until we get the correct answer
t = reward = 0
turn_index = 0
reward = 0
discount = 1.0
rid = uuid.uuid4().hex
while reward == 0 and t < self.max_turns:
while reward == 0 and turn_index < self.max_turns:
# Convert the conversation into input tokens
input_ids = self.tokenizer.apply_chat_template(
messages,
@ -111,7 +114,7 @@ class MultiTurnWorkflow(RolloutWorkflow):
> **Note**: The `rid` field in `LLMRequest` is the request ID. Requests with the same ID
> will reuse the LLM inference server's KV caches for better efficiency.
### Handling Multi-Turn Conversations
### Handling Multi-turn Conversations
Next, we'll check if the current answer is correct using our `reward_fn`. This function
should return 1 for correct answers and 0 otherwise. When the answer is wrong, we'll

View File

@ -75,7 +75,6 @@ python3 -m arealite.launcher.ray examples/arealite/gsm8k_grpo.py \
allocation_mode=sglang.d12p1t1+d4p1t1 \
cluster.n_nodes=4 \
cluster.n_gpus_per_node=4 \
...
# Launch with Slurm launcher. 16 nodes (8 GPUs each), 12 nodes for generation, 4 nodes for training
python3 -m arealite.launcher.slurm examples/arealite/gsm8k_grpo.py \
@ -85,7 +84,6 @@ python3 -m arealite.launcher.slurm examples/arealite/gsm8k_grpo.py \
allocation_mode=sglang.d96p1t1+d32p1t1 \
cluster.n_nodes=16 \
cluster.n_gpus_per_node=8 \
...
```
Additional references:
@ -99,9 +97,9 @@ Additional references:
>
> 1. Ensure `allocation_mode` matches your cluster configuration
> (`#GPUs == cluster.n_nodes * cluster.n_gpus_per_node`)
> 1. Ray/Slurm launchers only works for more than 1 node (`cluster.n_nodes > 1`). For
> 1. Ray / Slurm launchers only works for more than 1 node (`cluster.n_nodes > 1`). For
> single node scenario, please use `LocalLauncher`.
> 1. In Ray/Slurm launchers, GPUs are allocated at node granularity, which means #GPUs
> 1. In Ray / Slurm launchers, GPUs are allocated at node granularity, which means #GPUs
> for generation or training must be integer multiples of `cluster.n_gpus_per_node`.
<!--