2.8 KiB

Raw Blame History

Installation

Prerequisites

Hardware Requirements

The following hardware configuration has been extensively tested:

GPU: 8x H800 per node
CPU: 64 cores per node
Memory: 1TB per node
Network: NVSwitch + RoCE 3.2 Tbps
Storage:
- 1TB local storage for single-node experiments
- 10TB shared storage (NAS) for distributed experiments

Software Requirements

Component	Version
Operating System	CentOS 7 / Ubuntu 22.04 or any system meeting the requirements below
NVIDIA Driver	550.127.08
CUDA	12.8
Git LFS	Required for downloading models, datasets, and AReaL code. See installation guide
Docker	27.5.1
NVIDIA Container Toolkit	See installation guide
AReaL Image	`ghcr.io/inclusionai/areal-runtime:v0.3.0` (includes runtime dependencies and Ray components)

Note: This tutorial does not cover the installation of NVIDIA Drivers, CUDA, or shared storage mounting, as these depend on your specific node configuration and system version. Please complete these installations independently.

Runtime Environment

We recommend using Docker with our provided image. The Dockerfile is available in the top-level directory of the AReaL repository.

Pull the Docker image:

docker pull ghcr.io/inclusionai/areal-runtime:v0.3.0

This image includes all training requirements for AReaL.

For multi-node training: Ensure shared storage is mounted to the /storage directory on every node. All downloads and resources will be stored in this directory, and the AReaL container will mount this directory to /storage within the container.

Code Setup

Clone the AReaL project code to /storage/codes:

mkdir -p /storage/codes
cd /storage/codes/
git clone https://github.com/inclusionAI/AReaL
pip install -r AReaL/requirements.txt

Dataset

Download the provided training dataset and place it in /storage/datasets/:

mkdir -p /storage/datasets/
cd /storage/datasets/
wget https://huggingface.co/datasets/inclusionAI/AReaL-RL-Data/resolve/main/data/boba_106k_0319.jsonl?download=true

Model

We train using open-source models available on Hugging Face Hub. Here's an example using Qwen3 (ensure Git LFS is installed):

mkdir -p /storage/models
cd /storage/models
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Qwen/Qwen3-1.7B
cd Qwen3-1.7B
git lfs pull

Alternative: You can also use the Hugging Face CLI to download models after installing the huggingface_hub package. Refer to the official documentation for details.

2.8 KiB Raw Blame History