AReaL/docs/installation.md

2.8 KiB

Installation

Prerequisites

Hardware Requirements

The following hardware configuration has been extensively tested:

  • GPU: 8x H800 per node
  • CPU: 64 cores per node
  • Memory: 1TB per node
  • Network: NVSwitch + RoCE 3.2 Tbps
  • Storage:
    • 1TB local storage for single-node experiments
    • 10TB shared storage (NAS) for distributed experiments

Software Requirements

Component Version
Operating System CentOS 7 / Ubuntu 22.04 or any system meeting the requirements below
NVIDIA Driver 550.127.08
CUDA 12.8
Git LFS Required for downloading models, datasets, and AReaL code. See installation guide
Docker 27.5.1
NVIDIA Container Toolkit See installation guide
AReaL Image ghcr.io/inclusionai/areal-runtime:v0.3.0 (includes runtime dependencies and Ray components)

Note: This tutorial does not cover the installation of NVIDIA Drivers, CUDA, or shared storage mounting, as these depend on your specific node configuration and system version. Please complete these installations independently.

Runtime Environment

We recommend using Docker with our provided image. The Dockerfile is available in the top-level directory of the AReaL repository.

Pull the Docker image:

docker pull ghcr.io/inclusionai/areal-runtime:v0.3.0

This image includes all training requirements for AReaL.

For multi-node training: Ensure shared storage is mounted to the /storage directory on every node. All downloads and resources will be stored in this directory, and the AReaL container will mount this directory to /storage within the container.

Code Setup

Clone the AReaL project code to /storage/codes:

mkdir -p /storage/codes
cd /storage/codes/
git clone https://github.com/inclusionAI/AReaL
pip install -r AReaL/requirements.txt

Dataset

Download the provided training dataset and place it in /storage/datasets/:

mkdir -p /storage/datasets/
cd /storage/datasets/
wget https://huggingface.co/datasets/inclusionAI/AReaL-RL-Data/resolve/main/data/boba_106k_0319.jsonl?download=true

Model

We train using open-source models available on Hugging Face Hub. Here's an example using Qwen3 (ensure Git LFS is installed):

mkdir -p /storage/models
cd /storage/models
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Qwen/Qwen3-1.7B
cd Qwen3-1.7B
git lfs pull

Alternative: You can also use the Hugging Face CLI to download models after installing the huggingface_hub package. Refer to the official documentation for details.