[fix] Support local mode without setting CUDA_VISIBLE_DEVICES (#38)

If `CUDA_VISIBLE_DEVICES` is not set, it means that all the
devices will be available for use. If it's an empty string,
it means no device is available for the process to use.

As a result, if `CUDA_VISIBLE_DEVICES` does not exist, the
default value should be equivalent to all the devices instead
of an empty one.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
This commit is contained in:
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-05-16 13:23:28 +03:00 committed by GitHub
parent e9eff73d90
commit 22eb63ef6e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 2 additions and 1 deletions

View File

@ -13,6 +13,7 @@ from typing import Dict, List, Optional, Tuple, Union
import psutil
import realhf.base.logging as logging
from realhf.base import gpu_utils
from realhf.base.constants import LOG_ROOT
from realhf.scheduler.client import (
JobException,
@ -87,7 +88,7 @@ class LocalSchedulerClient(SchedulerClient):
self._gpu_counter = 0
self._cuda_devices: List[str] = os.environ.get(
"CUDA_VISIBLE_DEVICES", ""
"CUDA_VISIBLE_DEVICES", ",".join(map(str, range(gpu_utils.gpu_count())))
).split(",")
self._job_counter: Dict[str, int] = defaultdict(int)