llama.cpp/tools
Radoslav Gerganov c556418b60
llama-bench : use local GPUs along with RPC servers (#14917)
Currently if RPC servers are specified with '--rpc' and there is a local
GPU available (e.g. CUDA), the benchmark will be performed only on the
RPC device(s) but the backend result column will say "CUDA,RPC" which is
incorrect. This patch is adding all local GPU devices and makes
llama-bench consistent with llama-cli.
2025-07-28 18:59:04 +03:00
..
batched-bench llama : add high-throughput mode (#14363) 2025-07-16 16:35:42 +03:00
cvector-generator llama : deprecate llama_kv_self_ API (#14030) 2025-06-06 14:11:15 +03:00
export-lora mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503) 2025-07-25 13:08:04 +02:00
gguf-split scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
imatrix imatrix: add option to display importance score statistics for a given imatrix file (#12718) 2025-07-22 14:33:37 +02:00
llama-bench llama-bench : use local GPUs along with RPC servers (#14917) 2025-07-28 18:59:04 +03:00
main llama : fix `--reverse-prompt` crashing issue (#14794) 2025-07-21 17:38:36 +08:00
mtmd mtmd : add support for Voxtral (#14862) 2025-07-28 15:01:48 +02:00
perplexity llama : deprecate llama_kv_self_ API (#14030) 2025-06-06 14:11:15 +03:00
quantize quantize : update README.md (#14905) 2025-07-27 23:31:11 +02:00
rpc rpc : Fix build on OpenBSD (#13541) 2025-05-25 15:35:53 +03:00
run cmake : do not search for curl libraries by ourselves (#14613) 2025-07-10 15:29:05 +03:00
server server : allow setting `--reverse-prompt` arg (#14799) 2025-07-22 09:24:22 +08:00
tokenize llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
tts sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
CMakeLists.txt mtmd : rename llava directory to mtmd (#13311) 2025-05-05 16:02:55 +02:00