llama.cpp

History

Radoslav Gerganov c556418b60 llama-bench : use local GPUs along with RPC servers (#14917 ) Currently if RPC servers are specified with '--rpc' and there is a local GPU available (e.g. CUDA), the benchmark will be performed only on the RPC device(s) but the backend result column will say "CUDA,RPC" which is incorrect. This patch is adding all local GPU devices and makes llama-bench consistent with llama-cli.		2025-07-28 18:59:04 +03:00
..
batched-bench	llama : add high-throughput mode (#14363 )	2025-07-16 16:35:42 +03:00
cvector-generator	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
export-lora	mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503 )	2025-07-25 13:08:04 +02:00
gguf-split	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
imatrix	imatrix: add option to display importance score statistics for a given imatrix file (#12718 )	2025-07-22 14:33:37 +02:00
llama-bench	llama-bench : use local GPUs along with RPC servers (#14917 )	2025-07-28 18:59:04 +03:00
main	llama : fix `--reverse-prompt` crashing issue (#14794 )	2025-07-21 17:38:36 +08:00
mtmd	mtmd : add support for Voxtral (#14862 )	2025-07-28 15:01:48 +02:00
perplexity	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
quantize	quantize : update README.md (#14905 )	2025-07-27 23:31:11 +02:00
rpc	rpc : Fix build on OpenBSD (#13541 )	2025-05-25 15:35:53 +03:00
run	cmake : do not search for curl libraries by ourselves (#14613 )	2025-07-10 15:29:05 +03:00
server	server : allow setting `--reverse-prompt` arg (#14799 )	2025-07-22 09:24:22 +08:00
tokenize	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
tts	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
CMakeLists.txt	mtmd : rename llava directory to mtmd (#13311 )	2025-05-05 16:02:55 +02:00