llama : fix MiniCPM inference after Granite Four changes (#14850)

MiniCPM models use the llm_build_granite constructor which was changed in the Granite Four PR to use hparams.rope_finetuned instead of a use_rope parameter. MiniCPM models need rope enabled by default. Fixes inference from gibberish to correct responses.
2025-07-24 17:50:51 +08:00 · 2025-07-24 17:50:51 +08:00 · 86f5623d90
parent 39cffdf188
commit 86f5623d90
1 changed files with 3 additions and 0 deletions
--- a/src/llama-model.cpp
+++ b/src/llama-model.cpp
@ -646,6 +646,9 @@ void llama_model::load_hparams(llama_model_loader & ml) {
                ml.get_key(LLM_KV_RESIDUAL_SCALE,              hparams.f_residual_scale);
                ml.get_key(LLM_KV_LOGIT_SCALE,                 hparams.f_logit_scale);

+                // MiniCPM uses rope by default, unlike Granite which uses it as a switch
+                hparams.rope_finetuned = true;
+
                switch (hparams.n_layer) {
                    case 52: type = LLM_TYPE_1B; break;
                    case 40: type = LLM_TYPE_2B; break;