>>106933415
I've got it working now.
For some reason, only ~13 of my 24GB of VRAM is used during these benches. Is that normal, or should I be looking to fully saturate that?
$MODEL = "G:\LLM\Models\GLM-4.6-smol-IQ2_KS\GLM-4.6-smol-IQ2_KS-00001-of-00003.gguf"
# === Launch llama-server with recommended GLM-4.6 settings ===
& .\llama-bench.exe `
-m $MODEL `
-mmp 0 `
-ngl 999 `
-p 128,512 `
-n 128,512 `
-b 4096 `
-ub 4096 `
-fa 1 `
-fmoe 1 `
-ctk q8_0 -ctv q8_0 `
-ot exps=CPU `
-t 20
Pause