am i doing something wrong? (the answer is yes, i'm sure)
i'm getting like 1.5tk/s from GLM4.7. i assume i'm missing some sort of moe flag?
./llama-server \
--model GLM-4.7.Q4_K_M.gguf \
--ctx-size 8192 \
--n-gpu-layers 13 \
--batch-size 512 \
-t 32 \
--temp 1.0 \
--top-p 0.95 \
--min-p 0.01 \
--host 0.0.0.0 \
--port 8033 \
--jinja \
--mlock
or is it just *that* slow?