>>108766573
Works on my machine but for cood. Free 1.5-3x performance with Qwen 3.6 27b q8 in Cline. Didn't add that --spec-draft-n-max though.
llama-server.exe ^
-m "T:\models\Qwen3.6-27B-MTP-Q8_0-.gguf" ^
--threads 10 ^
--threads-batch 18 ^
--tensor-split 24,17 ^
--n-gpu-layers 999 ^
--ubatch-size 1024 ^
--ctx-size 150000 ^
--parallel 1 ^
--ctx-checkpoints 64 ^
--checkpoint-every-n-tokens 8192 ^
--reasoning on ^
--spec-type mtp ^
--no-mmap