I'm trying to use Gemma 4 on Silly Tavern but I can't figure out how to turn on the reasoning with llama.server
These are my settings (3060 12GB + 32GB Ram), I'm not sure if they're the ideal ones.
google_gemma-4-E4B-it-Q8_0.gguf `
--no-host `
--mlock `
--fit on `
--fit-target 512 `
--n-cpu-moe 30 `
--parallel 1 `
--cache-type-k q4_0 `
--cache-type-v q4_0 `
--flash-attn on `
--ctx-size 8192 `
--threads 12 `
--batch-size 512 `
--ubatch-size 256 `
--swa-checkpoints 3 `
--reasoning on `
--reasoning-budget 300 `
--reasoning-budget-message "[Reasoning limit reached, formulating final response...]" `
--gpu-layers all