>>108743686
I vibeslopped it myself since I couldn’t find an open hardware monitor software that displays graphs and the information I need, while running on that server and being accessible over the network from my main PC.
>>108743866
Nice, that's quite a performance gain indeed and seems pretty good for agentic usage.
I had to slim down the mmproj and context a bit, but mine in mymodels.ini is:
model = /home/LLM/google_gemma-4-31B-it-Q8_0.gguf
ngl = 99
c = 60084
port = 12345
a = Google_Gemma-4_31B-it-Q8_0-reasoning_specdec_26B-A4B
fa = true
mlock = true
no-mmap = false
reasoning = true
keep = -1
np = 1
kvu = true
cache-type-k = q8_0
cache-type-v = q8_0
mmproj = /home/LLM/mmproj-google_gemma-4-31B-it-bf16.gguf
no-mmproj-offload = true
model-draft = /home/LLM/google_gemma-4-26B-A4B-it-IQ2_XXS.gguf
ngld = 99
draft-min = 0
draft-max = 16
cache-type-k-draft = q8_0
cache-type-v-draft = q8_0
ctx-size-draft = 60084