>>106763907
Please do. Q4 working really well might just be me deluding myself but I really haven't noticed any instances where 4.5 Q4 slipped up where Q8 didn't.
>>106763914
I'm currently using a really basic one on standard llama.cpp server. I'm not even making full use of my A6000 right now.
./llama-server --model ./zai-org_GLM-4.6-Q4_K_M-00001-of-00006.gguf --n-gpu-layers 99 -b 4096 -ub 4096 --override-tensor exps=CPU --parallel 1 --ctx-size 32000 -ctk f16 -ctv f16 -fa on --no-mmap --threads 32 --host 0.0.0.0 --port 5001