>>109186188
Thanks.
I settled on the smallest version for this model, with
~/BH/llama.cpp/build/bin/llama-server \
--model ~/CB/models/gemma-4-26B-A4B-heretic-APEX-I-Mini.gguf \
-ngl 999 \
-ncmoe 12 \
-c 122880 \
-np 1 \
-fa on \
-ctk q8_0 \
-ctv q8_0 \
--no-mmap \
--mlock \
--flash-attn on \
-b 1024 \
-ub 256 \
--host 0.0.0.0 \
--port 8080
Getting 60t/s so far.