>>108925172
Seems to be working great on CUDA so far, but vulkan splitting across both nvidia and amd cards (2x 3060 1x 9060xt) is segfaulting at model warmup, verbose doesnt seem to be giving any extra info but my command is
./llama-server -m ~/models/gguf/gemma-4-26B-A4B-it-heretic-ara.i1-Q4_K_M.gguf -mm ~/models/gguf/gemma-4-26B-A4B.mmproj-f16.gguf -t 16 -c 131072 -fa on --backend-sampling -ngl 99 --host 0.0.0.0 -ctk q8_0 -ctv q8_0 --reasoning off -np 2 -sm tensor --verbose