>>109003556
That's odd. It does at least load for me, although I get some weird message at the beginning I don't know mean anything.
0.01.168.574 E llama_init_from_model: failed to initialize the context: Gemma4Assistant requires ctx_other to be set (this is normal during memory fitting)
0.01.214.852 W srv load_model: [spec] failed to measure draft model memory: failed to create llama_context from model
>>109003564
I tried all the way from 1 to 4 for that value. 1 gave the best for my setup. 2 gave me about 21.6 t/s, 3 about 19.8, and 4 about 18.2.
>>109003568
Which ones?
>>109003623
It's all on my two GPUs. As I said one of them is on a slow PCIe slot. It also bottlenecks me, I believe, when I try doing tensor parallel.