>>109151456
I suppose I should share warning messages on the shell:
0.02.783.255 W llama_model_loader: tensor overrides to CPU are used with mmap enabled - consider using --no-mmap for better performance
0.25.964.463 W llama_context: n_ctx_seq (32768) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
0.25.981.622 W sched_reserve: layer 0 is assigned to device CPU but the fused Gated Delta Net tensor is assigned to device CUDA0 (usually due to missing support)
0.25.981.623 W sched_reserve: fused Gated Delta Net (chunked) not supported, set to disabled
0.27.443.790 W srv load_model: speculative decoding will use checkpoints
0.27.443.804 W common_speculative_init: no implementations specified for speculative decoding
Probably the last one is the clue but what parameter do I need?