when I try
./build/bin/llama-server -m "/mnt/nvsam2tb/llm/GLM-4.5-Air-Q5_K_M-00001-of-00002.gguf" -c 32768 --n-cpu-moe 33 --n-gpu-layers 99
I get
gguf_init_from_file: failed to open GGUF file '/mnt/nvsam2tb/llm/GLM-4.5-Air-Q5_K_M-00002-of-00002.gguf'
llama_model_load: error loading model: llama_model_loader: failed to load GGUF split from /mnt/nvsam2tb/llm/GLM-4.5-Air-Q5_K_M-00002-of-00002.gguf
llama_model_load_from_file_impl: failed to load model
llama_params_fit: failed to fit params to free device memory: failed to load model
llama_params_fit: fitting params to free memory took 0.10 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) (0000:01:00.0) - 31588 MiB free
gguf_init_from_file: failed to open GGUF file '/mnt/nvsam2tb/llm/GLM-4.5-Air-Q5_K_M-00002-of-00002.gguf'
llama_model_load: error loading model: llama_model_loader: failed to load GGUF split from /mnt/nvsam2tb/llm/GLM-4.5-Air-Q5_K_M-00002-of-00002.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/nvsam2tb/llm/GLM-4.5-Air-Q5_K_M-00001-of-00002.gguf'
srv load_model: failed to load model, '/mnt/nvsam2tb/llm/GLM-4.5-Air-Q5_K_M-00001-of-00002.gguf'
srv operator(): operator(): cleaning up before exit...
Comment too long. Click here to view the full text.