>>108421377
i was trying to be lazy, theyre basically exactly the same now
rocm + rocwmma fa
./llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 99 --n-cpu-moe 33 -t 48 -fa 1 --mmap 0
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, VRAM: 24560 MiB
| model | size | params | backend | ngl | threads | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 48 | 1 | 0 | pp512 | 263.73 ± 1.08 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 48 | 1 | 0 | tg128 | 13.61 ± 0.19 |
vulkan
(づ◡﹏◡)づ [llama.cpp]$ ./llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 99 --n-cpu-moe 33 -t 48 -fa 1 --mmap 0
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 24560 MiB):
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, VRAM: 24560 MiB
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | threads | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm,Vulkan | 99 | 48 | 1 | 0 | pp512 | 263.77 ± 1.18 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm,Vulkan | 99 | 48 | 1 | 0 | tg128 | 13.70 ± 0.05 |