>>108151341
>sub 100t/s prompt processing
You must be holding it wrong.
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | CUDA | 99 | pp20000 | 2717.56 ± 14.52 |
| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | CUDA | 99 | tg128 | 55.08 ± 0.27 |