Final results trying to find good performance for google/gemma-4-31B-it with MTPat TP=2 [spoiler]in VLLM[/spoiler].
fp8 weights, fp8 kv cache, MTP 4
| test | t/s | ttfr (ms) |
|----------------:|-----------------:|-------------------:|
| pp2048 | 2291.16 ± 141.76 | 908.26 ± 57.30 |
| pp2048 @ d4096 | 1238.95 ± 13.10 | 4971.09 ± 52.81 |
| pp2048 @ d8192 | 1296.14 ± 2.19 | 7911.61 ± 13.32 |
| pp2048 @ d16384 | 1086.33 ± 4.07 | 16978.44 ± 63.85 |
| pp2048 @ d32768 | 826.55 ± 1.23 | 42133.18 ± 62.65 |
| pp2048 @ d65536 | 566.73 ± 1.23 | 119266.39 ± 258.69 |
| tg128 | 25.82 ± 2.92 | |
| tg128 @ d4096 | 19.90 ± 2.91 | |
| tg128 @ d8192 | 20.10 ± 0.68 | |
| tg128 @ d16384 | 18.72 ± 2.65 | |
| tg128 @ d32768 | 16.41 ± 3.01 | |
| tg128 @ d65536 | 9.83 ± 0.29 | |
Good enough for creative writing/RP use cases, I will stop here I think.