>>101460948
For me, it's 882 T/s with exllama and 630 T/s with llama.cpp with Llama 3 70B.
>Llama-3-70B-Instruct-Q4_K_M:
prompt eval time = 12179.96 ms / 7679 tokens ( 1.59 ms per token, 630.46 tokens per second)
generation eval time = 2090.56 ms / 26 runs ( 80.41 ms per token, 12.44 tokens per second)
>Llama-3-70B-Instruct-4.65bpw:
11 tokens generated in 9.53 seconds (Queue: 0.0 s, Process: 0 cached tokens and 7667 new tokens at 882.84 T/s, Generate: 13.01 T/s, Context: 7667 tokens)