What is this black magic?
I'm trying a i1 Q4 KM quant, and it's giving responses in just 10 seconds rather than 5+ minutes which the old Q4 quants took.
What the fuck is going on? How is this possible?
llama_print_timings: load time = 1478.94 ms
llama_print_timings: sample time = 223.15 ms / 150 runs ( 1.49 ms per token, 672.18 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_print_timings: eval time = 9270.37 ms / 150 runs ( 61.80 ms per token, 16.18 tokens per second)
llama_print_timings: total time = 9718.07 ms / 151 tokens
Output generated in 9.95 seconds (15.08 tokens/s, 150 tokens, context 1728, seed 481338642)