>>102394761
For comparison:
08:44:02-643094 INFO Loaded "Dracones_Midnight-Miqu-70B-v1.5_exl2_4.0bpw" in 19.40 seconds.
08:44:02-644208 INFO LOADER: "ExLlamav2_HF"
08:44:02-645219 INFO TRUNCATION LENGTH: 16384
08:44:02-646098 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"
Output generated in 9.07 seconds (14.88 tokens/s, 135 tokens, context 12, seed 1859634060)
Output generated in 2.77 seconds (11.54 tokens/s, 32 tokens, context 157, seed 570974459)
Output generated in 3.24 seconds (15.45 tokens/s, 50 tokens, context 157, seed 2012484294)
Output generated in 3.76 seconds (13.82 tokens/s, 52 tokens, context 218, seed 275116343)
Output generated in 31.88 seconds (16.06 tokens/s, 512 tokens, context 281, seed 1587517452)
Previously I had issues loading llamacpp on two 3090s, but now it seems to work fine. Maybe ooba update fixed it.
>>102394789
I use Mistral Large 2.75bpw exl2, 16k context. There is anon who think I'm a fool for doing that. Let's hear what he says.