>>101509977
Single P40 test with 8-bit Nemo. PL 160W.
Processing Prompt [BLAS] (6587 / 6587 tokens)
Generating (271 / 1024 tokens)
(EOS token triggered! ID:2)
CtxLimit:6858/65536, Amt:271/1024, Process:29.873s (4.5ms/T = 220.50T/s), Generate:21.598s (79.7ms/T = 12.55T/s), Total:51.471s (5.27T/s)
About 17.5GB VRAM use with FA. Running without FA limits context to 32K and some, not too shabby.