| model | test | t/s A16 -sm layer | t/s A16 -sm tensor | t/s RTX 3090 |
| ------------- | -------------: | ----------------: | -----------------: | -----------: |
| llama 8B Q4_0 | pp512 | 821.58 | 1487.54 | 5492.16 |
| llama 8B Q4_0 | tg128 | 37.43 | 87.36 | 153.10 |
| llama 8B Q4_0 | pp512 @ d32768 | 357.66 | 868.67 | 2122.05 |
| llama 8B Q4_0 | tg128 @ d32768 | 20.00 | 56.94 | 85.92 |
NVIDIA A16 performance is I think still a meme vs the ~2000€ price.
>>108113530
The code for loading data is currently very bad so it takes like 30 minutes to load a large model.
I'll test larger models once I've fixed that.
>>108113537
Comment too long. Click here to view the full text.