>>106577958
>>106577913
okay i think i got it so the llama consoles says offloading 48 layers to gpu so i started there and then lowered it until it would launch without crashing which ended up being 33
ill probably do some llama bench runs tomorrow so i can see what the performance difference actually is
-ngl 99 \
--n-cpu-moe 33 \
-t 48 \
--ctx-size 20480 \
-fa on\
--no-mmap;
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: ROCm0 model buffer size = 17562.93 MiB
load_tensors: ROCm_Host model buffer size = 34892.00 MiB
load_tensors: CPU model buffer size = 254.38 MiB