>>106904820
Quick question, if you were to see the following console output, do you think you would intuitively understand what it's supposed to tell you?
llama_params_fit_to_free_memory: projected memory use with initial parameters [MiB]:
llama_params_fit_to_free_memory: - ROCm0 (AMD Radeon Graphics): total=16304 used=39959 free=-24341
llama_params_fit_to_free_memory: - ROCm1 (AMD Radeon RX 6800): total=16368 used=42480 free=-26296
llama_params_fit_to_free_memory: - ROCm2 (AMD Instinct MI60 / MI50): total=32752 used=76200 free=-43626
llama_params_fit_to_free_memory: allocation projected to use too much memory to fulfill margin of 1024 MiB on all devices, need to reduce memory use by 97337 MiB
llama_params_fit_to_free_memory: context size reduced from 65536 to 4096 -> need 13440 MiB less memory
llama_params_fit_to_free_memory: with only dense weights in device memory there is a total surplus of 53432 MiB
llama_params_fit_to_free_memory: set to use 36 dense-only and 21 full GPU layers in total, projected memory use:
llama_params_fit_to_free_memory: - ROCm0 (AMD Radeon Graphics): 36 dense-only layers, 4 full layers, 13373 MiB used, 2244 MiB free
llama_params_fit_to_free_memory: - ROCm1 (AMD Radeon RX 6800): 0 dense-only layers, 5 full layers, 12983 MiB used, 3200 MiB free
llama_params_fit_to_free_memory: - ROCm2 (AMD Instinct MI60 / MI50): 0 dense-only layers, 12 full layers, 28598 MiB used, 3975 MiB free