Back to GPU split troubles with llama3 70B:
OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU has a total capacity of 23.68 GiB of which 499.81 MiB is free. Including non-PyTorch memory, this process has 23.19 GiB memory in use. Of the allocated memory 22.81 GiB is allocated by PyTorch, and 72.56 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I'm testing in exui, with a fresh build of torch and exllamav2. Guess I need to turn off auto GPU split and do it manually?