Can I trouble with a technical question? I am trying to get llama.cpp server to work with a llava model. I had a working solution using an older version of llama.cpp (and an older llava) that had a different syntax. I supplied both a base LLM and a mmproj on the command line like:
.\server.exe -m ".\vicuna-13b-v1.5-16k.Q5_K_M.gguf" --mmproj .\mmproj-model-f16.gguf --host 127.0.0.1 --port 8080 --n-gpu-layers 100
However mmproj no longer appears to exist. llama.cpp documentation now doesn't mention it. I have downloaded llava-v1.6-mistral-7b.Q5_K_M.gguf and am now trying to get it working. Based on some google searching I am using the following to run the server:
.\server.exe -m ".\llava-v1.6-mistral-7b.Q5_K_M.gguf" -c 4096 --host 127.0.0.1 --port 8080 --n-gpu-layers 100
Although it loads and appears to run, it completely messes up every image I send it, to the point where it only describes the image as a desktop background or a person standing in the mirror doing a selfie (regardless of image content). For reference here is the python code snippet that creates the parameters. I got these parameters from some I saw in llama.cpp github discussions, but I've tried a lot of other options. What confuses me is that before I had to supply a LLM now it is like llava has been wrapped up in the mistral model? I admit I don't understand what is different:
parameters = {
"temperature": 0.1,
"repeat_penalty": 1.0,
"top_k": 40,
"top_p": 0.95,
"n_predict": 300,
"prompt": prompt,
"cache_prompt": True,
"image_data": image_data
}