llama.cpp CUDA dev !!OM2Fp6Fn93S
08/13/24(Tue)08:26:50 No.101867659 >>101867565
>I can learn what these tokens are for the model from its model card?
Assuming you have an instruct model it has some instruct format like
USER: how 2 download car
ASSISTANT: You can't download cars.
You would set "USER:" as the stop string so that when the model thinks the text should be continued by more user input the program returns control to you.
>Kobold has something about instruct tag preset, which I think handles automatically for some models?
I don't know how koboldcpp specifically handles this but the instruct format (if there is one) is saved in the model files.
>So I can fix that by increasing repetition penalty, correct?
Fix is I would say too strong of a word.
As I said, it's an unsolved problem.
>But wouldn't that also affect how the model responds in general, like increased propensity towards changing topics?
Yes, you basically have a tradeoff between precise and diverse model responses.
One major parameter to tune here is the so-called temperature: higher values mean more randomness in how the model picks the next token.
With temperature 0 the model always picks the most likely next token.