couldn't resist pulling the reasoner budget, it's a nice way to cut qwen chatter
https://files.catbox.moe/ng0m1w.patch
here's the patch I am going to maintain to unslop some of it, along with a vulgar hack to strip away quotes "" from the reasoning-budget-message because it just so happens, if you have this
reasoning-budget-message = "Reasoning budget exceeded, let's write the answer."
in your presets.ini, it will actually fucking use the quotes and insert them as part of the message when reasoning budget triggers. It only happens when the arg is extracted from presets.ini running llama-server in router mode, not when you pass --reasoning-budget-message flag from the CLI. This one is more the router's fault than pwilkin's code, they haven't put much effort into the ini parsing and this behavior is desirable for passing json objects like
chat-template-kwargs = { "enable_thinking": true }
in your ini
I also add extra newlines before the message is inserted. It would be very dumb to default to inserting in the "I am thinReasoning budget exceeded" way pretty sure it would damage the model output
anyway just the router literal " passing reminds me that many of those vibers don't test a fucking thing for real before they hit the push button