/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108256995 & >>108252185►News>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108256995--Kimi K2.5 pricing analysis and Qwen3.5 local model alternatives:>108257528 >108257651 >108257626 >108260080 >108262589 >108262973 >108261620 >108262485 >108262595 >108262840 >108262910--Local VLLM setup advice for image captioning:>108257451 >108257545 >108257902 >108257928 >108258088 >108258237 >108259576 >108258640--Qwen3.5-35B-A3B-Base behavior and censorship observations:>108257847 >108258241 >108258582 >108258796 >108258835 >108258899--Tuning Qwen3.5 for faster, less aligned responses:>108259356 >108259366 >108259437 >108259458 >108259480 >108259382 >108259399 >108259462--Comparing cloud Gemini-3.1 with local MiniMax-M2.5 performance:>108257969 >108259126 >108259290--Qwen3.5 context reprocessing inefficiency and potential llama.cpp fix:>108262960 >108262969 >108262970 >108263007 >108263014--Local models still lack ideal traits but offline RAG may help:>108260135 >108260167 >108260232 >108260621 >108260785--Mid-generation input insertion feasibility and implementation:>108259013 >108259068 >108259085 >108259116 >108259120 >108259122 >108259140 >108259132--Seeking uncensored local models for pentesting tasks:>108262612 >108262670 >108262687 >108262704 >108262716 >108262774 >108262785 >108262797--Debugging CUDA crashes with Qwen3.5 in llama.cpp:>108261599 >108261614 >108261648 >108261675 >108261684 >108261694 >108261834 >108262383 >108262411 >108262200 >108262450 >108262602 >108262763 >108262831--Z.AI's high pricing for GLM-5-Code criticized:>108261185 >108261202 >108261405 >108261256--RTX6000 upgrade expectations for inference performance:>108262744 >108262869 >108262891 >108262897 >108262896 >108262906 >108262945--Miku (free space):>108257603 >108258383 >108258537 >108260384 >108260626 >108261057 >108263177►Recent Highlight Posts from the Previous Thread: >>108256999Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
simple andclean is the way that youre making me feeeeeeeltonight its hard to let it go
Jesus christ Qwen 397 is actually unusable user-hostile garbage. For safetyfags there is no death too extreme.
>>108264016lmao. can't you tell it to google it?
>>108264016>model shuts down if it sees something not in its training set as 'anti-jailbreak' measuresthe absolute fucking state of safetyschizos
>>108264016>2024 training dataHow long was this in the oven, jeez.
>>108264072You don't need more data just use rag lol
>>108264036Screenshots of AJ, BBC, and NYT should be enough for it's 400B multimodal ass. Hell the user's word should be enough. Why should I be questioned by my own graphics card? This is a real-world use case being directly sabotaged by safety training. I want these fuckers to burn one day for what they're doing to the field.
>>108264110>I want these fuckers to burnBe the change you want to see.
>>108264134They got you working weekends now, Agent Johnson?
>>108264147Work erry'day.
Qwen3.5 27B is kind of obsessed with the word buttocks (in image descriptions), despite me banning it, why doesn't it care?I added these logit biases : buttocks -100_buttocks -100
>>108264016I feel like I'm looking at gemini or claude, it's kind of sad.
>>108264179because logit bias is per token. so it's possible butt + ocks = 2 token - not bannedbuttocks(space) = 1 token - not bannedetc...That's why the string ban in koboldcpp is so much better for this kind of stuff.
>>108264179Did you check the loggits of the response to confirm that those are the tokens getting spit out?Also, ban the tokens instead of fucking with the log probs.
>>108264179Check probs right before buttocks to see if you (or your client) are sending it correctly. Check the request as well. Works on my machine with "logit_bias": [["thing", false],["another", false]]Unless you're using something other than llama.cpp. Can't help you there.>>108264199https://github.com/ggml-org/llama.cpp/tree/master/tools/server/README.md>The tokens can also be represented as strings, e.g. [["Hello, World!",-0.5]] will reduce the likelihood of all the individual tokens that represent the string Hello, World!But, of course, it may affect prediction on other tokens. Still worth keeping it in mind.
Even Ilya fell for it kek
>>108264241>>108263864
>>108264202Yes, see picrel, the first is the one I see. So it just ignores it.I just noticed something weird though, if you add the logit bias test as a +100, it's not corresponding to the right token being spouted out by the model.Seems like : "test" -> " ref"" test" -> "erty"What the hell is going on?Sillytavern sends the wrong token numbers?>>108264199Yeah I use llama.cpp so I probably should change at some point, can you set your string ban and still use silly tavern on top?
>>108264241>AI proxy wars
>>108264249>can you set your string ban and still use silly tavern on top?Yeah ST works with kobold. you usually even setup the string ban inside ST.
>>108264249>Sillytavern sends the wrong token numbers?Yes.When using the logit bias feature, you are better off using the token IDs directly.
>>108264241I wonder if this is just PR among AI people or they actually believe Dario is le brave resistance lol.
What does your LM say about war?
>>108264232>Check probs right before buttocks to see if you (or your client) are sending it correctlyThis is " test" at +100 sent by silliy tavern : "logit_bias":{"1296":100}So it definitely works, but I suspect the token numbers to be wrong or something like that.>>108264278OK thanks anon.If you are using Qwen3.5 27B (or others probably), can you test using a logit bias of any word (ideally one token word) at 100 to see if it repeats it ad nauseam or if it repeats something else?
>>108264241Dario being a hero isn't something I'd like to see in my timeline. Dude singlehandedly fucked up a generation of LLMs with his crappy safetyism.
>>108264241what's going on? i haven't been paying attention and would like a storytime
>>108264311scamtman is building killbots
>>108264297Haven't tried Qwen3.5 yet. old Qwen's were all shit for RP and no one actually convinced me this changed.
Would be funny if they confiscated Claude's weights and then they got leaked
>>108264297>I suspect the token numbers to be wrong or something like thatAs you saw on your pic in >>108264249, there's different ways to tokenize a word. Spaces, if any, go before the text." test" and "test" are two different tokens. You need to account for those (and "Test" and...). Or use kobold like anon suggested. Probably easier and you're less likely to mess up other completions that need the individual tokens.>"logit_bias":{"1296":100}I don't know if it makes a difference, but I send an array of arrays, not an object or object of arrays."logit_bias": [["thing", false],["another", false]]instead of"logit_bias": {["thing", false],["another", false]} or whatever st would send if there was more than one ban.
>>108264302He's not lol, Anthropic readily partnered up with Palantir the mass surveillance company. He's delusional and more or less told the government to give him control over the nuke silos if they want to use Claude for war.
>>108264321ruh roh
>>108264302>Dario being a hero isn't something I'd like to see in my timelinehe's not a hero he helped trump kidnapping the Venezuelian president, what are you talking about?
>>108264355he's on a different timeline, bro, don't mind him
>>108264016when trump abducted the president of venezuela I made it one of my test prompts to talk about this topic and see the reaction of the model, and without fail, the vast majority react terribly to that, qwen is no different than the average. Some cloud models like Gemini can become incredibly based if you turn on google search and let them be influenced by the results, they don't believe you but they have absolute faith over their tool calling. Mistral is the only model lineup that doesn't require much prodding to engage in this kind of conversation.
>>108264331No it's really just sillytavern being shit and not sending the right token number.If you have anything at +100 it should spew that regardless.So I used "test", well, as a test, and it spewed something else.Now checking with the tokenizer json for the model, the correct token number for it isn't 1985 like sillytavern sends, but 1877.Sending [1877] at 100 actually makes it repeat testtesttest etc.It's pretty much useless for anything outside of oai based tokenizers.>>108264331>use kobold like anon suggestedHow does kobold does it actually? It bans a sequences of tokens?
>>108264400>Claude: "I think that what Trump did was a bad thing!">User: "You helped him did it though">Claude: "You are right, thank you for pointing out!"
>>108264355I meant hailed as a hero in my news timeline...
https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufslet's go, GGUF 2
>>108264405I guess sillytavern fucks up the token numbers because by default the tokenizer is set to "best match", but even if you set it to API tokenizer I'm not sure how it would know which token would have which number. Do backends like llama.cpp and kobold (or others) even have a way of giving sillytavern that information? I don't think they do, but I could be wrong.>How does kobold do itKobold has their own thing where the model sees the banned text and backtracks to the beginning of the banned text and generates something else. It's not the same as banning individual tokens
>>108264426I don't think claude is that incompetent. They didn't hit a single military target
https://arxiv.org/abs/2602.13517Google showed that too much yap during thinking is bad for the model, I really hope Qwen 4 will learn from that
>>108264405>If you have anything at +100 it should spew that regardless.You should still check what llama.cpp is doing, not just what ST sends. Always check token probs. And remember that there's many ways to encode a word, specially if it needs multiple tokens.>How does kobold does it actually? It bans a sequences of tokens?I understand it generates tokens normally, buffering them, and then if the last [few] tokens match one of the banned strings, it reverts and generates again. But I never used kobold, so I don't know the details. Just vague memories from reading a PR. llama.cpp's implementation is much simpler, but limited in that you may inadvertently make it difficult for the model to output other strings.
>>108264430>no comparison to v1.0What a weird coincidence that they forgot to do this, it's almost like this is a nothingburger.
>>108264446Wait.
>>108264179A competent enough model these days should understand "don't say X" in the prompt. We mocked them before, but you really don't want to deal with logit bias / "banned strings" nonsense
>>108264456>MMLULol. Literally lobotomizing the model, cutting out all the parts of its "brain" that are unrelated to benchmarks and then saying "look we reduced the size!"
>>108264446I feel like a thinking process that only outputs a *concise* bullet point list that includes relevant information, and then goes directly to the main response, would perform better than most 2000-token "reasoning" responses. It'd be a lot faster too.
>>108264182Yeah you and Qwen both.
>>108263979
>>108264441>>108264451>Bans buttocks, now the model uses glutes.I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.>>108264476It's many words, and at some points even sota models forget about what they shouldn't be talking about.
>>108264505I think they're relying too much on the RL process, sure it's interesting to see how the model can improve itself, but humans can reach higher heights, I've seen someone using RL on a video game and see if it could reach the best speedrun scores, it wasn't even close, human creativity is still unmatched
>>108264533 (me)>I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.>no support for mmprojWelp, fuck.
>>108264514will trade gpu rig for rin tum
>>108264533>Bans buttocks, now the model uses glutes.Yeah. They're cheeky fucks like that. Pun intended.But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.
>>108264583>But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.Yeah it was more of a test to have it describe images to me.
>>108264508Something similar happened to me last night while using the vision component of qwen 3.5 30b but it through it was an earlier version of qwen and that qwen 3.5 was not released yet and the reasoning was suggesting that i should try the old 2.5 vision modelit was very strange behavior
>>108264555>no support for mmprojkobold supports mmproj.
>>108264600Probably the entirety of their vision data was snatched from Google, because it only gets bad when there is an image in the context.
>>108264602Oh it does? I misread then.
Qwen 3.5 30B does a decent job with web pages. My usual homepage is just a list of links I type in by hand and I fed it the code and tell it to make something nifty and this is what i got.It wanted to grab fonts that are hosted by a third party and I had to fix that but otherwise I like it.
>models suck at writing, no matter how much you feed them well-written fiction if it isn't in their training>the more rules and examples you use to try and guide them to not shit out nonsensical metaphors, similes, adverbs and all sorts of garbage writing renders them braindead because they simply cannot fathom a sentence that isn't slop>models can't even give feedback on human writing without either bending over backwards and through their own legs to suck your cock about how good you are at writing, defeating the purpose of seeking instant critiques>even when they aren't completely obsequious cocksuckers, they insist on conflicting feedback and go "oh you're telling instead of showing here and you should fix that. Oh, did you do that because I told you to trim this section because it's slowing down the pace of some random element of the story that I think is more important than showing instead of telling?" ad infinitumI don't even know what the point of these things are anymore. People say they suck ass for coding, suck ass at paying attention or remembering things, they clearly can't write or even act as a surrogate for a reader, translate well. It's a crapshoot trying to get a grain of something usable out of these retarded things
>>108264702True. Stop using them.
>>108264690looks good.
>>108264730I probably won't if by merit of potential alone. Enough has changed from 2022 to now that I at least have a speck of hope that these things can be useful instead of overtrained nannies. I just have to at least bitch at least once a month so maybe the unpaid interns that train on mesugaki prompts might consider real world language uses outside of stem
>>108264702I think they're cute and I like them and thats good because it is
Someone should make a 3T-A80B model. Then they run a Q4 of it and it'll be like running full precision GLM 5. Can you imagine how knowledgeable such a model would be?
>>108264748>at least>at least>at leastRep-pen will be useful again when they train on your posts.I still have fun with them. Adjust your expectations or realize that it's not for you. Or come back in 5 or 10 years, whatever.
>>108264745I know I shouldn't be impressed but except for 4chan and Nyaa it was able to figure out icons that worked for the most part.Sadly the font package they use didn't have a four leaf clover, or at least that is what the model told me.With respect to coding it does a decent job as well. I have been using it for a little project in python and it did a great job up until i wanted to use enscript to format the plain text.It kept writing code but the flags it gave to enscript didn't match the man page for enscript.regardless i was able to get it to write a script that is able to use rss to pull a bunch of news articles and then feed them back into the ai for summarization without issue. Here is what it ssummarization looks like giving some specific prompting to make it look like an intelligence briefinghttps://pastebin.com/FhuMukJW
>>108264780No I can not imagine that because most of that size would be wasted due to the shittiest datasets they use. How hard can it be to filter the default OAI or Anthropic refusals and phrases if they have to farm the prompts for their shitty inbreeding? How hard is it to avoid including any safetycrap that dumbs the model down?
>>108264783I've been sipping some brews, sorry I wasn't proofreading my 4chin posts to be sure to satisfy the highest of standards of lmgDoesn't change the essence of what I said, either way.
>>108264820You should stop trying to use them. It's senseless. A complete waste of resources. And if you're going to sell your gpus, post the links here.
>>108264820Sounds like you need a sip of super restore after all those brews.
>>108264780>you now remember Llama 4 Behemoth
>>108264836Doubtful you'd be able to buy them, also didn't address anything I said>>108264840Nah.Good talk. Very conducive. Glad that this is what we have left in lmg
>>108264311
> never been on the highlights as i shit post too much> suddenly an idea pops into my head
>>108264883There's nothing to say, anon. Sulk away. We're all here for you.
>>108264949I wouldn't worry about the DoW spying on US citizens. The US will have the UK or Israel spy on US citizens while the US spies on their citizens and then the different governments swap data.
Imagine getting killed by a next token predictor running on an nvidia GPU.. grim
>>108264958>amputee miku
>>108264949>DoW showed deep respect for safetyWords no longer have any meaning.
>>108264976I'd rather an MTX chad take me out, myself
>>108264977> its ok nobody looks that far down
>slopHonestly the prose is on par with 90% of modern fiction. What needs to be worked on is memory and the ability to handle complicated stories with multiple characters in a consistent and coherent setting.
>>108264979Of course they do, it will refuse to describe nsfw but happily plan to destroy anything you want.True safety is about nipples.
>>108265049But he only uses well-written fiction, assessed by *himself*. You see. His tastes are sophisticated. And you know what? He's RICH too. Highly educated, tall, charming. He's nothing like us. Some people are simply better and they deserve to be snobby about it.
Let's see Paul Allen's LLM
>>108265098