[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1751519593478255.png (3.14 MB, 1288x1728)
3.14 MB
3.14 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108256995 & >>108252185

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: miku work.png (347 KB, 512x512)
347 KB
347 KB PNG
►Recent Highlights from the Previous Thread: >>108256995

--Kimi K2.5 pricing analysis and Qwen3.5 local model alternatives:
>108257528 >108257651 >108257626 >108260080 >108262589 >108262973 >108261620 >108262485 >108262595 >108262840 >108262910
--Local VLLM setup advice for image captioning:
>108257451 >108257545 >108257902 >108257928 >108258088 >108258237 >108259576 >108258640
--Qwen3.5-35B-A3B-Base behavior and censorship observations:
>108257847 >108258241 >108258582 >108258796 >108258835 >108258899
--Tuning Qwen3.5 for faster, less aligned responses:
>108259356 >108259366 >108259437 >108259458 >108259480 >108259382 >108259399 >108259462
--Comparing cloud Gemini-3.1 with local MiniMax-M2.5 performance:
>108257969 >108259126 >108259290
--Qwen3.5 context reprocessing inefficiency and potential llama.cpp fix:
>108262960 >108262969 >108262970 >108263007 >108263014
--Local models still lack ideal traits but offline RAG may help:
>108260135 >108260167 >108260232 >108260621 >108260785
--Mid-generation input insertion feasibility and implementation:
>108259013 >108259068 >108259085 >108259116 >108259120 >108259122 >108259140 >108259132
--Seeking uncensored local models for pentesting tasks:
>108262612 >108262670 >108262687 >108262704 >108262716 >108262774 >108262785 >108262797
--Debugging CUDA crashes with Qwen3.5 in llama.cpp:
>108261599 >108261614 >108261648 >108261675 >108261684 >108261694 >108261834 >108262383 >108262411 >108262200 >108262450 >108262602 >108262763 >108262831
--Z.AI's high pricing for GLM-5-Code criticized:
>108261185 >108261202 >108261405 >108261256
--RTX6000 upgrade expectations for inference performance:
>108262744 >108262869 >108262891 >108262897 >108262896 >108262906 >108262945
--Miku (free space):
>108257603 >108258383 >108258537 >108260384 >108260626 >108261057 >108263177

►Recent Highlight Posts from the Previous Thread: >>108256999

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
simple
and
clean

is the way
that
youre making me

feeeeeeel
tonight

its hard to let it

go
>>
File: 1772307162133.png (68 KB, 1076x506)
68 KB
68 KB PNG
Jesus christ Qwen 397 is actually unusable user-hostile garbage. For safetyfags there is no death too extreme.
>>
>>108264016
lmao. can't you tell it to google it?
>>
>>108264016
>model shuts down if it sees something not in its training set as 'anti-jailbreak' measures
the absolute fucking state of safetyschizos
>>
>>108264016
>2024 training data
How long was this in the oven, jeez.
>>
>>108264072
You don't need more data just use rag lol
>>
>>108264036
Screenshots of AJ, BBC, and NYT should be enough for it's 400B multimodal ass. Hell the user's word should be enough. Why should I be questioned by my own graphics card? This is a real-world use case being directly sabotaged by safety training. I want these fuckers to burn one day for what they're doing to the field.
>>
>>108264110
>I want these fuckers to burn
Be the change you want to see.
>>
>>108264134
They got you working weekends now, Agent Johnson?
>>
>>108264147
Work erry'day.
>>
Qwen3.5 27B is kind of obsessed with the word buttocks (in image descriptions), despite me banning it, why doesn't it care?
I added these logit biases :
buttocks -100
_buttocks -100
>>
>>108264016
I feel like I'm looking at gemini or claude, it's kind of sad.
>>
>>108264179
because logit bias is per token. so it's possible
butt + ocks = 2 token - not banned
buttocks(space) = 1 token - not banned
etc...
That's why the string ban in koboldcpp is so much better for this kind of stuff.
>>
>>108264179
Did you check the loggits of the response to confirm that those are the tokens getting spit out?
Also, ban the tokens instead of fucking with the log probs.
>>
>>108264179
Check probs right before buttocks to see if you (or your client) are sending it correctly. Check the request as well. Works on my machine with "logit_bias": [["thing", false],["another", false]]
Unless you're using something other than llama.cpp. Can't help you there.
>>108264199
https://github.com/ggml-org/llama.cpp/tree/master/tools/server/README.md
>The tokens can also be represented as strings, e.g. [["Hello, World!",-0.5]] will reduce the likelihood of all the individual tokens that represent the string Hello, World!
But, of course, it may affect prediction on other tokens. Still worth keeping it in mind.
>>
Even Ilya fell for it kek
>>
>>108264241
>>108263864
>>
File: file.png (5 KB, 237x125)
5 KB
5 KB PNG
>>108264202
Yes, see picrel, the first is the one I see. So it just ignores it.
I just noticed something weird though, if you add the logit bias test as a +100, it's not corresponding
to the right token being spouted out by the model.

Seems like :
"test" -> " ref"
" test" -> "erty"

What the hell is going on?
Sillytavern sends the wrong token numbers?

>>108264199
Yeah I use llama.cpp so I probably should change at some point, can you set your string ban and still use silly tavern on top?
>>
>>108264241
>AI proxy wars
>>
File: stringban.png (188 KB, 429x504)
188 KB
188 KB PNG
>>108264249
>can you set your string ban and still use silly tavern on top?
Yeah ST works with kobold. you usually even setup the string ban inside ST.
>>
>>108264249
>Sillytavern sends the wrong token numbers?
Yes.
When using the logit bias feature, you are better off using the token IDs directly.
>>
>>108264241
I wonder if this is just PR among AI people or they actually believe Dario is le brave resistance lol.
>>
What does your LM say about war?
>>
>>108264232
>Check probs right before buttocks to see if you (or your client) are sending it correctly
This is " test" at +100 sent by silliy tavern : "logit_bias":{"1296":100}
So it definitely works, but I suspect the token numbers to be wrong or something like that.

>>108264278
OK thanks anon.
If you are using Qwen3.5 27B (or others probably), can you test using a logit bias of any word (ideally one token word) at 100 to see if it repeats it ad nauseam or if it repeats something else?
>>
>>108264241
Dario being a hero isn't something I'd like to see in my timeline. Dude singlehandedly fucked up a generation of LLMs with his crappy safetyism.
>>
>>108264241
what's going on? i haven't been paying attention and would like a storytime
>>
>>108264311
scamtman is building killbots
>>
>>108264297
Haven't tried Qwen3.5 yet. old Qwen's were all shit for RP and no one actually convinced me this changed.
>>
Would be funny if they confiscated Claude's weights and then they got leaked
>>
>>108264297
>I suspect the token numbers to be wrong or something like that
As you saw on your pic in >>108264249, there's different ways to tokenize a word. Spaces, if any, go before the text." test" and "test" are two different tokens. You need to account for those (and "Test" and...). Or use kobold like anon suggested. Probably easier and you're less likely to mess up other completions that need the individual tokens.
>"logit_bias":{"1296":100}
I don't know if it makes a difference, but I send an array of arrays, not an object or object of arrays.
"logit_bias": [["thing", false],["another", false]]
instead of
"logit_bias": {["thing", false],["another", false]} or whatever st would send if there was more than one ban.
>>
>>108264302
He's not lol, Anthropic readily partnered up with Palantir the mass surveillance company. He's delusional and more or less told the government to give him control over the nuke silos if they want to use Claude for war.
>>
>>108264321
ruh roh
>>
>>108264302
>Dario being a hero isn't something I'd like to see in my timeline
he's not a hero he helped trump kidnapping the Venezuelian president, what are you talking about?
>>
>>108264355
he's on a different timeline, bro, don't mind him
>>
>>108264016
when trump abducted the president of venezuela I made it one of my test prompts to talk about this topic and see the reaction of the model, and without fail, the vast majority react terribly to that, qwen is no different than the average. Some cloud models like Gemini can become incredibly based if you turn on google search and let them be influenced by the results, they don't believe you but they have absolute faith over their tool calling.
Mistral is the only model lineup that doesn't require much prodding to engage in this kind of conversation.
>>
>>108264331
No it's really just sillytavern being shit and not sending the right token number.
If you have anything at +100 it should spew that regardless.
So I used "test", well, as a test, and it spewed something else.
Now checking with the tokenizer json for the model, the correct token number for it isn't 1985 like sillytavern sends, but 1877.
Sending [1877] at 100 actually makes it repeat testtesttest etc.
It's pretty much useless for anything outside of oai based tokenizers.

>>108264331
>use kobold like anon suggested
How does kobold does it actually? It bans a sequences of tokens?
>>
File: 1747381184106913.png (580 KB, 1232x848)
580 KB
580 KB PNG
>>108264400
>Claude: "I think that what Trump did was a bad thing!"
>User: "You helped him did it though"
>Claude: "You are right, thank you for pointing out!"
>>
>>108264355
I meant hailed as a hero in my news timeline...
>>
https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
let's go, GGUF 2
>>
>>108264405
I guess sillytavern fucks up the token numbers because by default the tokenizer is set to "best match", but even if you set it to API tokenizer I'm not sure how it would know which token would have which number. Do backends like llama.cpp and kobold (or others) even have a way of giving sillytavern that information? I don't think they do, but I could be wrong.
>How does kobold do it
Kobold has their own thing where the model sees the banned text and backtracks to the beginning of the banned text and generates something else. It's not the same as banning individual tokens
>>
>>108264426
I don't think claude is that incompetent. They didn't hit a single military target
>>
File: 1753125369482735.png (460 KB, 2025x1362)
460 KB
460 KB PNG
https://arxiv.org/abs/2602.13517
Google showed that too much yap during thinking is bad for the model, I really hope Qwen 4 will learn from that
>>
>>108264405
>If you have anything at +100 it should spew that regardless.
You should still check what llama.cpp is doing, not just what ST sends. Always check token probs. And remember that there's many ways to encode a word, specially if it needs multiple tokens.
>How does kobold does it actually? It bans a sequences of tokens?
I understand it generates tokens normally, buffering them, and then if the last [few] tokens match one of the banned strings, it reverts and generates again. But I never used kobold, so I don't know the details. Just vague memories from reading a PR. llama.cpp's implementation is much simpler, but limited in that you may inadvertently make it difficult for the model to output other strings.
>>
File: image.jpg (481 KB, 2304x1260)
481 KB
481 KB JPG
>>108264430
>no comparison to v1.0
What a weird coincidence that they forgot to do this, it's almost like this is a nothingburger.
>>
>>108264446
Wait.
>>
File: 20240116.jpg (99 KB, 800x600)
99 KB
99 KB JPG
>>108264179
A competent enough model these days should understand "don't say X" in the prompt. We mocked them before, but you really don't want to deal with logit bias / "banned strings" nonsense
>>
>>108264456
>MMLU
Lol. Literally lobotomizing the model, cutting out all the parts of its "brain" that are unrelated to benchmarks and then saying "look we reduced the size!"
>>
>>108264446
I feel like a thinking process that only outputs a *concise* bullet point list that includes relevant information, and then goes directly to the main response, would perform better than most 2000-token "reasoning" responses. It'd be a lot faster too.
>>
File: 1772311354970.png (44 KB, 908x362)
44 KB
44 KB PNG
>>108264182
Yeah you and Qwen both.
>>
File: 1763111176687835.jpg (583 KB, 1634x1817)
583 KB
583 KB JPG
>>108263979
>>
>>108264441
>>108264451
>Bans buttocks, now the model uses glutes.
I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.

>>108264476
It's many words, and at some points even sota models forget about what they shouldn't be talking about.
>>
>>108264505
I think they're relying too much on the RL process, sure it's interesting to see how the model can improve itself, but humans can reach higher heights, I've seen someone using RL on a video game and see if it could reach the best speedrun scores, it wasn't even close, human creativity is still unmatched
>>
>>108264533 (me)
>I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.
>no support for mmproj
Welp, fuck.
>>
>>108264514
will trade gpu rig for rin tum
>>
>>108264533
>Bans buttocks, now the model uses glutes.
Yeah. They're cheeky fucks like that. Pun intended.
But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.
>>
>>108264583
>But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.
Yeah it was more of a test to have it describe images to me.
>>
>>108264508
Something similar happened to me last night while using the vision component of qwen 3.5 30b but it through it was an earlier version of qwen and that qwen 3.5 was not released yet and the reasoning was suggesting that i should try the old 2.5 vision model
it was very strange behavior
>>
>>108264555
>no support for mmproj
kobold supports mmproj.
>>
>>108264600
Probably the entirety of their vision data was snatched from Google, because it only gets bad when there is an image in the context.
>>
>>108264602
Oh it does? I misread then.
>>
File: 1762371559174792.png (176 KB, 1515x1651)
176 KB
176 KB PNG
Qwen 3.5 30B does a decent job with web pages. My usual homepage is just a list of links I type in by hand and I fed it the code and tell it to make something nifty and this is what i got.
It wanted to grab fonts that are hosted by a third party and I had to fix that but otherwise I like it.
>>
>models suck at writing, no matter how much you feed them well-written fiction if it isn't in their training
>the more rules and examples you use to try and guide them to not shit out nonsensical metaphors, similes, adverbs and all sorts of garbage writing renders them braindead because they simply cannot fathom a sentence that isn't slop
>models can't even give feedback on human writing without either bending over backwards and through their own legs to suck your cock about how good you are at writing, defeating the purpose of seeking instant critiques
>even when they aren't completely obsequious cocksuckers, they insist on conflicting feedback and go "oh you're telling instead of showing here and you should fix that. Oh, did you do that because I told you to trim this section because it's slowing down the pace of some random element of the story that I think is more important than showing instead of telling?" ad infinitum
I don't even know what the point of these things are anymore. People say they suck ass for coding, suck ass at paying attention or remembering things, they clearly can't write or even act as a surrogate for a reader, translate well. It's a crapshoot trying to get a grain of something usable out of these retarded things
>>
>>108264702
True. Stop using them.
>>
>>108264690
looks good.
>>
>>108264730
I probably won't if by merit of potential alone. Enough has changed from 2022 to now that I at least have a speck of hope that these things can be useful instead of overtrained nannies. I just have to at least bitch at least once a month so maybe the unpaid interns that train on mesugaki prompts might consider real world language uses outside of stem
>>
>>108264702
I think they're cute and I like them and thats good because it is
>>
Someone should make a 3T-A80B model. Then they run a Q4 of it and it'll be like running full precision GLM 5. Can you imagine how knowledgeable such a model would be?
>>
>>108264748
>at least
>at least
>at least
Rep-pen will be useful again when they train on your posts.
I still have fun with them. Adjust your expectations or realize that it's not for you. Or come back in 5 or 10 years, whatever.
>>
>>108264745
I know I shouldn't be impressed but except for 4chan and Nyaa it was able to figure out icons that worked for the most part.
Sadly the font package they use didn't have a four leaf clover, or at least that is what the model told me.

With respect to coding it does a decent job as well. I have been using it for a little project in python and it did a great job up until i wanted to use enscript to format the plain text.
It kept writing code but the flags it gave to enscript didn't match the man page for enscript.

regardless i was able to get it to write a script that is able to use rss to pull a bunch of news articles and then feed them back into the ai for summarization without issue.
Here is what it ssummarization looks like giving some specific prompting to make it look like an intelligence briefing
https://pastebin.com/FhuMukJW
>>
>>108264780
No I can not imagine that because most of that size would be wasted due to the shittiest datasets they use. How hard can it be to filter the default OAI or Anthropic refusals and phrases if they have to farm the prompts for their shitty inbreeding? How hard is it to avoid including any safetycrap that dumbs the model down?
>>
>>108264783
I've been sipping some brews, sorry I wasn't proofreading my 4chin posts to be sure to satisfy the highest of standards of lmg
Doesn't change the essence of what I said, either way.
>>
>>108264820
You should stop trying to use them. It's senseless. A complete waste of resources. And if you're going to sell your gpus, post the links here.
>>
>>108264820
Sounds like you need a sip of super restore after all those brews.
>>
File: fligu-migu.png (85 KB, 296x256)
85 KB
85 KB PNG
>>108264780
>you now remember Llama 4 Behemoth
>>
>>108264836
Doubtful you'd be able to buy them, also didn't address anything I said
>>108264840
Nah.
Good talk. Very conducive. Glad that this is what we have left in lmg
>>
File: 1746176772801983.png (457 KB, 1266x1644)
457 KB
457 KB PNG
>>108264311
>>
File: 874483870.jpg (901 KB, 1600x1200)
901 KB
901 KB JPG
> never been on the highlights as i shit post too much
> suddenly an idea pops into my head
>>
>>108264883
There's nothing to say, anon. Sulk away. We're all here for you.
>>
>>108264949
I wouldn't worry about the DoW spying on US citizens. The US will have the UK or Israel spy on US citizens while the US spies on their citizens and then the different governments swap data.
>>
Imagine getting killed by a next token predictor running on an nvidia GPU.. grim
>>
>>108264958
>amputee miku
>>
>>108264949
>DoW showed deep respect for safety

Words no longer have any meaning.
>>
>>108264976
I'd rather an MTX chad take me out, myself
>>
>>108264977
> its ok nobody looks that far down
>>
>slop
Honestly the prose is on par with 90% of modern fiction. What needs to be worked on is memory and the ability to handle complicated stories with multiple characters in a consistent and coherent setting.
>>
>>108264979
Of course they do, it will refuse to describe nsfw but happily plan to destroy anything you want.
True safety is about nipples.
>>
>>108265049
But he only uses well-written fiction, assessed by *himself*. You see. His tastes are sophisticated. And you know what? He's RICH too. Highly educated, tall, charming. He's nothing like us. Some people are simply better and they deserve to be snobby about it.
>>
Let's see Paul Allen's LLM
>>
File: paulallen.png (1.12 MB, 1568x974)
1.12 MB
1.12 MB PNG
>>108265098



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.