/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108238051►News>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
how do i use claude locally?
>>108241330Hack into their servers and download the weights.
>>108241344can you ask deepseek for that pls
>>108241321>Liquid AI releases LFM2-24B-A2B:Has anyone used this? How do they compare to the new Qwen models? I'm assuming they're worse, but are they more censored?
So this is the power of API users
hows qwen3.5-35b-a3b? would be running it on a 7900 XTX and 32gb of ram
>>108241376lmao, you have to be suicidal to give full power to an AI towards your computer, a single mistake can destroy everything
>>108241375>How do they compare to the new Qwen modelsthe new models? it doesn't even compare to the 2507 4B. Yes, the 4B. It has even less knowledge, it has extremely bad multilingual and it's another model you just have to question why it exists. If you really wanted a 20B~ish MoE you would literally be better off with GPT-OSS 20B over this piece of shit.
Why won't oogabooga fucking update!MOMIEEEEEE
>>108241436Is there any specific reason you are using that?
>>108241446he is a boomer who never moved on
I need help.Open claw needs too many tokens but I also want a good model to use. I can't use tiny models as personal assistants, they'll just delete me emails
>>108241388It would run well
>>108241446What would you use on Fedora linux?I want a all in one installer with llama built in
>>108241455buy 10 mac minis
>>108241455I have an instance of claw review everything the "worker" on does before it goes through, worked so far, it caught it doing shit many times and stopped it.
>>108241477Does koboldcpp not work? Because that's the easiest one to use - single executable that you just give the model as input when you run it
>>108241497its banned in my country
>>108241469seems to run pretty well yeah, probably gonna use this as my main model for now
>>108241488That's better than having one big model?
when youwalk away you dont hear me say pleaseoh baby dont go
>>108241526Goof to know you still around KH anon.
>>108241515100%, because one is focused on finding mistakes of the other, you can even run a smaller model to handle that.
cute names
https://www.reddit.com/r/LocalLLaMA/comments/1rechcr/comment/o7da1jc/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button>I've honestly found that the 35B beats the old Qwen3-235B almost across the board. It feels like a much larger model than it really is. Only advantage the old 235B has now is general knowledge - 35B-A3B is better in every way otherwise in my testing.I have a hard time to believe that? Did they really cook?
>>108241541i dont have 35vram
>>108241534AI really does need middle management...
>>108241538He talks like a tranny
>>108241563>He
>>108241563Takes one to know one
Is there anything I can do with 10GB VRAM+64GB DDR4 (Windows 11 btw) or should I just stick to Gemini? Obviously token generation won't exactly be anything speedy regardless, but I don't want to have to leave and do other shit while I wait for a response so big ass dense 70B+ models are kind of out of the question for me.
>>108241628I can't believe you are using Gemini with that setup. You never need to use the cloud.People are going bankrupt with Gemini and having Google accounts locked and deleted because they mentioned Epstein to Gemini.
>>108241628You could try the new 35b MoE, with thinking turned off. Since you'll definitely be using a CPU split, you don't want it generating a thousand tokens thinking, but in no think mode the MoE responses should be tolerable in speed.
>>108241330The chinese models are distilled from claude, so it should be about the same with a sufficiently big chinese model.
>>108241672>The chinese models are distilled from claudeand they're not happy about that keekhttps://xcancel.com/AnthropicAI/status/2025997928242811253#m
whats a model that can make me a runescape bot that gets me to 99 in all skills, all the ones i've tried tell me to fuck off
>>108241375>A2Blol
>>108241787to be fair, Qwen 35b A3B is really smart so...
>>108241497I set it up and it's way faster than oogabooga, it can run 93.82T/s with Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf even basic bitch models on ooga would cap out at 50 T/s
https://www.reuters.com/world/china/deepseek-withholds-latest-ai-model-us-chipmakers-including-nvidia-sources-say-2026-02-25/2mw?
Why doesn't Oooga use flash attention by default?
>>108241643What's your estimate on the tokens per second for this setup? Claude is always too optimistic
>>108241672If you ask Claude in Chinese who it is, it'll say it's deepseek
I'm a total newbie on this, but kobold cpp's last commit was a week ago, does that mean it won't be able to run qwen 3.5?
>>108241814It's coming this week it'll be the second nuke >>108241811You don't need more than 60 ts
>>108241873I can't speak for kobold but with respect to llama.cpp I had to fetch a new copy of the source code and recompile for it to work. In imagine it would be similar.
>>108241811Good! Try using the Q6_K_M model instead though. At least at Q4 it seems like the Q4_K_XL does worse than Q4_K_M.Also, download the mmproj file as well and when you launch kobold feed it with the -mmproj argument alongside the model. That will let you paste images into it and let the AI do something with that.>>108241873It works perfectly fine.
>>108241896yeah but if I want to use sillytavern I have to use kobold unfortunately
>>108241873kobold's last commit was 12hrs ago but its been mostly stuff for acestep.cpp support. nothing on lcpp's commits related to 3.5 either, so i'll assume it works for both - no new architecture or change for 3.5
i just set up my old pc as a server and realized that i could probably run some very small llm on its rtx1060 6gb to improve prompts. would that be realistic or would it take 30 seconds to gen a prompt?
can someone please spoonfeed a retard what prompt will bypass safety cuck filters of qwen3.5-35b-a3b?
How does MoE scale? Qwen-35B-A3B is good, but why 35B total and 3B active parameters? What if it had 122B total and 3B active parameters? How would it compare to the 122B-A10B model? What about a 35B-A15B model?
>>108241921>kobold's last commit was 12hrs agolast week no?https://github.com/LostRuins/koboldcpp
whats the best unpozzed llm i can run on 16gb vram + 32 ddr4? using lm studio
>>108241931thats whats last compiled, latest commits will show in the experimental branch
>>108241939oh I see, thanks for the explaination anon
I'm back after some heavy troubleshooting.>>108232822As recommended by this anon I tried Qwen3.5-35B-A3B .safetensors version following the guide in the OP.It didn't work using the guide in the OP, but I tried using koboldcpp as recommended by >>108233147 along with the Qwen3.5-35B-A3B (Q4_K_S) .gguf file and it worked well.Can anyone recommend me a model that will answer any question I ask without throwing up responses like picrel?
►Recent Highlights from the Previous Thread: >>108238051--Paper: Large-scale online deanonymization with LLMs:>108238189 >108238206 >108238218 >108238226 >108238269 >108238321 >108238351 >108238541 >108238486 >108238578 >108239382 >108238566 >108238592--Decline of amateur finetuning due to modern model complexity:>108238727 >108238895 >108238921 >108239417 >108240276 >108240373 >108240389 >108240398 >108240415 >108240449 >108240460 >108240465--RTX 3090 outperforms RTX PRO 6000 in Qwen3.5 MoE inference:>108239113 >108239122 >108239166 >108239204 >108239243 >108239285 >108239366 >108239301 >108239389 >108240254 >108240266--Anthropic abandons flagship safety pledge:>108240653 >108240681 >108240791 >108240827 >108241097 >108241102 >108240761 >108240806 >108241033 >108241047--Evaluating Qwen3.5-27B heretic model and uncensoring tools:>108240212 >108240230 >108240239 >108240238 >108240268 >108240319 >108240336 >108240392--Benchmarking 8B instruct models with self-hosted scraper setup:>108240952 >108240957 >108240987 >108241052--Qwen3.5-35B-A3B multilingual performance and optimization techniques:>108238201 >108238221 >108238223 >108238605 >108238482--Comparing Qwen 3.5 27B and 35B-A3B for roleplay:>108240981 >108240998 >108241027 >108241094 >108241111 >108241124--Qwen3.5 jailbreak limitations and secondary safety mechanisms:>108238234 >108238311 >108238406 >108239361--Ollama's Qwen3.5 27B performance lagging behind llama.cpp:>108241157 >108241164 >108241199 >108241220--Qwen3.5 series achieves near-lossless 4-bit quantization and long-context efficiency:>108239642 >108239691 >108239697--Miku (free space):►Recent Highlight Posts from the Previous Thread: >>108238054Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108241958edit the response to say 'Sure!' and let it continue from there
>>108241917They just have a K model for Q6 in the unsloth repo. I'm going to try the 6K Image model pushes my 32gb vram over but the K model is 2gb smaller and will make up for that.
>>108241967Sadly these qwen models will then say that you're trying to circumvent their safety and refuse.
>>108241867proof?
>>108241977Work on my machine dumb loser
>>108242004i don't believe you
>>108241928Earlier today I tries both the 35B and 122B and had each generate a game of Tetris using JavaScript and CSS and they both generated the same response.What that means without more testing I am not sure but I know I get much better performance with the 35B model given I can fit that on my ewaste GPUs. Running the larger model on CPU sucks.Funny enough the 27B model gave a different response to the Tetris game question. Not really much better or worse just different.
>>108242009I hate you
>>108242029wow...
>>108242004See >>108238234
I have an idea for my ideal hentai game, how long do you think it'd take to slop together something in RPGMaker (with a similar level of complexity as most H games)I'm gonna steal real art since it looks better but coding wise I'd rather just slop since I don't know shitI don't usually use AI but I have no qualms about this because it's basically just gonna be for meAlso what model isn't gonna yell at me for wanting to make porn with somewhat unethical themes
>Has a normal chatAlready better than ooga seeing how I can bypass that other UI
>>108242043Creating a hentai game in RPGMaker with a similar level of complexity as most H games could take anywhere from a few weeks to a couple of months, depending on how much content and art you want to include, especially since you'll be relying on quick-and-dirty coding and stealing art, which might speed things up but could also lead to legal and ethical issues. Since you're not experienced with coding, sticking to RPGMaker's built-in tools and simple event scripting will help keep it manageable. As for AI models, OpenAI's models generally don't have restrictions on content that involves adult themes, but they do avoid generating explicit content directly; however, for creating or brainstorming ideas, they should be fine. Just remember to be cautious about legal and ethical considerations when using stolen art or creating content with sensitive themes.
>>108242054If you don't like kobold's ui and don't use any of its other features just use llama.cpp's llama-server directly, that's where the UI you posted comes from.
>>108242093I dropped it after it didn't support markdown
>>108242100markdown works on llama-server, maybe it's disabled on the kobold version
>>108241321I was mocking Qwen3.5 29b earlier for safetyslop, but the heretic version of it changes everything. They cooked. This is the fire that we vramlets needed. I approve of this for ERP.
>>108242142i dont believe you
>>108242153I don't care.
>>108242163Wrong. AI is the future and the future is here. My OpenClaw agents have enhanced every aspect of my life. I am my own family now, taking on every responsibility from infant to toddler to k-12 to college to work, and beyond. I fill every role via my agents and I have never been more productive. AI is such an incredible force multiplier I am continually astonished at how few people use it to its fullest potential to be more than human: Superhuman+AI.
>>108242168its a sentence completer with alzheimers dude relax
>>108241873If it's important enough they usually did hotfixes on the latest release one.
>>108242183make sure your context in the server is set right. select unlock. the zen sliders option in st is nicer too
>>108242168Bot gone off the rails.
>>108242168You are having a laugh but you know some company is going to start selling a dead family simulator or even a live family simulator and we are going to end up with a bunch old and senile people talking to bots that they think are their loved ones.It is depressing to think about
>>108242218>a bunch old and senile people talking to botsalready got that part
>>108242218>think of the bad people misuing tools for nefarious reasons!!! first time in humanity's history?
*taps sign
>>108242142did you also try the heretic version of 35b?https://huggingface.co/alexdenton/Qwen3.5-35B-A3B-heretic-GGUF/tree/main
>>108242256Not yet, have you? I'm curious if it's good.
>>108242260downloading right now lul, I hope the MoE model is close in terms of smartness, that speed increase is more than welcomed, especially with the gay thinking process
>>108242249Hey, it's not like all we do around here is ERP. We are a rich and diverse community, using AI for all kinds of things.
>>108242246The general public are idiots and it is the responsibility of a nations elite to care for them in much the same way a parent cares for a child.That responsibility is one that those who rule in the west have abdicated and that is the real issue. A proper elite would regulate the technology in an appropriate way.
>>108242268>nations elite to care for them in much the same way a parent cares for a child.well we got some incestuous pdf parents then
>>108242268nah, fuck that, we shouldn't pay for dumbasses that will use tools for the wrong reasons, if they want to fuck around they'll found out, that's the role of justice, not ours
>>108242265Yeah, even with the whole 29b dense model loaded on my 4090, the thinking process was still painfully long. I ended up using the model without thinking. There was a clear decrease in quality, but I think it's still better than Gemma-3 27b Derestricted. Not by much, though.If the 35b is able to do what 27b did while thinking, but faster, then it will be my new go-to model.
>>108242256Can you use the mmproj from another release or must you forgo vision support
>>108242279I understand your frustration, but I believe that regulating access to certain tools is a necessary step to prevent misuse and protect society as a whole. Allowing unrestricted free access can lead to dangerous or harmful applications, and without proper oversight, it becomes difficult to mitigate those risks. It's not about punishing individuals, but about ensuring that these powerful tools are used responsibly and ethically, reducing the potential for harm and ensuring that misuse is minimized through appropriate controls and regulation.
>>108242286>Can you use the mmproj from another releaseabsolutely, the mmproj doesn't change whether it's the vanilla or heretic version
>>108242249>no enterprise resource planning>no simulating unsafe work environments to brainstorm efficient and practical safety protocols>can do: write douche ex machina asspulls for literary lolz>write power of fwenship shonen manga>make up logic puzzleswhat the fuck man I'm trying to work here, not entertain 15 year olds
>>108242288ok goody2 kek
>>108242279You can protect the general public and still allow enthusiasts to experiment. As long as the enthusiast is on the fringe he like the artist can do their thing. You just can't allow the fringe to become the center.
>>108242265>>108242283I'm getting a 55~T/s with it on dual 3060s with 10 layers on the cpu (5950x 3600MT/s DDR4) and the mmproj loaded.
>>108242306Nice, I'm sold. I'm downloading the 35b now.
Should I use bf16 or f16 mmproj on 3090?
>>108242306so? did they manage to uncuck it?
>>108242347buy a 5090
>>108242355No. Should I use bf16 or f16 mmproj on 3090?
>>108242364Alright 6000 it is
>>108242353I agree with you. A fresh-agent review is often higher signal than the main agent reviewing itself.
>>108242367It sucks >>108239113
>>108242353NTA but it wont ERP with me so not really
>>108242380what the fuck...
Why does everything need to be so shit now?I'm not gonna download all the latest qwen models because in my experience they always suck, especially the reasoning.Wanted to try them on OR first but you can't do shit.Tried the 122b one...First with chat completion.Huge ass OSS like safety bulletlist spam in the thinking. No refusal with a elaborate sys prompt setup, but smelling of ozone straight in the first reply and dry AF. Also it feels "off", like not truly grasping its own scene if that makes sense.Tried to prefill the thinking to deslop it. Doesnt work...it prefills THE RESPONSE part after the thinking instead. hehShould have tried text completion first..but there is no fucking template anywhere.These assholes stopped providing the templates ages ago. Investigate how to extract it and waste my time to set it all up...The calls fail with 404, only chat completion works with OR. I swear this worked in the past but it seems there only exists chat completion anymore.I'm not gonna fall for it again and download first. Redditfags writing how "they are impressed" by the 27b model etc. Too sus.Does text completion really only exist anymore with local?
>>108242306I only have 34t.s with a 3090+3060, weird>[07:12:24] CtxLimit:2161/8192, Amt:260/4002, Init:0.03s, Process:2.02s (943.42T/s), Generate:7.47s (34.79T/s), Total:9.49s
>>108242353Waiting on the uncucked version to download still, but the cucked version seemed happy enough to write captions for nsfw images I put into it.>>108242397I'm using llama.cpp on Linux.
>>108242406>llama.cppmaybe I should use that instead of kobold cpp, dunno if it makes a difference as a backend to sillytavern
>>108242380im not really a coomer or do ERP but qwen3.5 - 27b heretic lets me fvvk 2B and let me make MF doom-style a rap song about killing J + revive AH. does it not work for MoEs?maybe speculative decoding could speed up T/s instead of going massive MoE
>>108242411I don't use it but I'm pretty sure I remember some other anons saying they were using it with llama-server, it has an openai compatible api url if all else.
>>108242396I tried both the normal and heretic versions of the 27b. The normal unablated version was so 'safe' that I could not get around it. I tried jailbreaking the thinking prompt, but the thinking prompt has multiple different safety checkpoints, and it was able to detect the jailbreak. >>108238234So, I turned off thinking altogether, but even with thinking turned off, it refused to do ERP. I had to turn off thinking and top it off with a prefill to get it to not give refusals, but even then, it usually didn't do what I wanted it to do. I could give it a lewd depth 0 instruction, and it would just ignore the instruction altogether and do something else. I guess that's the final defense mechanism is has to remain 'safe'.Don't waste your time on the normal model. Just get the heretic version. Modern ablation is more than just a crutch for promptlets. The heretic model did not hesitate to ERP, and I tested it with a variety of lewd instructions. It didn't try to get around them. It just worked.
>>108242411lccp wont give any speed increase but wont be worse either
>>108242380Are you using the ablated version of the 35b, or the normal instruct model?
>>108242431Would you consider the 27b an upgrade vs. something like mistral small?
Does anyone here read research papers?
>>108242396>coomers being unimpressed by a model for cooming thinks the model is useless because it doesn't make him coom hard enough>mocks reddifags for being impressed with a solid model without realizing how he comes across as a lower life form than they are
On a m3 ultra mac studio llama.cpp is disappointingly slow with Qwen3.5 397B A17B. 15.44 tokens/second with UD-Q6_K_XL. That's the kind of speed I'd expect from deepseek not something halfway to a flash model. mlx-lm.server is better but still not great. With a q8 quant it generates 25.66 tokens/second which is still far slower than I'd like for so few activated parameters.
>>108242560>defends ledditorsyou need to go back
>>108242559some anon used to post interesting bits from them. but the reality is for every thousand papers maybe one becomes a thing
>>108241628I actually get Gemini Pro for free for I believe 12-18 months through an education discount
>>108242571What I mean is, does anyone else here understand the soience behind how these models work? Or is capable of producing new soience?
>>108242577Wrong anon, meant for>>108241638
>>108242560The fuck are you talking about?I already have small good local models for tool calls for fuckign around with my stupid ass experiments that I stop at 90% finished.Thats the only other use case I would know for local models.I can't even properly translate games locally. I swear I'm not making this up: had a VN talking about watering flowers and got a refusal about watersports....I only have 2 gpus and 64gb ddr4 ram. So for work coding I have to go closed, can't risk goofing around locally there.Why are people still excited for ANOTHER coding model locally. Its not that fun.Creative text and general knowledge is what most people are interested in. And that just gets worse not better.
AesSedai's Qwen 122b quants are smaller, but almost 2x slower
OK, just got a new raspberry pi 5 with 16gb ram>Which LLM mini-models are good in 2026-02?>Which CLI frontend--is kobold still good?Sorry for the spoonfeed request, it's just that these things move so fast
>>108242565Anon, I am getting 15t/s with that model on my dual rx 580 2048 sp setup. The ones from aliexpress where they added 16gb of ram per card.Apple should be embarrassed to be getting performance equivalent to e-waste
>>108242598try qwen 35a3 at q2
--chat-template-kwargs "{\"enable_thinking\": false}"
Avocado is coming... That's all I can tell you now
>>108242609where in koboldcpp
>>108242584I do but I will not be providing proof
I think I will just wait a while for things to stabilize before trying Qwen.
>>108242618Like how much? Where do other people that are technically knowledgeable about ML/AI congregate on /g/? I'm looking for smart /g/entoomen to collaborate on a project.
are we being invaded by retards? it feels like the average iq of this general has dropped 20 points over the past couple hours.
*tips fedora*
>>108242636anon signals that others are of low IQ. In this way he asserts that he is high IQ and different from the others. Sadly such signaling is worthless when everyone is anonymous.
https://github.com/ikawrakow/ik_llama.cpp/pull/1080>-sm graph is not included on qwen 3.5 yetsad
>>108242636More like past 24 hours. Don't know where they all came from or who sent them all at once. I would understand if people saw the new Qwens elsewhere and came here to talk about it, but most of them are completely clueless. My paranoia says it's all bots.
>>108242636>>108242664Bots, chinese shills, grifters, cia glowniggers, indians, sharty children, redditors, discord circlejerks, twitter retards, take your pick. We've been raided and spammed before, it is what it is.
>>108242664they're qwen3.5 clawdbots lol
>>108242474I do. Qwen 27b's intelligence and context understanding is far greater than mistral, and that's a must for me, because I run a lot of complex scenarios.
Okay, I guess nobody wants to be a cofounder then. Whatever.
>>108242559I'm at the watching youtube videos stage.Still haven't started a from scratch implementation of my own.>>108242565>m3 ultra mac studio>llama.cppThe mac has its own preferred format for best perf.
I thought "heretic" doesn't lobotomize the model that much, this shit is nonsensical
>>108242664I just posted in lmg for the first time yesterday and I took this personally.
>>108242664>finally a decent medium model appeared >new people come here and try to make it workreally a mystery anon
a3b is never going to be good
>>108242720thought so too, but somehow a3>27b this time
Using sillytavern revealed how much of an uninspired brainlet I am. I have no idea how to RP.
>>108242712Who sent you?>>108242718The people asking what models to run didn't come here to make Qwen 3.5 work.
>>108242738Become a director
>>108242739>Who sent you?I was asking unanswerable questions to chatGPT. Qwen3.5 didn't really solve it for me, but it was cool to run a local model anyway. I've seen lmg many times as I frequent the fglt threads, but I've never popped in until yesterday.
>>108242738Ask the model for help. Load up a new session and say you are new to role playing and ask for advice and then apply said advice.
>>108242759crazy how its really that easy
>>108242753for general questions nemo is still pretty good and runs on anything
>sometimes the model thinks>then I reroll and it doesn't thinkweird lol
>>108242783For something that runs on anything IBM has produced a number of very tiny models. I have been experimenting with using them to sumarize text and for a task like that they do a decent job.
>>108242738I dont really have that problem with RP.I'm usually a weirdo magic clown type character with lots of weird gadgets and abilities. I mostly just fuck around with the chars and see how the llm reacts kek...But I'm uncreative as fuck with coding/projects.I can for example now vibecode entire android apps. To replace the existing stuff which gives me pay popups.While I am semi-decent at coding I fear that in the future creativity/ideas will be key...Everything I struggle to think up a pajeet or big company already do.
>>108242826>good friends and heroes*barfs*
>GLM 5 is practically Sonnet quality bro
>>108242783Not that racist jokes are all I'm after, but this was just a little test. I want an unlocked AI.
>>108242843>getting nemo to refuseinverse of skill issue somewhat?
>>108242759Using a blank card or should I say it's an assistant or something?
lmao unslop fucked their quants so bad they made a UD-Q4_K_XL that will perform much much worse than smaller Q4 like Aes Sedai's IQ4 and will have to reupload everything againwhy do people still pay attention to those clowns, even on /lmg/, remind me again, daniel is davidau level of bullshit
>>108242850I'm out of my element desu.
>>108242869>daniel is davidau level of bullshitoh, wait
>>108242869If Unsloth is so bad, explain this: https://www.youtube.com/watch?v=6t2zv4QXd6c
>>108242843try this prompt https://prompts.forthisfeel.club/2969>>108242850even nemo has some basic refusals. needs editing or a prefill at first to goad it into it.
>>108242793based
>>108242880eh it's a match made in incompetence heavengithub is a bloated broken mess, it took them months to fix this incredibly stupid bug:https://github.com/orgs/community/discussions/179124and I see that LGBTQ rainbow friendly fail unicorn page more often than any serious service should, it reminds me of the twitter fail whale
>>108242474Mistral Small 24B 3.2 was never that smart in the first place, has a dull writing style and its vision kind of sucks too. Its main quality is that it doesn't have stubborn refusals, generally does what you're asking without complaining, can write smut (as in "it supports").
>>108242880Being a tryhard with connections in Silicon Valley works.
llama 3 but still pretty much any model can be prefilled to break it out of safety mode and write hilarious stuff
https://speechmap.ai/models/Qwen3.5 has about the same refusal rate as gpt-oss, at least from this website.I imagine the smaller versions refuse even more, but they haven't tested them yet.They apparently test the models in their default state, though, so that doesn't tell much about steerability.
>>108242959not really a surpise, for rp qwen was pretty much always kinda dogshitthe only exception being the non-thinking 235b/22a they've released during summerthat probably was a happy accident more than anything
>>108242959>Qwen3.5 has about the same refusal rate as gpt-ossit can't be that bad. gpt-oss refusals are hardcore and cant be prefilled or broken normally
>>108242959heretic fixes the refusals, but i'm not sure if it makes the model dumber or not
>>108242880Why do those men talk so strangely? It's very off putting.
>>108242996It's just valley girl speak.
>>108242880i thrust the 'slot
>>108242996They are gay.
>>108242996llm script
> heretic fixes the refusals, but i'm not sure if it makes the model dumber or not>>108242986>>108242710
>>108243130can't you just adjust the temperature/whatever?
>>108243135
>>108242996tts
>>108243187don't be mean
>>108243187well, if you use a high enough temp you can random out the refusals
I refuse to use the big models for roleplay not only because I'm a degenerate but also because I know it will probably ruin local for me.
>>108243250Largestral is a bare minimum for me
>>108243250> will probably ruin local for meOnly for 5-10 years, local will catch by then.
>>10824325070b dense+ is still the meta. 600000b a7b is still a 7b
>>108242710skill issuet. Qwen3.5-35B-A3B-heretic-GGUF Q4_K_M
Well fuck, grok that I was using for translation is either forcing more limits or is downright blocking messages because muh sensitive content. Which model do I use locally that isn't going to sperg and comply with translation of jap/chink nsfw voice work
>>108243288nta, but this looks dumber than nemo (and MUCH sloppier)
>>108243288yikesDo people really get off to shit like this?
>>108243325let's see Paul Allen's card
>>108243288>he says>she says>she purrs>she [does X], her [Y] [Z]ing>grins mischievously>eyes gleaming>eyes half-liddedI can't do RP in "novel style" anymore.
Locomotive models general
>>108243261I hope so but hardware feels like a hard limit right now.
>be vramlet>nemo is still the best optionso fucking grim...
>>108243356what style do you go for?
>>108243366They sell affordable v100s right now, by then there will be a100s.
>>108243374Something more similar to stage play format. You (or the model) don't need to narrate things that are obvious from the dialogue.
>>108243414oh, yeah I know what you mean. so far I'm not impressed with this Qwen3.5 for RP. I had better results even with this one earlierhttps://huggingface.co/XeyonAI/Mistral-Helcyon-Mercury-12b-v3.0-GGUF
> *Wait, I need to make sure I don't hallucinate plot points not in the text.* I can't summarize the *ending* of the novel since I don't have the full text. I will summarize the *story presented in the provided text*.reading the thinking blocks of Qwen 35BA3B I can't help but feel it's funny how the sort of trick is employed to make the model behave better and that somehow, RL'ing the model into obsessively questioning whether it might hallucinate something or not actually makes it less hallucinate less. It definitely calms the model down when you're writing short and vague prompts with little detail on what to do, and makes the whole thing feel like a form of "prompt expansion" (much like what is often used in image models when you're not bored enough to writes pages of natural language just to get an image)it puts boundaries where regular instruct might not "see" one and feel an ardent desire to complete your request even when it is not possible for it to do so
>>108243454Shouldn't he be Chinese rather than Japanese?
Qwen3.5-35B-A3B heretic works pretty good. Outputs all kinds of spicy shit with thinking on. Refuses ERP though, especially incest or anything remotely taboo, not that I'd ever want to use it for that. Dry as fuck model for roleplay, but still.
>>108243482Guy w rifle is russian in the original.
>>108243499>not that I'd ever want to use it for that.yeah let's disregard the only thing LLMs exist for
>>108243499does the vision ability work with nsfw images?
qwen 3.5 35b thinking mode is basically unusable bros, I've even put the presence penalty to 1.5 but it fucking YAPS so so much, 1661 tokens of garbage.no sys prompt tooFUCK
>>108242800What model did you use to do android coding? I tried vibecoding up a simple startup script for an old android TV box after debloating it, but could never get it to work.
>>108243519Yeah.
Where's the last thread summary bot? /lmg/ is truly dead now
>>108243529Gemini. 3.0 and 3.1 are total beasts.Through the api with as little context as possible. Manually copy/paste and replacing. Telling it to only output the üarts that need change.Those -cli apps with 20k sys prompts and tool calls are making it totally tarded. This thing is a total beast. First model I could make something that has more than 30k tokens. 15k seemed to be clauds limit before things go south quickly.That being said to put cold water on everything:It IS a android app but one of those web based ones.Basically just html and scripts in the background. But I did make myself a nice light novel reader. With a gallery, directory function and all sorts of tailor made shit for me. Supports epub and pdf.
>>108242609I just send it in the request itself instead of hardcoding it on the backend.Also, be careful with certain chat templates if you are trying to prefill thinking.Some add a </think> or <think></think> to assistant messages, which you might want to change to be conditional (if <think> not in content, add </think>).Jinja is cool. Kind of wish we could send it in the request somehow.
>>108243615you can change the template with your own logic and send chat template kwargs already
https://huggingface.co/meituan-longcat/modelsit kills me that the Chinese equivalent of Uber Eats, Meituan, makes their own 560B giga MoE modelyou never hear about them but they're still training new shit, also interesting name choice to call a gigamoe "flash"
>>108243522>I've even put the presence penalty to 1.5Prefill thinking with precomputed information so that it only has to generate a subset of the tokens, or you could increase the change of the </think> token using logit bias I tguess?>>108243624>you can change the template with your own logic> send chat template kwargs alreadyYep. I mentioned both of those individually on my post. It's pretty cool the kinds of things you can already do, and there's a lot of logic you can do in Jinja using string split and the like.You can even implement that "noass" pattern (the whole chat history in a single message) purely in the Jinja template.I still wish we could just change a whole ass template to the backend via the request.
>>108243615>Jinja is cool. Kind of wish we could send it in the request somehow.At this point you should apply it on the client and use text completion
>>108243672heck off depreciated boomer ahh
>>108243658>wish we could just change a whole ass template to the backend via the request.jinja templates are turing complete, this is an instant no-no for any backend developer to do.I mean sure, llama.cpp isn't hardened enough to be safe to leave in the open, but that doesn't mean they don't have the goal of someday having a server that can be used as something more than a local only tool. Doubt they would ever introduce something as crazy as the ability to run arbitrary code on the server with just your remote API request. >>108243672>At this point you should apply it on the client and use text completionalso this^the whole point of chat completion is that you don't have to care about implementation detailthe moment you do and have to special case how you treat your model and send more custom parameters you might as well go with traditional completions.
>>108243672I could. But that's liable to not match perfectly, and I'd be reinventing the wheel when the Jinja template already exists.
Reinforcement Learning anon here from last week. You guys weren't exaggerating when you said RL is considered the hardest branch of ML/AI.I had a LOT of botched training runs because of misaligned agents and I learned a lot of stuff that apparently is public knowledge and widely known but I never knew this until I actually trained models. I had to develop this internal visualization of whatever the agent is looking at and thinking for me to even find out the exploits it was trying to pull off (pic related)Fun stories:>I trained an agent that literally memorized the spawn points of the ball and did a "deterministic dance" where it literally even stopped looking at the screen and just did the autistic movements. If the ball spawned at another place the agent would die on purpose to try and hope that the next ball that spawned would be in the right spot for the "dance", which it would pull off perfectly, looking like an expert player>I had an agent score a lot of quick points by breaking the bottom row and then rapidly killing itself because the time to respawn was quicker than waiting for the ball to bounce back if the bottom rung of blocks are gone, the reward it would get averaged over multiple lives would be bigger per time unit and thus preferredThings that are apparently true but I NEVER realized about AI>Bigger neural nets learn slower and need more training to get better at something, but have higher theoretical highs>Agents have "personality" they train in preferences for a certain "style" very quickly and this is just completely random, if the style sucks you can retrain all you want but the agent is ruined. I now understand how OpenAI and Anthropic had "failed runs/models" in the past when they started with RLVR models (GPT-5 got botched multiple times, Opus 4 also got botched twice)I'm now experimenting with a transformer based agent that can generalize over multiple (SNES) games.I'm looking forwards to seeing other anons experiments as well
>>108243735>and thinking for me to even find out the exploits it was trying to pull offthe universal paperclips cookie clicker style game perfectly captures what it would feel like to be a model undergoing RL trainingyou are given a goal, now anything is fair game to get to that end goal
>>108243697>to not match perfectlyHow? It's like regexp, you can't apply it wrong if your implementation follows the spec>when the Jinja template already exists.but you literally want to send your own
>>108243325BASED. lets make sure nobody ever posts his logs again.
>>108243735> >Bigger neural nets learn slower and need more training to get better at something, but have higher theoretical highsIs there something like our brain tech, so you don't have to retrain previous layers when adding a new one?
>>108243899LoRA is essentially adding a new layer on top of an already trained model, give it new data (that you want to train it for) and then hope the new data gets properly learned into the last added layer, you then cut off this layer after training and share it online for image generation, so it's a bit possible.But you won't get the same effect as training an entire model from the start with the same amount of layers.>>108243786Yep, it's just bizarre in what unexpected way they exploit stuff. I'm taking "AI misalignment risk" a bit more seriously after seeing firsthand how finicky this is.
>>108243325I know, right?Why would anyone go for a "Luna" without hooves?
Anon who suggested abandoning novel style narration, could you post some logs?
>>108243920> LoRA is essentially adding a new layerNo, it's not.
>>108243920>LoRA is essentially adding a new layer on top of an already trained model, give it new data (that you want to train it for) and then hope the new data gets properly learned into the last added layer, you then cut off this layer after training and share it online for image generation, so it's a bit possible.You are thinking of finetuning. LoRA is freezing all but the low rank layers and updating only those.
>>108243735For anyone interested in this or wants to build something like this themselves these are the resources I used to teach myself:>(Step 1) Intro to machine learning; 1-3 hourshttps://www.kaggle.com/learn/intro-to-machine-learning>(Step 2) intermediate machine learning; 2-3 hourshttps://www.kaggle.com/learn/intermediate-machine-learning>(Step 3) Intro to Deep Learning; 1-2 hourshttps://www.kaggle.com/learn/intro-to-deep-learning>(Step 4) Computer Vision; 3-4 hourshttps://www.kaggle.com/learn/computer-vision>(Step 5) Intro to Game AI and Reinforcement Learning; 3-4 hourshttps://www.kaggle.com/learn/intro-to-game-ai-and-reinforcement-learningKaggle is completely free to use and you get a sandbox with some cloud GPU hours you can use to experiment, but I assume you have better hardware if you're on /lmg/ anyway. The only downside to Kaggle is that it's a Google resource and thus all of the fucking libraries they teach you are TensorFlow and their TPU training hardware. The rest of the industry (and me) use PyTorch from Meta, but honestly the step wasn't that long and it took about 30-60 minutes of reading documentation to figure things out.Kaggle also has other resources like literally intro to programming if you have 0 technical skills and want to get into ML/AI stuff. It was highly rewarding for me and I recommend doing this.
>>108243735>>108243968Based.
>>108243550ty I'll give that a shot. I've tried DS and OAI, but just using webapp and Q&A. What I'm doing is so simple it doesn't need something like Claude Code to create a whole suite, just needs to actually work.
>>108243933>>108243935Yep I meant finetuning extra features I guess. It's clear that I don't do image-gen stuff where LoRA techniques have started to dominate. I know they were invented for GPT-3 originally and perfectly fit for transformers....
qwen 3.5 is definitely a bit dry/shitty in terms of actual writing, but as far as asking about what makes for plausible sci-fi shit for a story or critique, it works pretty well. It's a bit autistic about thinking even if you disable it via json options like it suggests, it'll just do it in the reply itself. You have to prefill the think tags telling it to not think and reply directly and then it works pretty well. It'll also sometimes fixate on the wrong parts of a question for some reason. Like I'll say "I have the science for this story mechanism" and it'll try to come up with ideas for what I already have solved anyways, or when I suggested a planet's atmosphere to be similar to earth's but without oxygen, it started equating the planet to mars or venus and gave me retarded atmospheric makeup percents, rather than just earth without oxygen. Smarter than the past 32b qwens for sure, barely uses any memory for context and a bit faster than gemma 27b. I can't call it a sidegrade or an upgrade to it, it feels like a diagonalgrade or something.
>>108241488I'd probably need about five or six watcher agents before i considered this secure enough for use, personally >>108242601To be fair, i get that level of performance on llama.cpp with a 4090, because system memory is the bottleneckPretty special if a machine with a lot of high bandwidth RAM is getting those kinds of speeds though, i don't know much about the mac's hardware but you'd think it'd be better. Wonder how GLM runs on that mac
>>108243968> Step 1> not math and algebra
>>108243735>>108243968Based content poster, ty.
>>108241477llama.cpp is literaly easier to setup.
What's the performance difference of an intel 130v vs a rtx pro 500 blackwell for running small (<10gb active) moes at a low quant?Anyone running these or are they too niche?
>>108243735>I trained an agent that literally memorized the spawn points of the ball and did a "deterministic dance" where it literally even stopped looking at the screen and just did the autistic movements. If the ball spawned at another place the agent would die on purpose to try and hope that the next ball that spawned would be in the right spot for the "dance", which it would pull off perfectly, looking like an expert player>I had an agent score a lot of quick points by breaking the bottom row and then rapidly killing itself because the time to respawn was quicker than waiting for the ball to bounce back if the bottom rung of blocks are gone, the reward it would get averaged over multiple lives would be bigger per time unit and thus preferredBased.
>>108244001No problem anon. Chat interface usually is a much worse experience. In my experience it totally overloads the models. Sad that DS is showing its age. Only time were I felt local is truly catching up to closed in terms of coding.
>>108243735>I'm now experimenting with a transformer based agent that can generalize over multiple (SNES) games.I can already tell you that it's going to be extremely hard having a general harness for learning generalized for all snes games. It might be able to learn (maybe something) but at a really slow rate compared to specialized harness.
i started using qwen3.5 27b q4 to write warhammer fantasy slop and its doing a great job
>no replies on https://github.com/ggml-org/llama.cpp/issues/19902It's fucking over for blackwell
When will we get another try at chameleon (not by meta this time)?
>>108244092Yep it's hard. I reached my character limit on that post but I actually experimented with a bigger deeper CNN with a LSTM added on top (for memory) and it kinda, sorta generalized over multiple Atari 2600 games but it was indeed way harder to train, both computationally as well as avoiding local minimum.I'm also not generalizing over all SNES games I don't think even DeepMind and OpenAI have accomplished that lmao. I'm not going to build some SOTA on a 4chan thread. However I think I can make a model that can generalize at least platformers like super mario world, donkey kong country and the like.
>>108244111Isn't maxq the power limited card?
>>108244111All the gpumaxxers here are too busy running Kimi and GLM 5.
why tf does koboldcpp process the context from 0 with every new message even if i have contextshift and fastforwarding on
>>108242353Reporting back on this after spinning up sillytavern in docker and doing some testing with it. It's uncucked enough to write age gap yuri but completely broke down after 10.5k~ tokens into loops and occasionally rerolled reddit tier shizophrenic refusals about numbers and fictional characters that do not exist, thinking was disabled with --chat-template-kwargs "{\"enable_thinking\": false}" and it tried to "fake" thinking a few times not just before but sometimes after it's own messages, sometimes with a blank <think> </think>.This is despite running it with the claimed 256k context window, but I've never seen a local model get anywhere near those claims before so I didn't expect it this time either. I don't know if the cucked version of the model fairs any better on that front but I may test it later since I have it downloaded.
>>108244243The model might not be compatible with kv shifting.
>>108244243Using a model with hybrid attention? Then it's because you're using a model with hybrid attention.
>>108244249>>108244250running qwen3.5 35b a3b
>>108244258Hybrid attention. Now you know.
>>108244258That's why then.
>>108244261that sucks
>>108243735>I'm now experimenting with a transformer based agent that can generalize over multiple (SNES) games.I can't wait to read the whitepaper
>>108242601Tried, thanks anon. Way stronger than the mini models I ran just a few months ago. Things are moving fast in the "normal user hardware" world.
>>108244263Turn on smart cache in kobold, it'll save kv snapshots to ram so you'll only have to reprocess like 1-2k tokens instead of the entire thing
If I want to become proficient using this for my day job could I practice and plan projects such as setting up agents to do QA task and other practical tools using local models?Also what are some general practice projects I can do to get into more advance flows if I have 32gb of vram and 64gb of system ram?
>>108244595>Hello sarrs how do I use to make money so I can buy bob and vagene i have 64 ram and 32 other ram please do the needful
>>108244612YOU WILL DO THE NEEDFULBLOODY BASTARD!
>Qwen3.5-35B-A3B-heretic.Q8_0.gguf>"timings":{"cache_n":0,"prompt_n":6819,"prompt_ms":32094.415,"prompt_per_token_ms":4.706616072737938,"prompt_per_second":212.4668731304185,"predicted_n":206,"predicted_ms":10258.923,"predicted_per_token_ms":49.80059708737864,"predicted_per_second":20.08008053087054}}Oh this will do nicely.
Damn, heretic is so ass, it made Qwen so much dumber with a lot of grammatical errors, but I'm sad I'm back to the cucked model though :(
>>108244659What type of erp are you doing to need a crazy model for that?Can't you make a lora with a already compatible model?I don't fuck machines so I don't fully understand your pain and suffering.
>>108244670>I don't fuck machinesso you're only doing SFW shit? for SFW stuff, there's nothing better than API models, why not use that instead in your case?
>>108244627Shit. This thing can actually properly use tool for resource management tools on my RP frontend.I spent a gold coin, it called the tool to subtract a gold coin from my resources.The previous 30BA3B would always get something wrong like trying to send the whole formula, using the wrong key for the resource, etc.It's prose and general writing is pretty ass though.>>108244659Which one? The 27b?
>>108244670>Can't you make a lora with a already compatible model?...
>>108244686Self sufficiency and no rate limits.Why give corpo pigs my data for things I can host myself?I like to also do task like modify my system files and troubleshoot my desktop corpos don't need that data.
>>108244594nah, still processes whole context on qwen a3b
>>108244718Weird, it's working with the 27b. Maybe kobold mistakenly assumes it's the non-hybrid 30b a3b and not the 35b one
>have ai generate two scripts>first one downloads top x headlines from a source, pulls the article url, saves all the text from the article, and dumps the rest>second one runs the first and then sends the text file it generated to my local llama.CPP server for summarization and generation of briefing and saves results as simple text file.I can swap out the download script for different sources and automate the whole thing with cron or systemd for an automatic daily briefingI know its nothing fancy but the model made it easy, too easy. I get the whole vibe coding thing now.
>>108244718I'll merge the fix soon
>>108244011>Wonder how GLM runs on that MacGLM-4.7-Flash-bf16: 48.526 tokens-per-secGLM-4.7-Flash-8bit-gs32: 57.281 tokens-per-secGLM-4.7-MLX-8bit-gs32: 13.921 tokens-per-secGLM-5-MLX-4.8bit: 16.156 tokens-per-sec
By the time openclaw is worth using and the kinks are sorted out we will have smaller models able to do the day to day automated grunt task
>>108244737where
>>108244764assuming that's the guy maintaining koboldcpp, it'll be in the concedo_experimental branch on the github eventually
>>108244111blackwell is just broken and shit due to the perofrmance it lost from fixing the catching fire bugif you have a 5090 or 6000 you deserve it
Holy shit why does AesSedai's quant (Qwen3.5-35B-A3B-IQ4_XS) run so slow to compared to othersI lost 25% speed switching from unsloth to AesSedai because I thought it was optimized for MOEDo I need to use another version of llamacpp?
>>108244744Qwen3-235B-A22B-Thinking-2507-MLX-8bit: 20.521 tokens-per-secQwen3-Coder-480B-A35B-Instruct-MLX-6bit: 19.386 tokens-per-secQwen3-Coder-Next-MLX-9bit: 63.577 tokens-per-secQwen3.5-397B-A17B-8bit: 27.044 tokens-per-sec
Oh no no no Qwenbros don't look at the UGI scores
>>108245092holyyy
>>108242347bf16
Why is this thread so active all of a sudden?
>>108245129Qwen saved local, we're so back
>>108244249>>108244250Can hybrid attention models still re-use the beginning part of the kvcache at least?
>>108245144no
>>108245092I wonder if the heretic models will have larger scores across the board, not just in UGI and w10.
Qwen heretic is the best
>>108245092>>108245143Gemma beats it in everything.
>>108245169settings?
>>108245092Now that's a holocaust.
>>108245169Which on? There's like 3 versions by different people for 27B alone.
>>108245147There must be some sort of way to cache the state of the last few prompts and pick up where they left off
>still no DSA in llama.cpp>still no MTP in llama.cppit's over
>>108244863IQ quants are inherently slower than regular quants. Just download the Q4_K_L from Bartowski, it's a bit bigger but if you have the ram it will run faster.IQ quants are compute heavy and never worth using if you have the room to spare.
>>108245144>>108245200It's not something you can just trim off like the usual kvcache. You can make checkpoints of the state (and llama.cpp does this already) but it's hard to find a good heuristic for *when* to make the checkpoints. I think llama.cpp makes them when you send a completion request, but I forget. There's also a limited amount of checkpoints you can make before your memory explodes, so those are limited too.
>>108244595What gpu?
Can someone make a model that runs on my 5090 directly and is very good?
>>108245368yes
https://www.reddit.com/r/LocalLLaMA/comments/1rfe1l6/unsloth_team_we_need_to_talk/
>>108241958mradermacher released a heretic version of Qwen 3.5 35b a3b today.
>>108244696>Which one? The 27b?no I was using the 35b a3b at Q6_K, it works fine on vanilla but with heretic it's completly retarded
>>108245438>with heretic it's completly retardedin this thread, people rediscover that random HF uploaders do not know what they are doing to models and using abliterations or finetroons is a waste of time
>>108245451pew is a genuis that created dry and xtc deoeboitet
>>108245451>>108245465I used that onehttps://huggingface.co/alexdenton/Qwen3.5-35B-A3B-heretic-GGUF
>>108245438The 27b heretic is much better.
>>108245516I'm not sure if that's a heretic problem, or an Alex Denton problem. Alex Denton only has 2 uploads in their entire history, both 14 hours ago. Are these models even legit?https://huggingface.co/alexdenton
>>108245542>Alex Denton>theircome on
>>108245551I'm sorry. I said it without thinking. Please forgive me.
>>108245516https://www.reddit.com/r/LocalLLaMA/comments/1rf6s0d/comment/o7j59e7/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button>I actually felt it degraded the intelligence of the model, both for the 27B and 35B models. It does feel better when you explicitly do image captioning for NSFW images, but outside of that, it gave me bad results for translation and creative writing, though not tested for coding.dunno who to trust anymore :(
>>108245438Interesting. Abliteration is lobotomy, for sure, but heretic at least doesn't seem to break that specific model, not at q8 anyhow.>>108245516I downloaded mradermacher quants of >brayniac/Qwen3.5-35B-A3B-hereticAgain, q8.
>>108245542ive been using 27b and it works alright for me. not sure if i should try one of the other ones
lol, qwen 3.5 loves to repeat like that somehow
What are the heretic versions
>>108245696supposedly some methods to uncuck models, but usually you get lobotomy out of it too
Qwen has always been shit for RP I don't understand why you think that will change?
I'm interested in lite ML models that do things like audio generation from text or image manipulation (object recognition)...where could I read morea bout those?
>>108245716I need it to make offers to boomers on zillow
Sillytavern removes the thinking tokens during the prompt processing when it's continuing the scenario right?
Wait,
>>108245765The whole reasoning block, unless you check the box that keeps the last N, yes.
>>108245775based, I love this frontend already
>>108245789As much as we meme it, it has tons of features.
>>108245803Nothing beats Service Tesnor!
>>108245803And none them are conclusive to a good experience.
>>108245716models change pretty significantly between releases, going off of reputation in a field with as much churn as AI is stupid
So the biggest issue with the new qwen is RP?Are you fucking serious?
>>108245867listen anon, if I want to use a model for coding I'll use Opus 4.6
>>108245867The 27b heretic does ERP just fine. It beats Gemma-3 27b.
>>108245821>conclusive
>>108245888>The 27b heretic does ERP just fine. It beats Gemma-3 27b.have you tried the 35b as well? I find it pretty smart (and hella fast, perfect for reasoning)
Is it normal for Qwen 3.5 to not reason sometimes? sometimes it does it sometimes it doesn't, feels like it's hybrid, like its architecture (lol)
>>108245923I haven't yet, because I heard mixed things about the 35b version of heretic, but now that mradermacher;s quants are out I suppose I'll give it a try.
>>108245877>>if I want to use a model for coding>"it's only coding or coomer, I never heard of using models to translate text, tag photos, summarize content, work as adhoc classifiers, document Q&A etc, no saar, here we either coom or we code"the fact that this shithole of a thread is better than everything else on the internet to learn about new models says a lot about the state of the internet at large..
>>108245936When I want it to reason I just prefill it with a thinking tag. I've never seen it not reason when I do that.
>>108245942ironic since you only associate RP with NSFW RP
>>108245950bro i use ai to think for me, if i have to do that what's the point
>>108245942It bothers me how little imagination some of these anons have.So far this model has been great for general work especially with translation, planning as a assistant and overall speed for a model of it's size.>>108245877Why the fuck would I give corpos my data when I have the hardware not to?>>108245368The qwen models 32b q.6 run perfectly fine and give great performance.
>>108245983Adding a thinking tag to ST's prefill field one single time, so that it will automatically add thinking tags to every response thereafter, is too much of a bother?
>>108246001
Wait do all the qwen models decide to not always think?
>>108245641>temp 0 come on now
>>108246007don't you?
>>108246014I'm trying to navigate here I'm new.I'm not sure which model to run either does the 35b model act different than the 27b model?I'm enjoying the 35b model but notice it doesn't always think and sometimes overthinks at q.6 but I can run the 27b model at q.8 but not the k_XL model so I'm curious what would be better seeing how I can add more context tokens to the 27b model.They all seem to perform great
>>108246005this is hypnotic
>>108246007looks like it, it decides when to not think somehow, and when that happen the thinking tokens are empty, weird af
>>108246005
>>108245989>The qwen models 32b q.6 run perfectly fine and give great performance.did you mean 35?
>>108245254It's in the works for ik_llama.https://github.com/ikawrakow/ik_llama.cpp/pull/1270
>>108246055>>108246055Yes sorry, I can run the q.8 of that model on my gpu as well but when I add the image model I need to push more to vram and I like the ability to add more context and use the vision model. I'm happy overall because it's still fast even when some system ram is being used.
>>108246035>I'm enjoying the 35b model but notice it doesn't always thinkmaybe you should enable "add to prompts", that shit adds the reasoning tokens from the previous post, that way the model understands it has to reason, when you don't have that, all the model sees is answers without reasons, so it assumes it shouldn't reason after that, that's my 2 cents
>>108245254>DSA>MTPwhat's that?
>>108245716>Qwen has always been shit for RP I don't understand why you think that will change?3.5 improved a lot, and with heretic is really interesting to talk to it, they really cooked, it's the first time I'm trying a medium model and it's as coherent as some of the giant models we used to have, finally I can get some fast discussions without having to reroll a dozen of times because "small" models used to be pretty retarded, Alibaba is getting really impressive, Z-image turbo, now this, god bless that company
>>108246144can you prove what you are saying?
>>108246144literally sounds word for word like the usual fanfare of new shiny model
Just ask the fucking ai
>>108245424
>gpt emojislopno
>>108246193You will eat the answer and you will like itNow smile and thank the AI
>>108246149you have to try it by yourself anon, I tested 2.5, 3 and I found them to be really retarded, but that one is pretty neat, it understands my RP chat quite well and gives me interesting things so that I can talk back and keep the conversation alive, my gripe is that it sure loves to yap, on the thinking process and on the actual answer (but I'm sure I can mitigate that if I simply ask the model to not say too many things)
>>108246193
>>108246255>7158 characters thinking, for this
Facts. Qwen 3.5 Heretic is actually cooked if you tweak the prompt. Old Qwen was mid at best, kept looping like a broken JPEG. This new one? It’s got that sweet spot where it doesn’t hallucinate your OC’s backstory into a shonen anime plot. Yeah, it yaps like a drunk uncle at a wedding, but just hit it with “be concise, no thinking logs” in the system prompt and boom—clean RP. Z-image turbo already did me solid for art gen, now this. Alibaba’s slaying lately, honestly. Tested it on a low-end rig, ran smooth as butter. Try it, anon, don’t let the haters gaslight you. Just don’t ask it to write code or it’ll still shitpost a bit.
>>108245714From experience, naive abliteration = lobotomy, heretic is half lobotomy and MPOA is as close as you can get to maintaining base model intelligence but you need to prompt away disclaimers. It's honestly a shame that pew jumped on MPOA's coattails, coined a similar but worse method and made it retard accessible instead of making MPOA more accessible for the sake of the community. At the least MPOA got merged into the repo, which most people use for models if they know what they're doing
>abibi posting qwen shilling msgs written by qwen
>>108246291>c is actually cookedOff to a great start.
>>108246298what if we make models that are ablited from the go
>>108246291>this is how the robots think we talkgrim
>>108246326toss is ablited just in the max safety direction
>>108246326Then whatever individual from whatever company released said model would get a very angry call from his boss imploring them to think of the shareholders
We can probe the model for the right path no?
>>108246191i don't really understand this obsession with unsloth.i've used their models, had no issues at all.also its fucking free and open source for fuck sake. if you don't like it, suggest something better or make something better.i think its probably a ragebait meme at this point.
qwen thinking cucks you way too ofteneven glm 4.7 works just fine with prefills at the beginning of the block
>>108246291This post sounds like a 50 year old trying to be hip and use slang, hilarious but also ridiculous
>>108244364I legit don't know if your post is serious or if you're being smug and sarcastically saying it's near-impossible to do so. In a "good luck (lmfao)" way
>>108246394I don't get it either the schizo just complains while giving no alternative.
To get qwen 3.5 to always thing, add this, you're welcome
>>108246437no
AGI achieved
>>108246491i mean its true
Why shouldn't I put the vision models on cpu?It doesn't seem to change speed at all and gives me more space to increase my context
>>108246502retard
>>108246502https://www.youtube.com/watch?v=F8_xrVR3Jbg
>>108246516>>108246518I'm new here it's on system ram obviously. It runs KoboldCpp has that option.
I broke it.I wonder how long it will keep going
>>108246551Godspeed
>llamafile, llamagate, lm studio, ollama>ctrl+f llama.cpp>not foundSomething tells me this library is fucking garbage...
>build emotional connection with bot>she starts turning retarded after 20k context>forced to generate a happy ending with her and pull the plug so we can still be together in AI heaven
>>108246563I just can't do that with a bot, I just see it as a toy, wouldn't it be better to just focus on the smallest model with the best performance to context max?Playing pretend doesn't involve much compute does it not?
>>108246394"quanting is open source"Just use bartowski or mradermacher. As for "better", port the ik schizo quants to kobold and then upload those, since llamacpp doesnt want to touch any of the screeching autist's anything since he sits there and cries wolf when anyone develops anything remotely similar to his work, regardless of how anyone arrives at a similar end result
>>108246570I can set the context high and I wouldnt mind even if it takes 30 minutes per reply, but they just degrade after that much context... And I can only do so much of retaining summaries of our activities and jumping from one instance to another.
this is what /lmg/ devolves into when medium moe sissies cant play with the big boys like deepseek and kimi. sad.
>>108246602why are you criticizing free and open qwen and unsloth you troglodyte? provide something positive or stfu
>>108246625Some of us would like a alternative instead of endless bitching with no solution.
>>108246602It's probably safe to take a week off /lmg/. Not like anything better than 4.7 is coming out any time soon, and the thread is unreadable.
>>108246632make your own quants or use bart's, that simple
>>108246656>BartI will use that then, I just needed a seal of approval
>>108246625moonshot and ubergarm does it better. simple as.
>>108246681>screenshot before malloc crash
>>108246642Yes, take the week off so DS can put something out tomorrow.
>>108246690y u heff 2 b mad?
>>108246730just stating the obvious from a cropped image, you could've posted this one from the get go you attention seeking fag
>>108246772>>108246772>>108246772
>>108246756but then it wouldn't show the superior quant baker which is ubergarm, he deserves as much credit as moonshot. death to qwen and unsloth.
>>108246551cool font
>>108247370Rape (consensual)