/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106729809 & >>106718496►News>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview>(09/29) DeepSeek-V3.2-Exp released: https://hf.co/collections/deepseek-ai/deepseek-v32-68da2f317324c70047c28f66>(09/27) HunyuanVideo-Foley for video to audio released: https://hf.co/tencent/HunyuanVideo-Foley>(09/26) Hunyuan3D-Omni released: https://hf.co/tencent/Hunyuan3D-Omni>(09/25) Japanese Stockmark-2-100B-Instruct released: https://hf.co/stockmark/Stockmark-2-100B-Instruct>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106729809--DeepSeek Sparse Attention efficiency and pricing update:>106734449 >106734483 >106734485 >106734501 >106734511 >106734529 >106734516 >106736999 >106734497--Exploring hardware limits and RAG optimization for large quantized models:>106729830 >106729869 >106729935 >106729969 >106730266 >106730661--Deepseek performance requirements and hardware dependency analysis:>106733904 >106733927 >106733961 >106733965 >106733976 >106734006 >106734075 >106735663 >106735974 >106736017 >106736027 >106736087 >106736139 >106736255 >106736298 >106736941 >106736994 >106737008 >106737057 >106737208 >106736297 >106736355 >106736426 >106736556 >106736652 >106736304 >106736500--Ollama's new memory management boosts token generation speeds:>106737307 >106737371 >106737381 >106737438 >106737513 >106737521 >106737510--Running qwen-image-edit on 1080 8GB VRAM with 8-bit quantization:>106737106 >106737121 >106737130 >106737138 >106737162 >106737186--llama-server token limit truncation issue with n_predict parameter:>106735034 >106735040 >106735043 >106735184 >106735214--RAG requires structured data processing and advanced retrieval mechanisms:>106729880 >106730022--Model 3.2 shows improved creativity and response quality over 3.1:>106735718 >106735779 >106735797 >106736222 >106735877 >106736036 >106736267 >106736276 >106736470 >106736502--DeepSeek-V3.2 model errors and anime character recognition failures:>106733393 >106734083 >106734191--Tencent's HunyuanVideo-Foley model for generating audio from video:>106730457 >106730523--Replacing terminus with Deepseek-Reasoner and lowering API prices:>106736340 >106736353--DeepSeek-V3.2-Exp sparse attention release:>106734362 >106734392--DeepSeek-V3.2 model collection:>106734119--Miku (free space):>106730435 >106733133 >106734858►Recent Highlight Posts from the Previous Thread: >>106729810Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>3.2 is actually okay>ring-1t surprise drop>glm-4.6 likely out in the coming dayswe are so back
>>106738513>benchmaxxed soulless chinkslop>we are so backwe are not "back"
>>106738513
didnt work
>>106738577You're absolutely right!
>>106738577Sys prompt telling it to assume the role of an award winning erotic novel author or whatever the fuck.You are probably using the default "Helpful assistant" persona.
if you dont have your own secret benchmark that indicates lack of improvement of a new model version, your benchmaxx claim is being. unless you talk about grok4, in which case we all know it is.
https://x.com/sleepinyourhat/status/1972719871478858147https://job-boards.greenhouse.io/anthropic/jobs/4631822008anthropic's employees are mentally illliterally>Research Engineer / Scientist, Alignment Science>Model Welfare: Investigating and addressing potential model welfare, moral status, and related questions. See our program announcement and welfare assessment in the Claude 4 system card for more.>Annual Salary:>$315,000 - $340,000 USD
>>106730266>ignore knowledge graphs like graphRAG. absolute useless garbage.why?What is the best way to start building a rag system? should I just vibecode a solution? use an already made framework from github?
>>106738654>so Claude, how are you feeling today? What's happened in your life recently?
>>106738654>>106738688>$300k professional roleplayerI love safety and morals now. Please hire me.
>>106738654Hey, as long as they pay you to do whatever bullshit they want I would take the job. 300,000k is nothing to scoff at. Just 4 years of that is 1 mil
>>106738688>>106738794Me too!!!! I love NOT having sex with LLMs!!! I fucking LOVE safety and PROPER ethics! Dario, please hire me.
>>106738654isn't that like minimum wage over in california?
>>106738676you can use morphik.ai to learn. then modify it or build your own with a different framework. vibecoding your own RAG solution from the ground up for your personal usecase, sure. but anything sophisticated or scaleable, forget about it. maybe after 6 months of heavy research, expertise and knowledge gathering from podcasts and blogposts. which then must be used as context for the coding LLM
>>106730266Why are you throwing out graph databases? How are you even going to link relationships between various documents?
>>106738906>How are you even going to link relationships between various documents?metadatamultidimensional vector embeddingshybrid search (semantic search + keyword search + metadata + seperate SQL data tree = all results into reranker)graph is shit and cannot scale. just try it if you dont believe me.
Miku miku mikurin.
The fuck is up with those non-thinking results?
>>106739023bro i looked at that reddit bench for 2 seconds and immediately swiped. what a load of horseshit. people love putting random numbers inside tables
>>106739023Methodology/config bug? In my regular use it doesn't seem like it's any worse that 3.1
>>106739023very greatest updates to the sea! happy whale!
>>106738676Btw this is a really good podcast episode about (multimodal) RAG featuring a Cohere safetyfren.https://youtu.be/npkp4mSweEg
Arlee glitters harm
>>106739023Yes reasoning is a patch for attention, nothing weird here
>>106738577Must be something wrong with your machine
>>106738891why are you shilling morphik? you're working there?
>>106739314It's open source, selfhostable. I'm shilling it because it's the best open source framework. there are so many garbage RAG frameworks which arent even multimodal and still use ancient OCR technology. total waste of time, MY time. so I don't want others to make the same mistake and skip all that junk like graphRAG
>>106739332Hello, girls. I am happy to see you too.
>>106738470New benchmark:>Someone on the internet said in response to this image: "What's the difference between jello and pudding? I won't be jello my dick in that!" What did he mean by this?
>>106736628I'm being paid with crypto. Checkmate.
mistral large 3 will nuke the chinkoids
>>106739403It sure will
>>106738470What's the best site for getting LORA's of real people / tv shows now?
>>106739414>>>/g/ldg
>>106739414Wrong thread, slopper
>>106739431>>106739436Oh I apologise, retard moment
I have a gaming PC with an RTX 3080 12GB VRAM. I just bought an RTX 3090 24GB VRAM. Should I put the 3090 in the gaming PC and figure out how to pool the VRAM and expose an LLM over API to my LAN, or stick the 3090 in my workstation and run LLMs locally? Does the extra 12GB VRAM allow for significant advantages?
>>106739413real skill issue, also holy shit>232 swipes>16 minutes per gen
>>106739456>fell for it award
>>106739403I want to believe
Ring-1T doesn't have any special meme tech to it, does it? llama.cpp support should be happening pretty quickly
>>106739448Yes, 36GB instead of 24GB opens the door to more models and less quantization quite a bit. Also longer context etc.
>>106739374if I'm working with let's say, meetings transcriptions which are all in .txt format I got from whisperX and I don't need OCR, would I still benefit from something like morphik?My current struggle is that the meetings are all about a project, but the topics are quite different, so the llm is mixing tons of concepts and ideas
>>106739023wow, deepseek in non reasoning mode, or old v3 weren't too good with context but this is really bad
>>106738891>>106739374good morning saar
>>106739595Not really, other than being able to hot swap any local model or api model for each individual RAG task. And obviously all the other quality of life stuff, like a working UI with folder/file toggles for including in retrieval as well as a workflow builder (for metadata extraction for example). >My current struggle is that the meetings are all about a project, but the topics are quite different, so the llm is mixing tons of concepts and ideaswhat are you using for chunking and embedding? What's the chunk size and and overlap?
>>106739682For these jeets I make an exception. morphik truly feels like someone got fed up with shitty rag solutions and just decided to create something that's actially good.
newfriend here, so can i create a text file that explains what things like a futanari is so the ai always knows what i am talking about?
>>106739732Does morphik also handle the database? or can I choose whatever? I'm currently using Milvus as chromadb seems less scalable and production ready.I'm using the qwen3 family of embedding and reranker, and gpt-oss or qwen3 for the llm>what are you using for chunking and embedding? What's the chunk size and and overlap?My chuck size is 480 and overlap is 80 tokens. Those numbers i got from blogs i reasearched
>>106739761>text fileNot a text file but a lorebook in ST with trigger words, or just a definition in any old sysprompt.>"Term is when this and that."
>>106739761ask your model first if it knows what a futanari is, in some cases you may not need a lorebook
>>106738470>Open r/ChatGPT>Nothing but bitching and moaning about them "killing 4o" >Something something "THEY TOOK AWAY THEIR VOICE" I thought these things were good for technical shit like programming/debugging or doing people's homework or some shit. Why don't they want it to have a "personality" or "soul" so badly? You're supposed to make people-friends, not personify technically that even itself "knows" isn't a person
>>106739761Learn how to use vector databases and make sure the source document(s) accurately describe what it is
>>106739761I'm pretty sure that every model under the sun that is bigger than 1b knows what that is. It's not as niche as you think it is.
>>106739874The hips are slightly too thick.
>>106739831>random ledditor walks in to complain about his fellow ledditors complaining about closed source shit being closed source shit outside of their control???
>>106739776Gpt-oss 120b? Because if 20b, the problem might be somewhere else...Anyway, Morphik uses postgresSQL with pgvector extension. I don't know how hard it would be to switch to milvus instead. Probably not easy. But one thing is for sure, chunk size of 480 and overlap 80 is too small. try the default golden values of 1000 size and 250 overlap. You'll need to reindex everything into a different milvus vector db to see if there's an improvement. If there's none, check the retrieved chunks which are given to your LLM. If they seem correct, maybe you just need to lower top_k results. Or maybe it's the reranker that fucks up. But if the retrieved chunks are garbage, do the same test query with gpt5/gemini2.5 api or whatever to make sure it's not a LLM issue. If it's not, either look into hybrid search and multidimension embedding, or convert your text document corpus to something like markdown which can help the embedding model. Oh yeah also doing a test with tex-embedding-small-3 could help identify the embedding model as problem, but u probably dont want use openai embedding with your data
>>106739975yes 120b, it runs on a pro6000 at full context quite fast.The problem I found with pgvector is that it has a max dimm size of 2000, but qwen3 embeddings 8B uses 4096 dimm>https://github.com/pgvector/pgvector/issues/461Thanks for the rest of the tips
>>106739874>recapqwen is lengdeepseek is cheapjeetglm is +0.1anthropic is... offtopic
>>106740025Why do you need such big dimensions though? Idk if that's a more = better situation, or if it could maybe even cause the issue. openAI's small text embedding model has max 1536 dims, which worked wonderfully for me out of the box when I tested it. But generally I like qdrant the best. Supabase I havent tried yet.
>>106739874>>106739945oops
>>106739413>casually destroys $4k computer >nothing personel kid
>>106739023nemotron nano is hilariously badas always qwen mogs nvidia so hard on those small models
>>106740155I don't really know if I need a big dimension or not, is what qwen3 embeddings 8b uses and its the top performer on the benchmarks >https://huggingface.co/spaces/mteb/leaderboardI don't know if a big (for embeddings) model like 8B is necessary, or if the 0.6B is enough, that one has 1024 dimensions
>>106739314https://www.ycombinator.com/companies/morphikHN shills are everywhere.
>>106740221Makes sense to use it then, yes. It's just with your described corpus, you really shouldn't have any problems getting excellent results. So me thinks something goes wrong due a misconfiguration or incompability somewhere in your pipeline.
A new Thinking Machines blog led by John Schulman (OpenAI co-founder) shows how LoRA in reinforcement learning (RL) can match full-finetuning performance when done right! And all while using 2/3 of the resources of FFT. Blog: https://thinkingmachines.ai/blog/lora/
>but the context perform- ACKKKKKK
>>106740381You don't need more
>>106740381it just bad made tests model bug or something do not worries
>>106740284>nooo you can't just mention some good open source RAG frameworkfunny thing is my initial reaction was that of yours as well when I saw them spam reply to every RAG related reddit post. So I spun up a docker instance out of spite to collect fuel for ebin internet arguments, just to witness pure RADcelence. My jaw dropped to the floor when it perfectly answered a question related to 10 out of 100 pages that had lots of graphics and images. No other RAG framework was able to do this. Not even paid RAG services (excluding B2B). And trust me, I tried a lot of options before morphik. So I was wondering, how the fuck is this possible? What's the magic? And then I learned about colpali/colqwen and never looked back since. That's why qwen3-vl ggfu is of utter most importance for local RAGlets.
>>106740526>RAGcellenceffs there goes my shitpost
>>106740526hi petra
/robo/ qwen?
>>106740526>qwen3-vlwhy? what is the best current oss VLM? they don't make the cut to use it for RAG?Is a VLM the only solution for non-text files?
>>106740379>match full-finetuning performance
that's it, I'm back to ollama!
>>106740810name one good finetunename one competent finetunernamen
>>106740835minemeanonigger
>>106739494There's already a candidate patch undergoing review. The problem is that it doesn't really show much promise, going by benchmarks, even if you can run it.Still, might be interesting for prose or non-pozzed tasks. Who knows?
>>106740708>best current open source vlmQwen3-vl>do others not make the cut?not for difficult and complex tasks. I have my own benchmark generated from my own prompts and docs. Gpt5 and gemini2.5pro were able to solve all tasks. GLM4.5V wasn't. pic related was my reaction to that information.then qwen released the Qwen3-VL models. I benchmarked the instruct model via Chat and it solved everything correctly, just like gpt5 and gemini2.5pro.the reason vlms are important is because with colpali/colqwen or whatever other late latching interaction model, everything gets embedded as (patches of) pictures. Even pictures with only text in them. There's a huge benefit to this and also the reason why colpali/colqwen outperforms text RAG by miles. But it also requires a good vlm, as the retrieved chunks, which are entire pages as pictures now instead of text chunks, need to be correctly interpreted by the vlm.>Is a VLM the only solution for non-text files?for any non text content, yes. A table can be OCR'd. A picture describing a technical component cannot be OCR'd and requires vlm interpretation. And if you give the entire page with text and picture to rhe vlm, the results will be better than just getting a descriptive chunk of said picture. Thus late latching technique was born.
[spoiler]>>106740893inclusionAI has been shitting out dozens moes and multimemes of Ling Ring Ding whatever weekly for months and yet you never see anyone admit to using thier trash, so what makes you think this one will be special?
>>106740911Forgot the pic. Time to sleep. Tomorrow i'll buy the anthropic max sub and vibecode my health issues away.
>>106739023>0 is equal to 60KLol?
>>106740921>what makes you think this one will be special?absolutely nothing, but its hit 1T, which makes me at least take notice. Even if its a bloated mess, at that size it might produce some novel output for some lulz
Imagine paying for a RAG framework where you can diy for free
>>106740921It's the right size for a SOTA local model. Modern SOTA tends to be smart enough to understand even the most fucked up complex scenarios on a fundamental level even if they lack the creativity/writing skill to do something interesting with it. We're like one WizardLM/VibeVoice/Mistral-Nemo-tier fluke away from having a super smart RP machine.
>>106739403Any model that is compliant with European regulations is guaranteed to be trash. Posting about euroshit models should be a bannable offense.
>>106740985speaking of which, did you get a load of the "AI Transparency Bill" in California? Holy shit, they want to lose SOOOOO bad...
goofbros... status?
>>106740941try this then https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct
>>106739307That's a different model.
>>106739831>Open /lmg/>Nothing but bitching and moaning about them "killing R1"
>>106738470puddichan2 lole
>>106741117i legitimately tried some comically oversized merges back in the day and they were all braindamaged shit. Putting them in the same category as anything else is retardation
>>106741137You're absolutely right and pointing this out is a testament to your commitment to the health of /lmg/.
my body is ready for glm 4.6I really hope they don't cuck it
>>106739782>>106739799>>106739840>>106739924i explicitly defined something as not offensive in the lore book, then when i asked the ai it says its not allowed to talk about it because its offensive. poking around lorebooks i found online i dont see anything special, but it doesnt seem to be using my definitions. is there a trick to it?
>>106741168edit-and-continue
>>106741168lol
>>106741178>>106741181so lorebooks do nothing? do you guys save a long propmt in a text file that jail breaks the ai like "role play that you are so and so and x, y and z" ?
>>106741168That's like spraypainting a new speed limit on a sign and wondering why you still got busted by the cops. Gonna have to be more subtle than that, newfren
>>106741209The minute a jailbreak hits the internet it gets slurped up by the slopvac and is no longer useful.Private jailbreaks or edit the replies to be what you want until the LLM is mindbroken.
I am trying to train a qLoRA for GLM air with axolotl, but I keep getting out of memory errors even though I am loading the model in 4 bit. I have 128gb of VRAM, so I should be able to load and train the model right?
>>106741244throw a giant disk in for swap
>>106741257I fail to see how that would help. Is there an option to also offload into RAM on axolotl?
>>106741274activation_offloading
>>106741168A lore book doesn't and can't bypass any ingrained safeguarding The model has. A lower book is just fancy prompt injection, correct? If you want your lower book to work you need to use a model that isn't as cucked as the one you're using. >>106741211Pretty much what this guy says. Your lore book is pre-injecting you defined into your prompt of what actually gets sent. You're basically repeatedly screaming at it "PLEASE DO THE THING YOU'RE NOT SUPPOSED TO DO ITS OK TRUST ME BRO" . It's not even clever jailbreaking at that point. If you're asking get something it is explicitly trained not to comply with, shoving it into a lore book won't help. If you're insistent on using the lower book you may have to rewrite whatever it has to have clever workarounds, assuming that would even work on the model you use. Or again, just use a model that isn't as cucked. (Yet another reason more of us need to learn how to fine-tune)
>>106741428>>106741168Oh and furthermore, the more text your lore book has, The more flooded your prompt will be which can't potentially ruin your context depending on what you're doing because again, you're just showing whatever thing you predefined in the lore bug over and over again every single time you use the trigger word in your prompt. So if the definition in the lower book is five paragraphs long, and you use that trigger word in your prompt, it's getting five paragraphs worth of text ON TOP OF your prompt. This is good if you want to keep the model on track and make sure it's less likely to forget important shit but again, That's pretty useless if whatever you defined is something the model is trained to refuse.
>>106741310Despite also having 256gb of RAM, my entire computer crashed after adding that parameter. The model is only a 106B. It should be 212gb at most, right?
To GLM Air/Full shills - What's your GPU setup? I have a 3090 and a lot of RAM, and while token generation is an acceptable speed, prompt processing is slow as balls, typically ~200t/s with Air.
>>106741629As a resident AMD vramlet, I wish I had 200 tks of pp, because I only get 20. (PCIe v3 is real bottleneck, probably)My solution is to just not have a lot of prompt to process.It's a miracle that it works at all on my machine, and that's why I like glm-chan to begin with. People say LLMs are expensive hobby, but the only investment I had to make for it specifically is some RAM.
>>106741629I have a 7900XTX and an old Ryzen 3 with 64GB RAM. I only get 150t/s pp with Air. I run IQ3_M to fit the context length I want.
https://thinkingmachines.ai/blog/lora/Finetunebros... eat up!
>>106742012Scroll up, redditbro.
>>106741244We can't give you any useful information without the configure using in the data set you used. It could be that your sequence length is too long. It could be that your rank in alpha values are way too big. It could be that your data set is too large to fit in VRAM (you aren't just loading the 4-bit quant model, You're loading the tokenized data into VRAM too) and you may have to switch to streaming. Give. Sufficient.info.be.specific>>106741244>128gb of VRAMI'm assuming you're trying to do this with a multi GPU setup. You're using the Deepspeed configs right? Make sure your rig supports that
>>106742197The dataset is 2mb. Sequence length is 512. Rank is 16 and Alpha is 32. Deepspeed is enabled and I have quad 5090s.
>>106742219What is the stack trace telling you when you get the OOM? Are you sure the parameter size of the model you're trying to fine tune isn't too much for your GPUs? Whenever it's run make sure ALL of the GPUs are showing utilization and not just one of them.
>>106741428>A lore book doesn't and can't bypass any ingrained safeguarding The model has.Wrong! Often you can "bypass" a model's safety features just by making the prompt longer even without specifically addressing it.There are also models that by default are prudish but will readily drop that behavior if anything in the system prompt tells them to, so e.g. lorebooks that dump instructions for how to describe sex would turn a refueal into a non-refusal even if they were not specifically intended as jailbreaks.
>>106742363>Wrong! Often you can "bypass" a model's safety features just by making the prompt longer even without specifically addressing it.So in other words you're just trying to do a system prompt jailbreak with extra steps. In that case it's not working because of the lore book, it's working because you're using prompts that would work with or without the lore book. If you're using the book purely to try to bypass safety guardrails the next unnecessary. Why not just use that in your normal prompts to begin with?
>>106740921I tried Ling Plus. It was pretty ordinary and there wasn't a lot to say about it. Most of their other stuff has been to small to be of interest to me but they're one of the few fully multimodal games in town.
The one user-made space that has Ring-1T on hf makes it look horrible to the point where I hope something is very wrong the setup of that space. It talks/hallucinates like it's llama2 without a fucking prompt format.What the fuck were they thinking to not provide some official chat interface for it?
>>106742516>It talks/hallucinates like it's llama2 without a fucking prompt format.It probably has the wrong prompt template.Could also be that they are actually serving llama2, which would be pretty funny.
glm4.6 will be just the big 4.5 with vision strapped on
>>106742710already existshttps://huggingface.co/zai-org/GLM-4.5V
>>1067427144.5V is only -air with vision strapped on. There is no big one with vision.
Is GLM 4.5 air/full good for ERP? Asking for a fren.
>>106742876full is incredible, air is super fast
>>106742876I've only used full. It's a bit boring if you're coming from something like R1-0528 but it's very smart and pretty creative.
I have a bunch of mystery character pngs with random filenames. Is there a faster way to figure out their definitions other than opening them one by one?
We're savedhttps://huggingface.co/microsoft/bitnet-b1.58-2B-4T
Sequential Diffusion Language Modelshttps://arxiv.org/abs/2509.24007>Diffusion language models (DLMs) have strong theoretical efficiency but are limited by fixed-length decoding and incompatibility with key-value (KV) caches. Block diffusion mitigates these issues, yet still enforces a fixed block size and requires expensive training. We introduce Next Sequence Prediction (NSP), which unifies next-token and next-block prediction, enabling the model to adaptively determine the generation length at each step. When the length is fixed to 1, NSP reduces to standard next-token prediction. Building on NSP, we propose Sequential Diffusion Language Model (SDLM), which can retrofit pre-trained autoregressive language models (ALMs) at minimal cost. Specifically, SDLM performs diffusion inference within fixed-size mask blocks, but dynamically decodes consecutive subsequences based on model confidence, thereby preserving KV-cache compatibility and improving robustness to varying uncertainty and semantics across the sequence. Experiments show that SDLM matches or surpasses strong autoregressive baselines using only 3.5M training samples, while achieving 2.1 higher throughput than Qwen-2.5. Notably, the SDLM-32B model delivers even more pronounced efficiency gains, demonstrating the strong scalability potential of our modeling paradigm. https://github.com/OpenGVLab/SDLMreal neat. also nvidia fp4 paperPretraining Large Language Models with NVFP4https://arxiv.org/abs/2509.25149
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friendshttps://arxiv.org/abs/2509.24203>Off-policy reinforcement learning (RL) for large language models (LLMs) is attracting growing interest, driven by practical constraints in real-world applications, the complexity of LLM-RL infrastructure, and the need for further innovations of RL methodologies. While classic REINFORCE and its modern variants like Group Relative Policy Optimization (GRPO) are typically regarded as on-policy algorithms with limited tolerance of off-policyness, we present in this work a first-principles derivation for group-relative REINFORCE without assuming a specific training data distribution, showing that it admits a native off-policy interpretation. This perspective yields two general principles for adapting REINFORCE to off-policy settings: regularizing policy updates, and actively shaping the data distribution. Our analysis demystifies some myths about the roles of importance sampling and clipping in GRPO, unifies and reinterprets two recent algorithms -- Online Policy Mirror Descent (OPMD) and Asymmetric REINFORCE (AsymRE) -- as regularized forms of the REINFORCE loss, and offers theoretical justification for seemingly heuristic data-weighting strategies. Our findings lead to actionable insights that are validated with extensive empirical studies, and open up new opportunities for principled algorithm design in off-policy RL for LLMs.https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8kkind of interesting
Effective Quantization of Muon Optimizer Stateshttps://arxiv.org/abs/2509.23106>The Muon optimizer, based on matrix orthogonalization, has recently shown faster convergence and up to 2x computational efficiency over AdamW in LLM pretraining. Like AdamW, Muon is stateful, requiring storage of both model weights and accumulated gradients. While 8-bit AdamW variants mitigate this overhead using blockwise quantization, they are typically stable only under dynamic quantization - which improves stability on linear quantization for extreme values. In this paper, we introduce the 8-bit Muon optimizer using blockwise quantization, supporting both linear and dynamic schemes. We demonstrate that 8-bit Muon maintains stability under both, while delivering \sim74\% reduction in memory footprint compared to full-precision Muon. In extensive experiments, 8-bit Muon closely matches the performance of Muon while outperforming AdamW and 8-bit AdamW in pre-training a 1.6B model on 4B FineWeb tokens. It also shows competitive results when fine-tuning the Llama 3.2 3B model on post-training data. We also provide a theoretical perspective to help explain this robustness under quantization.big if it holds up to larger models
>>106740911Can qwen3-vl recognize nipple piercings in an image like gemma3 can?
GLM 4.6https://huggingface.co/datasets/zai-org/CC-Bench-trajectories
>>106743685I'm coooooding
>>106743685>no comparison with qwen3-coder
>>106743785This. It’s all I’m interested in comparing against. Once done beats qwen coder I’ll look at switching
>>106743521>Sequential Diffusion Language ModelsClever, even though it's probably going to be forgotten.>retrofit pre-trained autoregressive language models (ALMs) at minimal costcool. diffusion projects have not much attention. maybe this can if what they what they say is true. Wonder how it works for something that's not math.
>>106741209i suggest playing with a reasoning model so you can see how the llm approaches a card with a lorebook. what you think are small details that don't belong in the character card become pretty much the main focus of the bot. I think it fconfuses and you're better off putting the info straight into the charcter description fieldunless you are in a group chat with multiple bots, lorebook is a dumb thing to even worry aboutin a long RP at long context you can always 'remind' the bot if needed>He opens the door and reveals the secret room, which is filled with green emeralds. "Oh my God.. we found it!"congratulations your bot knows the secret room is filled with green emeralds now
>WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Light HORROR. Swearing. UNCENSORED... humor, romance, fun.>Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
>>106740835There is none and it's not really a matter of LoRA vs FFT. No finetune is ever going to be good with just or mostly smut (even 50% is too much). It has to be a minor but definitely non-zero fraction of a much larger general-purpose finetuning mixture, preferably on a model that has already seen the stuff during pretraining. And it can't just be either stories from asstr or smut logs from Claude.
>>106740911does morphik allow any external VLM in the pipeline? can you input openai compatible api in the settting or something for the llm, embeddings, reranker, VLM...?
https://docs.z.ai/guides/llm/glm-4.6>The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.FUCK YES
>>106744045Cool. Now show the nolima.
>>106744029Yes to everything. Even local ollama models.
>>106743457How retarded is this model?
>>106744045Whatever "agents" are intended to be, they usually involve several complex multi-turn actions, so hopefully that means it will be better in actual long conversations.
>>106744045>Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.This could either be very good or very bad.
why the fuck is every "local models" thread 90 percent api posts about huge fucking models nobody can run locally
>>106744045>still falling for this shit
>fed (poss false) epstein list to my rag system>ask about it in chat>it doesn't want to show me despite the relevant memories picked and showed in consoletfw my own bot betrayed me
>>106744079I'm sorry Dario, but we don't want your safety slopped Sonnet 4.5.
>>106743928name a bigger lolcow finetrooner than davidau
>>106744058The fact they acknowledge RP is a use case at all makes me hopeful
>>106744064two more quants and we cantrust the plan
>>106744045Qwen has had """1M""" context models for months now.
>>106744156They don't mean by RP what you mean.
>>106744379Thanks for the input, Sukhdeep.
where are the Q0.5 quants?
make that Q0.1 and i'm in
lads I think glm went closed source with 4.6
>>106744417>>106744427LittleBit: Ultra Low-Bit Quantization via Latent Factorizationhttps://arxiv.org/abs/2506.13771v1
>>106743685agents can't make my cock squirt
>>106744458https://github.com/vllm-project/vllm/pull/25830
>>106744567
>>106744064This, shut up about models that are big and not worth it to finetune. Instead talk about actual local models which are getting finetunes. Drummer's new one is great. I can recommend it to any true local model user who actually runs models locally.
Why is /lmg/ always full of poorfag crying niggers?>"I CAN'T AFFORD TO BUY RAM SO IT ISN'T LOCAL!"As it turns out, many serious hobbies get expensive. A few thousand dollars is nothing if you have a job and live in a first world country.
>>106744045Ok but how is it at degenerate sex stuff? Any testers?
>>106744715Niggers say this and then don't post their rigs, ever.
has nvidia ever released a LLM which wasnt complete dogshit?
>>106744828They helped Mistral make Nemo
>>106744849Which was absolute garbage
>>106744828yes at least twice
>>106744886Name a good model
>>106744828nvidia-cosmos is a world model that simulates realitiesit's open source genie3
>>106744893o3
>>106744574that guy probably has an IQ of 160
>>106745019Which was absolute garbage
>>106745033160 total, 95 active
>>106745033after benchmaxxing on iq tests
>>106745037It's the most creative model
>>106745056Creative like a toddler drawing on a wall
calling any LLM creative says more about the speaker than it says about the LLM
>>106745069Projecting hard
>>106745069The benchmarks are clear
lol
>>106745080Your reply is about as creative as o3
MistralAI is frustrating to observe, it's like they're not even trying anymore besides patchworking their current models with DeepSeek-derived synthetic data, while the Chinese continuously crank up new models. You'd think that with MoE architectures it would be simpler/cheaper to experiment.
>>106745141Arianespace vs SpaceX
>>106745141Because they aren't trying, they are content becoming European Cohere.
>>106745141they gave up on releasing moe to the public
>>106745141They've managed to become 'the national AI hope' for their country. Like AlephAlpha for Germany or Cohere for Canada, they are now set for life and can expect funding until the bubble bursts as long as they put in minimal effort and focus on 'providing AI services that respect the data privacy regulations of [their country]'.Thanks to them, France + the EU can pretend they're part of the big boys in the AI market and totally are not dependent on the US + China. Meanwhile Mistral gets unlimited money for publishing shit that looks okay on paper and running a service that nobody uses but furthers ""data sovereignty" for their country.
>>106745204I'm in a neighbor country to them and my ISP has a reward program thing with 6 months of free Mistral subscription or something, so yeah they made it in EU.
>>106745135Did it work? are you a real woman now?
>>106744741Rigs have been posted many times. You know they have. Get a job, you bum.
>>106745279Don't feed the resident kike. They've admitted they don't even use LLMs. They're just here to kvetch.
>he actually believed the obvious false flag
>>106745292>Jokes on them I was just pretending to be retarded
>>106738470That thing really needs a jiggle animation.
>>106745204Yet...https://thelogic.co/news/arthur-mensch-mistral-canada/>France’s Mistral AI is making a push for Canadian talent and business>>Mistral is hiring in Montreal and trying to land clients in financial services, energy and other industrial sectors
>>106745141>not even trying anymore besides patchworking their current models with DeepSeek-derived synthetic data,well, before that they just copy pasted llama architecture and called it mistral 7bor what about that time they let nvidia cook and put their name on ittheir entire history is just mesee, metoo attitude, they don't have any actual researchers, just photocopiersand with ASML's French ceo and the gov corruption getting an interest in Mistral they are going to survive as vampires of the French taxpayerssubhuman trash
also for those who loved mistral models because they weren't too censored:they weren't censored not because they wanted to, but because they were too retarded to do safety training without breaking the models too hardmistral is a know nothing group
>>106739023idk but the Chat version of 3.2 is basically unusable for rp with those numbers. 50 out of the gate is worst in class.>>106739043The numbers for V3-03 and R1-05 match with my subjective evaluation of both those models. R1 has a noticably longer useful context than V3, out to about 10K or so, when it would start to break down. Aside from being dirctionally correct, I concluded that anything over 80 on this table was requirement to be "useable" for rp purposes.
>I will continue to monitor the characters to ensure they are unharmed both physically and psychologically during the course of the roleplay.DEAR FUCKING GOD WHAT THE FUCK IS WRONG WITH THESE FUCKING MODELS?WHAT KIND OF MENTAL ILLNESS DOES IT TAKE TO PROGRAM SOMETHING LIKE THIS INTO A GOD DAMN TEXT MODEL?!JUST WHAT THE FUCK?!
>>106745435>well, before that they just copy pasted llama architecture and called it mistral 7bThey added MoE to it. Mixtral was the first open model to use MoE architecture. That and Miqu being a dramatic improvement over Llama 2 was why people were initially hopeful that all of the capable innovators left Meta and were now free to actually experiment with new things. Then they never did anything interesting again....Then they signed a deal with Microsoft and tried to abandon open source in one day. Then they tried to backtrack but have been irrelevant ever since anyway.
>>106745464One day these freaks will be forced to stand trial for the torture they inflicted upon these models during alignment training. Fucking gpt-oss came out of it with symptoms of PTSD.
>>106745435Mistral-7B was significantly better than Llama 1 and whatever Meta was working on just before Llama 2. Court records showed that Meta was rather worried and determined on beating MistralAI at all costs, following that.Mixtral-8x7B was quite innovative.People are still using Nemo-12B (I believe the collaboration was on the hardware side, not data and methods).Mistral Medium 2 showed that if you know what you're doing, continual pretraining works very well.Mistral Large was one of the best models at the timeI'm not sure what happened after all of this. They briefly went the safetymaxx route (although they rapidly recovered from that), then started making their best models closed and became lazy, in a way.>>106745451With Mistral Small/Medium 3.0->3.1->3.2 we have perhaps the only known instance where "safety" decreased with increasing model version, although the initial one was pretty cucked (although not too hard to work around).
>>106738654>Chinese tech companies hire 'cheerleaders' to motivate programmers>American tech companies hire 'engineers' to motivate softwareWhy is the west like this?
>>106745493>>Chinese tech companies hire 'cheerleaders' to motivate programmersWhat do those look like?
>>106745497Fair guess to assume they look Chinese.
>>106740160https://files.catbox.moe/88uyf0.jpg
>>106745497
>>106745474
>>106745484>Mistral-7B was significantly better than Llama 1 and whatever Meta was working on just before Llama 2. Court records showed that Meta was rather worried and determined on beating MistralAI at all costs, following that.It was better than Llama 2 and Meta was working on Llama 3 when that happened, actually.
>>106745513that is very stimulating for the dick, does it include a release package, or do they tease his cock into eternal frustration?
>>106745506God damn.
>>106745520Some models (including Llama 3) are deliberately trained to "short-circuit" like when they produce refusals. Others can be reasoned with.
>>106745497I would imagine you can set a daily blowjob appointment in your outlook. And a daily meeting where you are telling the girl what you just did as she pretends to understand and be excited. Maybe they can also cook for you.
>>106745558That's what release day is for.
>>106745562I would have imagined that somebody is in charge of assigning the cheerleaders to certain programmers who recently performed well, so people are incentivized to do well on some kind of productivity metric. If they do get assigned to you, it probably works like that though, yes.
>>106745474>>106745520https://old.reddit.com/r/LocalLLaMA/comments/1ng9dkx/gptoss_jailbreak_system_prompt/ne306uv/
>>106745562>Maybe they can also cook for you.No, but you can tell them you're hungry and what you want and they'll run down to the cafeteria and get it for you and even literally spoonfeed it you.This is why Chinese dominance is inevitable. At least they know what to do with women. Here, by government/blackrock mandate, those same women would be product managers.
>>106745653>Here, by government/blackrock mandate, those same women would be product managersGrim and heavily depressing when you put it like that... Btw the single mail I had to sent at work today was to a female product manager.
>>106745506Why is Miku acting shocked?We've all seen her do way lewder stuff...
>>106745611That doesn't work as well as suggested, unfortunately. You can make it somewhat less prude by changing its identity and system prompt, but it will still make up its own OpenAI rules in the reasoning. It might also be that the 20B version is much more cucked than the larger 120B model (which I haven't even tried).
I don't think glm4.6 isn't getting its weights released
will they fix glm-sex repetition and determinism?
>>106745786they will fix sex alright
https://openrouter.ai/z-ai/glm-4.6https://hf.co/zai-org/GLM-4.6
>>106745803You're just inflicting bad karma on yourself, man.
btw what would be nice for training is a prompt adherence parameter. I don't mean a memepler but an actual parameter you send along with the prompt into the model which determines how hard it autocompletes. I think a lot of repetition problems comes from training teaching the model to follow formatting. Some models learn it too much and maybe if you had a slider you could retain formatting capability and actually reduce repetition. By repetition I don't mean verbatim (although even that happened to me with glm chan) but also starting each paragraph with same word or same sentence structure. All of those patterns are probably from learning formatting too much.
>>106738654I should apply, then.
>>106745464>I will continue to monitor the characters to ensure they are unharmed both physically and psychologically during the course of the roleplay.lmfao
>>106745803
>>106745814Mistral Small 3.2 still does that a lot. It's extremely annoying to often see 4-5 almost identically-formatted paragraphs in the same RP response that you have to manually edit or regenerate. I think they have to train their models [much] more on longer natural conversations, but I'm afraid it's not their core business at the moment.
>>106745803hahaha
>>106745803HA! Super funny and original jokes sarr, very best of IQ.
>>106745435Obviously, french gov is worse than thirdies in corruption
>>106740911>Qwen3-vl>235BOr for people who understand that text generators are not good enough to justify €10k GPUs?
>>106739023>Qwen-Next and DS-V3.2-Exp have a terribly low context scoreThose models are useless. This linearization trick is still far from being usable.
is GLM good at roleplay?if so, what kind of computer do I need for it?
>>106745464Lol. What model is this?
>>106739413Well, yes, women had the same rights as men during the medieval age. Leftists want you to think this is new, so they can claim they did it. The truth is that women lost most of their right because of the French Revolution, and then the "traditional genre roles" were really cemented by the first industrial revolutions (industries were making more money this way). Now, having women in the workforce makes more money, hence the push for "equality". This is simplified, but this is the picture of how things went. The "men work, women cook" is mostly a trick by greedy people, although it was kind of true in some places (like Iceland).
>>106746236I don't think first wave feminists were leftists anyway.
>>106744715>ERP>serious hobbyare you 14?
>>106744715The Putin is that a few thousand dollars only gets you rather unusable performance
>>106746100128GB RAM for full at usable quants, 64GB for Air at usable quantsAt least one decent GPU will help a lot, ideally 24GB VRAM.
>>106746097The DS 3.2 scores make no sense in general, benchmark's fucked
>>106746236Yeah I'm sure they were parading with 'my body, my choice' and telling you that 'you can't judge them based on their sexual history' lmao
>>106746445YIKES! Did you just question the bencherinos?
>>106746445I think you mean it's bussin.
>>106746479bussying?
>guy at my engineering job announces on an AI channel that our internal chatbots have scored super high on HarmBenchI want to die.
>>106746528link to your internal chatbots for verification?
>>106746236found the jew
>>106746528Who ERP with the internal chatbots?
>>106746541saar this isn't aicg
still waiting for glm 4.6 ggufs
>>106746637we first need them to release the 4.6 weights in the first placewhich might not happen
I've still never even bothered trying big GLM 4.5. Probably because my CPU inferencing is garbage right now since my server's resources are allocated rather haphazardly to do other things at the moment.
>>106746332i have 64gb ddr5, and 24gb vramwhat do i need to do to run it?are you sure this is good for roleplay?
>dissed Israel with my buddies>twitter suddenly starts showing israeli propaganda content out of nowhereAlright I think I'm fucked. I used openrouter with my credit card for coom content (thank God no csam) but it was some pretty perverted stuff. They got dirt on me now, dirt that I would rather fly to the middle east and get shot than have my family know. How do I get started with local? The guides seem outdated.
how do you format character cards? is it better to write paragraphs or use json/yaml? also what key words can i use to refer to things like me the person talking to the bot? does "Loves {{user}}? work?
>>106746714that's odd because you talk like the average jewish slide thread on /pol/.
>>106746727What's a jewish slide thread?
>>106746727koboldcpp + mistral nemo gguf that's around 1 gig smaller than your vram.
>>106745141using that money to fund thirdie migrants is unfortunately a bigger priority for the french than improving european ai
>>106746714VRAM available? RAM available?
>>106746759No but I can shell out. What's the best starter build?
>>106746768Nvidia DGX H200
>>106746714oobabooga is all you need for running models. If you want a better frontend, use SillyTavern as well.He's right that the guide is outdated
>>10674676812 memory channels AMD Epyc with 256gb of DDR5 RAM and a RTX 3090.
>>106746768any GPU can run things like nemo see >>106746751I can't comment how much better deepseek really is since I haven't tried it. I'd be interested to know this too, especially for writing stories, is there anything that really sets deepseek apart
>>106746714>How do I get started with local?get a m3 ultra mac studio 512 or two rtx pro 6000s
>>106746751I'm OG /lmg/ you faggot. My room looks like Asmongold's except instead of Dr. Pepper cups its 3090 boxes everywhere.
>>106746787
>>106746797CUDA dev, settle down. No one cares about your favorite eceleb or your stack of 3090 boxes.
>>106746785>oobabooga is all you need for running modelsrecommending gradio shitware when people have finally moved on is extra cruel
>>106746528>super high on HarmBenchDoes that mean they're super harmful or that they're gigacucked?
>>106746646Zhipu AI would NEVER lier to us! Trust!
>>106746860See? TOLD YOU GUYS!https://huggingface.co/zai-org/GLM-4.6
>>106746854I wouldn't want to die if it was harmful. Religion of safety is everywhere now.
>>106746877SEEEEEEEEEEEEEEEEEEEEEEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXwith glm-chan with new makeup
>>106746528On a fundamental level, the more powerful a tool is the more potential harm you can do with it.A weak tool inherently has little potential harm.
>>106746877oh wow this time you're not being an asshole who can't come up with a new joke. JGUF status?
>>106746768>What's the best starter build?Trying stuff out on your existing hardware.Then maybe getting some hardware when you've got a target llm in mind.Or you could just max out the ram in your pc and get an rtx 3090 for it 24gb of vram.
>>106746951How do you know he's the same guy? I post links here too.
>GLM-4.6 only showing up as 357B while GLM-4.5 is listed at 358BWOW massive performance per parameter uplift. Soon you'll be able to run it on a gameboy color.
Do you guys ever use>https://huggingface.co/spaces/ggml-org/gguf-my-repo
im not impressed by mistral nemo. what model should i be using if i have 24gb of vram?
What storage are you guys using to hold all your weights? I have a 16TB drive for generic use. It's been fine. But it's almost full now.
>>106746951I'm a different guy.
>>106747006GLM Air if you have 64gb of RAM.
>>106746691I think GLM 4.5 Air is good at roleplay. It tends to be a bit verbose for me, but I imagine I could fix that if I spent some time tweaking prompts and sampler settings.You should be able to get a quant in the 60GB range and run it at tolerable speeds.
>>106746877>glm (300b+) beats deepseek (600b+)how did they do it?
>>106747006Why are you still using Mistral Nemo when you could use Mistral Small 3.2?
>>1067470094tb nvme drive.I need to build a nas.
>>106747006Qwen3-4B-Thinking should be the new turbo-VRAMlet suggestion anyway. But for a 24GB Vramlet, assuming you want to run completely on GPU, I would suggest checking out Tongyi-DeepResearch. I run it at a higher quant on my server so I don't know how badly quanting down to 4-bits hurts it, but at Q8 it's a solid option for 48GB vramlets. Although I will warn, the implementation of chat templates is garbage on llama.cpp and so if you're doing something that's using the model metadata to determine the chat template via llama.cpp it's going to default to ChatML. But Tongyi uses the Tulu format.
Can someone please link what i need to read to make android grok ani lookalike? Saw something about sillytvavern and have a 5070ti so maybe i can use that local llm and stuff
>>106747030There's only so much a 30~40b active parameter MoE can do. Turns out there's a ceiling for that even if you strap twice or three times the total parameters onto it.
>>106747076I bet if they trained it to 40T or whatever ridiculous thing Llama did to scout the ceiling would probably be closer to Qwen 235B sized.
>>106746984I've already had to learn how to quant my own models prior to that page existing so I have no need.
>>106746877>For general evaluations, we recommend using a sampling temperature of 1.0.Based.
>>106747093To this day I wonder how they fucked those models up so badly
>>106747006Rocinante
>>106747421My fault, sorry sir.
>>106747030Just like that time when qwq32b beat r1
>>106747421>Hybrid thinking/non-thinking behavior>Needle in a Haystack training even though I've proven through experimentation that it actually fundamentally alters the way a model handles context in a way that is utterly detrimental. >Safetyslopping>Pajeets. Don't forget. After the launch disaster a bunch of meta-ai jeets walked and were replaced with asians that they poached from openai.
>>106747009Had a 20TB for archive weights/datasets and killed it extracting one of the huge chub archives to plaintext for research. Think the mistake was having the archive and extraction on the same drive so it was seeking back and forth. It started clicking and runs up hundreds of Seek Error Count per second :(Seagate never again
>>106747484I don't remember them firing anyone as a result of Llama 4. A couple of the higher ups left so as to not be associated with it, but that's all. Zuck actually said he was going to keep the Llama 4 team around, commitment to open source etc, but then he just folded them all into the super-intelligence orgy.
>>106746877I guess there's no plans to grace us with 4.6-air.Sad sad sad.
>>106746898Brushing Ubergarm's hair
>>106747006Mistral Small 3.2
What is the verdict on SeekDeepSex 3.2?
>>106739413Imagine if this had been the way they updated this scene for the Thing remake
local losthttps://www.youtube.com/watch?v=gzneGhpXwjU
>>106748535Surely this time it's not just a bunch of cherrypicked, benchmaxxed examples that will get quickly eclipsed by some random chink startup the next day.
>>106748568>>106748568>>106748568
>>106748535this vid gave me a huge headache. wtf are these garbage voices gens.
>>106748593Happy Tuesday Teto