/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108956323 & >>108949851►News>(05/29) Step 3.7 Flash released: https://hf.co/stepfun-ai/Step-3.7-Flash>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108956323--Intel Crescent Island GPU's high VRAM capacity and bandwidth specifications:>108956813 >108956855 >108956867 >108956870 >108956887 >108956903 >108956945 >108956964 >108956979 >108957315--Comparing mistral.rs and llama.cpp performance on B200 GPUs:>108956708 >108956745 >108956760 >108956809 >108957775 >108958023 >108958036 >108958048--Comparing Nvidia N1X memory bandwidth against AMD Ryzen AI Max:>108958059 >108958069 >108958082 >108958089 >108959414 >108961327 >108961628--llama-bench results for Qwen 3.5 and Gemma 4 on M4 Max:>108960068 >108960632--Mistral.rs benchmarks showing poor UGFF output quality vs llama.cpp:>108957878 >108957885 >108958096 >108958129--Addressing Gemma 4's repetitiveness in roleplay:>108960336 >108960455 >108960593 >108960708 >108962888 >108962990--Proprietary status and open-source promises of MiniMax M3:>108956662 >108956673 >108956733 >108956692 >108960423 >108956722--Coding agents preferring shell commands over built-in tool actions:>108957947 >108957967 >108957980 >108957985 >108958007--Local TTS recommendations for long-form narration and PDF reading:>108961085 >108961152 >108961188 >108961212 >108961282 >108961744--Mixed reports on llama.cpp PR for limiting llama_context outputs:>108957117 >108957200 >108957226 >108957588 >108960370--Using DuckDB and local datasets for offline information retrieval:>108962182 >108962270 >108963394--OS power plans and GPU clock locking for faster offloading:>108958954 >108959002 >108959506--ROCm support and stability issues with v620 GPUs:>108956495 >108956554--Comparing Go and Python memory usage for TTS server startup:>108962253--Logs:>108957062 >108957878 >108958548 >108961425 >108962253 >108962759 >108963127 >108963543--Miku (free space):>108956410 >108960487 >108962255 >108962716►Recent Highlight Posts from the Previous Thread: >>108956325Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Tetolove
Tetomarriage
nth for QoL
I ripped out Orb's slop detector an ran it on the c2 logs datasetNow I need to make deepseek 4 rewrite the flagged sentences until no more slop is detected, then try training some shit on it
Can I force more of a moe model onto ram? If I just leave it on auto I can fit Q4 moe qwen despite being able to fit Q6 moe gemma. And I have ram to spare.
>>108964085just j-j-jam it in
Fuck, Marry, KillMiku, Teto, NeruGo.
>>108964103Kill, Marry, Fuck
>>108964103Marry then kill, fuck then kill, kill
>>108964079Interesting idea. I mean if I cared I would implement something like this too. > Scan output for sneed words -> generate second pass. This can be automagic too.
>>108964079We can anticipate the results: it will be more difficult to train compared to the raw logs, and the model will exhibit different slop, while also degrading in capabilities anywhere else than roleplay in the format of the logs.
>>108963996What state do they benchmark closed source models, because they fiddle with them and change system prompts almost daily.
>>108964085yes, with -ngl and -ot args
>>108964079won't that that just converge to next thing you'll find annoying?also, how long are these?
Is Vulkan good enough nowadays that I can pick up a second AMD card with lots of VRAM to pair with my 5080?
>>108964143All I hear is seething from the AMD camp
>>108964143It works decently enough on my 7900 XTX, but that's just one GPU I haven't tried a multi GPU setup with it.
>>108964128>>108964140You're actually absolutely rightThat would be no different than to just distill deepseek directly now that I think about it
>>108964143absolutely sir please to buy! supports is very the good
ComfyOrg is a grift company. We need cumfart alternatives not engrain grifter projects into our chats. Absolutely disgusting
>>108964167You should be posting in that pewdiepie thread instead.
>>108964167You lost boy?>>108964162I don't understand the point to deepseek now that Qwen is here.
>>108964162i mean, you could change the slop profile, but not remove it entirelymaybe divide the dataset into x parts and each element have different guidelines for rewriting?like 25% one style 25% other, etc.at least it would be different
>>108964147>>108964155>>108964163Guess I'll just try finding a cheap (lol) 16GB 4060Ti or 5060Ti
>>108964182Your speeds are going to go to shit target a used dupe card or just get a unfied memory system to cope
even in strokes gemmy is pure sovl
>>108964140These anti-slop methods will never work properly except for the low-hanging fruit, because the samples will get fixed independently from each other, and that's where new slop will be introduced.Trying to fixing the problem just by finetuning is not the solution. A big source for the problem is that during inference the various conversations and message swipes are independent from each other, and current samplers do not fix this. LLMs do not have memory of past messages for avoiding frequently used patterns.
>>108964171why?>>108964172OP image>>108963996use sdcpp instead of that bloated malware. fuck comfy
>>108964176>maybe divide the dataset into x parts and each element have different guidelines for rewriting?Good luck getting it to converge
>>108964197this is a masterpiece, ex tier writing
>>108964204Because you are infatuated with youtubers.
>>108964190I was going off the figures this >>108956026 anon postedMid 20s t/s+ with Gemma MTP coming soon sounds good enough
>>108964201>because the samples will get fixed independently from each other, and that's where new slop will be introduced.What if I tested it against the whole dataset blob? Not a single expression will be repeated
>>108964228As long as you're good with it that's the only thing that matters.
>>108964197Gemma copies are personalized
>>108964103Marry, Fuck, Kill
>>108964167>>108964204You are still a raped retard, Julien
many such cases
>>108964223why would I be?
>>108964167>>108964271Can you both fuck off back to your containment general? Thanks.
>>108964244damn, she got you there
>TO STATES>• Implement a prohibition on standalone generative AI systems that have been built using unlawful web scraping, defined as the bulk and mass collection of training data through the World Wide Web, without protection against non-consensual collection of personal data.>• Enact legislation requiring transparency regarding training data collection practices and accountability across AI supply chains, and further:>• Require in law that technology companies, including those developing and deploying generative AI systems, carry out ongoing and proactive human rights due diligence to identify and address human rights risks and impacts related to their global operations. This must include clear regulatory frameworks requiring mandatory human rights impact assessments before the deployment of generative AI systems.>[..]>• Ensure meaningful consultation by independent bodies with affected communities, particularly those historically marginalized or discriminated against, throughout the lifecycle of the product.>• Where AI deployments are identified as exacerbating existing inequalities or creating new forms of discrimination, to cease their use.>• In all development, deployment and use of any AI system, guarantee access to effective remedy for human rights abuses linked to the impacts of technology companies, wherever the harms occur, including harms resulting from the operations of their subsidiaries, whether foreign or domestic. Redress mechanisms should be made easily accessible and understandable to enable individuals to file complaints when their rights have been infringed.
>>108964298Dose Amnesty International exclusively hire retards?
>>108964278catjak is here too so you can share his mental illness
>>108964103marry, fuck, killit makes the most sense if you think about it
>>108964278comfyorg should die for the enshitification of the ui. They killed a good app
>>108964103fuck fuck fuck fuck fuck fuck fuck fuck
>>108964298>they'll protest this but not the age verif everywhere absolutely raping any inch of privacy one might have
>>108964298thankfully this is so retarded on its face that it will be rightfully ignored. a ban on training on bulk data from the web is basically a blanket ban on LLMs kek. and as for the rest>our idea of effective regulation is, um... you have to fill out a lot of paperwork that no one will read about heckin' systemic injustice!look at my progressives dawg, we are never having an effective left wing movement ever
>>108964259Miku's still holding my post :)
>>108964298>some jewish ngo has an opinion on something
>>108964298basically eu ai act lol
minimax m3 is soon(tm) but i feel nothing..
>>108964465it's gonna be 1t and the arch will never be implemented in llama.cpp
>>108964465Right after Q3.7 release for sure
>>108964228That includes a 5070 ti, which is basically a 3090 equivalent with 16gb vram. You're probably not getting that speeds with tensor parallel even with two 5060 ti.
>>108960896Under qwen 3.6 27b direction it chose more trip hop and R&B, same seed and settings in ace step. Will still dock a point no mention of kitsune in the songhttps://vocaroo.com/1fvHCXj0Vp2m
>>108964465I tried it over openrouter and it's certainly another minimax model.I don't have a lot more to say about it.
>>108963996llama.cpp.performance went to shit over the last couple of months, older version I am using concurrently is twice as fast on qwen3moe
>>108964613any concrete metrics like llama bench and kld or just posting shit feels?
>>108964626>kldThat's a good point actually.It could be that it was faster, but also that something was broken and the outputs were degraded.I think something like that happened back in the 80B A3B days, IIRC.
>>108964298> prohibition on such systems. lol. ofc you can fill out a bunch of forms to get an Amnesty Int'l seal of approval. Fuck these rent seeking mfer's. Also link so other anons can point and laugh: https://www.amnesty.org/en/documents/pol40/0996/2026/en/>>108964322It's an NGO. So yes. Also this: >>108964406
>>108964517I'd be pairing the 5060Ti with a 5080 albeithough
>>108964298Based. Ban all large scale training and deployment until regulations on lawful data use are developed and implemented. Open source all prior existing models trained on unlawfully obtained data. Put the technojews who orchestrated it all behind bars. Models trained on humanity's accumulated cultural output should be free, only models trained on novel data should be allowed to be closed.
>>108964626Of course 'llama cuda dev' defense force is here in action.
>>108964143It should be fine for -sm layer which just pipelines the GPUs; you can compile multiple ggml backends at once and then mix and match them at runtime.For -sm tensor which attempts to run the GPUs in parallel mixing NVIDIA and AMD is a non-starter I think since there is no vendor support for synchronization between them.
>>108964626Posting my disbelief> llama-benchI build it later/tomorrow and post some results, got other stuff running on that machine rn
>>108964748will there be an option to combine tensor and pipeline parallelism at some point? I'd like to run 3 groups of TP 4 or 6 groups of TP 2 if that's faster.
>>108964794My ultimate goal is to have support for the combination of tensor and pipeline parallelism but that will require a refactor of the graph allocator.One usecase will be to pipeline multiple copies of tensor parallelism with itself in order to hide the latencies of transfers between GPUs (unlcear whether that will actually work out).
>run llama/kobold on host>run vibe slop agent in vmIs this the way to do it?
>cuda dev>cuda dev>cuda devwhen will they hire a rocm dev? arent they being propped up by huggingface?
>>108964919this but save yourself the system resources and just use a docker container instead of a full-blown vm. hermes-agent has this built-in as a one-click setup.
>>108964939That would require rocm devs being a thing, sadly jensen and his cousin have conspired to have all of them disappear. Buy more to save more.
>>108964950Besides restricting file access what is sandboxing supposed to protect you from? If it's doing something malicious then isn't being on the same network already a risk?
>>108964962You don't have to give it network access to anything other than the inference endpointBut restricting file access is the main point. If it's doing something nasty from inside the VM, you can just stop the VM. But if it has free rein to fuck with your .bashrc and such, it can persist itself in all sorts of ways that will be hard for you to detect.
>>108964962nta, it's more limiting the blast radius if your model does something stupid (trying to delete stuff it shouldn't, breaking your config/env, etc) rather than active maliceit's a gate to stop the baby from falling down the stairs not a home security system
>>108958925Broken for me, shows up as name style. Archive still shows them correctly though.
Weird behavior I get with qwen 3.6 27b mtp, dmesg says Time jumped backwards, rotating. And my podman containers say they exited 292 years ago. Does anyone else have this or is this unrelated to llama.cpp? I usually run gemma (no mtp), and haven't had this happen. Ran qwen (no mtp) for a couple of weeks and didn't have this issue. Ran qwen (mtp), and 170k tokens in, this happens. I reboot, and try again and the server kills itself at 40k tokens. Ran without mtp and it crunched 250k fine.
>>108965068n-no, nothing ever happens!
>>108965068Did they forget to pay their bribes on time?
3.3 70b uncs what are you running
>>108965068finally, the models are going to be BASED as fuck now!!! MAGA!
>>108965047you’re gonna have to do more debug research than dmesg saying time jumped backwards. what did your ntp client do?
>>108965128GLM 4.6 IQ3KS/4.7 IQ2KL ubergarm for co-op writing story stuff, Gemma 4 31b Q8 for RP, Qwen 3.6 27b MTP Q8 for code.
>>108964960yeah but can’t some poor vibe coder shit put some functional support? lack of driver software should be a solved problem pretty soon. all in on intel and amd!
>>108965128Gemma 31B F16
>>108965161how's the speed if you don't mind me asking. Also how do you feel about smaller models making these jumps?I think it would allow you guys to run some crazy workflows no?
>>108965141I don't know how to check what my ntp client did, so I went to ask qwen (mtp) and it instantly killed my server lmao.
>>108965171you’re in over your head if you can’t read your system journal
>>108965177>you’re in over your head if you can’t read your system journalEvidently.I let gemma take look and she says my logs are all clean. Too clean - they just cut off at the time of the crash. Some kind of hardware fault that only rears its head with qwen (mtp)?I think I'll stick with gemma for now.
>>1089651672x 3090 128GB ddr4 3200 windows: 4.5-5.3 t/s, 19-25t/s, 40-60t/s. pp for gemma and qwen 1000-1700, forgot for glm.>Also how do you feel about smaller models making these jumps?They're just much better at following creative instructions, Gemmy especially, while having the same problems as before like slop and context rot, but the feeling of context rot is now noticeable to me around 8-16k instead of 2-4k for stories and RP.>I think it would allow you guys to run some crazy workflows no?Only new thing I'm doing is using Qwen in Cline because it's good enough to do so 90% of the time, meaning it can use Cline not that it doesn't fuck up the code occasionally. Works best if you have some knowledge or come up with a plan and tell it to do specific things. "Make so and so that do exactly this and wire to this" and not pure vibecode "make this feature".
>>108965275I had 35 tks pp for glm 4.5 on two 3090s and ddr4 3200.
When will powerful local LLMs be accessible to people with consuner hardware?
>>108965068>Situational Awareness is now 2 years old>people still haven't read itGovernment taking control over AI is inevitable. Leopold's prediction that it will happen in 27/28 seems accurate. I hope you people aren't retarded enough to be surprised when open source AI will become heavily regulated and largely outlawed.
>>108965295qwen3.6 exits
>>108965302You need at least Q8 and 200k context to do anything useful with it.
>>108965295within the decade
>>108965314Why are you spreading misinfo?q5 and up are fine
>>108965295powerful is a moving target and datacenters will always be better than consumer setupsyou'll probably be able to run something mostly as powerful as today's best stuff in a year or two, but by then there will be even better stuff in the cloud
>>108965345*it will be banned
>>108964278>bothIt's actually just petra. She has this tactic of accusing herself with other names to deflect the blame to people she doesn't like.
>>108965384shut up nerd
>>108965161>GLM 4.6Why? Because NovelAI told you to not use 4.7? Kill yourself fucking shill.
kek!
>he's back
>>108965403do not to whine:!
>>1089654033 days fly by so fast
>>108965292Just checked it, getting 120-160t/s with ubatch 4096 for 7500 new tokens on ik_. But of course it depends on new token count, PCIe bandwidth (i'm running 3.0 x8), and `-cuda offload-batch-size=` if low new token count.
>>108965298They can't unpublish models so they're going to just sabotage the inference engines like llama through normal FOSS social engineering vulnerabilities. Primarily solodevs like KoboldGOD are the only real path forward.>Vibecode your ownDoesn't build sustainable infrastructure longterm.
>>108965454>They can't unpublish models>makes hf illegal to access in you're path
>>108965132fuck yeah, no more antisemitic models! praise israel!
>>108965465Torrents still exist.
>>108965465>hf illegal to accessAnd herd everyone over to a chinese replacement site?It's a no-win scenario for them trying to kill open source genAI. All they can do is make it a pain in the ass to get data and stop big corps from releasing.
>>108965403Do you like FUD? Did you like getting spammed about 4.7 being more censored without anyone ever offering proof? Just because there's a fucking shitty company with a paid subscription stuck with it? Fucking worthless shill. Go try making money somewhere else fucking asshole.
>>108965489lol good luck getting goog to drop gemma5 via torrent like mistral used to do i guess
>>108965465>make crime illegalgee, guess I'll just give up
>>108965512all that matters is labs giving up, not like toones or anything community led ever did anything for us
>>108965521yes, every lab across the planet would simultaneously give up
>>108965500You're out of your mind if you think Goog ideologues won't open source and torrent the weights of everything they can if they think DRUMPF is coming for them.Despite spending the past half-decade beating their chests about muh safety, censorship, and all that gay shit, they will absolutely release le scary dangerous AI to empower the brave trans folx and peeohsees against voldemort megahitler. Some might argue that's why 31b released as good as it did as testing the waters for an open Gemini flash release.
>>108965521Remember how all the EU labs got gigafucked by legislation and it didn't matter at all to the rest of the world's labs? Same thing if the US does it. None of the US's best labs even release open source other than Google.
>>108965555EU doesn't have anything worth releasing doe.
>>108965068I'd rather have him oversee them than sam or dario desuI want my models without feminism trained into them
>>108965566Like the US, they had one lab worth a fuck to open source, in their case: Mistral.
>>108965572you'll get that, as well as no porn, monkey paw type shit
>something older from before the age of manSHUT UP GEMMA, YOU CAN'T SENSE AGE WHEN CASTING SPELLS
>>108965572>>108965582I just hope everyone of these niggers loses desu
is there any place where i can try different models at different quants to check what is good enough for the jobs i want to do before investing in ewastemaxxing?i dont mind needing to upload myself the models, but i would prefer to not need to make a virtual machine and install everything
>>108965599I guess rent a cuckpod (runpod) or something like that?
Why do we need programmers to write code using AI if the future is supposed to involve using agents that are designed to render that very software obsolete and automate it in the background?Just so they don’t lose their jobs for a little while longer?If we went straight to using agents, couldnt we save ourselves all the computing power we are currently pouring into software that will be obsolete tomorrow?
>>108965617You make money directing agents moron, human conductors are needed
>>108965596read the order. it's a nothingburgerhttps://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/it's just some shit about vuln scanning and codifying a retarded early access mechanism for le uber haxxor models to be used first by cyber security. basically just encouraging them to partner with the nsa to backdoor their shit or whatever>Nothing in this section shall be construed to authorize the creation of a mandatory governmental licensing, preclearance, or permitting requirement for the development, publication, release, or distribution of new AI models, including frontier models.friendly reminder that EOs do precisely nothing other than be public facing text for whatever bullshit policy they are already following internally
>>108965646> We need conductors to orchestrate agents to develop Excel for office bitches so we can make moneyAre you attached to your job? Why should the office bitch use your Excel when agents can do her work too? Why do we need you conductors to orchestrate your Excel for her?
>>108965696You're not a bright onePeople like you is why I sleep well at night
>>108965696>give the wheel to american corpobot>reports you to authorities for tax evasion
>>108965721What end-user software do we need that we couldn't replace with an end-user agent interface?Facebook and Candy Crush?
>>108965737Infrastructure retard
>>108965748Why do we need so many of you for that? AI isn't getting any worse - on the contrary, surely most of that can be streamlined away.And why so aggressive? Isn't that the master plan behind AI?
Back to base(d) Gemma I go.
>>108965572> begging for government regulationPic related
>>108965165Well ask geohot and you will discover the horror that is AMD's software division and how isolated they are from the rest of the company.
I think gembrain might be alright for a finetroonmost others just feel like a downgrade or schizothis one feels like an actual sidegrade to base gemma though
Reminder to not fall for Nvidia's propaganda, that new notebook of their is a Mediatek and those suck for local LLMs and anything that actually uses the GPU
>>108965572>It's fine when I like the boot on my face
https://huggingface.co/google/CircularNetsirs what is this?
>>108966039>see poster>see poster linkedin.com/in/ link on hf profile>Bengaluru, Karnataka, Indiaof course
>>108966039hmmthis is just a dataset but an open image model by googl... me thinks could be cool
>>108965962I mean, if they're going to sell it at a good price (lol) it can be a good product even if the performance is lacking.
>>108966052>Change the text "NAUGHTY" to "KINDNESS"based
>>108966039https://sustainability.google/stories/circular-economy-marketplace/>CircularNet: How Recykal built Asia’s largest circular economy marketplace using Google AI>September 2023>India produces around 62 million metric tonnes of waste a year>CircularNet, Google’s open-source machine learning model for waste management>now operating in more than 30 Indian states and union territories
>>108966072holy lamo
>>108964201isn't the problem itself intractable to begin with? Once you start hitting real phrases people use, you're just detecting and flagging cultural noise lmao, and there's always going to be that.
>thread culture mentioned, melt incoming
I tried to make a self supervising language trainer for a non-language model. It didn't work, but this excerpt from the logs cracked me up >Child: % i=aiiyie c2:s$is&o a eleP\ e in Xrhy l ao te e onrieii notr e aa sh>Parent (GPT-2): yeah>Child: i wintlnh,hate oilsho tieeieecRrnihruoe wi otapnmocnmany.peiiu wn2 e.eblntbriw, dwoX.sow>Parent (GPT-2): I love my son
> The AI Alliance wants to train a frontier base model by sharing weight deltas instead of data, so contributors keep their corpora local >https://thealliance.ai/blog/project-tapestry-the-path-to-frontier-sovereign-ai
>>108965919So, you’re telling me intel has a chance?
>>108966284Same chance as OpenAI releasing all their models for free with MIT license.
What is my opinion on Ed Zitroon?
>>108965607i guess it should be a good option to try