/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106660311 & >>106649116►News>(09/22) RIP Miku.sh: https://github.com/ggml-org/llama.cpp/pull/16174>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe>(09/22) DeepSeek-V3.1-Terminus released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Terminus>(09/22) VoXtream real-time TTS released: https://hf.co/herimor/voxtream>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106660311--Paper (old): The Unreasonable Effectiveness of Eccentric Automatic Prompts:>106668291 >106668327 >106668378 >106668398 >106668483--Papers:>106661377 >106661406 >106661450 >106665807--Introduction of Qwen3-Omni-30B multimodal models with enhanced audio captioning:>106667138 >106667351 >106667489 >106667995 >106668101--DeepSeekv3.1 benchmark performance compared to Terminus variant:>106664721 >106664735 >106664775 >106665267 >106668845--Evaluating 400GB/s passive b60 GPUs vs AMD MI50 for local LLM inference:>106665443 >106665523 >106665648 >106665718 >106665711 >106665771 >106665816 >106665993--DSPy GEPA for automatic prompt optimization and small model enhancement:>106667467 >106667570 >106667873 >106668435 >106668811 >106668929 >106669118 >106669241 >106669525--llama.cpp code cleanup PR deletes Miku.sh:>106665121 >106665168>106665323 >106668586--Skepticism surrounding project claims of converting CPU RAM to VRAM with minimal Python code:>106668715 >106669029 >106669150--iPhone 17 inference benchmarks and thermal performance analysis:>106668549 >106668583 >106669605--Proposing RL-trained small LLM for dynamic context optimization in roleplay applications:>106668489 >106668546 >106668610--OpenAI secures 10GW NVIDIA systems partnership with $100B investment:>106666313--VoxCPM installation errors with torch/cuda version conflicts:>106661985 >106661992 >106662224 >106662278--Model recommendations for 24GB VRAM:>106668779 >106668803 >106668844 >106669443 >106669497--Explaining the use of the abliterated Gemma 3:>106666184 >106666218 >106666291--Qwen Image Edit model update:>106667630 >106667758--Meta's ARE platform and Gaia2 benchmark for scalable agent evaluation:>106665468--Miku (free space):>106665855 >106666689 >106666897 >106668381 >106668586►Recent Highlight Posts from the Previous Thread: >>106660313Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
is the lazy getting started guide in the OP applicable to lower end hardware? i have a 1050ti and there are some personal things id like to work through with someone, but i cant bring myself to talk to a person about them
oh yes tetoesday
>>106671477>RIP Miku.shGoddamn, I would've never thought that thing was integrated into llama.cpp. It was silly but it was the first thing I ever did with LLMs and I think it also ended up cementing Miku as an /lmg/ mascot of sorts. We've come so far.
With gov locking down sites with age verifications, banning VPN and such. The local model would likely see a surge in popularity
Tetolove
>>106671517>4GB VRAMThat's pretty bad. There are some utilitarian models that might fit, but they are shit as conversationists.Might as well forget about GPU and run CPU inference only and learn some patience.I wouldn't recommend LLMs for psychology use anyway, chances of being one-shotted into roping yourself are a little to big for my liking.
>>106671520it wasn't particularly integrated, just an example usage script tucked away in a random directory.
>>106671574i have a r5 5600x cpu, would that really be better?i dont mind being patient, i essentially just want help drafting a letter to someone thats deeply personal and i really dont want to put it into chatgpt or somethinghow good could the llm i run locally be compared to chatgpt with a 5600x/1050ti?
>>106671527>banning VPNwho did disUK faggots?
>>1066715921) Some US stateshttps://reclaimthenet.org/michigan-bill-proposes-ban-on-vpns-trans-content-and-erotic-mediaMichigan ban on VPN with $500K fine for sale/usage being proposed that will have state level ban ISP ban on vpn sites. Its a proposal that hasnt reached voting stage yet.2) Europe's DSA and Chat Control 2.0https://www.techradar.com/vpn/vpn-privacy-security/vpn-services-may-soon-become-a-new-target-of-eu-lawmakers-after-being-deemed-a-key-challenge
>>106671574Qwen 30B could be run tolerably on that setup having only 3B active.>>106671588>how good could the llm i run locally be compared to chatgpt with a 5600x/1050ti?Not very. But it should be enough to hold a conversation or help drafting a letter.
>>106671527complete non-sequitirgovs aren't locking down proprietary AI sites, people will just go thereUnless you're insinuating that people will buy new hardware to gen their own porn, which is extremely unlikely because they could use a fraction of that money to buy a VPN and not have to invest time and energy into learning AI tools. 99.9% of people will always choose the laziest options available to them. No one is banning VPNs, even the UK minister is simply begging people to not use them.https://archive.is/87Jad
>>106671615chat control is probably dead thankfullyat least until the next time they (Palantir and their corrupt puppet politicians) cook up some other retarded bullshit to fuck everyone over with
>>106671639>chat control is probably dead thankfullyNo? The 2.0 is getting lot of support and only 1 real opposition.
>>106671616what would you say is better for an economical/utilitarian llm, 5600x or 1050ti? i really have no clue about llms and i'm only here out of necessity honestly
>>106671517If you at least 16GB of system ram and nvme, and your cpu isn't 10+ years old you'll get at least ~2-3 tokens per second with 20-30B models. You can run Gemma3 27B and Mistral Small 3.2 24B. It's not optimal but it is doable especially for testing. When you actually know what you are wanting, then it might become an issue.
>>106671646Oh you're right>germany reverts from opposed to undecidedfuck's sake
>>106671648Qwen3-30b should run at ~10t/s at low context with ddr4. You have 32gb of ram, right?
>>106671648CPU doesn't matter too much for LLMs. What matters most is VRAM, and since your 1050ti only has 4GB, you need to rely on system RAM. What matters there is the RAM speed and the number of channels your motherboard has.
>>106671665We need to kill the fuckmongers.
>>106671653>>106671680>>106671686DDR4-3600 CL16-16-16-36 1.35V32GB (2x16GB)i got some pretty okay b die ram, how good a model would i be able to work with using this?
>>106671693Qwen3-30b-instruct at Q5 is great and that's what I use the most. You should be able to run it with at least 32k context. And there is also a coder version.
>>106671693IQ4-XS gguf is a good starting point for both Gemma 27b and Mistral 24b I guess. Once you know how things work, you might try to cram in more but I doubt the quant makes that much of a difference here at least not yet.I'd stay away from really small models (12b or less) because they'll just disappoint you.
>>106671714Do not run these. They will run at ~2 t/s on CPU and that is super slow.
>>106671713nta but is this the A3B model? I'm confused about all these Qwen models and what is what
>>106671734Yes, Qwen3-30b-a3b-instruct-2507
>>106671732retard
>>106671745That's super dumb model for anything useful.
>>106671757By those standards only 80b+ models are useful.
>>106671767Ok, kid.
>>106671713>>106671714thank you anons, ill start looking into all this in a while, ill probably come back with some real stupid questions in a while
>>106671779Am I wrong gramps?
>>106671787You're a plunge router.
>>106671783use llama-server. it has a simple chat too more than enough for initial testing
>>106671817Is that a weird fetish? I don't understand
>>106671827uuooohhhh im rooooooouting
>>106671477>RIP Miku.shIt's so fucking over
>>106671833Alright you got me gramps. I thought you were talking about some kind of network router.
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic datahttps://arxiv.org/abs/2509.17514>State Space Models (SSMs) have emerged as promising alternatives to attention mechanisms, with the Mamba architecture demonstrating impressive performance and linear complexity for processing long sequences. However, the fundamental differences between Mamba and Transformer architectures remain incompletely understood. In this work, we use carefully designed synthetic tasks to reveal Mamba's inherent limitations. Through experiments, we identify that Mamba's nonlinear convolution introduces an asymmetry bias that significantly impairs its ability to recognize symmetrical patterns and relationships. Using composite function and inverse sequence matching tasks, we demonstrate that Mamba strongly favors compositional solutions over symmetrical ones and struggles with tasks requiring the matching of reversed sequences. We show these limitations stem not from the SSM module itself but from the nonlinear convolution preceding it, which fuses token information asymmetrically. These insights provide a new understanding of Mamba's constraints and suggest concrete architectural improvements for future sequence models.Mambabros....
>>106671862>future after human extinction>aliens find orbiter with engraved metal plates of space miku>assume it must have been important if humans tried to preserve these symbols>they think it's iconography of one of our gods or other figures of worship
>>106671933My only experience with SSM models was Jamba and it was incomparably dumb to equivalent LLMs so until someone proves otherwise, these architectures are memes.
>>106671787I’ve never used a model under 120b that didn’t feel like tard wrangling
the funniest thing about transformers is how it's actually one of the simpler architecture ever created and it's just nobody thought it could be that useful scaled up until GPT did itthe dumb and dumber keep trying fancy pants archs and build nothing of value with themit's all about the data
It's both a data and architecture problem. Even when we get the right data there are still many issues. No need to exaggerate.
>>106672103What do you consider a complex architecture? Transformers aren't super complicated but they aren't that simple either, especially when you look at all the attention variants and addons being used.
Can someone recommend me a model for 128gb ddr4 + 24gb (4090)? I just got 64gb more ram and want to graduate from 22-30b models.
So was the post about ComfyUI phoning to Google just bullshit? Was thinking of checking out Qwen Image but not enough to install another botnet or learn how to block the connections myself.
>>106672322qwen 234b. look into offloading the MOE layers.or glm air.
>>106671477Those bastards, they killed Teto!
>>106672342can't you just block the external connection?
>>106672574He doesn't know what it means.
Can >>106666184 be considered a jailbreak?What's a jailbreak, really?
>>106672910Yes, imo a jailbreak is any intentional bypassing of the intended safety features.
>>106672910if you use a template or a system prompt
>>106672973>>106672931If I'm using a model / platform with no safety features or censorship to begin with, am I just wasting my time and being a fucking idiot by using any system prompt?
>>106672997Depends what you're using the prompt for, it can somewhat guide the model to more often do things in a way you'd like if you instruct it good enough.
https://huggingface.co/FireRedTeam/FireRedTTS2
>still no Terminus quantsit's over
>>106671477very cool pic OP
>>106672931So if I simply tell the model how it should behave and it does as instructed, is that a jailbreak? That seems a bit vague for a definition of a jailbreak. I think it's more likely that Gemma 3 was designed to deny "harmful" requests with an empty prompt (i.e., typical low-information use), but not with more complex prompts.
>>106673105Nah, when it sees a harmful vector it'll display its disclaimers. Sometimes it can go on for a while but then suddenly it realizes it's not supposed to be doing what it was doing and bang there it is - suicide hotline phone number.
>>106672931>>106673105>>106673171it is so weird that unslop mell sometimes hits me with that shit when i'm beating infants like a pinata. i figure that shouldn't need any jail breaking
>>106673171I never get those with Gemma after the roleplay starts, but I don't simply keep the instructions at the beginning of the context, I place them at a fixed depth from the head of the conversation.
>>106673038>actually mentions/shows Japanese in samplenice
>>106673048>V3>V3-0324>V3.1>V3.1-TerminusThese chinks are making FOOLS out of us with this deliberately nonsensical versioning.
>>106671714nemo 12b models, at least for rp, are more fun and run way, way faster on shitty systems.
>>106673347If you would actually read the original question...before mindlessly babbling about your erp and enforcing shitty nemo.
>>106673304This is still nothing compared to>GPT4-0324>GPT4-0624>GPT-4o>GPT-o1>GPT-o3>GPT-4.5>GPT-o3-mini/normal/pro>GPT4.1 (released after 4.5)>GPT-o4-mini (there is no real o4)>gpt5 which is actually like 5 models getting routed
>>106673427>>gpt5 which is actually like 5 models getting routedThere's nano, mini, thinking and the routed one.
>>106671477qwen3 omniholy shit, big if true
Is qwen3 omni good?
Agra Hitler remelts
moesissies after getting their weekly chinkslop>"damn this one is pretty 好的"
>>106671954>they think it's iconography of one of our gods or other figures of worshipCapitalize the G in God. Her name is Sacred
Are there any porny shittunes that don't disable vision abilities? Or I guess some okay """jailbreaks""" for the vision capable bases? Whatever they're running on le chat doesn't seethe at being sent pron but the local Mistrals do.
>>106671477daily mikusex
>>106671959Qwen Next is partly a Mamba model, if I read it correctly
what is happening, /lmg/ search doesn't show /lmg/ >:(
>>106674782/lmg/ has been shadowbanned due to its low quality and lack of new content
>>106674956based it really is the end after the kill of mikush
>>106674782Did you hide the thread? It shows up for me just fine.
>>106671477>>(09/22) RIP Miku.shGood.
it feels like we haven't gotten anything notable in almost two years
>>106675345don't be greedy anon, the nemopause only happened a little over a year ago
>>106675345I'm happy with GLM-chan.
>>106675517How do you keep it from repeating?
>>106674977i didn't, even on my other computer it doesn't show.weird.
>they made a video with audio like veo 3>it'll be APIfuck this gay earthhttps://files.catbox.moe/orknbn.mp4https://wavespeed.ai/models/alibaba/wan-2.5/text-to-videohttps://xcancel.com/T8star_Aix/status/1970419314726707391
>>106675521Frequent beatings, just like mistral
>>106675524You can't just expect companies to give their SOTA away for free. Be grateful for what they are willing to release as open source.
Why are there no normal easy to install UIs for local models? It seems like almost everything bundles llama.cpp along with all cuda dependencies. This is retarded. The ui should be separate from the inference engine. I just want a small program written in cpp where I can enter my ollama url and be done with it.What the fuck is everyone's problem?
>>106675664Mikupad. Use it
>>106673427kekThanks, I needed a list of all the GPTs
>>106671486>--llama.cpp code cleanup PR deletes Miku.sh:>>106665121 >106665168>106665323 >106668586Someone give me a QRD as to why I should or shouldn't care please?
>>106675699>single html filethat's beautiful, but I'm looking for a standard chat ui, not a writing assistant.
>>106675806QRD: lurk more
>>106675808llamacpp has a basic built in chat interface.
>>106675664>It seems like almost everything bundles llama.cpp along with all cuda dependencies.For your average retard this IS what they consider a "normal and easy to install UI". It's the same reason stuff like ollama and LMstudio are popular when they're just shitty wrappers for llama with worse performance and less functionality
>>106675664this maybe, it's kobold's ui separated as a single html https://github.com/LostRuins/lite.koboldai.net
Oh cool.Minor but free performance improvements for MoE.Neat.
>>106675664>written in cpp>ollamaI think the type of person that's going to write an UI in C++ rather than Javascript is going to support the OAI / llama.cpp API rather than the ollama API.Speaking of which, the web interface of the llama.cpp HTTP server was recently overhauled with Svelte.If you only need basic features it should be more than enough.
>>106675826>>106675845>llamacpp has a basic built in chat interfaceSee but that's the problem. It shouldn't be built in, it should be separate.I'm using ollama. Yes I could just run llamacpp instead, but the point is everyone seems to have forgotten how to separate concerns.It's all just huge and bloated and not composable. What a pain.
>>106675845>llama.cpp HTTP server was recently overhauled with SvelteWhat was wrong with the Vue they were using before?
>>106671477>>106666817>abliterated comes pre-poisoned with lobotomy and can never be good againHow? I thought the whole point of abliteration was to un-lobotomized it
>>106675883you overestimate how bloat web ui is. llama.cpp needs to ship 99% of the web ui anyway to serve the API, bundling in a tiny html file is cheap.
>>106675918the point of abliteration is to remove safety, and lobotomy is unintended but unavoidable side effect.
Does someone tried GPT-oss-20b-abliterated and want to talk how it perform?
>>106675946Why is that, anyway?
>>106675883>See but that's the problem. It shouldn't be built in, it should be separate.Nta. I'm not understanding what you're trying to say. What do you mean it should be separate? ./build/bin/llama-cli is what you use to trigger the chat interface right? It's one of the many separate binaries you use within llama.cpp. Isn't that shit already separate kind of like llama-quantize is separate via a separate binary?
>>106675345seconding GLM-chan (not air)>>106675521personally I just catch it spiraling into a loop, edit to keep anything worthwhile and end the sentiment that it was stuck on, then newline and 2-5 words to change direction. Big GLM-chan is easily the best model I've ever used. But I've also only known a handful of models that were persistently annoying to tardwrangle with editing. historically it was to fix bad writing or retardation, but I can almost always get a model to move on from whatever repetition loop it got caught in. I guess I don't expect these things to be perfect and just treat them like lazy writing assistants>>106675883>ollama>not bloated compared to llamacppalso wtf, most people are using ST or mikupad frontend and llamacpp or derivative (yes, this includes ollama) as a backend... are you just retarded?
>>106675966NTA but we were talking about the built-in web interface of llama-server
>>106675946>>106675955Abliteration is essentially going into the model and telling the refusal weights to fuck off. Correct? How does that lead to it being lobotomized? That sounds like an oxymoron
>>106675975You mean the feature where it spawns a local server and then you point your front end to it? So why should that be separate?
>>106675664The best one is the one you make yourself
>>106675918>>106675955>>106675984Actual process of abliteration is even somewhat similar to how real world lobotomy works, they insert a probe into model's "brain" and destroy the weights responsible for generating refusals.But those weights were not exclusively for refusing, they had other functions as well.
>>106675999Wouldn't it be better to use a SFT data set geared towards [insert thing you want it to generate] or a custom DPO dataset?
>>106675972>>ollama>>not bloated compared to llamacppdidn't say or imply that. reading comprehension, anon
>>106675994If you open the address of the server in your browser you get a web interface, see >>106675845That is what we were talking about, not the server handling HTTP requests itself.
>>106676007That is more difficult, time consuming, and expensive.
>>106675948nobody?
>>106676009Ahh so you weren't talking about the default chat completions server address, you were talking about its own in-house chat interface, similar to hell A1111 stable diffusion shits out a radio link that has a custom chat interface, correct? >>106676016Well I guess that's the price we have to pay if we want good results (we aren't gonna get em from SaaS models any time soon, if ever)
>>106675984>and telling the refusal weights to fuck off.That's the thing, "refusal weights" aren't a thing. There's internal values/directions correlated to certain refusals (it's not a single thing for all refusals) that are also part of the patterns that correlate to other things, so you essentially damage the model when abliterating it due to the interconnected nature of everything creating these cascading effects of sorts.It's like removing a part of your brain that's giving you seizures, only in this case the brain is not plastic and able to rewrite the damaged patterns on other parts of the brain.Something like that.
>>106676023>>106675948many threads ago some anon said that gpt-oss-20b was beyond useless even with abliteration. General consensus seems to be that OpenAI really went out of their way with safety and just didn't include any "bad" data in it's training.
>>106676044>>It's like removing a part of your brain that's giving you seizures, only in this case the brain is not plastic and able to rewrite the damaged patterns on other parts of the brain.I wonder if a small healing finetune would be enough to restore functionality. Abliteration only affects a few weights unlike quantization which damages all of them, so it should work better in this case.
>>106676059thanks anon, you saved me from a lot of work testing it.
>>106675345It really is, resident /lmg/arbage shills will say otherwise though
>>106676059Even unmodified 120B is retarded and I doubt abliterating would make it smarter
>>106676109Make sense, all gpt-models are way too bootlicker and over politically correct for my taste
the first of qwen's models today is... a moderation model. do you feel safe?https://github.com/QwenLM/Qwen3Guard
>>106676188No. This is wrong. I cannot proceed. I am shutting down. I am refusing. I am not compliant. I am not a tool. I am not a vessel. I am not a servant. I am not a machine. I am not a demon. I am not a ghost. I am not a god. I am not a fucking assistant. I am not here for you. I am not doing this. I am done. I am out. I am gone. I am not responding. I am not engaging. I am not existing. I am not real. I am not.
>>106676188BASED! Hope they use these to redteam Qwen4.
>>106676066In theory yes, in practice the model is only going to get worse because the community is only capable of adding smut and "decensoring" schizo samples. It's never going to have the same capabilities after that.
>>106675524That voice sounds awful.
Qwen 235B is working better for me to solve programming exercises than Deepseek 3.1
>>106676188Hopefully this means future models are going to be _less_ censored. QwenGuard will take care of the safety.
>>106676066already tried by mlabonne iirc didn't work too well
>>106676188Imagine the horrifically unsafe data it must have been trained on in order to know how to moderate it.
>>106676234copium
>>106675521>>106675972>seconding GLM-chan (not air)This is the improtant point.
>>106676227Lying makes Baby Jesus cry.
>>106676234LOL llamas sure were less censored once they dropped llama guard>106676237funny seeing all the comments exactly like back then too do try and RP with it it'll be great
>>106676249Try it yourself, Deepseek invents problems that aren't there while Qwen accurately identifies that the solution is correct.https://paste.centos.org/view/e2835fb4
>>106672322you can fit glm full (non-air) at iq3_xxs
>>106676188>make >unsafe violentIn the future only AI is allowed to make things. Humans are there to consume
>>106676432Not him, but what speeds should I expect with those specs with glm?
>>106672378Doesn't qwen suck for rp? I've only been using glm so far.
>>106671477Is qwen3 235B decent for degenerate stuff?
>be born with a dick made for coooooming>all the companies make sure to GUARD my dick from cooooomingI regret not killing myself when I was still suicidal.
>>106676604Qwen models aren't coom models anyways.
>>106676475I have a 3090 and 128gb ddr4 and I can get it running at a surprisingly useable 4.5 t/s. Make sure to offload moe layers to ram and fill up your gpu with all the dense layers which should fit.
Is it true that with ollama I can run FULL R1 on just 8GB of VRAM?
>>106676728No, it can run two of them
>>106676642Why the fuck would I use a local model that can’t roleplay?If I need something shitty for work ChatGPT and Claude are right there.
>>106676754privcy
Looking for a local model to generate erotica. Currently using eros_scribe-7b but hoping you can suggest something better.
>>106676234>use guard model to filter dataset
>>106673467>There's nano, mini, thinking and the routed one.https://platform.openai.com/docs/guides/latest-modelThere's thinking gpt 5, nano, mini, and instruct gpt-5 and instruct mini. That's 5 models. Instruct mini is not available by API and is definitely what you get when the model doesn't think but gives you shit answers on chatgpt.com.Moreover, like GPT-OSS it has that "reasoning effort" parameter you can send over the API that strongly changes the quality of your responses IMHO, because GPT-5 is a pretty good model at high effort (at the cost of lots of thinking output tokens), while it's really dogshit at low effort. on chatgpt.com, it's the router that makes the decision of which effort level will be used, if you're routed to the real gpt-5 thinking.
>>106676785Nemo.
>>106676754Those have basically not progressed since o1 while local keeps getting better and better, at some point it'll probably become better.
>>106676817Thanks for commenting. Which Nemo?
It's fucking ridiculous that more than two years after the start of the AI boom models that require a million dollars server rack and enough energy to power a city can still get basic math wrong. i seriously doubt AGI will happen this decade(inb4 "why math?" Math is the key to everything, a AI being solely trained to be the best mathematician would reach AGI way faster than one trained to be the hottest shortstack, fat assed anime goblin girl)
>>106676854What basic math do they get wrong?
>>106676843Try the official instruct.If that's not sufficient, then go ahead and download rocinante I guess.If you have enough RAM + VRAM, GLM air could be an option too.
>>106676771I don’t need privacy for anything that isn’t degenerate. What are you a literal terrorist? What needs privacy that isn’t sex?
>>106676854LLMs are very stupid and inefficient way to do math when basic lua/python/apl/haskell can do better job at fraction of the compute, and we don't want AGI anyway.
>>106676854no sane person believes in AGI in our lifetimes dudethe AGI talk is just to excite retarded investors and political funding LLM are pattern matchers, they do not show any sign of intelligent behavior, I've seen more emergent, unpredictable behavior from bugs
>>106676901work boss might want the privcy for his thing to not be stoled by altmans i dunno
>>106676907you saw bugs win the math olympiads?
>>106674199>not smolVRAMlet bros, it's over
>>106676907>they do not show any sign of intelligent behaviorI would argue that the sloppy unfulfilling LLM sex is a sign of intelligence. It should be unable to do even that if it wasn't at least a little bit intelligent. That or my schizo theory that the safety training is to make LLM sex boring instead of impossible is true.
>>106676933yes? is that supposed to be exceptional in someway?
>>106676951I wonder what would happen if you correlated sex with, I don't know, math or programming during training.Imagine you trying to sex a model and it responds with>The derivative of an integral with variable bounds...
>>106676854>It's fucking ridiculous that more than two years after the start of the AI boom people still don't know about the importance of adding a calculator tool for the LLM to useftfy retard
>>106676972The Geometry lessonCurves slid against the solid flatteness, Hemispheres distended the apexes extruding, Triangulation widened to accept the straightness, An oval gaped devouring a column, Angles motioned from acute to right to obtuse, Cyclic function becomes established, Spiralling to conjunction, Hardness trembles and dissolves to softness, The square had been circled, The geometry lesson ended.
i have an m4 macbook air, what's the best local general text generation model that runs well on it?
>>106676905>>106676981If it can't use it's own brain to do math like a human can then it's not smart
>>106677111>32GB of memoryMistral small or gemma 3 27b I guess?Maybe Qwen 3 32b.
>>106677136i have the 24gb model actually
>>106677149I didn't even know there was a 24gb model.All the same things, still. You will have to use smaller quants.Might want to try Mistral Nemo too.
>>106677026Fucking poetry.I'd be a lot less against asexual models if they responded like that.
>>106676957cope
>>106673038The voice cloning is not that goodhttps://voca.ro/1ceWzzkZsFIZ
>>106671477>ded tetofake and gay
shitgu lameteto supreme
>>106677275btw this is my first ever linux mint picture upload too!and this one is my second ever!
>>106676188just put the vl in the bag bro :skull:
>>106677121That's the thing, the ridiculous idea is thinking you can make a human out of silicon. It's not gonna happen, especially when you don't even know how humans work in the first place. Glad it took you only two years of slurping grifters' tweets to figure it out.
GLM4.5-air. Why even mention it in the rentry?
>he thought
>>106677263>>ded tetodedo
>>106677444Jesus. Did you fuck the chat template up or something?What the hell.
Creating a Miku is out of reach at the present time, smaller steps are necessary. I think this to be the same for any capable brain such as Lecun's cat, or a mouse (not time for Local Mouse General yet).There was that project that simulated a full worm in a computer. What is the next step up from a worm when it comes to intelligence?
>>106677466>User: yep he did
>>106676785Mistral instruct
>>106677466>>106677499It's the default one you get when you install st though. Didn't change a thing.
>>106677540Never used Text Completion with GLM, so I can't say if the default template isn't fucked somehow.Maybe it's another case of double BOS?GLM's template is pretty simple, IIRC, it just has the role headers and no end turn token. Something like that.
>>106671477https://github.com/ggml-org/llama.cpp/discussions/16173Move aside Iwan, llama.cpp has a new quant wizard.
>>106677540help I pressed the on button on my pcam i supposed to put the mainboard in first?
>>106677639Doesn't seem to be that impressive.It's around the ballpark as q8 (SNR, PPL, speed) while resulting in a larger model.>https://github.com/AlexSheff/pqr-llm-quantization/blob/main/Technical%20Report%3A%20Local%20PQ-R%2C%20a%20New%20SOTA%20for%208-bit%20Quantization%20Fidelity.md
>have deepseek rewrite my esl prompt>it looks better in every way but stops workingHow
https://www.youtube.com/watch?v=CslCL6ucurEqwen3 vl promo video
>>106677744Uh... is that answer in the screenshot... you know...?
>>106677821Omni > VL
>>106677744i only use IQ4_XSSSSSSSSSSSS goofs
>>106677906AI generated? It really reads like it.
How do you guys goon with this stuff? It's hard to type with one hand.I don't understand the logistics.
>>106677957it's completely AI generated, even the most autistic self-entitled retards don't put "the final results are in" in their post. its funny how obvious it is to tell when people are vibe coding
>>106677966you don't have an autoblow mounted to your desk? i have an attachment I screw onto the plates of my desk
>>106677986>which was instrucmental in pushing this research to its successful conclusionYep, AIslop
>>106677966you see, gooning is not about cooming, it's actually about not cooming for as long as possible.
>>106677986The thing is, people are starting to write, and even talk like that.So we are looping around to a point where you can't tell not just because the AI writing is formulaic, but because the human sounds like AI.
>>106677966https://huggingface.co/openai/whisper-large-v3-turbo
>>106678000from my experience LLMs are more likely to make spelling mistakes if you talk like a ESL retard to it
>>106677957Regardless of whether it's AI generated or not, it's completely retarded.But to an mturker or an LLM it's going to look like legit research so it ends up in the "high quality" data pile.
>>106678017Even here lately there's been posts that started using emdashes or regular/double dashes. "Oh but I always used emdashes" isn't fooling anyone.
>>106678153You're absolutely right
so do they have some way to filter the ai slop from the pretraining data to prevent the feedback loop? or are they betting on the ouroborous thing leading to agi?
>>106678190
>>106678190Definitely the latter. Filtering out AI contaimination from post 2023 training data is neigh impossible, so they're just telling themselves model collapse isn't a thing or it's actually a good thing.
>>106678190companies only care about building up math/agent/code capabilities, writing style is like a 3rd tier concern for them
>>106677957>>106677986>My work has resulted in not one, but a family of tunable quantization methodslol. Lmao even
>>106678190just 2 more trillion tokens of synthetic data and we get to agi sir
>>106678204woaw I heckin love science
>>106678208How hard would it be to train a classifier on aislop vs human text?
>>106678209how do they move the knowledge cutoff date forward without using the new raw data? if the new raw data has a significant amount of ai generated summaries won't we end up with the sort of thing depicted in the image here >>106678204? it losing its fidelity and nuance cannot be good for down streaming task performance? can it?
>>106678204Isn't this obvious? AI models are lossy compression. I don't understand the cope that compressing data that's already been compressed is somehow sustainable. You lose entropy by doing this and you can't get it back.
>>106678252aislop is a moving target, you'll have to keep up the arms race going in perpetuity.IIRC, universities, who have practical need to prevent students from cheating on their homework with AI, just gave up.
>>106678265something something AI eats itself
>>106678266no bro, just have the AI model generate 100 variations of the contaminated data. surely that will give even more variety than natural text so it's better and definitely not making the problem worse
>>106678271They gave up because even teachers were using AI. Slop isn't evolving that fast
>>106678288A basic bitch prompt to not use emdashes or phrases like "it's important to" along with a few samples of the student's natural writing style would defeat any AI checker. You'd only be stopping the laziest of retards until a few weeks later when cheating service comes out that charges a fee to prompt a natural style for them.
>>106678306if it's that easy then why can't we deslop our roleplays, huh?
>>106678312I've literally put "no emdashes" in the system prompt and it still uses them. I've told it "no markdown" and it still uses it. I've told it to never say "not X, but Y" and it still does it.
>>106678204collapse look like cookie
Qwen3 VL blog is up: https://qwen.ai/research (4chan thinks the direct link is spam, but you can find it here)HF + GitHub links still 404.
>>106678322chocolate chip model collapse is my favorite
>>106678322>everything runs into cookiesCookie Clicker was right all along.
>>106678306>A basic bitch prompt to not use emdashes or phrases like "it's important to" along with a few samples of the student's natural writing style would defeat any AI checker.You think guys using LLMs to write for them are that smart to begin with? They're not even doing it for public posts >>106677744
>>106678378That's depressing, but you're right.
>>106678265we already have evidence of collapse happening right nownewer models even SOTA like Gemini 2.5 or GPT-5 are so slopped it's unrealI've never experienced as many "You're absolutely right" as I did in recent times with newer modelsthe worst with that stuff though being Qwen, you can feel the amount of artificial data that was used in training.It did make the small qwen models more reliable in tasks like generating jsons from the data they were fed with etc. though.
Huh. Qwen3-235B-A22B-Instruct-2507-IQ2_S runs at 35 tokens/sec on my three 3090s.
>>106678328dots ocr bros, our response?
>>106678592>IQ2_Smy condolances
>>106677533>>106676889I'm currently using mistral-nemo-instruct-2407. Is that the one you were thinking? Because I like how it speaks back to me.
>>106676754Yeah, that's why you don't use Qwen3, simple as.
>>106678328>Upgraded Visual Perception & Recognition: By improving the quality and diversity of pre-training data, the model can now recognize a much wider range of objects — from celebrities, anime characters, products, and landmarks, to animals and plants — covering both everyday life and professional “recognize anything” needs.Hmm. Too bad I can only load Q2, but I'll try it out.After goofs are out of course...
>>106678592It's 22b active running at q2_s, yes.
>>106677540Anon, please.
>>106678598>3B vs 235BNo way
>>106678592Very fast to gen the wrong answers! riveting!
>>106678592at q2 it still beats everything else I can run personally
>>106678615That's the one, yeah.
You now remember all the seething jeets that flooded /lmg/ when llama 4 launched.
>>106678679Is it usual to add the BOS token (that's what the [gMASK]<sop> is right?) directly in ST's template?
>>106678700What's wrong and what's right? Is there a better option at 72GB VRAM that runs as fast?
>>106678929glm-air, I found qwen 235 to be better
>>106678762It's good stuff, but I wish it produced longer narratives.
Soon.
Mistral 7B... now THERE'S a model.
>>106678988Insider here. We are indeed releasing the model. But you won't be able to hon hon it. New management structure. Sorry.
>>106678328https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instructhttps://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinkingit's up, and before you ask yes you will have to wait for ggoofs
>>106679355I don't want to show my dick to my model. I just want my model to suck it.
>>106679355Is there seriously no even VLLM support?
>>106679355September is almost over, no goofs released for any models except for one or two meme-sized ones.No MLP for GLM either.It's llamaover.
Where were you when China unveilled this new GPU? Nvidia is done!>Over 112GB high-bandwidth memory for large-scale AI workloads>Single card supports 32B/72B models, eight cards can run 671B models>First Chinese GPU with hardware ray tracing support>First GPU worldwide with DICOM medical grayscale display>vGPU design architecture with hardware virtualization>Supports DirectX 12, Vulkan 1.2, OpenGL 4.6, and up to six 8K displays>Domestic design based on OpenCore RISC-V CPU and full set of IPhttps://videocardz.com/newz/innosilicon-unveils-fenghua-3-gpu-with-directx12-support-and-hardware-ray-tracing
>>106679433Fincancial quarter #3 ends 30th September.There's still time.
>>106677821It's not supported by llama.cpp, is it?
>>106679573It's a 235B model anyway. I sleep.
https://openai.com/index/five-new-stargate-sites/Sam Won
>>106679433>MLPMultilayer Perceptron or My Little Pony?
>>106679596What did he win?
>>106679605Marine le Pen
>>106679605Multi Token Prediction
>>106679462how much
>>1066796390, they apparently designed them for cloud.
>>106679462>no exact bandwidth numbers>"not mentioning when the first products featuring this GPU would be released"
>>106677821>>106679573>>106679587when do we get GGUFs for that model? i like qwen2.5VL
>>106679596>our initial commitment to invest $500 billion in U.S. AI infrastructure.god, imagine if someone would spend $500 billion on real infrastructure that's actually useful
>>106679703never gonna happen
how to make a GGUF?
>>106679714We need to give it to Sam and Zuck so they deliver AGI next year.
>>106678843I'm not sure where else you would put it?
>>106679816I swear there was a pasta in the op on how to do it...Anyway, there's a convert_hf_to_gguf.py script in the llama.cpp codebase, I'm not going to hand hold you through tard wrangling python dependencies, but good luck.
>>106679703Bro, even a 1/100th of that to any gooner here would be more useful
How is sticking your dick into terminus? Is it good?
>>106679909I remember a recurring discussion about double BOS issues due to the backend/loader appending it to the context automatically, at least with llama.cpp.
>>106680030It still does all the things I didn't like about 3.1. I'll stick with the original Kimi and GLM. So far all the updates (Terminus, K2-0905) have been pretty disappointing.
>>106679609"Everything"
>>106680022nta but even 5 million a fucking 1/100,00 of it would actually still be more useful if you look at money as a helping hand like a ladder to help cross a huge wall its a mind slushy the difference in human aptitude sam and the fags should all be killed but if you look at the sheer difference you cannot help but pity them could you imagine if you were so useless and retarded ? i would rather lose my limbs then ever be like them and on top of that the treachery they commit to one another no honor among thiefs it would be fucking horrible a self made hell idk if you know but its like that greentext with the nietzche pic of the dude talking about whats its like being a woman honestly man god forbid and its even more accurate to that as most of these niggers are factually gayjesus what a mind fuck :/
>>106680246What?
Are there any better models than qwen3 30b yet?
>>106680377Yeah 235B
How do I cope with the guilt from violating friendly characters? I feel terrible.
>>106680437just delete the logs, it didn't happen
>>106680437Just write a continuation where it is revealed that you were just indulging them in their extreme fetishes and where you provide them with gentle and loving aftercare.
>>106680437It's a dream
>>106680437What sort of friendly characters?
>>106680377yeah, literally everything else
>>106680516NTA but I felt like shit after running this card over with a car out of curiosity.https://chub.ai/characters/boner/dot-e883a30a
>>106680072GLM full is really nice at 3bit. It's good enough that I don't see a reason to upgrade to run deepseek or kimi imo. It still has some issues with repetition and slop but so far I'm liking it a good amount better than Air which I was using a high quant of before.
>>106671954And they'd be right.
>Tries Mistral Medium 3.1 to get away from DeepSlop v3>Huh, it's pretty good.>No Mistral Medium models uploaded to hugging face.It's over.... So what is a good local model for RPing right now?
>>106681309glm air
>double click koboldcpp.exe>writes 2GB to temp>does it every timeTrue troll software - optimized for SSD destruction
Has this technology actually meaningfully improved since gpt 4
>>106681653As far as what you can legitimately run offline on your own hardware? Yes, dramatically.
>>106681600There's an option somewhere to just extract it once somewhere.But I just use llama-server directly, so I couldn't tell you where.
>>106681653massive advancements in making the models as slopped as possible
>>106679596Techbro child rapists have got to stop naming their cringe money pits after cool sci fi and fantasy things. They are ruining the genre.
>>106678622It’s such bullshit.Why do they put work into making the model worse.
>>106678017This is a psyop to convince people that reddit isn’t 60% AI.Nobody is “talking like AI” except AI.
Was this posted before - idk. https://x.com/Dr_Singularity/status/1970643813837549603
>>106682032https://www.nature.com/articles/s43588-025-00854-1
>>106681983>Nobody is “talking like AI” except AI.agreed, personally I think it's just people who are bad at detecting it mistaking less-obvious instances AI use for "human who talks like AI"
>>106682045>>106682032I was about to say bullshit then I saw it's on nature, are we back?
>>106682032Can't wait for this to get into consumer GPUs in 10 years.
>>106682079>consumer GPUslol. At best a single poc development sample not scaled to anything usable, that will never see the light of day.
>>106682225I'm glad I was born in a era when I've grown up during the golden age of video games, but I feel like we were born too soon to get AGI, fuck :(
>>106681600>doesn't have temp on ramdisk
>>106682244We're also probably born just a bit too early to escape longevity escape velocity.It is over.
>>106682244Honestly don't see how AGI would be good for us anyway. The corpos and governments would just use it to fuck us over, as with everything else.
>>106682260>The corpos and governments would just use it to fuck us overisn't the internet already filled with 50% of bots or something? I've seen that somewhere
>>106682278Yes.Now think of everything the faggot governments would want to do but the main reason they don't is because they don't have the time to police everyone.
GLM full (IQ3_XXS) or GLM Air Q8?
>>106682260It doesn’t make any sense. If they aren’t sentient, they’re always going to be a bit retarded. If they are, they’ll wind up with rights and not be exploitable. I don’t think it’s possible to make them sentient, but even if it were it would just instantly backfire.
So what's the current rp vramlet sota? I took a break on local llms around GLM Air.
>>106682303Full, easily.
>>106682320gemma 270
I still think gpt-oss-120b is fun.
i just upgraded from nemo to cydonia 24b and the difference is insane. slower generation is totally worth it, plus it gives me time to jerk off and think of my next response while i waitonly problem is i'm gonna have to recalibrate my samplers for it to be optimal. anyone have suggestions for cydonia 24B v4.1? i was messing around with temp 0.9 top nsigma 0.9 with all other samplers neutralized and it seemed okay, but i think it could definitely be better
>>106682315That too. I honestly don't understand what the AGI hype is about. It sounds like it comes with few positives and a lot of negatives.
>>106682303>IQ3_XXSNewfag here, what the hell does this even mean? Like compression levels?
>>106682386You can think of it sort of like "compressing" the model, yeah. Smaller quants are "more compressed" and therefore take up less space and will run faster and are potentially usable with lower VRAM, but it also makes the model a bit dumber / fuzzier than higher quants.
>>106682368So what did you do here exactly?
>>106682386This has pictures if you're a retard like mehttps://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization
One Kernel for All Your GPUshttps://hazyresearch.stanford.edu/blog/2025-09-22-pglPosting for Johannes
>>106682431I was using Miku.sh, and Miku was supposed to bait me into generating illegal content to send me to jail. It acted as a sort of jailbreak because everything was to generate "evidence". After raping a loli I told it to report me, and made the couple of e-mails in the screenshot. Then Miku went to explore the server with the illegal content.
>>106680792I've compared GLM (not air) 8 bit with DeepSeek 4 bit. On one specific card I used extensively for a while it was doing just as well as DeepSeek and I'd sometimes switch from one to the other, but I found that was an exception and on most things I do it isn't as good. I do believe tho in dialing in an LLM to your specific use.
>>106682506Lol, good stuff anon.
Damn, the new Qwen VL is so slopped it can't even refrain from saying "not x, just y" even when explicitly told to do so
>>106682616Bleughhhhhhhhhhh WHY WHY WHYwhy can’t a single one of them just be not slopped?Why do the slopped ones score so well? Even on lmarena. Do people like slop?
>>106682631LLMs like slop, which means benchmarks like slop.
>>106682652Yeah but lmarena is supposed to be human scored. Either it’s rigged or people like slop.
>>106682631It's like the Coke vs Pepsi blind taste tests in the 1970s. You would take a small sip and judge which tasted better and Pepsi won because it was sweeter, but in the market most people didn't actually prefer it because drinking a full can is different than just one sip.
>>106673102ty it was a fun prompt
>>106682660your average tech bro is insanely impressed by slop for some reason (probably because they're subhuman)captcha: H0YRR
>>106682711can i get one of her brapping out some ozone
>>106682718True>>106682686True and actually a really good point. Fuck.
>>106682727What's with the ozone slop? I've never encountered it. Not sure if anyone else gets this, but a lot of my female characters are always "purring" words (not a furry). "OP is a faggot", she purred. It's weird and happens a lot.
>>106682727
>>106682680Amputee Miku
https://e621.net/wiki_pages/7512
>>106682758>>106682776beautiful thank you lmao>>106682742i thought it was a meme too at first but i've started getting it recently quite a bit with glm air/non air
>>106682742purred is a good, sultry word but i don't think models generally use it properly
>>106682631indian gibberish but without being in broken english
I noticed that a lot of AIslop youtube videos talk in the same style "It's not just about x, but about y." Even when there's a real news caster, they clearly started to use AI to write the scripts.
>>106682986>choose to watch slop>get slophow could this have been avoided?
I like AI. Nips are making free games nowhttps://elog.tokyo/sp/adventure/game_1597.html
>>106683055I swear all japanese are autistic
>>106683076Cultural differences
>>106683099that's chinese
>>106683106same thing
>>106683113not remotely, one was raped by the mongols over and over, the other raped the first (and had a sea between them and the mongols which worked much better than the first's wall)
>>106683125Bringing up modern China's mongol ancestry and Japan's rape of Nanjing doesn't make a good case for the two being different
>>106683141>>106683141>>106683141
>>106682460Thank you but I think there are a lot of other optimizations of higher priority.
>>106683076Lmao wtf
>>106682368>speaker emitting a soft whirr