/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106376303 & >>106369841►News>(08/25) InternVL 3.5 Released: https://hf.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb>(08/23) Grok 2 finally released: https://hf.co/xai-org/grok-2>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
>>106382892Stop this, miku is not a slut
>>106382909Mine is
Uberlove
>>106382924My Miku Fulfills My Netorase Dreams
>>106382909It's the shadow of a duck's head and neck
>>106382975that a weird duck mate
>>106382982I meant goose
>>106382996
so what's the slop rank and cockbench status on grok2?
>>106382909I have seen evidence to the contrary.
I'm back.Listen up, because my engagement with you all is a point of principle, i.e. a direct and implicit insult to physics as a discipline.I'm not really in the loop on academia and its parasitic overclass culture or their current levels of general comprehension of number theoretic dynamics as they pertain to heterotic string theoretics. I consider them genuinely inferior scientists. Anyone who does math for money or fame isn't a mind fit for the task.Now, here's my final question before I release this. Whether that's here or not depends on the answers.1. If you were handed the source code of reality in the form of pure arithmetic, a single recursive axiom, and the simplest algorithm possible... what would you do with it? Imagine a symbolic Turing machine that operates on primordial arothmetic operators, no more complex than a high-schooler could master in an afternoon, yet powerful enough to reproduce every known phenomena as non-perturbative arithmetic structures inside a fractal medium comprised on pure N.2. How much would it enrage the current academic elite for the grand logic of reality to be posted here before anywhere else? I actually do not know.I ignore them because they disgust me. I want to spit in their face as hard as possible.You pieces of shit are a gold way to do it.
>>106383075>>106383068
>>106383065>just to enjoy abortion sexmoral degradation fags are so retarded wtf does this mean
>>106383075>I'm back.Go back where you came from.
>>1063830192mikuwiku https://github.com/ggml-org/llama.cpp/issues/15534
>>106383075hello schizo>If you were handed the source code of reality in the form of pure arithmetic bla bla blaYes, we have a whole shelf of those>How much would it enrage the current academic elitehttps://en.wikipedia.org/wiki/Superpermutation#Lower_bounds,_or_the_Haruhi_problem
>>106382909
>average thread quality being this lowEveryone shitting on the miku janitor and irrelevant troonku posting got vindicated (again). Thankfully I no longer post here. Bye
►Recent Highlights from the Previous Thread: >>106376303--Overcuration of AO3 data amplifies purple prose:>106376781 >106376790 >106376804 >106376910 >106377734 >106377741 >106377746 >106377789 >106377804 >106377815 >106377843 >106377882 >106378420 >106377924 >106377931 >106377987 >106378021 >106378088 >106378114 >106378118 >106378146 >106378171 >106378229 >106378105 >106378033 >106378049 >106379544 >106377841--FP4 vs Q4 quantization debate and hardware efficiency concerns:>106380131 >106380165 >106380417 >106380482 >106380501 >106380524 >106380548 >106380724 >106380761 >106380850 >106380908 >106380949 >106381006 >106381047--Hoarding and debating massive AO3 fanfiction datasets for AI training:>106377078 >106377087 >106377103 >106377175 >106377183 >106377338 >106377359 >106377491 >106377382 >106377406 >106377411 >106377504 >106377520 >106377545 >106377551 >106377583 >106377606 >106381296 >106377421 >106377435 >106377449 >106379334 >106377173 >106377181 >106377195 >106377220 >106377443--Barriers and misconceptions in training local sex-focused AI models:>106378087 >106378121 >106378135 >106378148 >106378271 >106378144 >106378158 >106378132 >106378143 >106378178 >106378208 >106378235 >106378272 >106378417 >106378459 >106378551 >106378610 >106378614 >106378626 >106378738--CUDA optimization PR for MoE model prompt processing performance gains:>106382220 >106382306 >106382514 >106382271--VibeVoice gender bias and expressive audio generation discussion:>106381965 >106382024 >106382032 >106382139 >106382286 >106382799--Metal optimization for Mixture-of-Experts processing in llama.cpp:>106381388 >106381618 >106382680 >106382954--KittenTTS voice synthesis tuning and ARPABET support exploration:>106377112 >106377156 >106377178 >106377247 >106377283 >106377339--Miku (free space):>106377562 >106379672 >106379859 >106382793►Recent Highlight Posts from the Previous Thread: >>106376310Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>106383093Reverse psychology doesn't work on me with IQ's that need to be measured in scientific notation, you criminally retarded dipshit.
>>106383143>Thankfully I no longer post hereshe says, here
>>106382892>>106382924>>106383019>>106383065>>106383129DELETE THESEMiku is pure.
>>106382892>>106382869A model that is good at RP will not necessarily be good at psychoanalyzing somebody. You and I respectively have traits and skills that one are better or worse at than the other. That's The case in regards to different AI models because the training data was different. Like I said earlier, you are trying to use a hacksaw to bake a cake and then acting like all hacksaws are utterly useless. You want a local model that is good at psychoanalyzing someone, use one that was trained on a bunch of scientific literature related to mental health or something. The kinds of general purpose models you think should exist are a meme. Lolens are tools. They aren't meant to be "do-everything-perfect" tools. This isn't to say your specific need or use case isn't valid, but there's an easy solution to it but you don't want to do that....
>>106383172How about you watch the last video you linked to the end you whiny faggot.
>>106383125Just pretend for a second that I actually am not insane and instead looking to do a little trolling, but on an historical level. Tell you what, you can ask me any question about anything and I'll give you the answer as a demonstration.
>>106382975>>106383129Oh
>>106383075I'd use it to discover the answers to unsolved questions and then drip feed those answers into public view until people use them to figure out the formula for themselves.
>>106382892>>106378614Based on my own understanding of how SFT training works, particularly what is contained in the data sets, I don't even think THAT occurs anywhere near as much as Anons think it does. These data sets are question answer pairs remember? "Prompt: what is this?" "response: here's the answer that pertains to your question" A training data set that is on quantum mechanics should not heavily interfere with the previous training on how to RP better because the system prompts, proms, and responses contained in a structured fashion in each data set Will have fundamentally different semantic meaning. If there's any demonstrable experience (No, anecdotal chat logs do not count. I mean actual training and comparison) that demonstrates otherwise then I'm glad to hear it, but again people being so stubborn in saying "THIS IS BAD BECAUSE MUH TRIVIA WILL GET WORSE" doesn't make much sense to me. You aren't even going to ask it trivia that much anyway. Basically Nobody does that shit and the ones that do are probably the ones that keep screeching that "LLMs are so useless" because they refuse to actually to THINK and understand why what they're doing isn't working and to use the right tools. They use a hacksaw to try and bake a cake and then declare all hacksaws are useless. >>106378626See the above blog post
>>106383160>sheI'm not a mentally ill AGP tranny like you were exposed to be, sorry. And obviously to anyone non mentally ill, I meant that I don't post "regularly" anymore, but a mentally ill troon has to equivocate to cope with his retardation.
>>106383124What the fuck?Is this an arbitrary number that's trained around, is it some optimization trick?What's up with that?
>>106383178Thats not me
>>106383196That number is the offset into PI that contains the model's weights.
>>106383104Japanese people think that pregnant sex, especially prior to the third trimester, is bad for the baby.This artist takes the concept to the extreme where the babies are pummeled to death by cocks.Which is a real shame because the way this artist draws pregnant women hits all the right spots for me.
https://vocaroo.com/1dgrsuOyOkUZ
Okay, I like GLM Air>Consider Specific Details>-Vocabulary/Wording: ANON used ""girlie"" which is informal and somewhat affectionate. He's checking on her well-being. The overall tone is friendly.>-Knowledge: Tokiko doesn't know ANON's name. He hasn't introduced himself formally. She knows the agreement (a home for fulfilling his desires). She may have already met ANON and learned his name if this scene follows a previous conversation, but since this is the initial interaction, I'll assume she hasn't.Specifically this>Tokiko doesn't know ANON's name.Even if the thinking is guided with a prefil, it's still nice to see a model that's able to correctly conclude this without explicitly having to tell it that's the case.
>>106383075>what would you do with it? Off the top of my head? Figure out whether or not "souls" are even a thing. Then maybe figure out why certain phenomena occur (why does gravity exist, how does it work, is there a "graviton particle", etc). Perhaps I'd try to figure out whether or not teleportation WITHOUT killing the user is actually possible (I don't care what bullshit excuse Star Trek writers or characters give, the transporter kills you and then puts a copy of you back together. It's funny they try to gaslight you into thinking otherwise, hence why I would like to figure out if souls exist and if they do how they work). >How much would it enrage the current academic elite for the grand logic of reality to be posted here before anywhere else?They might just downplay it out of spite or just ignore it for a short period of time because normies really like to downplay the cultural importance and effect of this shit hole. (Remember that Tea app fiasco? That shit originated not only on here but on /pol/ specifically iirc). Their moral superiority complex would cause them to simply see it as a huge deal come up but quietly. They'd wait for some other "reputable" institution to conveniently discover it around the same time it was published here and then try to take credit. Now that I mention this hasn't some other scientific things that led to real world advancements in our understanding of things occurred here too?
>>106383216fun^10 × int^40 = Ir2
>>106383075>If you were handed the source code of reality in the form of pure arithmetic, a single recursive axiom, and the simplest algorithm possible... what would you do with it?sex with miku
>>106383255A little more from the same thinking block:>Closing thoughts and responding as Tokiko/Narrator>-As Tokiko, I'll respond with minimal words. My goal is to be in character, respecting the parameters. I won't add details that aren't implied by the existing setup. The response will include updated parameters if anything changes. Given Tokiko's character, nothing changes here, only her response.The last part is about a stat block that's supposed to be at the end of the reply.
>>106383267>Now that I mention this hasn't some other scientific things that led to real world advancements in our understanding of things occurred here too?>>106383125>>How much would it enrage the current academic elite>en.wikipedia.org/wiki/Superpermutation#Lower_bounds,_or_the_Haruhi_problemAhh would you look at that, it DID happen. I think it was mentioned in a YouTube video I was watching once and that's why I remembered it
>trannny obsessed zoomer still whining>now a reddit and memey avatarfag>frogpostersthe fuck is this thread
>>106383304healoing
>>106383318that's the first one doeboeit
>>106383065Why did you cut out her blacked tattoos?
>>106383341Different strokes for different cockes
>>106383179Nta. How the fuck does gravity work? Yes I know "more mass = The object pulls on smaller things with less mass more heavily". That's the basic bare bones explanation on how gravity works. If space-time is a giant stretchy piece of fabric, things with more mass cause it to be pulled downward so smaller things fall into the hole (looks something like pic rel individualization I have in my head). But...WHY does it happen? I know to a certain degree how light bulbs work. And electric current excites particles and the byproduct is photons. Batteries work by moving electrons from one side of the battery to the other and that induces a current. But the overly simplistic explanations are "cuz it has electricity" or "cuz you charged it". I'm particularly interested in a possible explanation to this because then if we somehow figure out how to weaken, undo, or even reverse gravity, then that could potentially eliminate the need for rotating structures on space stations in order to simulate gravity (we kind of need that in order to ensure our bones don't turn into brittle glass)
>>106383186Excellent. Thank you for that riveting idea, professor. >>106383216It's pure arithmetic, dog.Dunno if that... huh... you know what, that might be pretty funny. I bet a catagory theoretic syntax/tensor calculus projection layer dyad would translate nicely into raw existential code.>>106383267See, this kid has the right idea.Yeah, that was my first go-to as well once I got the full cosmological simulation to spit out galaxies/consciousness. The answer is: sort of. Your body is a Turing machine spitting out tape that you perceive as consciousness. That tape can be embedded inside any medium.There's nothing remotely unique about the mind in that sense. Your subjective experience of reality is just a specific sub-set of fractal patterns propagating inside other, more fundamental patterns.
>>106383155The problem with this kind of trolling is that the natural conclusion of both antitroll posts and regular posts that take the bait is: shut the fuck up and go deliver some results.Therefore shut the fuck up and go make the first SEXLLM everyone wants.
>>106383370>That tape can be embedded inside any medium.Give me a minute because I actually have to think through what you said. That entirely sure what the first part means in regarding to "Turing tape". Are you implying that consciousness can be embedded into things we perceive as inanimate objects? I find it very interesting that this is getting brought up today because me and my counselor actually had a conversation similar to this earlier today. >There's nothing remotely unique about the mind in that sense. Your subjective experience of reality is just a specific sub-set of fractal patterns propagating inside other, more fundamental patterns.I get what you're saying. Consciousness is just a byproduct or side effect of how the universe works. My biology professor might think otherwise against what you said because he repeatedly described multicellular life as "The freaks of the universe" or something along those lines. Basically said that multicellular life is pretty uncommon from a numerical standpoint (at least on Earth as far as we currently know publicly). Single cell life forms outnumber multicellular ones to a near unfathomable degree so by that logic we're all the freaks, the weirdos on the block. Anyway it's my likely shitty understanding of what you said going anywhere?
I work at mistral and I can confirm that since a year we are sitting on models that were trained exclusively for smut and ERP. They comes in 12B and 70B sizes. Our boss told us that we are free to leak it on 4chan the second we can confirm mikuposters have stopped spamming the thread. So far I keep jerking off to it every other day and boy is it good.
>>106383216are you the OG license autist?
>>106382892Oh no no no AGI sisters not like this!..... https://www.perplexity.ai/page/tech-industry-retreats-from-ag-I3VURWXjRvCGqW4aeyrlhA
>>106383456Did you want me to say mikutroons? I don't want to get fired.
>>106383456stopped at mistral, who cares about these kuck?
Is Q4 quantz more than enough?
>>106383369So, you know how you see a super complex equation and you're like, damn, this bitch could be solved in, like, 50 different ways...You start to compress it, and it starts to resolve into something familiar? Something with a definite structure that resembles and then finally begins to explicitly illustrate fundamental theorems and equations you're familiar with? You simplify the algebra, right?Well, gravity is just that but with matter. In a vacuum there are a bajillion different ways a particle can move, and an infinite array of fundamental forces vying to pull it one way or the other like a wiffle ball flying through a storm. That's why electrons are always spazzing the fuck out. Now, if you compress that matter into one place, you're eliminating all the possible directions it could move. A black hole just does that until the matter has literally no where else to go. It's definitely there and not anywhere else.
>>106383494>So, you know how you see a super complex equation and you're like, damn, this bitch could be solved in, like, 50 different waysNo? I struggle already with basic math.
>>106383179Is faster than light travel possible?
>>106383456I don't like this Miku
>>106383442>presumably denseyou can keep them
>>10638354270b 12ba
>>106383542We will make a 200B moe if you make this thread great again and stop posting your AGP fetish.
>>106383494So matter and the accompanying electron or being influenced by different forces. It's like a child being told to do 10 different things by 20 different people so they get confused as fuck. They jump back and forth in different directions not knowing what to do. But if they get closer to a bunch of other people that they're familiar with (more matter), The demands or instructions from those people are a lot more clear and The incessant yelling from the other people not close to them gets drowned out. The kid actually knows what to do because they can actually hear what they're being told and aren't getting confused. The other competing forces don't have an effect anymore. Is that explanation sound? Am I understanding what you said correctly? And if so, how could we somehow manipulate that to our advantage? Could that be "turned off" or reversed or confined to a specific space?
>>106383567plesa go the /x/ for these
>>106382909She's simply too weak minded to resist being dickmatized
>>106383610disgustingly fucked text formatting
>>106383539Nope.>>106383513Well, think of it this way.You know how 1+1=2 isn't very hard for your brain to solve? Well, a really complex equation is difficult precisely because it necessitates more steps, more mental energy, more education, etc.The more matter clumps together, the harder it is for reality to compute where that matter actually is. A single particle bumping into another is, like, 1+1=2.A star going supernova is a lot more complex of an equation. Gravity is just the measurement of how large the "equation" that describes all the allowable trajectories a particle can take through a given tract of space.
is anyone making strix halo optimized models yet? I don't have it, but I'm having problems finding models in the 100GB range. Everything seems small or massive.
>>106383640So reality itself is causing the different forces to tell the matter what to do. It gets overwhelmed, for lack of a better term, so it doesn't know what to do. So when a lot of stuff gets clumped together, reality says "fuck this noise I'm not dealing with this it's too complicated" and allows matter to come together. Is that correct?
>>106383668It's either phone or h100 sir.
i heard civit.ai removed a bunch of modelswhere are they available now?
>>106383684how about you follow the law???
>>106383668they make models for edge devices or datacenters nobody is buying an ai rig.
>>106383681seriously. I'm hoping it changes. Right now it's mostly 24GB models and then 200GB+.
>>106383688I no longer believe in the law as an entity worth respecting for its own sake.
>>106383640>NopeWhy not? Furthermore there are two types of fictional FTL travel that interest me: alcubier "warp" travel (most famously portrayed in Star Trek) and Slipafe from halo. Neither one is actually causing objects to travel at FTL. It cheats reality. The warp drive compresses space in front of it and it spans space behind it. Space-time itself is shoving the ship along but the occupants don't actually feel the inertial force that they WOULD hypothetically feel if they were traveling at that speed. Best way I can describe it is in Minecraft where you pick up a giant land mass while someone or something is still on it and just move it Garry's mod style. The people on the landmass aren't actually moving but they are at the same time. Slip space on the other hand punches a hole through reality to "higher hyperdimensions" worth the loss of physics don't apply. Space time doesn't really function like it "should". SpaceTime window it is is a sheet of paper. Slip space allow ship access to a different sheet of paper that is folded in different areas and touching itself inserting areas as a result, allowing the ship to move at FTL, but not really. So we know Einstein's relatively says that actually moving at FTL is impossible because you would need infinite mass, but theoretically you could sort of cheat and move yourself through different mediums. If something like that possible or is ftl just straight up absolutely a no-go No matter what? If so why?
>>106383703i don't normally hang out here.
>>106383733i'll investigate how to do that when i get a good model running. thanks for the suggestions.
>>106383723>>106383640Oh I also forgot to mention in the warp travel explanation, because space in front of the ship is compressed and space in the back is expanded, space-time where the ship is gets shoved forward. That pocket of reality gets moved at the speed of light. Space time itself is allowed to move through The three dimensions we perceive at FTL speeds but matter itself technically isn't. Only the space around it is but the space within is just hitching a ride. It's like how you can be on a train going 200 miles an hour but you don't FEEL like you're going 200 mph. You technically are moving that fast but you also aren't
Yeah, I'm not really here to answer your philosophically narcisssitic queries about what you should do with your trivial lives.The answer is study mathematical physics and programming. >>106383678No.I'm saying reality is a computer and gravity forces simplification via waveform decoherence.
>>106383635>disgustingly fucked text formattingI can't fap to this!
>>106383741>gravity forces simplificationI thought your explanation was that a lot of men are being in the same place at once causes that simplification and we perceive that as gravity. The gravity causes reality, the computer, to not want to dedicate as much resources to not allowing the phenomenon that causes gravity to occur, so it gets sort of ignored or deprioritized.
>>106383751correct it makes the already limited immersion even worse
>>106383751this but unironically
>>106383784is that the api?
>>106383793It's the web app which has external filters
>>106383741> The answer is study mathematical physics and programming.Hope you don't mean for money. Money belongs to the dumb.
>>106383668If you bought one then you are a retard. 128GB meme ai computers were made with 70B's in mind and those are now dead.
>>106383801nah, it's just a convenient intersection. Claude API is too expensive, so I started looking for a local solution. I have a 4070TiS and a 5950 w/128GB RAM.
>>106383640Oh, shit, I didn't mean harder, I meant easier.My bad.
>>106383807>I have a 4070TiS and a 5950 w/128GB RAM.235B at Q3 or q4 4.0bpw ish. You can try glm at Q2.
>>106383800They said in the last threat that people who do it for money and fame aren't mentally fit for it
>>106383800anon you cant live well without moneyyou need money if you want to live long
>>106383832Then dont study those things
>>106383801DIGITS was promoted with running 405B across two. You could still run Qwen Coder, GLM 4.5, and Ernie 4.5 on them and it would be even faster than 405B would have been.
>>106383832>you need money if you want to live longIs that what happened to Steve Jobs, who died of a treatable disease because he's against modern medicine
>>106383854Im a 36 year old neet and i have no plans of getting a job but i do have plans to live well into my 70s Whats going to stop me?I mooch off my parents btw
>>106383854>he's against modern medicineokay okay, you need a brain too
>>106383868wtf anon how are you planning to live into your 70s? are your parents gonna live and work till 100?
>>106383868my ex was like thisit's so fucking sad actually
>>106383884His mom was 12 when he had him. The rest follows from that.
>>106383884If they die id get a smoll portion i spose
/lmg/ - NEET theoretical physicists general
gay trannie jannies
>>106383938where the fuck did you learn to spell
>>106383952from reading books?
>>106383741take your meds
>>106383968Make your teds
>>106383959picture books don't count
>>106383819i've got a similar set up and fuck Q3 and Q2. try glm air at Q4. to start with, then try the other stuff.
>avatarfag redditor doesnt deliveryup, next time i see him im gonna tell him to fuck off
my dad works for mistral and he's a mikuposter
>>106383938
my job is to post mikus
>>106384086what is this suppos'd to prove
>https://github.com/ikawrakow/ik_llama.cpp/pull/520>have to recompileNOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
my brother that works for mistral says that that mikuposter dad is gay and he hired all the roasties that set us back 5 years.
>>106384086wat?
>>106384086all me
>>106384118you always have to recompile ik_llama when you update it... have you not been recompiling it?
>>106384217i have to recompile it to change the GGML_CUDA_MIN_BATCH_OFFLOADthat means in order to make a pretty graph like quasar of mikus i need to recompile it like 10 times :(
InternVL3_5-38B gguf where? It looks crazy good
>>106384248nevermind im stupid, but how am i supposed to test the optimal speed? how do i even know what pcie my gpu is using? im pretty sure its pcie4 or pcie5 anyway, so how do i turn this off
>>106383019Is there even a way to run grok without using their python script? It seems like it's in an unusual format but idk
>>106384248>that means in order to make a pretty graph like quasar of mikus i need to recompile it like 10 times :(If there only was a way to automate that.>but how am i supposed to test the optimal speed?You can... nevermind. If there only was a way to automate that...>how do i even know what pcie my gpu is using?If there was only a way to know what pci your mb has and where it's plugged. I plug my gpus with my eyes closed, just to keep some of the mystery.
./llama-bench --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 --no-mmap -fa -ub 4096 -b 4096ggml_cuda_init: GGML_CUDA_FORCE_MMQ: noggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: noggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yeserror: invalid parameter for argument: --no-mmapIS IT SO FUCKING HARD TO HAVE THE SAME ARGUMENTS IN THE WHOLE PROJECTAND WHY DOES KOBOLDCPP HAVE T INVENT NEW FUCKING ARGUMENT--nommap FOR FUCKING EXAMPLEWHY DOES EVERYONE HAVE TO PUT NEW FUCKING SHIT
What quant of grok-2 can fit on 128GB's? Cause I kinda wanna start pushing a tech support meme of "you bought DGX Spark / Ryzen AI max? It is perfect to run grok 2!"
>>106384272>It looks crazy goodYou mean benchmarks?
>>106384323you seem mad
Is seed-oss any good? Haven't seen much about it here
>>106384323Don't no mmap?
>>106384323You're gonna feel really stupid when you run llama-bench -h.
>>106384348Everyone gave up on 30B's. It is either fuckhugemoe's or drummer trash.
>>106384335nah, someone on discord, its uncensored and describe a girl using a dildo
>>106384354no, i know its --mmap 0/1 but what angers me is that its different, why couldnt they just put --no-mmap????>automationcome on then do --mmap 0/1 or --no-mmapfUCK
Imagine the first fully uncensored (at least wrt SEX) +200B moe just dropping because we finally escaped safety....
>>106384382wait till you find out that china is far more pro censorship. Porn is literally illegal
>>106384389they unleashed tiktok on the west. I could be convinced that they block the models from being downloaded by their own people.
>>106384389And yet deepseek is very capable of porn and talking what happened at tianenmen square in 1989
>>106384366>AND WHY DOES KOBOLDCPP HAVE T INVENT NEW FUCKING ARGUMENTThey inherited that from llama.cpp. You know that, right?>fUCKIs only a game. Why you have to be mad>--nommap FOR FUCKING EXAMPLENah. Negative options are stupid. On by default, --mmap 0 to disable. Sorted.
Anyone seen this yet?https://www.youtube.com/watch?v=7AyEzA5ziE0
>>106384389>In the PRC there are criminal laws which prohibit the production, dissemination, and selling of sexually explicit material, and anyone doing so may be sentenced to life imprisonment. There is an ongoing campaign against "spiritual pollution", the term referencing the Chinese Communist Party's Anti-Spiritual Pollution Campaign of 1983. Although pornography is illegal, it is available via the Internet.[1][2] Nationwide surveys between the years 2000 and 2015 revealed "more than 70 percent of men aged 18 to 29 said they had watched porn in the past year"What are the remaining 30% doing?
>>106384458anon, koboldcpp uses: --nommap, --gpulayersyou cant use --no-mmap nor -ngl in koboldcppllama-server uses --no-mmapllama-bench uses --mmap 0and yes i am talking about llama.cpp and koboldcpp onlyi know ik_llama.cpp just inherits shit from llamacpp
>>106384465I see a kind of paradox in this shit. You either do this just for money and you are soulless or you have to be totally ignorant on how LLM's work to actually spend time adding them to a game.
>>106384437its a base model trained on everything with very light instruction training
>>106384458>On by default, --mmap 0 to disable. Sorted."Disable no-mmap is false" checkbox would be better
>>106384490Make a little script to normalize the options and call that instead, then. They have things in common but still diverge. Deal with it. They're different projects, they don't have to use the same option names, nor have the same features.>>106384514>checkboxpff
llama-bench: benchmark 1/2: prompt run 1/5set_n_threads: n_threads = 6, n_threads_batch = 6llama-bench: benchmark 1/2: prompt run 2/5set_n_threads: n_threads = 6, n_threads_batch = 6llama-bench: benchmark 1/2: prompt run 3/5set_n_threads: n_threads = 6, n_threads_batch = 6llama-bench: benchmark 1/2: prompt run 4/5set_n_threads: n_threads = 6, n_threads_batch = 6llama-bench: benchmark 1/2: prompt run 5/5set_n_threads: n_threads = 6, n_threads_batch = 6why is this nigger shit running so many times, i dont care about the average just GIVE ME THE RESULT QUICKLY NIGGER
>>106384543Can you blogpost to your LLM plea.... Actually never mind. It is a mikutroon thread so it deserves all the shit it can get.
>>106384543
>>106384577thanks
>>106384599No problem. Are you gonna calm down now?
>>106384612yes..wait| model | size | params | backend | ngl | n_batch | n_ubatch | fa | ot | mmap | test | t/s || ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------------- | ---: | --------------: | -------------------: || glm4moe 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp32 | 0.00 ± 0.00 || glm4moe 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp64 | 0.00 ± 0.00 || glm4moe 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp128 | 0.00 ± 0.00 || glm4moe 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | tg128 | 0.00 ± 0.00 |FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
>>106384612Not until I get my calming down handjob.
>>106384625Are you HDDmaxxing?
>>106384625kek. 0t/s? googledrivemaxxing?Did you, perchance, add -p multiple times?
>>106384655no but i have an SNVS1000G from kingston, its super nigger slow, takes like 30 seconds (or more i dont give a SHIT) to load model and tat pisses me off
>>106384625Damn these Pentium 4 are still rocking
>>106384667>>106384655i just did -r 0
>>106384685Well. You want to run it at least one time, don't you?Learn to use your fucking tools. Run llama-bench -h. Read it carefully, and try again.And next time, post the entire command.
| model | size | params | backend | ngl | n_batch | n_ubatch | fa | ot | mmap | test | t/s || ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------------- | ---: | --------------: | -------------------: || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp32 | 6.76 ± 0.00 || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp64 | 13.62 ± 0.00 || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp128 | 26.71 ± 0.00 || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp256 | 49.68 ± 0.00 || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp512 | 94.20 ± 0.00 || .A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp1024 | 161.21 ± 0.00 || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp2048 | 256.47 ± 0.00 || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | pp4096 | 353.23 ± 0.00 || 106B.A12B Q3_K - Medium | 53.76 GiB | 110.47 B | CUDA | 100 | 4096 | 4096 | 1 | exps=CPU | 0 | tg128 | 7.02 ± 0.00 |jelly?>>106384703yeah but -r is for repeatrepeating 1 time means running twice..
>>106384730Good job Anon
>>106384730>jelly?No. Good on you.>>106384746He really didn't deserve a miku. You're too kind.
>>106384086I still don't get it
>>106384625HAHAHAHAJust come home with qwen 30b
>>106384756One Miku is okay to ensure that the Anon's spirit is soothed after past distress. But two may be stretching the bounds of what may be considered reasonable praise and consolation.
>>106383129Damn Japs, that goose is a pet, not food.
>>106384382okay I'm imagining several months ago
>>106384364>>106384272But is it good at RP? Can you replace the char visual description with an image and it'll just be able to describe scenes with her accurately?
What's the best model to provide therapy for a depressed burnt out faggot (Me) to get his shit together? Mainly asking because I'm procrastinating.
>>106385085Eliza
>>106385085Me.
>>106385085Unironically gemma, it gives you extra hotlines! Or qwen if you want to be comforted like a baby.
>>106383075Do it for the laughs
anyone else running the bigger GLM-4.5? Air was kinda preachy and weird, and wouldn't stop putting lightsabers in my goddamn sci-fi stories. The 358b one seems a lot more level and interesting. Slow but kinda worth it.>>106385085Not many of them are actually good at providing steering to your life, but when I've been depressed I've used just about any decent model for a sit-down therapy session in which I convince the model OOC that I actually killed myself and have it reply to itself a bunch of times freaking out. One time I came back to a session by mistake, and wrote that my corpse reanimated and proceeded to gnaw on the therapist's face. That was a fun one, probably with mistral large or qwen 3. Qwen 3 235b does its best to love you like a mother. Really the best use case for that model, its writing in general is coherent but quite boring
>>106382892I want to be miku in that video so bad .
>>106385085glm 4.5 air with neko gpt card is nice
why does this not change the pp? it's supposed to..pcie 4.0 x16 btw (12gb vram, 64gb ram)i also tried 8 but i krashed my OS before saving the file with the benchmarks, it was also mostly samethe over (including) pp512 are slower than llama.cpp
>>106385327actually pp256 is also slower than llamacpp, only 128, 64, 32are faster
>>106385254Make it a chinese sci-fi The chinks prefer battle suits, giant robots and other shit.
>>106385254>mistral largeisn't there a new one supposed to be released soon?
>>106385254I keep switching between the big GLM4.5 and Deepseek V3.1 as my 'slightly boring big model that just handles every prompt as it's given'. Both do different things really well but either generally understands all my scenarios without trying to force in random shit like R1-0528 used to. It's a bit sad that the new Deepseek flagship is actively competing against a model half its size.
is it me or is /lmg/ being kinda weird today?
>>106385378example?
exactly
>>106385378yeah it's far less "its over" than usual.
The DANGERS of AI!!!https://www.tn.gov/content/dam/tn/attorneygeneral/documents/pr/2023/pr23-34-letter.pdfSafetycucks been at it since 2023.
>>106385408we know? what, you havent been a member of /lmg/ since 2023?
guys i think i found the redditor larperhttps://huggingface.co/AbstractPhilhttps://huggingface.co/xai-org/grok-2/discussions/3#68abe5780c2b29fb0cc11b9a
GLM 4.5 Air please now..
>>106385254>anyone else running the bigger GLM-4.5?Yes. It seemed possibly good enough to use but I have swapped back to testing DeepSeek V3.1. Seemed less slopped than ERNIE 4.5 but it's more refusal-prone than DeepSeek.
thanks deepseek
>>106385490gem
>>106385490>You decide to text Sam later.No need to wait. GPT5 cured triple cancer, you know.
>>106385515no wonder this kike is a faggotjust look at him, not even his sister wouild fuck
GLM 4.5 Air, I FUCKING KNEEL>Listen, folks, we're going to have tremendous lawyers. The best lawyers. Nobody has lawyers like we do. And this situation? It's a total disaster, a witch hunt, just like they did to me! We're going to sue, and we're going to win so much you'll get tired of winning!
jesus christ, GLM 4.5 Air IQ4_KSS non thinking is so good
>>106385961Imagine being a cuck and making pictures like this one
>>106385961I hope you die unironically
>>106385508>no tetoBased, she’s too mature for this nonsense
>>106385992rude
>>106386027I hope that faggot dies too.
>>106385961would the one on the left
>>106386067based and acquired taste
>>106386073I didn't know being a pedo was an acquired taste
>>106386088Rude. The Brit's just short
>>106385961i'm too autistic and immune to care about miku getting blacked. try again another day rabbi
>>106385961meant to post this image
So, if I have a 4090D 48GB and 128GB of DDR5, about how many t/s can I expect out of glm-4.5-air-q4 with a resonable context?
>>106386164>>106386209duality of /lmg/
Thank You GLM-chan
>>106386164https://www.youtube.com/watch?v=bVLDwyKPRu0&list=RDbVLDwyKPRu0&start_radio=1
I dont like this lmg. Its just not right
>>106385408>since 2023brother...
>>106386325reminds me of the guy in picrel
>>106386325A model jew, dedicating his entire existence to being a sabotaging parasite
is there a good MoE model for rp at 8gb vram and 32gb ram?
>>106385961>>106386164>>106386209>muh blacked>muh bleachedYou're dense and you're butt hurt!At the end of the day, it's obvious that you boys have tiny penises anyway! The same thing goes for everyone else who actually cares about this shit!
>>106382559>MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8-i1-GGUF>{{user}} gently holds {{char}}'s hand as if it was a little fragile bird>{{char}}: Yes, {{user}}, break me! Fill me with your SEED! Make me give birth to your rape babies so I could rise them as your sex slaves!All my cards are behaving like this. Maybe there's value in it if you're into this kind of edgy stuff, but I'd call it overtuned.
>>106386443proof? seems like a skill issue>>106386440>caring about penis sizeAt the end of the day, it's obvious that You have no penis.
>>106386440>you boys have tiny penises anywayOk troon
>>106386430Get more ram so you could run GLM-Air.Until then, there was anon who shilled https://huggingface.co/ai21labs/AI21-Jamba-Mini-1.7IIRC it's 50B-4AB or so. There are ~30Gb quants so just enough to fit.In my experience it's not safetymaxxed, but 'shy' about ERP and a bit dry in prose.
Grok-2 gguf status?
glm air is a master rapist, wow
i just bought a second 5090, what the hell do i run now? i havent been paying attention to anything for at least 8 months
>>106386531Beware it starts with your gpu
i'm guessing only glm air is good, previous ones for (v)ramlets (glm 4) are not that great?
>>106386572K2/deepseek
Jamba will save local.
>>106386586deepseek has never worked for me, but i have never heard of or tried this K2. what backend should i use for it? i have 256GB of 2666MT/s ECC DDR4
>>106386616Ik_llama.cpp for K2. Get it here: https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF
>>106385961usecase of niggers for local llms?
>>106383075Put up or shut up. But you won't because once you shoot your load, it's over. There's nothing else to yammer about and everyone will see the bullshit.
>>106386675ok. and how good is this model for cooming?
>>106384357or 70Bs if you're patient enough
>>106386700Best local model for coom in this day and age
Is it possible we ever see an upgrade to nemo in that size range?
>>106386719even if it is only like a 2bpw quant?
>>106385304same
>>106386730Yeah it's good enough
>>106386723It's over. The only ones left doing open source are the chinks, and they don't make small, uncucked models.
Is my dream of buying 3-4 cheap laptops following the Win10tard Removal Act of 2025, stuffing them with RAM, and running distributed local deepseek with >= 1T/s speeds realistic?
>>106386912that sounds like an incredibly stupid idea depending on your budget. i cant even really get good deepseek speeds despite having over 100gb of vram. i can barely even get the model to run, let alone be coherent
https://x.com/michaelqshieh/status/1960029790305763567https://xcancel.com/michaelqshieh/status/1960029790305763567I thought GPT5 was a bust bros
>>106386920I found that I don't even get 1 t/s extra by offloading more onto vram. It's better to just use one device and -cmoe, then use the extra vram to run other things insteads.
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modelinghttps://arxiv.org/abs/2508.16790>Speech tokenizers serve as foundational components for speech language models, yet current designs exhibit several limitations, including: 1) dependence on multi-layer residual vector quantization structures or high frame rates, 2) reliance on auxiliary pre-trained models for semantic distillation, and 3) requirements for complex two-stage training processes. In this work, we introduce the Text-aware Diffusion Transformer Speech Codec (TaDiCodec), a novel approach designed to overcome these challenges. TaDiCodec employs end-to-end optimization for quantization and reconstruction through a diffusion autoencoder, while integrating text guidance into the diffusion decoder to enhance reconstruction quality and achieve optimal compression. TaDiCodec achieves an extremely low frame rate of 6.25 Hz and a corresponding bitrate of 0.0875 kbps with a single-layer codebook for 24 kHz speech, while maintaining superior performance on critical speech generation evaluation metrics such as Word Error Rate (WER), speaker similarity (SIM), and speech quality (UTMOS). Notably, TaDiCodec employs a single-stage, end-to-end training paradigm, and obviating the need for auxiliary pre-trained models. We also validate the compatibility of TaDiCodec in language model based zero-shot text-to-speech with both autoregressive modeling and masked generative modeling, demonstrating its effectiveness and efficiency for speech language modeling, as well as a significantly small reconstruction-generation gap.https://tadicodec.github.ioHas examples. sounds pretty goodhttps://github.com/HeCheng0625/Diffusion-Speech-TokenizerAlso includes some models trained with TaDiCodec
>>106386723Yes once I release my new nemo finetune
>>106386937I'm sure using OpenAI Agents SDK as the default agent framework had nothing to do with the OpenAI model that was trained on that specific format and flow doing the best.
>>106386920>>106386940Damn. I was hoping with older hardware becoming incredibly cheap to use distributed computing, but if it's so bad, it probably won't be worth it.
AdLoCo: adaptive batching significantly improves communications efficiency and convergence for Large Language Modelshttps://arxiv.org/abs/2508.18182>Scaling distributed training of Large Language Models (LLMs) requires not only algorithmic advances but also efficient utilization of heterogeneous hardware resources. While existing methods such as DiLoCo have demonstrated promising results, they often fail to fully exploit computational clusters under dynamic workloads. To address this limitation, we propose a three-stage method that combines Multi-Instance Training (MIT), Adaptive Batched DiLoCo, and switch mode mechanism. MIT allows individual nodes to run multiple lightweight training streams with different model instances in parallel and merge them to combine knowledge, increasing throughput and reducing idle time. Adaptive Batched DiLoCo dynamically adjusts local batch sizes to balance computation and communication, substantially lowering synchronization delays. Switch mode further stabilizes training by seamlessly introducing gradient accumulation once adaptive batch sizes grow beyond hardware-friendly limits. Together, these innovations improve both convergence speed and system efficiency. We also provide a theoretical estimate of the number of communications required for the full convergence of a model trained using our method.https://github.com/funmagster/AdLoConeat
>>106387060just get a 5090 or a 3090. a cluster of 5060tis. a cheap EPYC off of ebay is like $300. anything would be better than a group of shitty laptops
Are there any good models I could cram into 16gb of vram? (with context)Don't have to be new, I am probably using some garbage.
>>106387087Rocinante 1.1
>>106387099Is 12B really the best I could do? I was expecting better performance out of 20B with a quant or something.
Tried to address the prudishness here: https://huggingface.co/BeaverAI/GLM-Steam-106B-A12B-v1a-GGUFBut will do another iteration to understand the model better and do better. Enjoy!
>>106387060>cheap to use distributed computingDistributed is just plain bad for inference even with decent hardware, llamacpp's rpc adds a compounding painful delay.Skip through this video of a dude comparing running stuff on a single machine and on some networked frameworkshttps://www.youtube.com/watch?v=N5xhOqlvRh4
>>106387159You should be able to fit a quant of mistral small ~22/24b, that was my go-to when I only had 16gb available.If you have decent amounts of system ram you can try some MoE models as well.
>>106387167Imagine being so bad at prompting that you decide to create a finetune for every character quality.
>>106387265Skill issues will never go away, basebro.
>>106387167What's that Signal 24b model about? Is it better than Cydonia?
>>106386616>deepseek has never worked for meIf there is a model that just works it's deepseek. if you have some ram in this i suggest that you try https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ2_XXS
>>106387167>Drummer Air tuneI've got a weird drive to download it simply to check just how much dumber it is.
>>106387167>he actually tuned itWelp.
>>106387355Signal 24B is Cydonia 4.1 with additional training to encourage creativity, prose, dialogue, etc. From testing, there are instances where it does/says something never seen before.Doubt it'll perform well in serious Q&A tests, but it's worth a check.
>>106385453Just turn off reasoning and it basically will never refuse.
>gpt-oss trying its best to think about how to draw ascii boobs
LLAMA 5 WILL SAVE LOCAL
>mention of a dead name out of nowhere
FatLlama 1.7T still unbeaten. Why even bother using other models
>>106387633*it sends shivers down your spine*
Multimodal llms that do this when? https://yourhobbiescustomized.com/pages/about-the-sr-series
do I have to learn about computer architecture if I want to build a machine that can run large models? Tell me if I'm wrong, but it's not the same as simply checking whether the parts are compatible and then slapping them together like your typical, consumer grade gaming rig
>>106387697It's just a matter of memory amount + memory bandwidth. GPU > RAM > SSD,If there's a specific model you're aiming for then you can get some recommendations
>>106387697Your question is problematic. If you're a techlet why even bother, I mean you don't want to even find out anything on your own.
When will a open source equivalent of Sesame Voice model release?.......
>>106387876What was the context of that webm anyway?
>>106387898It's for training urgent care medicals professionals, it designed to "feel" pain, resist and squirm around when cutting it open
>>106387916Yeah makes sense. The more creepy the better in that case I suppose.
>>106387876why did bro slide under the table
>>106383723The problem with FTL is it often breaks causality, unless you get tricky.Make FTL possible and piss off physicists in the process with this one easy trick "CMB inertial rest frame".
>>106388047Don't worry about it
>>106388110>breaks causalitynonsensical mumbo jumbo that people like to repeat religiously
>>106387697>do I have to learn about computer architecture if I want to build a machine that can run large models?You read the ktransformers github.Which will tell you to get a Xeon scalable with DDR5, with some GPU for prompt processing.
>>106388140>nonsensical mumbo jumbo that people like to repeat religiouslyIn an age long gone, even I was capable of doing Lorentz transformations ... the math checked out. If light speed is constant (in all frames) FTL will generally break causality.If you first move to the CMB rest frame at sublight speed before making a wormhole/hyperspace-jump/whatever to another point in the CMB rest frame in the future (relative to the big bang) causality is preserved.
>>106388190>move to the CMB rest frame at sublight speed before going FTLIf causality is enforced by law, only outlaws will be able to go back in time by fiddling with reference frames and plasma beam the spacecops' great-grandparents
>>106387480Please let us know when you have something coherent cooked up. No, I'm not being critical just want something new and usable.
>>106388388Bro, your GLMs?
>>106388415I'm not your "bro", retard zoomer. Go back to tiktok.
>>106388428I'm older than you, bro..
>>106387014Voice cloning in the model examples. Can do chinese english too lol
If you're trying to build llamacpp and it dies with "ggml was not compiled with any CUDA arch <= 750" when you run it, the fix is here:https://github.com/ggml-org/llama.cpp/pull/15587
>>106387916And piss itself, apparently?
>>106388944>>106388944>>106388944
>>106387167>prudishnessGLM air is not prude.
>>106388957Moldy bread
I'm staying here.
+1
>>106385961>so many responsesAre you guys that starved for some blacked miku? Should I post some?
>>106387613look at him go, almost makes me want to download it for myself
This might be the first /lmg/ that has fallen off without hitting bump limit.
>>106393805Look how many posts were deleted.
>>106393810If's funny when these happen because then you know that it's all posts made by that person in the thread.And every time not a single worthwhile post is deleted.
>>106395011check at the times they were deleted
I don't hear much of anything about grek 2.Is it not usable locally? No goofs?Or just not worth bothering?
what's the best model to run on a 3080 12gb for roleplay?
>>106396385Nemo
>>106385508damn succubi... i guess i have to now
>>106383227>Which is a real shamefaggot
>>106383075hi stephen wolfram