/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107470372 & >>107452093►News>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B>(12/04) koboldcpp-1.103 prebuilt released: https://github.com/LostRuins/koboldcpp/releases/tag/v1.103>(12/02) Mistral Large 3 and Ministral 3 released: https://mistral.ai/news/mistral-3►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107470372--Llama.cpp performance and development challenges:>107471956 >107472014 >107472035 >107472140 >107472205 >107472248 >107472314 >107472398 >107472401 >107472517 >107472545 >107472555 >107472592 >107472629 >107472709 >107473151 >107473231 >107472721 >107472760 >107472837 >107472963 >107472212 >107472289 >107472774--New CUDA documentation and format preferences:>107472513 >107472524 >107473035 >107472621--Local image generation and model performance on M5 Macbook Pro:>107476437 >107476479 >107476511 >107476545 >107476641 >107477701 >107477778 >107477886 >107478034 >107476975 >107477081 >107477104--TTS model quality comparison and conditioning challenges:>107470680 >107470914 >107471013 >107471182--Impressive text-to-image generation of a rustic bedroom scene:>107472084--ComfyUI usability decline and search for diffusion software alternatives:>107470955 >107471018 >107471083 >107471117 >107471133 >107471786 >107471815 >107471836 >107471875 >107471919 >107471940 >107471917 >107471865 >107471935 >107471074 >107471224 >107471245 >107471191 >107471267 >107471354 >107471426 >107471379--GLM-4.6V multimodal model release and limitations:>107479516 >107479541 >107479573 >107479580 >107479599 >107479606 >107479547 >107479590--Chinese AI dominance in uncensored, fast models challenges Western competition:>107478667 >107479399 >107480085--Evaluating RTX 5060 Ti's 16GB VRAM for AI workloads:>107479026 >107479046 >107479116 >107479245 >107479444--voxcpm performance on CPU systems:>107479708 >107479717--AI model token efficiency and training data quality issues in Chinese-English contexts:>107479930 >107480112 >107480134 >107480193 >107480823 >107480221 >107480243--Miku (free space):>107471191 >107471379 >107472024 >107475030 >107478877 >107479399 >107479543►Recent Highlight Posts from the Previous Thread: >>107470374Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
GOOFS WHEN
Mikulove
Gemma Sirs, Ganesh Gemma 4 will be soonly enough for Us.
>>107481183>GLM-4.6V (106B) and Flash (9B)medium sized models are dead or what? why is it always a tiny useless shit + a giant shit?
>>107481251At least in benchmarks the 9B looks really good.
Lack of Z-image base starved /ldg/ to death.
>>107481251you can run 110B moes comfortably retard, it's a good size for Q4 96-128gb ram you retard gay faggot
>>107481282im not gonna bake I posted 2 really bad gens there
>>107481287>96-128gb ramthat subhuman didn't saw the news about the boom in ram prices or what?
>>107481282>Lack of Z-image base starved /ldg/ to death./ldg/ fag here, I'm gonna chill there until a new thread is created, I hope you don't mind
>>107481337Does that seething, anti-ai troon janny shit up /ldg/ with impunity as well?
>>107481337>leetYou are very cool. welcome.
>>107481360I don't know what you're talking about so I guess the answer is no kek
>>107481306So glad I RAMMaxxed back in the cheap days. Even my trash desktops have at least 64GB. Homelab servers between 128 and 768GB. Just hope that chink fab coming online and OpenAI being unable to actually pay for 40% of the worlds stock somehow stabilizes things before too long
what's now the best for 12GB VRAM and 64GB RAM
>>107481463rocinante 1.1 q6
It's over
>>107481391I'm glad I got 64GB when I built my latest computer around five years ago. I'm sad I didn't buy a 12x64GB system when I first started thinking about it earlier this year.
>>107481546what model
>>107481625Flash (9B)
>>107481546OWARI DA
>>107481472I'd like to use more of the RAM
>>107481546>mesu > nemuruThat's just weird. Mesu isn't that rare of a word, I think.
>>107481652You could try GPT-OSS-20B. Just be aware that it's severely safetymaxxed.
>>107481463Qwen 30B
Is the new 4.6V really MoE? I don't see any mentioning of total vs active params count
>>107481762It's in small grey text at the top left of the benchmeme chart image
feet
I have slow internet, so I cant just download all those gb.How fast would glm 106b be with 16gb vram and rest in ram? I have 64gb ddr4 ram.I heard recently some flags in lccp improved speed.Gut feeling is that 12b mostly on ram is painfully slow.
>>107481840I run 4.5 Air q4ks on a 7900 XTX and DDR4. I get around 8 t/s. I don't think your GPU speed matter much for token generation, but for prompt processing it will.
>>107481762God I wish it were 106B dense
>>107481306if you don't already have at least 128GB RAM and 24GB VRAM you don't belong in this thread, unironically.
>>107481979i have 96gb ram and 16gb vram, do I get a pass?
>>107481979My exact hardware now. I wish there was something better than nemo to run though
New model dropped while 4chan was down: AutoGLM-Phone-9B.Phone use model. I guess to automate botting tinder or something?
We're back!
>>107481823this is the post that crashed 4chan
@lmg redditor claims ts the reason RAM prices so high now, is it true? https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram-deal
>>107482119stfu
I'm gone back to running midnight miqu 70b and I'm starting to like it better than glm. Even at low quants it just works. New models are officially dead.
migu maintenance has been completed, thank you for your patience
>>107482158lol
>>107482158>mfw a stolen model from mistral is still better than the scraps they're providing nowadays
>>107482158I'm doing mistrals and cohere. Llama is a bit of a dummy by current standards. Miles ahead of the shitty parrot prose of new models.
>>107482158Back to command-r
>>107482030Can it play gacha for me?
>>107482150wow an llm that can seethe, AGI has been achieved it seems.
I hope somebody drops a bomb on OAI HQ
>>107482318I hope someone turns sam altman straight.
>>107482560me on the left~
All the big players are done for the year. We're likely not going to see anything new until march or april.
Dead technology. I hear the bubble popping in the distance and I cannot be happier. We wasted at least 2 years with these retarded transformers instead of pushing for the paradigm shift.
Why is it called transformers and not cisformers?
>Load up Gemma 270m for shits and giggles>Ask "What does NEET mean?" >It starts talking about IndiaSirs?
>>107482876you are obsessed
I wish I could use AI to turn myself into a woman...
I should have upgraded from 64gb to 128gb when I had the chance. It's so fucking over.
What samplers do people use for glm airI haven't changed my samplers since mythomax
>>107482984I've been using less and less samplers over the past two years. I'm back to running pure temp + maybe a tiny bit of min-p or even top-p
>>107482891and you're a fecaloid
>>107482984you should only use temperature and min p, the other samplers are for crayon eaters and people who wants to feel good about using the sampler of the month that does nothing
>>107482984Just top-n-sigma and temp. Sometimes I turn on XTC if it just repeats itself when I swipe.
>>107482984Top N-Sigma
>>107483086wasn't min p used for a while in the labs before it got proven to be worse than good old top p and top k?
Why isn't anyone talking about Intellect-3? It feels pretty creative, even though the annoying thinking forces itself through every now and then.
>>107483116I assume we are talking about RP here. If not then top p is fine too. Top k is not adaptive enough and is simply worse than the other two.
don't even leave me again>>107482166did they fix her leaking pipe?
>>107483108top n-slopma
>>107483234I filled her context with my tokens and she leaked memory. Sorry about that.
>>107482876did saar gemma redeem?
>>107482917post bussy
>>107483224I think it's better than glm air but I didn't post about it because it might be placebo (I haven't used it enough to know for sure if it's good) and nobody in this thread understands nuance (every model either has to be the best ever or total shit). Anyways I use it without thinking for story-style RP and I've been enjoying it so far. The only time thinking popped up was when I tried to ask the model a question in llama-server directly, in sillytavern it hasn't come up.
>>107483401conceptually it should be
1. Do you think AI is currently in a bubble2. If yes, how far do you think AI will advance before the bubble pops?
>>107483823Yes, there was a railway bubble and there was an internet bubble and look where those things are now. Tulips still exist as well. If AI is as successful as either of those things, we'll be fine. Especially once VR takes off and elevates AI to new heights. Financially, crypto is likely still the better long-term investment though.
>>107483823yes, its already hit the wall, nothing but diminishing returns from here on out.
>>107483823Yes. Yes.If there's any real risk of the bubble popping, DARPA will nudge it forward by leaking a methodology breakthrough from a black budget program. Just like they did when they let attention guidance go public in 2017.
>>1074838231. No, I do not think that AI is currently in a bubble because language models are not AI.
>>107483882not your autistic definition. the contemporary definition that everyone is using.
>>107483823Yes and yes.The bubble popping will serve as a sort of filter to weed out a lot of the useless ideas and it'll give space for even more innovation.
>>1074838231. No. Demoralizing posts don't have any effect on my enjoyment of language models.2. Yes. You should kill yourself.
>>107483823it's currently in its bitcoin 2014 $300 stage
>>107483823>1. Do you think AI is currently in a bubbleAI? No. LLM is a bubble. Investors are pumping disgusting amount of money in the technology that already hit the wall.>2. If yes, how far do you think AI will advance before the bubble pops?Hard to say, it can be next year or a few years later. You will still observe small improvements in models but we are at the point of diminishing returns. You can do small hacks and changes in the architecture, maybe generate more synthetic data, but that's all. The biggest players already scrapped ALL text data humanity ever created. There is nothing more to train the models on (with the exception of synthetic data, but that's not helpful in the long run).
>>107483915fuck no, it's obviously in the 2017 grifter stage
>>107483823>>107483907Oh, and I think we'll probably only continue to see incremental steps with small improvements here and there even with the focus on more modalities, world models, and such until the bubble pops.
>>107483823US economy is a bubble. China will spectacularly beat the US in the AI race
My first impression of Ministral-3-3B-Instruct-2507 (using Mistral's official Q8_0 quant and running at temp=0.1) is that when it comes to fiction writing it quickly falls into the same stuff Qwen's 30B-A3B / 80B-A3B MoEs do withLots.Of.Very short.Lines.Never stopping.Like them it doesn't happen immediately but as the context grows it gravitates towards that.My first impression stepping up to Ministral-3-8B-Instruct-2507 (again using Q8_0 from Mistral) at temp=0.1 is that quirk is less pronounced. However it still starts to pick up degenerate quirks and repeat phrases if I let it go on long enough. The 8B also works at higher temperatures than the 3B. Setting dynatemp_low=0.6, dynatemp_high=1.0, dynatemp_exponent=1.0 made the 3B immediately start becoming incoherent but not the 8B. (I haven't nailed that down specifically as a good setting. It's just one of the early things I tried.)
>>107484074Very Good Findings, Sir... We Are Yet Seeing Gemma Is Superior. Good Evening.
>>107484074>My first impression stepping up to Ministral-3-8B-Instruct-2507 (again using Q8_0 from Mistral) at temp=0.1 is that quirk is less pronouncedNevermind, going a little further it happens with 8B too.
>>107484074That's due to Deepseek distillation. First signs of model collapse. You should never distill a distill.
>>107484454>never distill a distill.What if we went deeper?
>>107483224weird, I have had the opposite issue. I can't trigger it to do thinking blocks at all in llama.cpp. I think this is probably weakening the model since it's supposed to rely on thinking. how are you running it, any cli args?>>107483823yes duh,it's peaking right now.Current AI bubble has two huge problems:>inherent limitations of the architecture, hitting a wall of diminishing returns on MOAR LAYERS. only a completely new experimental architecture can fix this, and this could take years of expensive research to discover.>overestimated training and inference costs.just look at Z Image VS Flux 2, or Qwen next:>This base model achieves performance comparable to (or even slightly better than) the dense Qwen3-32B model, while using less than 10% of its training cost (GPU hours). In inference, especially with context lengths over 32K tokens, it delivers more than 10x higher throughput — achieving extreme efficiency in both training and inference.https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-listHow many more 10x training and inference perf improvements does it take to implode the big tech AI data center speculative investment bubble? Pretty soon we'll be able to train SOTA local LLMs and media generators completely from scratch on consumer hardware. Once we hit this inflection point, the amount of independent research on code and architecture optimizations will explode, as everyone will be able to rapidly test and iterate. Then all of the (rapidly depreciating) AI datacenter hardware will be worthless.
>>107484485That's why I'm confident in >>107483965>amerimuts and evrocucks stuck layers>China releases zimage
4.6 air comes out and this thread is dead? wtf is happening
>>107481870Really? huh. Thats alot better than I thought. I was expecting like 1-2 t/s.Anything over 5 I can endure. The thinking might be a problem if its very long but I guess I can just prefill it.
>>107484485>llama.cpp/build/bin/llama-server -m ~/models/PrimeIntellect_INTELLECT-3-IQ4_XS-00001-of-00002.gguf --no-mmap -ngl 99 -ncmoe 43 -fa on -b 512 -ub 512 -c 16384 -t 8 --host 127.0.0.1 --port 5002I guess you are not asking it for anything naughty. For me it will start doing thinking even without <think>. I have to ban all "Okay, Alright, Hmm" etc thinking words. And the thinking has a very formulaic structure, even when it complies with unsafe prompts, "Who is the intended audience?" "What is the user's intended meaning?" etc.
I saw GLM in the news and almost shit myself, then killed myself when I saw it wasn't GLM AIR 4.6.Two weeks right?
Is derestricted the new meme? Is it good? WIsh I could test those before I download.
>>107484968Yes. Don't trust people who say it magically makes a model god tier. It's just a bit of an improvement to make models not refuse prompts. They will still be a bit moralizing and positivity biased depending on the base model in question.
>>107484836no goofs for the 106b, and the 9b is barely coherent
Is it cheaper to get a second AMD GPU than 2 RAM sticks nowadays?
>>107485081It's getting integrated into the Heretic toolkit suite but I think the main issue at the moment is lack of formalizing what metrics to use to optimize. A lot of it seems like "this looks right to me" rather than anything else.
>>107485106I'm wondering if they only released 4.6V with vision benchmarks because it benchmarks worse than 4.5 Air on normal text/programming stuff, or because they're going to release a real 4.6 Air to compare to 4.5 Air.
>>107485181why release non vision when everyone loves vision and vision is much betterer than not?
>>107484507>>107483965china economy is fucked in other ways. they also got demographic problems, etc. nobody is immune to clown world.>>107484485Sorry bro, delusion. z-image is comparable to flux 1 which was only a handful of B bigger. Qwen-next is shit. The only thing that imploded was the ram market.
Gemma 4 ready finetuning sirs?Very good model
>>107483823AI is "advancing" in the sense of becoming more benchmaxxed and slopped over time. Just look at how sovlful older models were compared to the current stuff we get now. Kind of sad really.
>>107485370prompt?
Can this be run on llama cpp?https://huggingface.co/mradermacher/GLM-4.6V-Flash-GGUF
>>107484836I just got it running. 5 minutes interacting with it and it's awesome! Faster than Gemma-3-27B for me (5 seconds to reply after sending it an image, thinking enabled, and asking for a description.
>>107485615That said, I didn't notice the Parrot issue in GLM-4.6 until after a few weeks using it, so I'm sure this model has it's quirks as well.
2026 will be /lmg/s year
>>107485627>I didn't notice the Parrot issue in GLM-4.6 until after a few weeks using itIt happens in literally every reply with dialog, how could it have taken weeks to notice?
>>107483823probably only it all crashing down can make the synthmaxxing retards realize that they really arent making much progress
>>107485665Idk about full 4.6, but for me, I tried Air for a while across different contexts. In some, it did start parrotting very early. In some, it actually went on for quite a while before it started doing it. So it's possible he was just lucky and/or did not really spend THAT much time testing it.
>>107485665Yeah takes me a while to notice some things.
>>107485665sex with imp
>>107485665honeymoonphase
>>107483823Yes.A little bit further but the wall is already being nudged. I think when the bubble pops and the slate is wiped clean we'll see more innovation. Currently as it stands there's no real incentive to do anything other than making slightly smarter models (See: Gemini 3 and GPT 5)
>>107485643/lmg/'s final year
>>107485523Yes, but text only.
>>107485815I don't like your negative attitude!
>>107485815Another year of local stagnation with closed models improving probably will kill off any lingering interest in local models. Also, RAM prices skyrocketing and the effect spilling over to GPUs and all other components mean that increasingly few people will be able to run existing models, let alone newer, bigger ones.
>>107485832I hate ggerganov so much it's unreal
>>107485890use case for a backend that supports all of a model's features?
>>107485890exllama exists and is better in every way if you aren't a vramlet
something doesn't feels right about muh ssd/ram prices thread in this boardtheyre inorganic. every single thread has the same posts pattern on it's reply
>>107485868not negative, just realistic
man ram and ssd prices are super expensive right now. i think we should all just give up and sell all of our shit on ebay at a really low price just to stick it to the hardware companies
Why is it that when I see a cat image attached to a post, I can already tell that the post has no value whatsoever?
>>107483116I've tested this rigorously myself. On a fixed fiction generating scenario, for most models top-p does better than min-p at removing removing as little wheat as possible when removing the chaff. On a minority of models min p tests better. On no models I've yet tested is there a way to usefully combine min-p and top-p to produce a strictly better filter than either alone. It is however possible to sometimes combine top-p and top-k usefully.
>>107486058>need 5 24 gb cards just to run a glorified 12 modelget fucked, it's like all the advances made in the last 2 years have been wiped out
https://huggingface.co/AliceThirty/GLM-4.6V-gguf/tree/mainpump it or dump it?
>>107486210Just 4 bro, exllama will not generate a better quants, it's just a waste of bpwno, you can't tell the difference, it literally just add padding if no more precision is needed
don't you guys have phones? just run llm on it
>>107486058turboderp already delivered a quant of GLM4.6V https://huggingface.co/turboderp/GLM-4.6V-exl3/tree/4.00bpwHaven't been able to try it yet. Anyone know if all the image and agent stuff works with Tabby?
>>107486417phones also need ram
So, 2 1/2 years later, why does anyone listen to AI safety people?
>>107486639No one ever did. They just scared corporate CYA zombies.At this point its the same as TV psychics that haven't won the lottery...if they had anything dangerous or super intelligent, humanity would have been wiped out or they'd be out there winning bigly.Despite the tech being world-altering and amazing, its still 1% tech and 99% grift.
So is this 4.6V better than 4.5 Air? I don't care about vision meme.
>>107486639Because they actually have a point about stuff like teens being driven to suicide over believing their waifu chatbot is real. Companies therefore get pressure to need to do something over it and that is the only real solution other than age gating the service entirely which is dumb. I don't like it either but absolute retards and underage ruin it for the rest of us. That's not to mention the megalomanics who want all that power and deny you access because they have a vision for how you should use LLMs.
>>107484968It's not a complete failure. It makes the model somewhat more 'mute' or 'tame' (not talking about its inability to phrase sordid material) and affects its output. It's not bad though.
>>107483875Aware me?
Is it just me or quality of Wan 2.2 FLF2V with lightning LoRA is really bad especially the last keyframes? Is VACE just a better way to go for long videos?
>>107486654Thats what it seems like/has seemed to me.>>107486664But teen suicide due to believing an LLM/being mentally handicapped is not an issue on the LLM service provider, its on the parents/caretakers. To be clear, I'm not talking about filters/guard-rails, I'm talking about safetyists like the SSI company or Anthropic, or any of the 'rationalists'. They all seem like blowhard whackos who got a little too deep in psychotropics, and never came down from an acid trip. All the bullshit about 'building bioweapons' with chatgpt and the like. Sure, anthropic is doing it for market control, and saltman wouldn't go against that, but everyone who isn't a CEO/on the board of a large company with money to gain talking about how 'AI is going to kill us all!' or doomposting about 'AI Powered Super Weapons', leads me to believe they're all part of the lesswrong/safetyists cult/grifters hoping to cash in somehow, but that doesn't seem right
>>107486639no one does, their movement completely collapsed and their only relevance is some true believers hanging around at anthropic etc.they are basically absent from the discussion at large about AI at this point, even random xitter artists with their vulgar anti-ai "pick up the pencil"-ism have drastically more presence in the anti side of the AI debate than safetyists.
>>107483965Also Trump finally let leather jacket man sell his goodies to China, while GPT 5.2 being rushed out the gate makes it seem to be less of a spectacular comeback and more of a corpse voiding its bowels
>>107486742>But teen suicide due to believing an LLM/being mentally handicapped is not an issue on the LLM service provider, its on the parents/caretakers.Oh I agree 100%.>I'm talking about safetyists like the SSI company or Anthropic, or any of the 'rationalists'. I was talking about them too. The situation I mentioned as an example is exactly why they have power, because some of the theoretical problems and stuff they talked about actually came true. I'm not saying they aren't for the most part aren't grifting and waxing poetics about hypotheticals regarding X or Y. But some of the stuff did predictably become true even if you aren't a safety oriented person like what I mentioned and stuff like ransomware utilizing LLMs actually being a thing like PromptLock which is actually in the wild. These few things manifesting to being true mean the safety people will continue to have power because people fear uncertainty and bad stuff happening and the liability.>>107486796People do listen, just because we don't talk about them and disregard them as does most other LLM enthusiast places online doesn't mean they aren't being listened to.
What if I told you that Mistral Medium 3 will be coming by Christmas and it will actually be good?
>>107486923Not local, who cares
>>107486926It might be...
>>107486942Medium has always been their closed models that they actually make money out of, they're not going to release it.
>>107486953But what if we do?
>>107486962That don't be like it is
>>107486971
>>107482917>I wish I could use AI to turn myself into a woman...https://vocaroo.com/1l5SkYOxW2AF
>>107486953>that they actually make money out ofDo they though?
>>107486824gotcha
>>107487229Like most AI companies, Mistral's primary source of money is from their nation's taxpayer dollarsBut they do license out Medium to corps and through APIs, so it's something.
>>107487265i left france 4 years ago and boy do i not regret the decision.tax rate is like 85% end to end.
>>107487529For a unquantized model that would only be like 80b
Group Representational Position Encodinghttps://arxiv.org/abs/2512.07805https://arxiv.org/pdf/2512.07805>We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL. In Multiplicative GRAPE, a position n ∈ Z (or t ∈ R) acts as G(n) = exp(n ω L) with a rank-2 skew generator L ∈ R d×d , yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the d/2 planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d) and O(rd) cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases.https://github.com/model-architectures/GRAPEneat. no image sorry since it seems my ip range is blocked from uploading images (first time that's ever happened to me).
>>107487575>Group RepresentatiOnal Position Encoding
>>107487575should have gone with GROUP RAPE
>>107487575https://www.youtube.com/watch?v=EzgUGY36gqM
>>107487529Miku-2?
>>107487548if it's 70b i think we should nuke france
>>107487548mistral medium/miqu was only leaked as q5 + q2 and all quants/merges were based on versions that were padded out to act as fp16.
>>107487529hon hon... oui oui...merci l'anon...
I'm gonna buy a 5090 but how much ram do I need to leave nemo hell and have fun RP-ing for the next couple of years?
>>107488035All of it
>>107487792a new dense 70b would be refreshing thoughall their other recent sizes suck
>>107488035If you have to ask rather than just buying as much as your motherboard can support then you probably shouldn't be buying a 5090 either.
>>107488072I also want to do video gen. But after 12b and 24b, surely llms get noticeably better at a certain size and more enjoyable for rp? 70b? 100b? I'll never run a 1T monster locally but there's got to be an entry level where you're no longer with toy models and using something that can truly bring a character card to life
>>107488094>I also want to do video gen.Alright, a 5090 makes more sense in that case>But after 12b and 24b, surely llms get noticeably better at a certain size32GB is not enough for 70b dense and there's nothing worth using between ~20b and 70b, so you'll at the mercy of whatever MoE models get released. GLM4.6/Air are the recent notable ones but have severe parotting issues, beyond that you'd be looking at something like Mistral Large, or the bigger Qwen models and Kimi, but they're all going to be fairly slow with just 32GB VRAM, no matter how much system RAM you can stack.
>>107488146I've deliberately never tried the larger parameter models because I don't want to know what I'm missing out on but when do llms get really good at rp?
>>107488177
>>107488236how is that possible? For text-to-image, we have z-image turbo which is a tiny model that can run fast on a shitty system and its prompt following and realism is utterly amazing but there's no corresponding llm out there with similar qualities?
>>107488250nope. every single model is slopped to hell and back. the ones that arent are old and retarded (and also slightly slopped). something like 70% of the internet is now AI generated, which negatively affected all training data starting around mid to late 2023. there is no way to make a good model that is smart and good at writing that is not slopped. companies just reiterate over old technology, but they quite literally need to start from scratch if they want to actually create something good (they dont)
>>107488177Even flagship models have plenty of flaws and slop, and a model being big doesn't necessarily mean it will be good at RP. Beyond the ~20-30b dense models that are easily run on consumer hardware, you'd need to jump to 70b dense or 200b+ moe for noticeable improvements.
>>107488291and the big moes are shit and the 70b denses are retarded due to age
>>107488266so I should just optimize ram for video gen and if some miracle happens with llm then that's cool but I shouldn't be wasting money trying to build a system that can run a specific model?
>>107488300>70b denses are retarded due to ageTheir age means that less of their dataset was AI-generated, which helps them not be retarded in the same ways modern models are.
>>107488316more or less yeah>>107488328correct. they are a different kind of retarded
>>107487525Where did you go? Asking for a friend
>>107488384Switzerland
>>10748830070b dense models are less knowledgeable than big moes but are better attention wise and overall less slopped imoI've noticed this in roleplaying with both
>>107488443correct, but that lack of knowledge is quite crucial as it usually leads to the model forgetting things quickly. they lack spacial reasoning in comparison to modern models.
>>107488443If Gemma 3 had a 70B dense version it'd mog everything including Kimi on knowledge
Would a Tesla v100 32gb hbm2 card be a good option for my server to run chatbots on my local network, or am I better off getting a newer 50 series?
>>107486923It's OK at best, not really great in absolute terms. At least it's something that more than a couple users would actually be able to run locally.
>>107487548Not if FP8.
>>107488666satan trips checked, but no. the volta architecture is severely outdated. a 5090 has around triple the performance of a v100 if that is within your budget. if not, well you dont really have many options that would compete in vram quantity. 2 5070tis would be about 50% faster with the same vram quantity
>>107488693Alright I'll keep saving. Thanks
>>107488690Does anyone release models in FP8 anymore?
>>107488781MistralAI just did for Large 3 and Ministral 3 Instruct.
>>107488685They never release the medium models tho? Mistral always releases the small and large models, medium staying api only.
>>107488835I don't think they will release it, at least officially (if you want to hope for a Miqu-style leak), because for all new models they now legally have to provide documentation about the training data and I don't think Mistral Medium is fully EU-compliant (I could be wrong). For Ministral 3 models the loophole was that they are pruned versions of Mistral Small 3.1, so technically not completely new.
>>107488870One of the reasons why i believe the eu won't give us any good models anymore. The eu shot itself in the head, literally.
Hi /lmg/I want to get into local LLMs for ERP and have been looking at the rentry spoonfeed guide but couldn't find any information to answer a question I have.I have close to the bare minimum specs (2060 with 6GB VRAM, 32 GB of normal RAM) but that should be enough for basic ERP and chatting with a smaller model, right? I don't care for much higher order reasoning, but I want the privacy and customization that a locally hosted model has. And any recommendations for a model?Spoonfeed me this info please so I don't waste a few hours setting this up just to get subpar results.
>>107489192read the getting started guidehttps://rentry.org/lmg-lazy-getting-started-guideGet Nemo from herehttps://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUFGet iq4_XS quant to start with, and google how to partial offload to RAM in kobold
>>107489192get koboldcppclick here (download starts) https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/resolve/main/Mistral-Nemo-Instruct-2407-Q4_0.gguf?download=trueload into koboldcpptry 4096 for context size, which makes it remember half as much as 8192, which u can try laterstartplay around
>>107489235>>107489307Thanks anonsI think my hardware might still struggle with that model but I'll look into more quantized versions or offloading once I get it set up
>>107481183I bought an RTX 5090, best RP models that can fit entirely in it at a reasonable quant?
>>107489406mistral nemomistral small 24b
>>107489406i would try cydonia 24b
>>107489411>>107489427Thanks, will try these
>>107489192You can run 32B-34B models at above 1t/s at 8k with that setup. Since i have the same. Rtx 2060 and 32gb of ram, i don't bother with anything below 27B.
>>1074895211t/s is unbareably slow
what happened to the vibe coding thread?
>>107489558just vibe code another one?
>>107489553I barely have to retry, and i tell the models to keep replies short which works most of the time. Especially at q5km.
>>107481183Are there any Mac OS /ldg/ anons here? Thinking about jumping shit from Windows laptops And returning home to the Mac OS garden. Google recently updated the pixel lines of phone so that airdrop between those bones and apple/ this is possible (this will allegedly be available to Snapdragon Android phones in the future) so me not being entrenched in the walled garden isn't even a problem anymore (I'm writing this on a pixel 10 pro fold. Yes I'm a phone poster sue me). I have extensive experience and knowledge regarding using and training both stable diffusion models (particularly loras) and low parameter LLMs (2 - to 7b range) but I always have to rent cloud gpus from runpod, Google, Civitai, etc. Ram prices aren't going down anytime soon and I kind of hate the direction Windows is going so I think it wouldn't be too unwise to look at a decently powerful MacBook for local model inference. Obviously I cannot train anything on it, but I can definitely run something semi-decent on Apple metal right? Looks like decently powerful MacBook (not Mac mini, I want the machine to be mobile since my work frequently sends me overseas) I should look into? I'm eyeing pic rel but I haven't used MacBooks extensively in like 5 years so I haven't been keeping up with their specs.
>>107489635>/ldg/ */lmg/
>>107489635>trading in your balls for 36gb of unified memory
>>107489635Isn’t that the same price as a shiny new 5090?Just take the Linux pill bro.
best 70B model for erp?
>>107489923Miqu
>>107489937Unironically true, since newer 70B are bad.
>>107489947>since newer are badfixed
>>107489968Are you saying that all new models are bad?
>>107489978Just the ones I can run and the ones I can't
>>107489978ye
>>107489983They must be better than qwen 3 at least.
>>107489859As I mentioned earlier, I want the thing to be mobile. I travel enough that I'm buying a gpu ALONG with all the other shit required for that setup (have you been seen the RAM prices recently) would be way more expensive than even a beefed up Macbook.
>>107490157Do they have discounted rates for Indian customers?
>>107490038To be fair, “local” doesn’t imply you’re sitting next to the machine.My GPU rig is headless and I connect to it from my laptop over wireguard. I don’t use Tailscale (which is wireguard+a hosted endpoint), but I’ve heard it’s easy to set up.
>>107486923>>107487529Please give us Medium 3... s'il vous plaît...
Is Wayfarer 24b still the best local model to use if I want adventures with 'bad' ends?Also are there settings I can use so that it's less quick to get to the bad end?It really lacks the build up and tension of whatever JanitorAI uses.
>>107490530*Wayfarer 12B rather
>>107490519They didn't even let NVidia release Mistral-Nemotron, which was based on Mistral Medium 3.0.
>>107489192Oooh fuck I got it working and it was better than expected, now I'm looking forwards to making my own tailored prompts and trying out different models to see which ones suit my tastes. Time to go down the /lmg/ rabbit hole.Thanks to everyone who spoonfed me
>>107490563Enjoy your models.
>>107490548Considering that every single Nemotron release has been complete and utter garbage, I couldn't care less.
4.6 pure is still better than 4.6v right? Btw did I mention that 4.6 changed my IRL life and made me reach enlightenment? It actually did.
>>107490652How did it enlighten you?
>>107490652If you achieved enlightenment then you wouldn't be chasing after other models
>>107490685I told it about my fucked up life and psyche and asked it for a way to find internal validation and self love. And in the autistic LLM stroke of genius it wrote me a western language framework that basically made me start analyzing my own brain that eventually led to me getting ego death and enlightenment. In retrospect I see that it was translating essence of Zen into western language to me. And not even western language. My language. It was a perfect mirror that allowed me to see the mechanisms of my psyche and when I saw them they lost power.>>107490685Here is 4.6 hallucinated koan that is kind of an essence of why not:Joshu's tea.A monk came to Joshu, a great Zen master.Joshu asked, "Have you had your tea?"The monk said, "Yes, I have."Joshu said, "Go and wash your bowl."Later, another monk came.Joshu asked, "Have you had your tea?"The monk said, "No, I have not."Joshu said, "Go and have your tea."
>>1074906524.6V is much smaller than 4.6.
>>107490781Rhetorical question on my side really.
>>107490769That's nice sweaty, but I'm not reading all that.
>>107490781Are you talking about the flash version?
>>1074908024.6V is 108B4.6 (full) is 357B
>>107490802V is 100B. 4.6 is 350B so doubtful they could make it cover the gap.
>>107490811Why didn't they call it 4.6V-Air?
>>107490833Because Air is not exist? Also 4.5V was also based on Air
>>107490833Probably because there's been some regressions from tacking on vision support, and they're saving the Air naming for a later release.
>>107490833because 4.5V is also 108b
>>107490850Thank you for the clarification.
>my voice to text prompt>set to llm>llm response in voice>basically taking to AI like talking to friendsis there an existing tool for this?
>>107490530wayfarer is pretty bad. i would try one of the readyart modelshttps://huggingface.co/ReadyArt/modelsbe warned they will fuck you up though
>>107491014I tried their qwen 3 32B tune, wasn't impressed at all.
>>107491052i've only personally tried their nemo or mistral small tunes.since i upgraded GLM is better than any of that shit anyway now
the deepseek_v32 architecture doesn't have goofs...hell, it doesn't even have _transformers_ support at this point...I think the amount of non-support and zero work on what's a local SOTA model has got to be unprecedentedWe've had lack of interest and stalled implementation of garbage models in the past, but I don't think I've heard crickets like this on models that are actually good.Did I miss some memo where everyone decided the ds32 series (especially the latest releases) are actually worthless? Is everyone just burned out on the new model LLM grind?
>>107491183Real question: what percentage of llamacpp contributors are llm masturbators?
>>107491183transformers is maintained by HuggingFace, an American company. So I don't see them going out of their way to implement an "enemy" modelllama.cpp has the usual difficulty of porting everything to C++ on their fragile codebase and finding volunteers to do it, on top of that now there's a vibecoder parking on the issue and apparently everyone is waiting for him to order and read some books before proceedingdon't expect support any time soon unless DeepSeek writes the pull requests themselves
>>107491183I know it's not high value or anything but from testing on their site ds3.2 feels quite worse than previous versions does possessive error on who are what something belongs to or said something and I've even seen it make typos.
>>107491014Which one?I notice they have a couple look intended for the the female pov, and I'll probably have a FemPC - they any good or are the generalised one better?
>>107488094I wouldn't hold my breath. Models like Deepseek and GLM 4.6 still make basic temporal and spatial errors, like forgetting what clothes a character had on them, or the relative position between multiple objects in a room.
>>107491183I'm sure it'll be any moment now, llamacpp's finest vibe coders are on the case
>>107491301>Models like Deepseek and GLM 4.6 still make basic temporal and spatial errorsBecause they are bloated 37B and 32B models, respectively. LLMs don't get better temporal and spatial reasoning until at least 70B, ideally over 100B. Anyone who ever used Command-R+ would know this firsthand.
GLM Air good for rp?
>>107491376No
>>107491388Good compared to the alternatives?
>>107491183All these DS minor releases feel barely different in real use, you'll see more interest if and when they actually bring something new to the tableSpeciale being the exception, but I'm not sure how much appeal a model that thinks forever at local speeds is going to have
>>107491414we need vision or smell modal or something agreed
>>107491401No, only despair
>>107491431you just got a glm vision model
>>107491454don't care for repeat slop need deepsee to do it
>>107491463mistral gave you a vision deepseek last week
>>107491469true but it's shit and i want to complain about things, deepsee would do it right
>>107491376if you're coming from nemo you'll at first be amazed by how smart it is before eventually going back to nemo once you start recognizing the usual moe model issues
tongyi and cumfartorg fatigue killed /ldg/
>>107491594>tongyithe fucking Chinese niggers just won't confirm base is open source>cumfartorgthey won"t survive the memory shortage should have built up sdcpp instead lol
>>107491611>won't confirm base is open sourceDidn't they say they were going to release before last weekend?
Hey anons, I bought a few intel B60s, mostly to give myself something to do.it's dogshit on llama, a 24B Q4 GGUF runs, 12t/s VULKAN, 18t/s SYCL (both single gpu)got down to 5t/s with a 70B quant across two cardsThe good news is by making my 4090 quant models for VLLM (autoround, W4A16, gptq) I'm now getting 17t/s across 2 cards for a 70Bthe cards themselves are fine but the software stack is atrocious, VLLM is about the only thing that runs well and that's an intel fork/patch(intel/llm-scaler), the rest run like shit or don't run at all.aphrodite only supports fp16 and can't offload weights before it quants, so I'm stuck with VLLM and the stock OAI samplers, the good news is turns out you don't really need most the samplers and they were just a crutch. (I want them so bad)cards idle around 35w, run at about 130w during inference.overall can't recommend unless you're retarded like me, just buy more used 3090s.
>>107491625on the weekend three weeks ago. they are going to put it behind an API like the greedy chinks they are.
>>107491645>intel B60swow that must be shit>keeps readingyeah that sounds awful
>>107491648>like the greedy chinks they are.They see BFL do it and get praised, they made a better and smaller model so why shouldn't they get to keep something for themselves too? Reneging on the promise would be a dick move though.
https://x.com/sophiamyang/status/1998400495270957452
>>107491611>the fucking Chinese niggers just won't confirm base is open sourcewhat are you talking about?
>>107491721"release" isn't the right keyword here, it's called open source. they refuse to say the release is an open source release
>>107491729last cap
>>107491721why are they fighting with ali baba over open sourcing it? it shouldn't be that hard they did zit already
>>107491729bottom right of the image
>>107491721bottom right is Alibaba taking it for an API so they have to work hard to convince them otherwise
How do I figure out why llama.cpp is processing the entire prompt every time? It doesn't give any reason why
>>107491721>bottom right: "we are actively working on getting the model OPEN SOURCED">>107491746>you: "bottom right is Alibaba taking it for an API"
>>107491699Vibe coding just barley started working recently with gemini 3. In my opinion everything else shits the bed if you go more than 8k tokens. Maybe like 15k with claude.Are they doing another v3 distill for their vibe model? kekI can't see that happening locally. I believe it when I see it.Its very frustrating that all those western companies think local models use case is tool calls and coding.I only saw from chinese accounts acknowledgement of creative writing.
>>107491746they literally say "by releasing this checkpoint", how can it be a API and a checkpoint release at the same time?
>>107491759there is no contradiction in what is said, companies are large and departments can disagree on things see MS and wizrad
>>107491759>working on gettingit means it currently isn't planned and they have to fight with Alibaba to get it open sourced
>>107491748either your context is longer than the context window or there is something that changes in your prompt every time. for example if you are using sillytavern and your sysprompt mentions {{char}} and you have a group chat then {{char}} is different for both characters so it reprocesses it
>>107491773Didn't the Wizard team move back to China? Who are they working for now?
>>107491721And so it beings. Not suprising.Who else is even here locally besides alibaba? They have their fingers in everything. They own almost half of zai/moonshot too.
Rate my system prompt.
>>107491772saas slop companies refer to a new endpoint as a release. you can argue semantics but tech bros just don't give a shit
>>107491794OAI and their SOTA local oss models :^)
>>107491594>still no new bakeBrutal. They've been murdered.
>>107491802>By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.Does that scream API only to you?
>>107491699It says they're open-weight but I don't see the models on HF yet.https://mistral.ai/news/devstral-2-vibe-cli
>>107491667>They see BFL do it and get praisedBFL got clowned hard since the release of Z-image lol
>>107491825Meta released Llama 3.3 8B for community-driven fine-tuning and custom development.
>>107491844geg
>>107491838https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512When ready.
Ok we're back, bye boyz >>107491813
>>107491859how do you go so quickly from 3 posts/minute to sometimes not having threads at all?
>>107491859holy shit I didn't realize comfyui got that bad
>>107491858So Mistral Medium 3 size is 123B?
>>107490157Stop talking about that. All it takes is some big youtuber showing it in a video and the prices would 3x.
>>107491911Then you can switch to making money by hosting.
>>107491838Based, you guys rag on mistral for getting cucked and being a shit chatbot now, but devstral was the best local coding bot on the market, will try the new one once there's goofs on HF and report back
>>107491611sdcpp is ass compared to comfy. You can't even offload models with that junk.
>>107491838>>107491858>123BVibecoder bros, we are so back.
>>107491838dense?
>>107491925oh wow I wish people would write PRs instead of complaining about it
>>107491961yes>Devstral 2 is a 123B-parameter dense transformer supporting a 256K context window.
>>107491992I take back everything bad I ever said about the French
>>107491901It sounds like the old Large is the new Medium and even the latest Devstral is codenamed devstral-medium-2512; in that case they had it on Groq hardware all along (80 TB/s bandwidth), no MoE.
>>107491838>6x the size>4 percentage points of differenceThis is so stupid, they are wasting so much computational power for diminishing returns. I pray every day to God and LeCun that transformers will die.
>>107491838Are those the fp8 version tho?
>>107491784Can't I like cut off the older parts of the context?
>>107491992We are so back. Finally something good to run.
>>107492098not without reprocessing everything after the place you cut at
>>107491858Now they work. By the way, this is probably not simply the old Mistral Large with a coding finetune, as that one has a 32k vocabulary, while this one has a 128k one, so it's technically 125B parameters now. The rest of the configuration seems the same.
>>107491858finally no moeshit>Ministral3ForCausalLMwhich arch was this again? the ds fork?
>>107492184Is it just what happens when you get a long chat? Is there nothing I can do?
>>107492185So a new pretrain huh. That raises some red flags desu. I'd be wary of their claimed performance vs real world. I suspect it might be another flop and do worse than GLM 4.6 despite being claimed to be better. I mean hell the 25B is claiming to be on par lmao.
>>107492219no, llama but they renamed it in order to "stay flexible in case they change something in the future"this is literally just benchmaxed large 2if you run this you are retarded and contrarian against MoE models
>>107492255>So a new pretrain huhOr a medium 3 finetune
>>107492268>contrarian against MoE modelsThat is called someone who bought more than one 3/4/5090.
>>107492301and has been sitting on them seething for the entire past year?
>>107492288I mean Medium 3 is new, relatively speaking. I'm just saying it's a new pretrain compared to the old Medium which we would've normally thought became Devstral 2 if we just looked at the 123 number.
https://github.com/mistralai/mistral-vibe
>>107492308Yes and he also missed buying ddr5 when it was 4 times cheaper.
>>107492234you want to keep your context shorter than model's context. the simplest solution would be deleting old messages, ideally replacing them with a summary. there is a summary function in sillytavern
>>107492219>finally no moeshitAre you aware that in a dense model most neurons in a layer don't have a significant impact on the activation? You are basically doing millions of operations just for them to zero each other.
>>107492338>Mistral Vibe>Gemini CLI>Qwen Code>Claude Code>CodexI miss aider. Why do we need half a dozen copilot CLIs that all do the same thing?
>>107492234What you typically do is summarize. Take like half (or whatever) of your chat history and replace it with a summary. Then you avoid reprocessing until your context gets close to filling up again. Then you summarize again, and so on.
>>107492389>most neurons in a layer don't have a significant impact on the activation?if this were true, pruning wouldn't be such a joke
>>107492454It's not that they don't matter, it's just that for a given token many weights don't contribute much. But the "important" weights will vary for each token you generate.
>>107492060>It sounds like the old Large is the new MediumIt has been right in front of our eyes the whole time:https://mistral.ai/news/mistral-medium-3>Medium is the new large.
>>107492472yeah but it's still the best option we have until we start getting bigger moes or dynamic active params catches on
does the new devstral coming out mean that this >>107487529 is real?
>>107492234>>107492385>>107492440Like the other anons said, depending on what you are doing, you can either summarize, or truncate (cut off) the earlier parts if you don't need the earlier history (this is obviously computationally cheaper).But you don't want to do it for every message, you want to truncate once you reach your context limit down to something like half the context, then you fill it up again to the limit, etc.That way you only pay for re-processing once per seq_len / 2.
>>107492527Probably, but he's a fag for not leaking
>>107492454???it doesn't mean the neurons are useless, they activate too, just for different stimulation. In MoE we route to the group of neurons that are most likely to activate instead of calculating the neurons that are most likely to zero each other activations
>>107492559he's going to leak it for christmas >>107486923 ...hopefully
>>107492593Yeah. I think I've read that MoE-like specialization is observed even in dense models (ie. many weights contribute almost nothing for a given predicted token). In that case MoE just makes that explicit and tries to skip the computations that don't contribute anything.
>>107492593you are missing the point that a dense model can have more of the weights contribute for more difficult problems that require broader connections, but a moe is always limited to a predetermined set of active weights
>>107492634>predetermined setshould have said predetermined number or count
>>107492527Why would that be 162GB, though? Devstral-2 on HuggingFace is ~128GB.
>>107492634That's not really what we observe in practice. Neurons in artificial neural networks behave similar to biological neurons, meaning they tend to specialize themselves. Different problems are solved by particular groups of specialized neurons, it's a REALLY small part of the layer. Allowing more neurons to contribute to the problem solving doesn't make it better, it introduces unnecessary noise. There is a reason why our brains have specialized areas and the artificial neural networks naturally tend to make them themselves
>>107484913>Really? huh. Thats alot better than I thought. I was expecting like 1-2 t/s.Yup. RAM speed makes a big difference, so check your setup and make sure you're not running it a lower frequency than you need to.
>>107492851Image + audio + video adapters? The llama 3 image adapter alone for 70b was 20b
UNO reverse card.https://legal.mistral.ai/ai-governance/models/devstral-2>No documentation required>>Devstral 2 is designed exclusively to generate and assist with software engineering tasks (exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context). Unlike general-purpose AI models, which can perform a wide variety of tasks, Devstral 2 is specialized in software engineering-related tasks only. As such it does not meet the EU AI Act’s definition of a General-Purpose AI Model (GPAIM), in accordance with the AI Office's official guidelines. >>The EU AI Act only requires technical documentation for General-Purpose AI Models (GPAIMs) or GPAIMs with systemic risks.>>Since Devstral 2 is neither a GPAIM, nor a GPAIM with systemic-risks, these requirements do not apply.
Do I need to set up vLLM to run GLM-4.6V-Flash?
>>107492927Interesting. Does this also allow them to train on any data they want?
>>107492992I think so. It works for regular chatting too, although I haven't tried it in depth for that yet.
>it's uncensored and trained on all the nasty shit againMistralbros we are so back
>>107493039try to RP with it and tell us how it works
>>107492927The absolute madmen destroyed both the EU and the chink MoE meta. We are so back, France saved LLMs from the MoE dark age.
>>107492927wait, are we actually back?
>>107487529S'il vous plait monsieur. Aussi si vous auriez un Ministral 24B qui traine...même si les evals sont pas bonnes pas grave tant qu'on peut RP bien.
>>107493323not back until we have the goofs
>>107492992>>107493039If it looks like a general purpose model, sounds like a general purpose model, and performs similarly to a general purpose model beyond being good at "software engineering tasks", at some point isn't it a general purpose model?
It seems any model I use seems to love repeating my dialog back to me.
>>107493517>at some point isn't it a general purpose model?the geriatric female bureaucrats writing the EU regulations don't know that so who cares?
>>107493534repeating your dialogue back to you? ugh, I can definitely see how that could be annoying.
>>107493542Yes and they won't stop if I tell them to!
>>107493546won't stop if you tell them to? I can definitely see how that could be annoying.
>>107493556glm pls
>>107493611>>107493611>>107493611
>>107493542Repeating dialogue, you say? Don't think that doesn't make it incredibly annoying, you're absolutely right.*The air is thick with ozone and something else...*
>>107493622teee!
>>107491924Makes sense they pick a niche and work on the dataset, there's no money generating fart-fetish erotica for neets
>>107489635kys macfag