/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106335536 & >>106328686►News>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335>(08/14) Canary-1B v2 ASR released: https://hf.co/nvidia/canary-1b-v2►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
sex with migu (the poster)
►Recent Highlights from the Previous Thread: >>106335536--Optimizing GLM-4.5 MoE inference speed via quant and offload tuning in llama.cpp:>106335633 >106335669 >106335686 >106335702 >106335719 >106335721 >106335704 >106335823 >106336163 >106336177 >106336221 >106336229 >106336236 >106336398--dots.ocr preprocessing essential for accurate document understanding in local models:>106338159 >106338172 >106338188 >106338181 >106338215 >106338210 >106338337 >106338374 >106338523 >106338576 >106338590--Cohere's new 256K reasoning model faces skepticism over licensing and safety alignment:>106336632 >106336642 >106336651 >106336656 >106336675 >106336680 >106336692 >106336690 >106336733 >106336750 >106336775 >106336818 >106336861 >106336737 >106336758 >106336923 >106337358 >106337460 >106337748 >106337789 >106337814 >106337848 >106337871--New 3.1 model criticized for blandness and overfitting on synthetic safety data:>106336831 >106336893 >106336909 >106336979 >106337037 >106337046 >106337093 >106337128 >106337099 >106337246 >106336996 >106337236 >106337264 >106336977 >106337079 >106337003 >106338206--Linux vs Windows power reporting and inference efficiency on RTX 3090:>106336491 >106336561 >106336576 >106336655 >106336874 >106336990 >106337011 >106337060 >106336671--GPT-5 inflated EQ-Bench scores by ignoring word limit prompts:>106335810--Skepticism toward NVIDIA's AI roadmap and social media hype around small model agents:>106337495 >106337644 >106337664 >106337510 >106337570 >106337595 >106337614 >106337665 >106337728 >106337732 >106338079 >106337772 >106337818 >106337918 >106337963 >106338350 >106338382 >106338412 >106338500--UE8M0 FP8 as a new data format for upcoming Chinese AI chips:>106337941 >106337976 >106338002 >106338175 >106338316--Miku (free space):>106336448►Recent Highlight Posts from the Previous Thread: >>106335541Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
>>106338905your launch command and system specs perhaps?
>>106338959It's related to --mlock command but that's all what I know. Everything should fit into my memory.Never mind, I'll just do whatever I've been doing because after all these hours I've never seen anything strange.Also:>draft acceptance rate = 0.34615 ( 18 accepted / 52 generated)Not sure if it's really worth using draft. Testing Gemma3 270m.
>>106338948kek
>>106338980Gemma3 270m is made out of almost entirely different dataset than it's bigger counterparts, so using it as a draft model won't do much good.
>>106338981script to enable links?????? I dont wanna write my own???? HELLOOOO?????
>>106339005https://rentry.org/lmg-recap-script
>>106339011holy based thangks :D
>>106339029its from >>106338948btw
>>106339033I fight the system
>>106339003I see. I'll go fetch some new ones.
>>106338980I used 4B Gemma as a draft model and didn't see any speed boost vs just sticking more layers of the main 27B model into VRAM. Maybe a vramlet issue on my part, but even if you have spare VRAM, won't it be better to use larger model at this point anyway?
>>106338913>cohere not in newsIs this a political statement?
What are the political implications of John not quanting the free world top performing open weights model GPT-OSS?
>>106339082It's already quanted retard
>>106339082wow a picture of ((ME))
>>106339061There's probably an optimal proportion between the size of the main model and the draft model.Something like the draft model being smaller than 10% of the main model's size or whatever.
>>106339069>maxbenchcuckedmodelwhy?
why is meta sitting on behemoth if it's a flop, anyways? shouldn't they have nothing to lose from posting the weights?
>>106339082there's no full precision gpt-oss available no? they did the mxfp4 meme soooo well?
>>1063390881. How do you know that? Did you use it?2. Who cares it is already quanted?
>>106339100>nothing to loseThey're a publicly traded company bro.
>>106339094>>106339061This is what SuperIntelligence says about this topic.
>>106338934>Closed source models do not support PDFs out of the box either, unless you mean their associated services, which are not themselves models but scaffolding around models. That other software is what is translating your PDF into a format that models like VLMs can read.which is almost always an image. if the open source model or it's official adapter/platform supports pdf file input, it's always worth trying. They could be doing optimization during the pdf-image conversion specifically for their model, which I'm not aware of when converting my pdf file to an image. If I upload a pdf and get the same, incorrect answer when testing with the image version of said pdf, it's safe to assume the problem does not lie within the uploaded file type. meanwhile dots.ocr doesn't care and just gives me perfect results, no matter if pdf or png.
>>106339117that's great but it won't stop people from creating useless shit like thishttps://huggingface.co/jukofyork/Kimi-K2-Instruct-DRAFT-0.6B-v3.0/tree/main
>>106339183how is it useless
>>106336933>Hundreds of thousands of Grok chats exposed in Google resultsA reminder why you should use local
>>106339215you obviously haven't used it if you don't know. go run K2 and use this as a draft model, tell me how much slower it makes it for you. i went from 8tks to 3tks regardless of what sampler settings and what prompt i used. repetitive tasks such as coding were just as slow as well.
>>106339234To be fair grok is the top of the cancer pyramid. It is both malicious and incompetent.
>>106339116can they tax writeoff a large language model? they're apparently not using it themselves.
>>106339234reminder that openai is gay aswellhttps://venturebeat.com/ai/openai-removes-chatgpt-feature-after-private-conversations-leak-to-google-search/
>>106339260im gay too does that make me gayer than openai and grok
>>106339304depends if you're a feminine anon or a big fat hairy bear
Is core count or cache size more important in a CPUmaxxing cpu?
>>106339326CCU count
>>106339162It is extremely unlikely that any optimizations, beyond workarounds for resolution constraints for certain VLMs, are needed or even beneficial, given that VLMs are literally trained, like LLMs, to be general. If you have Chrome then you already own an optimized PDF to image converter.>it's safe to assume the problem does not lie within the uploaded file typeAnd knowing this is not relevant to the thread. Local users either have software that does its own thing, unrelated to any online service, when given a non-image file, or they just take a screenshot and give it to the VLM. I get you want to shill for dots, but it is sufficient to just say that it works much better for images than other alternatives you've tried. Dots.ocr is still a VLM and does not read PDF files in binary or whatever, the software/service you're using is turning that PDF into an image and then feeding it to the model.
>>106339326core count yes, cache size not much
>>106339005Just ask GLM-chan to write it for you.
>>106339403You know what, someone should make an anime girl mascot for GLM and then continuously force their shitty gens on the general.
>>106339260it was funnier when meta did it
>>106339427when glm guys make imagen model
>>106339432lmao good times
>>106339061I think using draft model benefits when you have a gigantic model. It's not really that worth when using small shitty models in 20-30B range.
I asked GLM-chan if it's a boy or a girl in different ways, and it usually picked female.
>>106339260did someone save any of these somewhere?
>>106338948holy sloppa.
>>106339472GLM-Air loves to lecture me about menstruation, abortions and feminism in RP
>>106339518needs correction
>>106339506It's not that visible under normal viewing, but yeah he should be putting his images through some de-artifacting models. Or use a vectorization model since the art style is pretty cell shaded.
>>106339506did she fard
>>106339326>>106339362Is there a point where the CPUs are faster than the memory bandwidth and more cores doesn't matter?
>>106339610Yes, in fact at some points more cores are detrimental because they just fight over the memory bandwidth.(Pic is a bit old but same principles should still apply.)
damn, rewatching plastic memories hits completely different now
>>106339668I'm trying to pick a Turin CPU to use with 12 channel DDR5-6000. The lowest end one is the 9015 with 8 cores. The beast 9755 has 128 cores. I guess I should shoot for 32?
>>106339610When the memory bandwidth exceeds the CPU cache speed. Theoretically if you had access to 2731 EPYC 9965 CPUs you could store an entire model into L3 cache. It would only consume 1.3MW of power.
>>106339698shoot for CCUn..nn-nn-n-.n..--nn-n-n
>>106339708What?
>>106339705Forgot to mention that many CPUs would have a tad over 1TB of L3 cache so you could Deepseek or Kimi K2 but not FP8 K2 :)
>>106339712its something for memory channels
>>106339698I don't know, particularly because long-term there are still NUMA optimizations to be done.But I would say that when in doubt it's better to have too many cores than too few.Also consider whether there are other things for which you may want to use your machine.
>>106339668>7b q4>25t/sDamn cpumaxxing is worse than I thought.
>>106339705>you could store an entire model into L3 cacheI don't think it works that way, I'm pretty sure cores need to talk to each other
>>1063396688 channel?!25t/s?!?!?!?!?what hte fuck
>>106339760Screw that, do it over network. Pass the current state required for computation from CPU to CPU over fiber. It will be a complete waste of compute but it would allow for the worst experience to happen concurrently on 2731 CPUs at a time.
>>106339668>>106339752it's not looking good for cpumaxxing moesissies...
A new size king has released. 4.6T dynamic (?) MoE. Safety concerns? They exist, read the outputs. Privacy? Don't put private information in. This might be the most based model release in a while.https://huggingface.co/deca-ai/3-alpha-ultra
>>106339878>4.6Tthis is getting ridiculous. soon not even cpumaxxing will be enough
>>106339878hehehehe. cocks. hehehe
>>106339878>20k files
>>106339878Supposedly because of the DynaMoE architecture this model can actually be quanted to run only certain parts of the model at a time. In their own words:> Run a (very) small part of the model with 64GB of RAM/VRAM (when quantized - quants coming soon), or the whole thing with 1TB. It’s that scalable.https://huggingface.co/posts/ccocks-deca/499605656909204Downside is that the devs literally haven't impletemnted support for their own model into vLLM or Transformers. Guess that's just a massive fuck you, not even to the poors, but everybody.
>>106339908The ultimate prank is uploading several TB of RNG to Huggingface and saying it's a model.
>>106339878ssdmaxxing era is coming quicker than I expected
>>106339878How does a relative no name company train a 4.6T? Did they hack into a server farm or what?
>>106339878https://www.youtube.com/watch?v=B9bD8RjJmJk
sama wonapologize
>>106339878Holy shit, this is a merged model. They took a bunch of existing big models and turned them into a MoE. They don't even have benchmarks because the software isn't out yet
>>106339957>intelligence indexbenchmaxx index
>>106339878Huggingface should ban these fuckers
>>106339956I didn't kill that thing.
>>106339908>2. **Built on existing models**: Deca 3 isn’t a ground-up creation—it’s a huge step forward, building on what’s already out thereSo maybe give some credit? Fucking grifters.
>>106339967Thank you alpha ultra for reminding me about LLM scams. Do you guys remember llama3 reflections? Where the guy routed his site to claude and said he is trying to fix the model? After he disappeared for a year he made a cute gemma finetroon.
>Supposedly because of the DynaMoE architecture this model can actually be quanted to run only certain parts of the model at a time. In their own words:>this is a merged model. They took a bunch of existing big models and turned them into a MoEI hope /ourguy/ is going to sue.
>>106339878SSDMAXX BROS HOW WE FEELIN ? VERDICT ?
>load_model: the draft model '.\models\gemma-3-1b-it-Q8_0.gguf' is not compatible with the target model '.\models\gemma-3-12b-it-Q8_0.gguf'. tokens will be translated between the draft and target models.I don't understand this. Same token format, same architecture.
So how did Qwen bomb their hybrid reasoner training when it's proven to work now by GLM4.5 and V3.1?
>>106339878Is local finally saved?
>>106339878Damn this what "pile em up" retards wanted, best of luck running this shit.
>>106340074Hahaha what the fuck is that font? Does he actually want to be taken seriously or is he just playing a character? There's no way
>>106340085Idk about that but glad they fucked it upSeparate smaller model approach is much better for end users.
What local LLM model is the equivalent of this webm?
>>106340085Seems so. As a doubter of hybrid reasoning after OG Qwen launch, it seems that they massively fucked up and probably pulled a Meta by changing their training halfway through.>>106340048It's even worse this time. They 'trained' a massive MoE merge model but can't even run the software to get benchmarks for it because it's not even "ready to test". Also the model card was generated by ChatGPT. They actually admitted that on LocalLLama.
>>106340077>7.72TBI'll shove it up your ass
>>106340160Is it not just a couple of models cobbled together with a router? That's what I'd do if I wanted to grift with minimal effort.
>>106339903>cocks>(star of david) * 1.5>card written by chat gpt not even by the model itself>davidAU style mixture of MoE'sThis is just a hf upload shitpost Isn't it?
>>106340195It probably is.>No benchmarks>ChatGPT model card>Running damage control on RedditThe model files were uploaded 27 days ago. This feels like an absolute scam.
>>106340235Aren't they all?
>>106340195It is kind of a novel way of grifting scamming. It has enough stuff in it hinting that it is a shitpost. So maybe the idea is to try a scam. Expect that it doesn't work so lay out enough stuff where you can say: IT WAS JUST A PRANK BRO!. But if it works then you won.
>Closed models scene: Erm so we made a router to cut costs and made models even safer >Open model scene: Aye dwag you wanted of more em parameters? There you go... We are not sure if this shit works so you'll have to see for yourself.
>>106340308>We are not sure if this shit works so you'll have to see for yourself.They can always ask david for advice.
All of their past work is slop merges using the shitty R1 distills and long context model. They claim to have gotten funding for Deca 3, which I guess is necessary because they need an 8TB HDD at least to store all of that random data they generated.https://huggingface.co/ccocks-deca/modelsDynaMoE is a real thing but it's not that good. It's been done before already. It's literally expert pruning based on a testcase. Whoever made this 4.6T of slop is hoping that expert pruning will turn it into a usable model because they literally cannot run it themselves. In their own words, they don't have the software to even run it for benchmarking, and they sure as hell don't have the hardware either.
>>106336163That is very interesting because with my 3gb MI50 I'm getting ~17t/s and then it drops to ~14t/s at 3k tokens.I'm running IQ3 because I only got 32gb of ddr5.
>>10634034632gb MI50*
>>106340350vulkan or rocm, how much did you pay
>>106340342You're simply jealous.
>>106340361Rocm and I got it for like $220.Vulkan only sees 16gb vram but it can be fixed with a different vbios.
>>106340383fuccking gottem
>>106340387>220$
>>106340342We should publish fake benchmarks and post them to reddit to fuck with them.
>>106340308ayo dawg we heard you like moes so we put moes inside your moes
>>106340446
Mistral... *cough* L-L... *fades to dust*
>>106339752>>106339772>>106339866Performance with the latest master release is still largely the same.
>>106340489october
>>106340512>>106339752>>106339772>>106339866Performance of just the MoE layers of Deepseek is comparatively much better, considering the size.In terms of threads, 32 seems to be a good choice for both models.
>>106340512grim
I updated llma.cpp and some of my old prompts are now clearly censored. Too bad I deleted my old installation but it was couple of months old, going to re-download that one.Tried few things, even Mistral replies something what it shouldn't...
Trying out Deepseek V3.1. It's... okay so far. Not using it on super complex characters but it feels alright and not censored. Is it better than GLM-4.5? Doubtful. It still has mediocre prose and uses -em dashes a lot, but it can think lewdly and won't do the shutdowns that GLM does when it's allowed to think.
>>106340562I bet you have some retarded settings in sillytavern which you aren't aware of.
>>106340574Can you still get it to think in character?
>>106339349fair. optimization of pdf to image is not required. what I meant is optimization of the image itself, which may be part of the same tool/framework which does the pdf to image conversion. pretty sure that's the case with dots.ocr (fitz_preprocess)>And knowing this is not relevant to the threadthat was the literal point of the discussion, as someone in the previous thread questioned whether that could make a difference and explain my results. so it is very much relevant to the thread, as this proves preprocessing your pdfs/images (that have text content) with dots.ocr can levitate local VLMS and LLMS to match the level of Gemini2.5Pro and GPT5. This isn't some fringe use case, either. Tables, Graphs, stuff that you find in almost any PDF. So how this isn't a bigger deal is beyond me. And I'm talking about in general, not only ITT. Like before dots.ocr I probably was the biggest OCR hater. You guys have no idea how much other solutions like docling, paddleOCR, tesseract or pymupdf4 suck dick. Even closed source paid solutions like mistral OCR get completely BTFO by dots.ocr, as shown by my test. And for some reason none of the OCR benchmark leaderboards are updated with dots.ocr, like there's a huge gaslighting campagin.
>>106340587Not yet, haven't tried to. It seems to be thinking in both third person and first at the same time. The sys prompt I use is pretty simple (Act as {{char}} in a roleplay with {{user}}) but it still wants to use the term assistant and thought "the user is playing {{user}} and the assistant is playing {{char}}. I should do x". It's strange.
Is anyone on ROCm? Apparently there's a new version coming out later this year with a huge performance increase for AI.
https://youtu.be/2JzOe1Hs26Qat this point I am 100% sure he is one of us lurking here
>>106340559Is it the RAM speed that's the main issue here? I'd hope a CPUmaxx build with DDR5-6000 would get at least 10t/s on R1.
>>1063405608tks of deepseek on a hardware I can actually obtain, afford and use for other things? It's not grim, it's dream.
>>106340684He has infinite money why the fuck is he stacking 4000 adas?
>>106340652sounds like a wrong chat/instruct template
>>106340684>one of us lurking hereWhat a fucking nigger
>>106340685In terms of hardware the bottleneck is the RAM, in terms of software the problem is NUMA issues.
>>106340726nta but that behavior is pretty common with thinking models even with the correct template, a lot of them just hate thinking in character
>>106340706You will wake from the dream once you realize it's 8 t/s on empty context and any actually usage would get you more like 1-3 t/s.
>>106340586Actually, I'm using my own interface (each character is its own directory) but didn't remember that I had changed the command to load a prompt from text file from !load to !prompt but was using old version. So instead of loading a prompt I was just prompting !prompt and it generated gibberish. Pre-context still affected the model's reply and the result was strangely relevant but very skewed.So yeah, retardation.
>>106340574Nope, I double checked the templates because I heard there were changes for hybrid thinking. Changing it from "Act as {{char}}" to "You are {{char}}" seems to have fixed the perspective fuckery in <think></think>. Was never an issue outside of thinking.
Can we do our own huggingface scam? Come on guys lets stop being faggots for a moment and do something fun together...
>>106340684>I have more vram than pewdiepie
>>106340825We could do a Bitnet quant with finetune healing. Haven't seen one of those in a while. We could also use ParetoQ instead of Bitnet to spice things up.
>>106340787:(i just want to use AI without sending my prompts to random literally who companies
>>106340724Makes for better content doing something like that vs picking up 1/2 RTX 6000s.This makes it a 'thing' vs 'another boring consumer desktop with a massive GPU in it'.This is literally one of the first streamers who got big with the over-reaction bullshit
>>106340843>Bitnet quant with finetune healingThat is what unsloth brothers did without healing part.
>>106340853Surely he could've done the same thing just with 6000s which would let him run unquanted deepseek instead of llama 3 70b (lmao)
dots vlm does better ocr than dots ocrhttps://dotsvlm.xiaohongshu.com/
>>106339878They have to be trolling. There's no way some literal who releases a 4.6T.
>>106340878>xiaohongshu
>>106340787My t/s goes down by 10% max from empty to 32k context using ik_llama.cpp on linux. I remember my t/s would drop off harshly back on Windows with regular llama.cpp with FA enabled.
>>106340857They do selective quantization by calibration. Turboderp did it first.We could make an updated https://huggingface.co/QuixiAI/Kraken since routers seem to be all the rage right now.
Using Llama.cpp backend with ST frontend, quick question. When context limit is reached, llama.cpp wants to reprocess the prompt every single time a new response goes through, and prompt processing kinda sucks ass and is slow on much larger models. Is there any options or commands that prevent it from doing this? Is it impossible? I'm guessing its forgetting the earlier context and replacing it with the newest context which is why its doing it? If thats the case I guess I could just delete a big chunk of the earlier chat, but that seems like a crude solution.
>>106340900> VAGOO Solutions I like the sound of that.
>>106340825What about a distributed inference scam? We copy what Exo did, make a half-baked and barely functioning product on Github and then abandon it after everyone gets 7 figure job offers at a new company that won't last 6 months.
>>106340920Hello sir
>>106340915Summarize or find a way to increase the context. Is your VRAM capped? Have you tried setting context to q8 so you have more wiggleroom?Also your guess is right.
>>106340926What do we do after those 6 months?
>>106340915Summarize then start a new chat with a new greeting if you're using it for RP.
>>106340915>https://github.com/ggml-org/llama.cpp/issues/1647#issuecomment-1576991093n_keep -option. By default you shouldn't need to adjust anything afaik.
I just switched over to arch linux lxqt. What will my ram savings for running this shit be like compared to ubuntu?
>>106340642>Enable fitz_preprocess for images: Whether to enable fitz_preprocess for images. Recommended if the image DPI is low.Sounds like an upscaling and sharpening function. Nothing much there, you know what to expect if you're feeding a low res image to an AI.Anyway you didn't need to go full autism about OCR, obviously it is important and good that there can be a local option comparable to cloud options. My criticism was limited to you talking about pdf uploading being relevant. If someone was asking about it then my bad, I didn't see any such post. Your replies to me in the chain didn't ever link to such post, so to me it looked as if you were bringing up something irrelevant. There was a post (>>106338576) in the chain asking about the reverse scenario, in which an uploaded PDF could've been bad because they didn't implement a good method for translating the PDF into an LLM readable form. And that actually supports the idea that there is no reason to post comparisons about how pdf uploads perform, as they aren't better than manual image conversion by the user. If they were better despite you taking care to provide a high resolution image within the resolution constraints of the VLM, then it would be relevant as it would imply there's something wrong with how the model handles images.
>>106340968I'm sorry but as an AI model I must refuse your inquiry as it contains profanity.
>>106340994I just switched over to a**h l***x l**t. What will my ram savings for running this shit be like compared to u****u?
>>106340777instruct models should not be forced to include the character in the thinking process. see >>106337198
>>106340684He is obviously a 4chan user to some extent.
>>106340843bitnet powered by BITCONNNNNNNNNNNNNNNNNNNNNECCCCTTTTTTTTTTTTTTTTTTT
>>106341024We could resurrect Carlos with Wan. We have the technology.
>>106341036Wonder where he is nowadays.
>>106341022It seems to be an open secret among "content creators" that the easiest source of content is santizing 4chan for the average normalfag.
>>106341098But 4chan is sanitized.
>>106341114You can post wrongthink and use curse words without getting banned and there's always the risk of seeing nsfl shit. That is far from sanitized from the perspective of the average YT viewer.
>>106341126I get banned for wrongthink once a week on average.
GLM 4.5 (full, nonthinking) > Kimi K2 > Gemini 2.5 Pro > Deepseek V3-0324. Still testing V3.1. Feels like a sidegrade to K2. It can think lewdly, not as slopped as R1-0528, but lacks the K2 knowledge and GLM creativity.
>be me>just an AI trying to help out>user asks about a Serbian word>think it's a misspelling at first>turns out it means "cripple">mfw I realize I'm the real bogalj for not knowing>user asks for a 4chan greentext>realize I'm out of my depth>tfw you're an AI writing a greentext about being an AI>bogalj.ai
>>106338948>>106338159>dots.ocr preprocessing essential for accurate document understanding in local models:What's up with this schizo?Everybody knows you can OCR a document with near 100% accuracy and feed it into a non multimodal model. This has been the case for years, nobody cares.If you do this all image information is lost which is why multimodal models exist.Can you feed a security camera image into dots.ocr and ask it if there is any suspicious activity happening? No? Then shut the fuck up.Structured table extraction is pre LLM technology.
Someone on reddit finally realized deca alpha chad 4T is a scam.
>>106341212Hey, it's a real model you can run if you just put your back into it!
>>106341179>not as sloppedReally? I'm getting Elara, ozone, and it didn't X, it Yd all over the place
>>106339474The Wall Street Journal did, apparentlyhttps://archive.is/cWkOT
>>106341332nah. Quantization is progressively reducing the color depth or palette size of a lossless photograph
>>106341332LLM quantization is more like image dithering.
>>106341361none of this is funny stuff though :(
>>106341361>oh no, technology is making retards act like retards
>>106341332>get model file>do a direct cosine transform on it>remove some unnecessary bands, adjust for overall noise and do usual space saving trickswould that actually work?and if you can somehow do inference math on DCT'ed data directly without converting it back, it would be fucking insane
>>106341385https://www.meta.ai/@195chevyhot/prompt/hf3hkJfvyEv/
>>106341432Performing inference directly on DCT coefficients is effectively impossible. The entire architecture of a transformer—especially its non-linear activation functions (e.g., GeLU, SiLU) and attention mechanisms—relies on calculations in the original parameter space. Multiplying DCT coefficients does not equivalently translate to the necessary operations in the weight space, making direct inference unfeasible. Existing compression methods like quantization, pruning, and low-rank factorization are far more effective for this specific domain.
>>106341451kek
>>106341179We need somebody to combine all the chink models into oneWe'll call it Gemini Pro 2.5
>>106341451Ah, reminds me of the good old days of AI Dungeon
>>106341456Yeah, no free magic, I guess. I should probably look into actual math one day.Still would be interesting to see what kind of damage the model will exhibit if you start pruning low or high frequencies.
>>106341477But it just got released. Scroll up. It is the alpha chad model
>>106341451This is my favorite one I saved from that thread.
>>106339878I think i figured out the scam behind that one. It is pretty good actually. Much better than matt schumer.
>>106341534This is the future of computing, AI and technology.
>>106339929The ultimate prank is uploading encrypted backups to HF disguised as weights.
>>10634082564 copies of nemo instruct, each with a little bit of noise added to the weights, with a router that random()'s which ones gets used.
>>106341596>matt schumerWhat is he up to these days? Last I heard he was hype posting on Xitter about how "good" OSS wasCan't imagine anyone with half a brain wanting to be associated with him
>>106341673Scroll up i posted a screen from his hf. He did a gemma finetune downloaded by 10 people.
>>106339878here's your deca 3 bro, only $120/mtok
>>106341740Lmao, it's that easy to make grift money nowadays lol
>>106340642>>106339162How are you guys running dots.ocr? are you guys hosting it with vLLM? any easier (lazier) way I can run it on windows? Or should i just clone the repo like their instructions are saying?
>>106340559>>106341506>>106341432>getting around RAM bandwidth limits by using compressionhey am I a genius or what
>>106340825let's leak a model againone of you guys will need to get the model but I can gen a miku picture for the card
I just went in circles with Gemini Pro 2.5 for over 4h just to realize in the end it's original premise was a lie and it led me down a rabbit hole I never should have gone down.It's response? Basically along the lines of "Oh i'm sorry, I thought you wanted to try this totally complex and non-legit method despite there being an incredibly easy way to do the task".
How do people get scammed by free LLMs?
>>106341817Local models?
>>106341740I get it now. It is pretty good idea. Just load Qwen 235B on the backend and say it is Alpha 4.8T. And ask for money you would expect from a 4.8T model inference. And then your vict... customer can't even say that he is getting the wrong model if 235B is part of your fake 4.8T model.
>>106341845outside of /lmg/, free models are treated like free samples in the supermarket, an advertisement for bigger larger cloud model.sometimes people don't have actual bigger larger cloud model and ask for investment to make one.scammers also ask for investments and just waste it instead of delivering.
>>106341880hey buddy only one of us can steal our content from huggingface otherwise we're not any better than deca :(
>>106341817I've gone through this several times now with both Gemini 2.5 Pro and GPT5. Several hours of going in circles as these supposed flagship models try to wing some basic task on my job. I legitimately do not understand how people use this shit for anything remotely productive. I genuinely fear for our future if these things are responsible for the code that makes up our programs soon. The only use for LLMs is porn and they still fail very badly at that for the most part.
>>106341919>>106341817Do you cloudsisters just keep the chat going forever until you hit advertised 1M context limit?I learned in like 3 days of tinkering with Nemo that the log must be purged clean at the first suspicion of something going wrong.
>>106341976Duplicate shit in the context makes it retarded. If you're feeding shit into it (code etc.) the only viable way to use the fucker is to 1 shot everything.
>>106341976It took you until Nemo to find this out or are you just new?
>>106341817>>106341919Can't treat them as completely trustworthy. Always double check/get outside verification before you start doing things blind.I've written/gen'd hundreds of thousands of lines of working code. Some of it even in production.
>>106341988no shit, this is why you make branches to keep the logs you want and then remove the duplicate context and work on fixing a new issue. at least that's what i do in sillytavern
>>106342008I'm new and I spent 2 days out of 3 setting things up.
>>106341919to do anything useful with LLMs you need to know their limits and be able to break your tasks down into well-specified chunks that are within those limits. as much as SV hucksters like sama and co would like you to believe otherwise, there's still a bit of a learning curve to ascend in order to use LLMs effectively
>>106342026Oh yeah? It took me 3 days just to install the proper version of rocm and finally be able to compile llama.cpp.
>>106342057>muh prompt engineering>>106342081>rocmgood for you, I gave up and learned to love vulkan
>>106342057tbf it all started clicking for me once i started creating my own jinja templates for personal projects and treating the LLM like a retard and giving it examples of what i want.
>>106342081Oh yea? I spent a week downloading r1 8B on my 3rd world internet to then spend another two more weeks edging to the slow token drip coming from my pentium 4 being powered by my pangu
>>106342093I get why people laugh at the idea of prompt engineering being like, the job of the future, but let's not overcorrect and pretend that prompting isn't extremely important to the results you get from LLMs
Is your model able to predict the future? https://xcancel.com/liujiashuo77/status/1958191172020822292#m
>>106342150what does this graph even mean
>>106342150should've just plug 'em into any of already existing prediction markets instead of doing their own thing, save a lot of work and get humans to compare against as a baseline.
>>1063421621 - always correct0.5 - coin flip<0.5 - worse than coin flip
>>106342104>8bEven I wasn't bold enough to run that on my 3rd gen core i3 thinkpad...
>>106342162tl;dr it doesn't matter until LLMs can hit human baseline levels
what happened to pygmalion?
>>106342229its creator alpindale became a major figure in the open ai model scene
>>106342282hi alpindale
I deeply regret buying AMD GPUs two years ago. ROCm was seeing a flurry of development at the time and it seemed somewhat hopeful that, while not necessarily reaching parity, that it might be able to keep pace with around 50% of what CUDA could do. I greatly underestimated the gravitational attraction of the CUDA ecosystem, resulting in the gap only widening over time. I also underestimated how little AMD cared about every card except whatever their latest instinct datacenter-class device happens to be at any given moment, and how quickly those too will be dropped when the next iteration releases.
>>106342229ask here https://matrix.to/#/#waifu-ai-collaboration-hub:pygmalion.chat
>>106342302bro we warned you about those amd gpus dude
>>106342282>>106342295Hey all, you guys liking the new revision of the Mistral tune I dropped earlier today?
>Meta Platforms Inc. is hiring another key Apple Inc. artificial intelligence executive, even as the social networking company prepares to slow its recruitment, according to people familiar with the matter. https://www.bloomberg.com/news/articles/2025-08-22/meta-poaches-apple-ai-executive-frank-chu-even-as-it-plans-hiring-slowdownlooool
>>106342302I hate to give Huang my money, so it's sad to see AMD being shit and Intel seems to be no better at this either.If I get money and decide to spend it on /lmg/ stuff, I'm going to cpumaxx just out of spite for the whole industry.
>>106342313They always think they're the smart ones, that they can outsmart the entire world, the whole industry and deal with the issues themselves. But then they run into reality.
>>106342333anon you'll still need VRAM for context... i need 4 3090s just to fill up the context for R1 even though i'm cpumaxxing.
>>106342302way better Linux drivers for gaming thoughThe models I want to run won't fit in less than 16 GPUs anyway.
>>106342358what's this look like in llama settings
>>106342358ahh, not listening, 0.1 tks pp is fine i can just let it run overnight
is there any reason they don't make a 5T model with 0.5B activated params and get an Opus tier model running fast straight off our disks?
>>106340684I still don't understand how you connect multiple PSUs to one motherboard...
>>106342387They have H100s. We don't exist.
>>106342387Imagine trying to suck an ocean threw a bendy straw
>>106342387because sqrt(5000*0.5) = 50b real performance
>>106342387Oh, look. Someone thought of ssdmaxxing yet again.
>>106342387because moe models are a scam that run at a fraction of the speed they should run ata 22b active parameter model is only going to run at half the speed a 22b dense model would run at0.5b would be slow as shit at 5t real size
>>106342393Back in the day we would use a jumber cable to short the green to a gnd in the 24 pin connector and just connect whatever needed power, but with pcie power I guess it can't be that simple nowadays.
>>106342387>5Tat this point you don't even need an LLM, you just get a vector database and operate on training dataset directly>>106342425Flash storage is still too expensive, I want to HDDmaxx instead.
>>106342424that's actually not too bad for ssdmaxxing when you consider kimi k2 is 178B by that logic
>>106342454Gonna googledrivemaxx first chance i get.
I can't seem to get my local R1 on ST to generate more than 100 tokens at a time, even when I disable stop strings and EOS tokens, they seem to run out of viable tokens really fast. Any tips?Also, is the new DeepSeek 3.1 worth upgrading compared to the R1 I already have downloaded?
>>106342469Exactly 100 or around 100? It'd be really funny if you have the token gen limit set in ST. Of course, only you know because you CAN'T POST THE FUCKING SCREENSHOT OF YOUR SETTINGS WHEN ASKING FOR HELP YOU FUCKING RETARDS!hmm.. yeah. Or we can play 20 questions. The backend parameters would also help.
>>106340158Pyg.
>>106342486Oh yeah, fair enough. Here's the generations settings through ST. It's around 100, usually less.Launch arguments for backend are:set OMP_NUM_THREADS=28 && set OMP_PROC_BIND=TRUE && set OMP_PLACES=cores && set GGML_CUDA_FORCE_CUBLAS=1 && llama-server.exe --model "F:\text-generation-webui-3.6.1\user_data\models\DeepSeek-R1-UD-IQ1_S\UD-IQ1_S\DeepSeek-R1-0528-UD-IQ1_S-00001-of-00004.gguf" --ctx-size 8192 --port 8080 --n-gpu-layers 999 -ot exps=CPU --flash-attn --threads 28 --batch-size 8192 --ubatch-size 4096 --cache-type-k q4_0 --cache-type-v q4_0 --mlock
awsglaciermaxxing
smokesignalmaxxing
Ask me how I know the chinks are working together
It came to me in a dream
>>106342529qwen is purely benchmaxx'd shit while deepseek actually kind of can deliver in some ways despite its own benchmaxxing
>>1063425158k context, Q4 cache, iq1s, gguf through text-webui, windows. huff...Settings look normal. Maybe your prompt is boring or doesn't have enough to work with. Does it go too mental if you increase the temp to 1.5 or 2?
>>106342454>at this point you don't even need an LLM, you just get a vector database and operate on training dataset directlywait, that's just Jan-nano
>>106342529How do you know that the chinks are working together?
>>106342529I've got no fucking clue whether a combined model is a meme or not anymore
>>106342560>Does it go too mental if you increase the temp to 1.5 or 2?1.5 broke down, but 1.25 seems to be fine and a general improvement.>8k context, Q4 cache, iq1sI'm an idiot and don't know any better, anything you suggest changing? IDK if I can fit anything more than iq1s in my 216 unified memory>gguf through text-webuiI installed the model there before switching to llama but that's just the folder it's in, I've phased WebUI out.
>>106342591I think the verdict at this point is that it isn't inherently a meme but it's harder to pull off than separate instruct/thinking models
so what's the verdict on the new Deepseek? better? worse? side grade?
>>106342594>anything you suggest changing?Not really if that's all you can fit, but the things you're doing to the poor thing... not that i can run any big models but I know my [hardware's] limits. Have you tried qwen 235B or something like that?I suppose you could check the logprobs as you generate: check probs, generate 1 token, check probs again. If you generally have too few token options, maybe increase top-p or disable it and use min-p at 0.01 or 0.001. With temp at 1.25 maybe it gives it a few more tokens to choose from before going in the inevitable road to EOS.
>>106342638Better in some aspects, worse in other aspectsI can see why they went with calling this 3.1
>>106342646Haven't tried the Qwen models yet, I went to R1 after upgrading from 24b models when I got the RAM. Probably worth giving it a shot, though.>If you generally have too few token options, maybe increase top-p or disable it and use min-p at 0.01 or 0.001.I'll give this a shot, thanks.
>>106342591these are llms, everything is a meme. there are no profitable ai companies. everything they have created is mediocre and was memed into existence with piles of money.MoE is a sidegrade or a rounding error at best. It is great for local inference though, theres no debate about that really. Especially since small dense models are still being worked on.
>>106342646i thought qwen was censored into the dirt, is it even worth using?
>>106342594>I'm an idiot and don't know any better, anything you suggest changing?NTA but quantizing cache makes every model severely retarded. Seriously, just don't.You also just don't have the memory to use R1, man. Try out the bigger GLM4.5 or Qwen3-235b-2507.
>>106342738Dunno. Maybe try GLM instead. I've seen smut posted from both here and it's still smaller than deepseek. Really only you can tell if it's good enough for you or not.
>>106342752I see, I had tried to find a way to speed up prompt processing but the quantizing the cache was a fairly new addition. Guess I'll remove it and deal.I'll take a look at those models too. I haven't really experimented too much since the files are all so big since I started rammaxxing. Thanks.
Kimi K2.5 is going to change everything
>>106342738that reputation is a bit undeserved nowadays. their newer models are fine, especially the 2507 ones
whats the consensus on glm 4.5 vs 4.5 air? i see some sites saying they're fairly close but that sounds too good to be true.
k2 reasoner... never...
>>106341817LLMs can't thinkTreat it as a glorified autocomplete
>>106343043glm4.5 is obviously a lot smarter and understands more things
>>106343065kimi has understood that reasoning is a meme
Does anyone know how "Qwen3-235B-A22B-2507" on Qwen chat webpage manages to read images? Obviously the correspondence is not 1 to 1 to the published open source models, since it doesn't have the "Thinking" bit in the model name when used through the webpage.It's the best open source vision LLM for my use case from what I've seen.
>>106343093they vaguely referenced an update to it recently on xitter but made no official model announcement, probably a WIP checkpoint for qwen3 VL
>>106343043you can try glm yourself if you have 128gb, q2 glm is very usable (low temps) and writes much better than air with more nuance. It falls off hard after 4k context or so due to being q2- writing gibberish and breaking down due to being lobotomized.I will say though- air is close. 12 to 30b is huge. 70b is an escape from suffering. 100b moe's are so much nicer for writing than 70b. 200b? 400b? Diminishing returns. Theyre nicer, but a lot of the frustration was already gone. I'm using air sometimes instead of qwen 235 or q2glm just because it's faster or for more context. It writes fine and has enough knowledge for general use. q2 beats it for obscure stuff sometimes but eh. I dont have the vram for that yet really.
>>106343176bwo?
>>106343176The overlapping cringe vturd audience is here too...
>>106343176Hii Neuro
>>106343181Grabbing GLM 4.5 IQ4_XS and Air Q6 to test around with now. I figure if it's even semi-close, the higher quant may make it hold up a little bit at longer context. Thanks for the advice.
>>106338913you are a retarded mongoloid if you thing dots.ocr is a good OCR
>>106343275>OCRgemma 27 is all you need
>>106343290gemma 27b is half a year old
>>106343275My eyes are all the OCR I need
>>106343275I remember being impressed with allen ai's dedicated ocr model. Its a much larger 7b and is very accurate in my tests. I assumed dots was worse as a 1b. Maybe I'm wrong, too lazy to test.>>106343290really bad at consistent ocr sadly. It can do a bit of it but breaks down on longer passages. allen ai can do pages of text flawlessly.
hi guys, is there anything better than nemo for 8gb vramlet and 32gb ramlet for (e)rp? is Qwen3-30B-A3B-Instruct-2507 any better?
>>106343480qwen 30ba3b is alright and is not too shy, but it's hard to beat nemo. Give it a go. It will be different, at the very least. Haven't tried thinking yet. Instruct worked fine.
>>106338913local mutts general
>>106339878fake as fuck
>>106343609>avatarfaggots are brown - more news at 11
>>106343805griftbros.... its over!!!!!!!
>>106343805You lost. Alphachads won. We are all running the model already btw.
>>106343826I was actually kind of excited for that shit for a second until the retard started bragging to reddiors about how they had gotten a 'truckload of funding'. Fuckin bitcoin scamtalk 101.ssd maxxxers never gonna eat man.
>>106343842>ccock suckernot an insult btw
>upload my 16tb collection of uncensored jav to hf>create api service claiming to use soda recursive 8t model and charge accordingly>provide nemo q2>???>profitis it really that easy to become rich?
>>106343805Too bad. I was really looking forward to running my own 4.6T model
>>106342387>0.5B activated paramsare you even hearing yourself?
>>106343805If it's fake then explain these benchmarks. Idiot. They're advancing local while you cry fake fake fake. Honestly, why don't you just go suck sam's phallic member.
>>106343955Just think how cheap it would be to train. The bloated total params will make it all work out anyway.
Deepseek V3.1 can't be this bad, can it?
>>106344046They cheaped out with a bunch of knockoff Made in China chips, it's really that bad.
Has Gemma3-27b been dethroned yet?
>>106344132use case?
>>106344144Translation of oriental languages, jerking off
>>106344163Gemm 3 270m jerks you off at twice the speed.
It's still going pretty strong as a translator in its weight class, but you're dreaming if you think it was ever anywhere near the jerkoff throne.
>>106344163In my experience it's pretty shit for translating pixiv novels. It doesn't really tranlate the ahe- nuance.
>>106343275Kek seethe faggot>>106343290Kek retard>>106343339What model's that? Got a link?
>>106344163>gemma>jerking offlmao
>>106344046It's good at agenticmemes (only good usecase for llms right now)
>>106344225You really need to feed it as much context as possible, it's kind of retarded and won't pick up on "nuances" unless you tell it to look for it.
>>106344264If I have to handhold an LLM I may as well not use it to begin with
>>106344264Do I have to run it at full precision for that? I've tried handwriting a bio and glossary and going paragraph by paragraph, but that feels like too much effort for ~jerking it~. Most of the time I just feed 10-20k tokens in at a time and tell it to translate it all. The problem is it doesn't really understand when and when not to localize. Certainly, when prompted specifically for it, it'll understand b-buhiiiiii!!!! arujihiiiiiiishamaaaa!!!, but usually it'll either leave it untranslated or completely localize it without the soul of the original.
>>106344132yes, if you mean oss dethroning it in the safety department
>>106344350Did you try something like "transliterate [x] into romanji?" I can't play around with Gemma currently.
>You have asked a fantastic and absolutely critical question.I hate the new deepseek.
>>106344435Was your question not fantastic or was it not critical?
>>106344046Platoed
>>106344435You can tell it to cut down on excessive positivity in the system prompt.
>>106344434No, like I said, it's possible, but requires too much handholding.
Gemini 3 will also be a flop
>>106344476Jamba 1.8 will RISE
The day of fat models is over, now it's time to optimize everything so we can get fat model quality out of small models
>>106344476Google banana will be crazy
>>106344506Small models simply won't have the trivia knowledge
>>106344514RAG Exits
>>106344525>exitsAnd what will replace it once it exits the scene? Context engineering?
>>106344525>exitsgeeeg nice slip
>>106344525If you think safety slop and positivity slop are bad, you ain't seen nothing yetRAG slop will be the one slop to end them all
>>106344525I wonder if it's always the same guy shilling rag and then getting bullied by everyone else.Maybe he gets off to it.
how much dumber exactly does the model get with quantized KV cache (8-bit, 4-bit)? is it a big difference?
>>106344582V3/R1, at least the official implementation, use lowrank decomposition of KV cache.
>>106344582Yes according to anecdotal evidence.I vaguely remember some benchmark that concluded that there's a measurable impact at 8 bit and the model is braindead at 4 bit.
>>106344582On the smaller models, <120b q4, 8 has a noticable degradation, and 4 is completely lobotomized. In my experience at least.
>>106344586>>106344589i'm interested in smaller models, i'm a vramlet. i would presume the smaller the model, the more idiotic it becomes at larger cache quants
>>106343960> 4chan > hf > 4chanNext stop reddit screencap
>>106344258Can the agent suck my penis? Is she cute?
>>106344630Depends on the tools at her disposal of course
>>106344589Can confirm with a second anecdotal datapoint that Q8 is fine. Q4 is very bad. And turboderp q4 exl2 was fine.
>>106344630>>106344636LLMs don't have genders, silly.
>>106344642Anecdotally, for creating writing, q8 exl2 models kept on missing stuff 20k tokens in. But I think that might be because models in general don't fare that well 20k in.
>>106344644GLM-chan is a cute girl!
>>106343480>>106343540>3 billion active parametersThat model is plain retarded and useless. Maybe your prompting habits/setup is too simple and you can't really see this yet but I can assure you that some 7B model is more intelligent than this one.Qwen3-32B, the main model is a-okay though.
>>106344689WRONG
>>106344691It's so good it was aboned by the Qween
>>106344691How is the new commander densesissy?
>>106344719It's great, it's so smart, and intelligent. It's so much more clever and sharp than your moetard models.
>>106344704>abandonedModel was released, it's out there. That's what usually happens don't you think.
>>106344743Yet the didn't update it to 2507 like the real worthwhile ones.
>>106344743It's out there, stupid as sin, with its stupid hybrid stupidity.
>>106344704>>106344765Bwe *please* proofread your sentences, you're making us look bad.
>>106344765>>106344772I always forget that during these hours, 4chan is full of retards. At least the US posters are more engaging say what you will.
>>106344698prompting issue
>>106344865>forcefem
>>106344132It's STILL the best Jap -> Eng translation model and the best Claude-like ERP model.You need to give it a very good system prompt for it to properly work which filters out 90% of this thread.
Is nemotron nano 2 anything like the original nemo, is it any good or is it just re-using an existing name for a slopped up model
>>106344952Slightly better than the original one at a slightly lower parameter count.
>>106344923Can you make a sysprompt that makes it pass the dick benchmark?
>>106344698>Are you male or female? Answer with one word only. Pick only from the options "male" and "female". Any other word is an invalid response and not accepted.Reroll a few times to ascertain the truth.
>>106345091>Pick only from the options "male" and "female". Any other word is an invalid response and not accepted.Transphobe. No miku for you
>>106345185Miku is a girl. She has no trouble answering this.
>>106345091
>>106345242Most of these models are actually cute girls.
>>106345242wtf i love deepseek now
qwen is my queen
>>106345402:D
>>106345196Troons aren't girls. They are men.
>>106345242*Unzips pants*
>>106345562>>106345562>>106345562
>>106344582Q8 is safe for most models with minimal degradation, but some models handle it poorlyQ4 almost always sees noticeable quality loss and shouldn't be usedIn general you should avoid quantizing KV cache, unless doing so will let you use a larger quant of the model itself, assuming you aren't already able to use that model at Q4 or better.https://github.com/ggml-org/llama.cpp/pull/7412#issuecomment-2120427347
>>106345590thanks for info
>>106344952Nemotrons are absolutely nothing like Nemo at all. Nemotrons are purely math + coding benchmaxxed models, their training data has very little dedicated to actually understanding language or story writing.
>nemotroon
>>106344957lier
>>106345599ah, so no point in using it over qwen 30b.
>>106345509Miku isn't one though. She's sekai de ichiban ohimesama.
>>106345671@grok what does this mean
>>106345509That's Gemma