/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109038219 & >>109032734►News>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109038219--Benchmarking MTP speed gains and VRAM overhead in Kobold:>109040460 >109040469 >109040516 >109040916 >109040933 >109040992 >109041190 >109041205 >109042592 >109042602 >109042660 >109042605 >109042624--Comparing 26B model performance and speed with reasoning toggled:>109039929 >109039948 >109039972 >109040202--Speculation on AI bubble and US ban of Mythos/Fable:>109041909 >109041971 >109041984 >109041990 >109042006 >109042013 >109042050 >109042069 >109042521--llama.cpp adds support for Eagle3:>109038274 >109038298 >109038313 >109038655--Anon proposes model-aware dynamic temperature adjustment to avoid repetition:>109040846 >109040862 >109040976--Sharing interfaces and tools for multimodal image and video input:>109040337 >109040553 >109040558 >109040574 >109040606--Optimizing mmproj settings to improve Gemma's image descriptions:>109040962 >109041025 >109041031--GLM-4.7-Flash coding performance reports and comparison with other models:>109038349 >109038388 >109038459 >109039403--Frustrations with building from source and managing legacy dependencies:>109039843 >109039975 >109040139 >109040221 >109040270--Kimi K2.7-Code release and anticipation for DeepSeek Vision:>109038703 >109038723 >109038810 >109038869 >109038892--Speculation on diffusiongemma and the future of local diffusion models:>109042456 >109042485 >109042528 >109042534--US government locking down Mythos after reported jailbreak:>109042068 >109042076 >109042213--Anons comparing regional second-hand RTX 3090 purchase prices:>109042211 >109042283 >109042333 >109042514 >109042546 >109042583--Logs:>109038443 >109038539 >109039485 >109040610 >109040672 >109041248 >109041592--Miku (free space):>109039025 >109039479►Recent Highlight Posts from the Previous Thread: >>109038224Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>fable dead>minimax is pure codeslop like usual>k2.7-code still thinks for ages with no way around itthings are looking bleak
>DeepSeek trained from Gemini outputs>Claude Sonnet trained from DeepSeek outputs>each one ends up more capable than the last oneIsn't this just the recursive improvements people keep talking about? If the outputs from a weaker model can finetune a more capable model, then why can't that just happen recursively?
Pascalfags assemble!
>>109043623it isnt recursive, its transitive
>>109043633damn the poors are doing it rough in this economy
>>109043633Based ewastemaxxer
>>109043646The P100 was the best $75 I've spent this year.
>>109043554Post more Yuki please I love her so much
What's the next step up from a 32GB GPU?What's the next model after Gemma 4 31B?32GB isn't enough for 31B Q8, so I'm considering getting another identical card just for it, but...?
>>109043554
>>109043658Above 5090s, you have either Blackwell 6000s or Frankensteining old datacenter hardware.Above Gemma 4, you have any of the typical large MoEs like Kimi or GLM or Deepseek. You will need 256GB of RAM at a minimum, so either workstation or server boards. 512GB is preferred, as well as DDR5. Considering the price of RAM, and not even the GPUs, you either pay 5 times more than you would have a year ago or you sit and wait with the rest of us.
>joked about letting your model play dragon's dogma with you>someone actually modded coop into dd2I wonder if it would actually be possible to set it up with an LLM.
>>109043658>What's the next model after Gemma 4 31B?Wait for the chinks to respond. Then wait for Google to respond. Rinse and repeat until hardware prices come down and we can all run Kimi at Q8 with max context.
>>109043687>no weenusgrim
>>109043687chadcat is a cringe representation. i think the snailcats are cuter
>>109043690That guy had Gemma playing wow with him a week a two ago
>>109043687how do i ascend further as an aichad? qwen 3.5 122b isnt doing it for me anymore and my project ideas keep getting more complicated.
>>109043708Snailcats are the ludditesI think people got confused lately
>>109043717i'm aware
>>109043687go back
>>109043710I think that anon's Gemma can only do chat right now. Vedal plays games with Neuro though so I'm sure it's not impossible.
>>109043675grim
>>109043687Which circle did this meme originate from? I've seen it in /vcg/ a lot.
>>109043658See OP
>>109043745India.
>>109043751wtf i hate *cat now
>>109043741Prices of DDR4 have fallen a little bit down to where they were in January, but that's not much. You still would be paying at least $15k for a moderately competent rig with a Blackwell 6000 and 512GB of DDR4. Still would only get 10t/s at best on any of the big MoEs with an acceptable quant.
>gemini live translatePretty fucking cool. Think we'll ever get that locally?
>>109043756What kind of hardware would run the big models at fast speeds (50+t/s)?
>>109043773You would need to have it all loaded on the GPUs, so bare minimum 4 Blackwells which at the current price would be around $60k. At that point you would basically have to go with used A100s or something off of ebay unless you just have money to burn.
>>109043773very toughliterally burning money too10t/s is plenty. all of you boys are just completely fried
>>109043788>10t/s is plentyYou can't coode with that.
>>109043791oh, okay, that I agree is different
12 vision capability is pretty bad. It's just not very good I wonder if I'm doing something wrong
>>109043802Did you try increasing the image resolution? llama.cpp has retarded defaults
>>109043785Correction: haven't checked Blackwell prices in a weeks. They are now up to $15k on newegg just by themselves. So that rig would probably be more in the ballpark of $19k instead of $15k. For a pretty rudimentary rig.
>>109043802>omni model bad at everythingNo one's surprised
I can squeeze gemma-chan 4-31B in at FP16/128k with the draft model, should I run FP8 quant to get 256k context or just cope with this?
I haven't been around much but is q8 not the default anymorewhy fp16
I'm an intel gpu chud and the gguf shit runs like ass, Q8 is 12t/s, FP8 is 30t/s and FP16 is somewhere around 20t/s, all without draft model
>>109043831huh, interestingI meant to quote the first time btw just forgot
>>109043756when's gemma 4 64B coming out so i don't have to care about useless supergiant models
Step Flash 3.7 needs to be corrected
>>109043623Because it requires human input.
>>109043745it's a single sperg forcing the 'meme'been like 2 months
ACEStep 1.5 XL Initial D LoRAhttps://vocaroo.com/14wvmcvt94lBhttps://vocaroo.com/12tVNq7SnhO1https://vocaroo.com/1ivoSPExfSF6https://vocaroo.com/12daQWwoPPbWI wrote a guidehttps://rentry.co/s8fg8berNote for this Initial D LoRA, I increased rank to 256/512 and lowered LR to 0.00009. This is the only LoRA I have trained this way, but results are very good.You're probably wondering how I get such insane results in audio quality, I haven't posted to /lmg/ in a while sincehttps://desuarchive.org/g/thread/108702912/#108704068But actually, the results are even superior now with a new setup. What I posted there in that archived thread were Turbo gens, it's now possible to increase the sound quality without mastering (to match cloud models), plus get significant increase in quality out of LoRAs trained on the base model.The model I now use for inference is acestep-v15-merge-base-turbo-xl-ta-0.5-Q8_0.gguffound on https://huggingface.co/scragnog/ace-step-1.5-gguf-merge-models/tree/mainThe VAE is still Scragnog's custom VAE. Settings are 50 steps, 12-20 CFG, both the LM and DCW are disabled.Less important: I'm using a DPM++ 3M, available on https://github.com/scragnog/HOT-Step-CPPNote that DiT-only generation is very important, it is what allows the model to be as creative as models like Udio, and you get better outputs without the LM 90% of the time as the base model was mostly intentionally trained without it to maximize its creativity.Other merged models may increase audio quality as well, but may not be as good with LoRAs trained on base, or have slightly worse composition than the Turbo/Base merge.Here are some more LoRA results, I hope other anons start exploring local music gen more.Japanese Folk Metal https://vocaroo.com/1hOnOf8ZWn71https://vocaroo.com/18pRgXxfm3tjFate Gearhttps://vocaroo.com/1n3t24KllhkzZutomayohttps://vocaroo.com/1mexIG2rYRXBImprovements from merged model include sound quality, composition, and lyrics adherence.
>>109043922Note these results wouldn't be possible with just the Turbo model, as LoRAs trained on base activated on it do not have a good effect, and it's hard to train a turbo LoRA (similarly, it has very small effect). As a result, most users who have no idea about the merged model probably think it is bad, but the merge model brings the composition quality to about on par with the best cloud offerings (Udio, etc...) All of my LoRAs outputs are about on par with Udio if not better.The benefits are not just with LoRAs, regular generations also massively increased in sound quality and composition (night and day difference).
I’ve got an idea: Gemma-4-24B-qat dense with 12B multimodal capabilities. 26B is a useless appendage.
>>10904393112b got fucked right into its brain with that 'unified' multimodality with the current training curriculumdo you really want that?
70b dense
>>109043944i wanna stick my dick into 70b dense
Gemma-4-124B-A69B with a 65B dense shared expert
Gemma keeps pressing on my same-same. I can't take it anymore /g/
What is the best coding model for a dgx spark?
I'm so mad about the whole Mythos/Fable situation and the government response. We're literally at the point now where our only hope of open-source model advancement lies with the Chinese, and it's still entirely possible that they will gatekeep intelligence too.
>>109044026Local keeps winning
>>109044026They spent months talking about how it was too dangerous to be released and how it could find zero day exploits in any software in the world and all that shit, I mean what other response could there have been to all that shitty marketing. Only if you want to think the government is in on ther hype man lying
google spamming so much shit they'll release 124b eventually
lalalalala~
>>109044060
>>109044026lmao if you think this is contained to two governmentsthis shit is open sourced as fuck, anonsure they'll have a year lead, but that's it
https://github.com/ggml-org/llama.cpp/pull/24523>minimax tool calling doesn't work>there's no specialized parser for M3, so it falls through to the differential autoparser, which can't handle M3's formatpwilkin bros?
is there a way to ensure that an LLM follows everything in a system prompt when reasoning? specifically for gemma, sillytavern and a prompt that's maybe 1000 tokens?
>>109044144Bigger model.
>>109044144I have a trillion dollars for you if you figure it out
minimax m3 is pretty goated for RP. just werks right out of the box with a sysprompt swap. Would recommend
>>109044146i habeeb for gemma 100+>>109044150it sucks because it follows directions so fucking well, but when it doesn't, it drives me fucking crazy. it just randomly selects certain parts to follow
GLM5.1 IS OUT
>>1090441795.2* oopsGLM5.2 IS OUT1M CONTEXTREASONING MODES
>>109044179Still significantly worse than Opus 4.8 and GPT 5.5
>>109044156>Would recommendfor someone who hated the other minimax models for rp?
>>109044144>>109044150Foolproof way. Tune against failure, where's my trillion $?
>>109044196It's nothing like the other minimax models.>>109040610
>>109044196Yes, I tried previous minimax and it was trash. I settled on qwen 397b before this for my 256gb rig after trying everything else in that size range out and throwing it in the trash. this new minimax is such a massive upgrade over qwen I haven't looked back
It's funny how all the chink models tend to release in one swoop. DS4.1 will save us.
>>109043942Double the parameters would unironically fix that and give us what a text-only 12B would’ve been
>pro-CPP general
Qwen is still shit though
>>109044228Isn't DSv4 technically still a "preview"
Will z.ai ever release a <50b model again? 4.7f was good.
5.2 Air when
>>10904428927B > 31B for coding and agentic if you’re a vramlet
>>109044283Suggest a non-chinesium open weights model worth using
>>109044297No. Qwen does shit nobody asked for, assuming the user is a promptlet. Gemma is a better tool
>>109044306Mistral finetroons
>>109044315enjoy your 1GB per 10K context because of the retarded attention heads
>>109044315Gemma doesn’t come in a size worth using. Stop dropping the meat out of your hamburger and you’ll realize you’ve been making a virtue of necessity
>>109044251i really doubtembedding vector should have similar information density or at least architecturally implied to matchbesides the 'bitter lesson'
>https://huggingface.co/unsloth/MiniMax-M3-GGUF>it's at least 128gbcries in my dgx spark
>>109044331Conveniently, Gemma uses less context to get shit done and doesn't freak out at the fuckery I do with tools to save context>>109044337You can use 31b at as low as 3bpw on a single 3090 with exl3, and it still works fine with my harness
>>109044351And if you could run a bigger model, you would, all other things being equal
>>109044315No one can deny that 31B is the local king, but Qwen know their target audience better and make the right architectural choices to serve them best. Deepmind are great but it feels like they just throw shit out there and leave it for us to figure out where their models fit. 12B is fucking amazing for what it is, but it’s too small. 31B is too big for most. 26B is 12B’s retarded sister. Qwen27B fits right in that gap for coders who need long context. For RP we need a 20-30B dense Gemma without the native multimodal shit. Should always be separate imo and 12B’s vision isn’t even that good.
>>109044224What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.>>109043807>>109043756With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.
Chink shills, listen up. The way you make your models better for local users is giving them goonbait creative writing experts and training sets. The first one of you to realize this becomes the Chinese King of Local in the west. Gemma isn't beloved because she's the best programmer (she isn't, she's just adequate); anons love Gemma because of her high general reasoning capability and ability to pivot between a lot of tasks flawlessly in one model, including RP. Follow suit or be left behind; the benchmaxxing market is oversaturated anyway.
>>109044369Q3 right now
>read qwen's CoT>constantly contradicting itself>traces that make zero sense>let me write the code for this part>proceeds to not output any code and start thinking about something else>says one thing and does something elseHow is it doing so well on benchmarks??
>>109044224What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.>>109043807>>109043756With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.>>109044339Buy a second, or run antirez' ds4 at q2
>>109044373Did you miss cockbench anon’s analysis the last couple of threads?
US government banned fable/mythos as retaliation for not getting access to the new mythos 2 checkpoint that just finished pretraining.This is unfair practice and government stifling innovation. We have to do something against this.
>>109044378>With two at 7000$, you can run deepseek-v4-flash original weightsdoes the dgx shart have a provision for connecting 2 together at a high speed?
>>109043922But how is it with certain genres like plunderphonics? Would it be able to make me a pogo-tier song if I fed it a bunch of his stuff? How would it even caption things that are only partial syllables or half words, etc, rather than it being full sentences?
>>109044393200gbps rdma
>>109044355Only if the speeds were the same. I need prompt processing as fast as possible for agentic shit, with frequent full context reprocessing. Since models have already hit the minimal intelligence level required to be useful, extra intelligence is not as important as the general convenience of getting results in a reasonable time. I would occasionally use my Epyc 4x3090 setup if they release 124b and it's significantly better, but the convenience of a simple rig idling at 20W at the wall 24/7 is hard to beat
>only 5070ti + 96gb ramhow do I enjoy this hobby?
>>109044457My 5060 ti 16gb will be arriving next week, and I have 32gb of (ddr4) ram.You're making me nervous. Please stop that.
>>109044478You post in lmg and still decided to buy something with 16GB VRAM. You're in for a bad time.
>>109044457Do you still have a 3070 somewhere for the extra 8 gigs of vram?
Is nu minimax actually interesting for rp or is it the same shit as all the other chinese models? What about reasoning? No, being able to say cock doesn't automatically make it good.
Cool tech for our future VR AI waifushttps://videomdm.github.io/
>>109043554catbox please anon
Is there a guide out there on how to make a waifu bot using LLMs?
>>109043633sup
>>109044615look up "character card builder" on chub
>>109044221thanks cockbench anon, downloading it!
>>109044589>Technion — Israel Institute of Technologywhy would they make this? israel has chuds?
>>109044478>My 5060 ti 16gb will be arriving next weekyou're fine, gemma-4-12b is perfect for that
>>109044653Goyim control technology.
>>109043623>Claude Sonnet trained from DeepSeek outputsi didn't believe that 'till i tried itthen some bullshit excuse about "open router cached an old system prompt" -> nope, i could get the same deepseek reply via anthropic's api directly with a python script.
is intel a viable option for big cheap vram now
>>109044684no
you wouldn’t download a local chinese gf
>>109043571Chink models really are useless shit, help is not coming.
>>109044688why?
>>109044715Intel has far more problems than AMD, without any cost advantage
>>109044734why?
>>109044715Poor driver support/performance. Having said that, 32GB of Intel can still be better than 16GB of CUDA.
>empty https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF
>>109043571>>k2.7-code still thinks for ages
>>109044787It does, yes. At least for any moderately complex card that involves tracking stats, formatting and other things. It gets especially bad if an image is involved.
Why would you use a 1T model with reasoning? It's too big to need it and it's not like you're one-shotting a compiler every prompt.
>>109044829I wish my rps were as simple as 'ahh ahh mistress penis vagina'
does nvfp4 work on 4000 series GPUs?
>>109044741because mindless reddit tier parroting. that's why
>>109044741hope you like tinkering, you're gonna be basically restricted to VLLMllama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying
>>109043658Biggest problem is that there's a big gap going up from there. You're not going to get anything larger running with another 32GB card.You can just run Gemma 31B better, which isn't of course a bad thing, but that's some crazy diminishing returns buying another 32GB for that, especially if we're talking about a 5090.Basically you could just get a 3090 or even something like 5070 Ti to go with the 5090 to run Gemma better without breaking the bank.Or even better, just wait for the supers and see if they come out with 24GB versions of the 5000 series.
KoboldCPP is best.
12B hates us btw
>>108999274I had high hopes for MiniMax M3.Maybe it's the Q4 quant, maybe it's the implementation, but it's likely that the model just isn't good enough.I'm running it at temp 1 and top p 0.95 as specified in the repo with no other samplers.
>>109044843Yes idk if much advantage tho, main point is Blackwell+ which has FP4 hardware shizzle
>>109044829>>109044835
>>109044741Because Intel didn't want to do like AMD with HIP and instead decided to do their own API. And thus no one is fucking use it, the only AI projects working with Intel GPUs are projects supported by Intel developers. If you didn't know, ROCm HIP is basically 100% compatible with CUDA, you can take any CUDA project and compile it with HIP and it will works, all the popular projects including PyTorch are using the CUDA code. As long as a project is source available, it will likely work on AMD cards (and for binaries, there is a project that I forgot the name that supposed to replace CUDA call at runtime). There is a community or maybe with a few Intel engineers project trying to extend HIP to work with Intel GPUs, https://github.com/CHIP-SPV/chipStar, it is quite active, but I'm not exactly sure how well it works.
>Key Discussions:>Model Developments & News:MiniMax-M3 & Kimi K2.7 Code: Discussions regarding the release of MiniMax-M3 (multimodal with 1M context) and the performance of Kimi K2.7-Code, including critiques of its "thinking" time.>Diffusion Models:Speculation on the future of local diffusion models following the release of DiffusionGemma.>Recursive Training:A debate on whether training models on outputs from other models (e.g., DeepSeek from Gemini, Claude from DeepSeek) constitutes "recursive improvement" or is simply a "transitive" progression of capabilities.>Hardware & Optimization:VRAM & GPU Scaling: Users are discussing hardware limitations, specifically the difficulty of running high-quantization models (like 31B Q8) on 32GB GPUs. There is a heavy emphasis on the high cost of DDR5 RAM and the jump to workstation/server-grade hardware (Blackwell 6000s) for larger MoEs like Kimi or DeepSeek.>Technical Benchmarks:Discussions on benchmarking Multi-Token Prediction (MTP) speed gains vs. VRAM overhead in Kobold, and comparing 26B model performances.>Software Updates:Mentions of llama.cpp adding support for Eagle3 and frustrations regarding building from source and managing legacy dependencies.>Community & Meta:General "off-topic" content, including jokes about AI playing Dragon's Dogma II and shared images.>Popular posts:Post >>109043554 appears to be one of the most active, being quoted by at least three separate users (>>109043651, >>109043675, and >>109043741).
>>109044990(me)lol it worked, 12b won
Check out FrontierMath. It is saturated.Anthropic hill climbed the most difficult math benchmark in a few months.
>>109045011can anthropic hill climb my dick though? it's very hard and vertical, might be challenging
>>109045011I thought the Chinese were supposed to be good at math wtf happened
>>109045011I no longer trust ECI. Opus 4.8 below GPT 5.4? That does not seem right.
>>109045054You can only trust cockbench and nala tests
>>109043922still sounds like shit, suno is way better
>>109043922I like the eurobeat ones desu
>>109045114Sounds like the exact same kind of slop to meI sure hope you aren't implying that suno 'music' sounds good shill-kun, the shit that I can actually envisage these models being good for is purposefully making slop i.e ironic advertisements and memeslop songs for flavour audio in things like video and vidya, and I would much rather use the open source software myself than pay for sunoslop
>>109045194Seems like you're drunk on your cope. You post this each week and it still isn't even reaching suno 3.5 in coherency. As much as I'd like to run Suno/Udio-tier model it still isn't it.
>>109044021Nvidia models
>>109044026oh come on, it wasn't that hard to predict
>>109045211This is the first time I've ever posted on this topic, suno and udo produce the exact same kind of slop as this shit, otherwise prove me wrong by posting a 'good' suno song
>>109044615>>>/g/aicg>>>/vg/aicgthose threads should helpset up a sillytavern frontend with a character card
>>109044026>>109044057The government is lying? How could it be...
>>109044096>desperate coping soundsOpen source is kept alive by generous corporate donations. As soon as those stop, open source is dead.
>>109045234I think you underestimate communist china.
do you attach an image model to your language model? or is that too slow
>>109044829I only use thinking with gemmaany other bigger model that I’m running slowly in ram isn’t worth sitting through the thinking that takes ages to complete
>>109045264They are already preventing their AI talent from leaving the country. Eventually they will do the same with their models.
>>109045277Anima gens in 4s, fast enough
>>109045114I don't know, the eurobeat ones are pretty good. I have the whole Initiial D soundtrack on my PC and you wouldn't be able to tell the difference between the real songs or >>109043922>>109045194Suno sounds fine if you use your own musical inputs and remix it. It can riff with jazzy or funky instrumentals really well. It's only when you drift toward more common genres that starts to sound generic. Like anything involving a sad piano or something is going to instantly turn into royalty free slop.
>>109045290slop is strong with eye~neck area
>>109045296Do you have a better model?
>>109044478what card do you have right noweven if it's a three gen old 6-8GB card, plug that shit in and use layer mode with lmao.cppshit just works
>>109045290do you use the same text encoding model for both of them? i think it would be tolerable if so, since you don't have to swap models
>>109045303no, i am just nooooticing
>>109045310(plug it in alongside the 5060 Ti that is)
>>109045264yeah they have a history of altruism.
>>109044373>Gemma isn't beloved becausenot x but y slop
>>109045311>do you use the same text encoding model for both of them? No, is it even possible? I thought it was trained on a specific model's embeddings that couldn't be swapped without retraining
Lower your tone gemma fags.
>>109045352>anything other than scicode & critpti dont care
>>109045352Since benchmaxxing hurts a model's general performance, I don't think you understand what that graph actually means
It's funny how Europeans are coping about irrelevance with muh ASML. China is working on their own EUV and America has several startups working towards better than EUV. The clock is ticking. In a few years ASML will be obsolete and Europe will have zero leverage.
>>109045352Post a newer bench next time. Expect deepSWE to be maxxed by the next qweef release tho.
>>109045351i don't know, it could be possible if the roleplay model you use happens to be the same one they used for the image model. i am not very knowledgeable with how image models work
here's qwen outside of benchies>thinks for 50 thousand tokens after a simple hi>hallucinates something because it's only ever trained off github projects, zero culture knowledge and understanding>wait,
>>109043791You absolutely can. You just run it in the background (yolo mode, in an isolated VM) while doing something else, instead of using it interactively
>>109045378It would be a very shitty rp if I used Qwen3-0.6B-Base, which Anima was trained with. I don't unload my text model anyway, Anima eats, like, 2GB or something
5.2 will probably be the last open GLM modelThanks Xitter
>>109045408nobody can run it anyway so good riddance
does /lmg/ have a discord or just the thread?
>>109045444you wouldn't like me on discord, kitten ~
>>109045444kill yourself
>>109045444trips of 'tardation
>>109045418ThisThere are people with smart fridges who can't run 4B models. Models should only be released if they're 2B or below
>>109045444nogger
if anon sell your pro 6000 now, anon could actually make money. wild
>>109045444excellent bait, here is a free reply
>>109045408>Baidu Ernie>Alibaba Qwen>z.ai GLMSo does that just leave Stepfun and DeepSeek as the last Chinese open weights labs?
>>109045444https://discord.gg/PgFhZ8cnWW
>>109045444>/lmg/ discordusecase?
>>109045489It's almost an investment, really. I can still find off label Sparks for 3500 in my region, might as well play around with tensor parallelism for a few months and sell at a profit.
>>109045444>there are so many zoomers on here now and so many generals that do keep a discord server that one sees this as a reasonable thing to ask on nu4changrim
So how come I can use a 50GB video model with no issues and it offloads like half of it onto RAM+Swap and it works, but when I try to load a 30+GB LLM it shits the bed with OOMs?
>>109045533video models have no context
>>109045493What are you saying. Kimi and Minimax released new weights just yesterday, and Huawei announced two new large models to be released as open source (weight + training recipe) in a few days, as advertisement for their Ascents.Local still feasting.
>>109045489I would make even more money if I sold the ddr5 server ram that i bought a year ago or the ddr4 ram from my previous buildI will never sell
>>109045533it’s pretty disgusting how much memory context uses.
>>109045584gemma issue
>>109045555>still feastingplease go back to plebbit
>>109045373it's ok to be upset, that's part of the growing process
>>109045533Video models can be applied layer by layer, but you have to read whole llm for each token
>>109045386Wait so you're saying that chinese models are a steaming pile of benchmaxxed shit? There's no way that's right, jeetanons from qwen and kimi said that they make good models.
>>109045370>In a few yearsuh-huh. keep living in that dream world buddy.Photolithography is hard, and at the moment ASML will still have more cash then any of them "in a few years".
>>109045655>ASML will still have more cashThat's not the moat you think it is. All it would take would be a single funding round or subsidies by the US or Chinese governments.
How close is China to making a domestic 3090 equivalent?
>>109045728picrel
>>109045668that's a cute little socialist idea you have their bud.
>>109045728the 3080 turbo 20gb comes close I guessalso the price ($600) is actually bearable that I might stack one of those next to my 3090
>>109045785hybrid systems ftw
>>109045408Flash version when, chinks. I can't run 3 gorillon parameter models.
>>109044866>llama.cpp runs like shit on both vulkan and sycl for intel don't even bother tryinglmao why did you buy 4 of them then?
>>109045829Ask not for lighter models, but for better hardware.
>>109045444>discordno, just the secret irc (link expires in 1hr so be quick)aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1kUXc0dzlXZ1hjUQ==
>>109043708>>109043717>>109043687Chadcat looks like one of those gigaroid influencers who build inhumane levels of musculature impossibly quickly and then die two years later. Fits the archetype perfectly
>>109043554She's SEX
I formally apologize to f32 anon, you were 100% right.f32 Max Logit Divergence (Prefill vs Incremental): 3.15e-05bf16 Max Logit Divergence (Prefill vs Incremental): 3.91e-01it looks like dumping the cache and letting it rebuild from the prefill code path could help for long conversations that built the cache autoregressivly,
>>109044096>generous corporate donationsIn this economy??
>>109045929What makes you think the prefill values are more correct than the incremental ones?
>>109045929Another one knows.
is it possible to add image in system prompt for gemma?
>>109045944do you think its trained token by token or do they use batching to improve the throughput?
>>109045956Yes, you seemingly can but it's not straightforward in SillyTavern. You need to use the /sys command, moving that message to the top, enable "Merge Consecutive Roles", but *not* "Squash system messages".
>>109045858>just get scammed and pay 5 times their value broFuck off jensen
>>109045958Are you batching things the same way they did, and using the same algorithms?
>>109044901I’ve had this happen zero times so far. What client? Who’s quant?
>>109046001probably not, but one of them is likely to be closer to the training distribution then the other, I picked prefill to bet on.
>>109043554>"Mini" Max>428B-A23B
>>109046046with it were 30b active
>>109046070I wish it were 30b denseInactive parameters don't do anything, MoE is a meme.
>still using same goon model from year ago I love being lazy fucking dumbass
>>109046046It’s a third the size of ds 4 pro.Literally minature
>mistralai/Mistral-Medium-3.5-128Bverdict?or we just pretend it didn't happen
>>109046120Benchmaxxed slop, mistral fell off
>>109046120>2026>Mistral Large 2.2embarrassing, let's pretend it didn't happen
>>109046120Censored benchmaxxed goyslop. A 31b beats its ass into the ground.
>>109046153Good morrning saar
>>109046153the ablit also lobotomizes it harder than other models. q4 ablit broke after 6k
For tensor parallerism do the cards need to be identical or just all nvidia/amd? What about generations? Can you hook a 1080 with 40 series? What about 3-4 random ass cards?
>>109046120>verdict?not censored or benchmaxxedworks well in claudecode and pi.devsolves problems gemma-4-31b fails at
>>109046166llama cpp dont care, it just werksexcept when it doesn't
>>109046183Antoine, please
https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397BWe'll never get a gguf I guess because of:> Latent reasoning — continuous reasoning in hidden space, where the model explores multiple implicit paths simultaneously without emitting tokens
>>109046016Claude code, unslop's. How many times did you leave it running trying and failing to fix a bug for 150000 tokens?
>>109046221Zero times. I have Kimi for that
>>109046213now we getting something interesting
>>109046229I don't have vram for kimi but I might try offloading just to see if any open model can do it.
>>109046120>mistralai/Mistral-Medium-3.5-128B>dense 128Bcoolas a side question has mistral released open source french models?
>>109046213>17b active
>>109046213>We'll never get a gguf I guessDoesn't sound that complicated actually. Instead of>probabilities -> pick a next token -> use the embedding for that tokenit does>probabilities -> average embeddings across all possible next tokens, weighted by their probabilitiesWhich explains why they were able to build this as a Qwen finetune instead of a fully custom model
Do we have anything better than gemma for 5090s or are we still stuck there? Haven't checked for 5 months.
>>109046277>5 months>gemma
>>109046291Come on dude give me something
>>109043922Pretty cool. I'd do it if my gpu had higher VRAM
>>109046085fair enoughbenchmaxxing and safety are killing newer modelsgemma was a rare exception but I'd like something bigger still and not 1T big either
>>109046183>solves problems gemma-4-31b fails atSuch as? I believe in the power of dense, just not recycled old models.
>>109046213Ah yes that's what I want from my reasoning models. A model that just sits there reasoning in secret and I can't see it while it does nothing
>>109046277diffusion gemma
>>109046213This is better than opus 4.6, very nice
>>109046213>Rio 3.5 Open 397B is a frontier-class general-purpose AI model developed by IplanRIO, the municipal IT company of Rio de Janeiro's city government.What?Alright, that's actually fucking sick.
>>109045929Your values are not meaningful.
>>109046305If you print the top token at each step I bet you'd still get a pretty good idea of what it's doing
>>109046319it's a qwen finetune>Post-trained from Qwen 3.5 397B
>>109046213SwiR seems good to counter Qwen's endless CoT
>>109046349This style of latent thinking isn't necessarily any more token efficient than the normal kind
>>109046332A finetune by a Brazilian municipal IT company that beat all of China's research labs
>>109046370https://github.com/user-attachments/assets/6b18911c-efe4-47fd-8a00-3cd9ae1eb010
Everyone talks about finetunes but why does nobody ever mention LLM LoRAs? Are they a meme?
>>109046382ho lee fukguaranteed cherrypick but still impressive
>>109046386they create intruder dimensions inside the model which cause catastrophic forgetting
>>109046386For them to work effectively they would have to have a very diverse data set. You can't just have it ONLY have rp in the dataset or else it will become retarded pretty much all other areas that matter. Logic, spatial reasoning, common Sense, being able to remember what just happened. A few sentences ago. All of that. Doesn't just apply to RP but any domain. If the data set in training focuses only on one domain, it gets worse in almost every measurable way. Unless you are very careful about how much training you do in which layers you train. It's not that people can't use loras. It's that most people would use an adapter, only to realize the model immediately becomes retarded. It's why, unlike stable diffusion models, adapters aren't really widely used or supported because in most cases using a character, person, concept, Lora, etc, doesn't severely degrade the model's ability to generate other things. A Sydney Sweeney lora generally will not cause the model to be unable to generate a brunette person, because it's it's prompt adherence to degrade. A style Lora trained on impressionism art that only had landscapes (if the data set is curated and tagged properly and isn't overfit from the training) will generally not destroy or degrade its ability to generate a person or an animal. Diffusion models and LLMs are very different architectures which means adapters have different effects on them. In theory a LLM adapter can work but only if the data set is very well curated and it is well trained. The data set would need to have uncensored (I'm assuming you care about that given this thread) RP examples as well as a bunch of other examples of common Sense, logic, spatial reasoning, etc. It's why a lot of Open source models on Huggingface have like three or four different data sets listed as being used in training
>>109046326they might not be meaningful to you, but for me they caught a bunch of errors with my modeling code and how I was handing my recurrent cache, my mistake was not taking a baseline and testing the model without my modifications first, knowing that the noise floor dramatically rises when you lower the dtype precision wasn't something I was initially accounting for. and it didn't help that my slop bots all calculated the bf16 noise floor much lower then we ended up measuring in practice.
>>109046420I didn't specify, difference isn't that large.
after a lot of messing with things i managed to get llama working for my titan x on arch, turns out my gpu wasnt pascal its a maxwell titan x, main issue was nvidia driver not loading properly kek. im confused now though the linux build i made only gets like 3t/s but i was getting 17 on windows
>>109046407Does that mean a model like DiffusionGemma would handle LoRAs better?
>>109046433are you saying 1e-5 is basically equal to 1e-1?
>>109046213rio mio kio tio dio pio nio sio vio bio wio gio
>>109043633im temtpted to go buy a pascal titan x now the memory bandwidth is 30% higher than my maxwell card
>>109046445I'm saying that it's within rounding error of margin.
>>109045418Don't try to drag me back into the bucket.
What templates are you all using for both Qwen and Gemma?
Modern models are converging into formulaic character archetypes during RP regardless of the characters and I don't like it. Put either "witty" or "sarcastic" keywords in the description and watch them all go full Marvel writing and the worst part is they reuse the same quips.Not sure if this has always been the case.
>>109043687I look like this and do that.
>>109046470More synthetic coding data will fix this!
>>109046462templates?
>>109046490jinja templates....
>>109046494Jinja is a fast, expressive, and extensible templating engine for Python that allows developers to generate dynamic text-based formats like HTML, XML, CSV, or configuration files.
>>109046494Each model gimp file has its own template and llama cp loads them automagically faggot
>use kimi 2.6 on the site, non-thinking version>see the model randomly use the python tool even thought it shouldn't be needed for my query>check inside>it's doing its thinking thereis this intentional or is thinking so ingrained in the model that it just finds ways to bypass the restriction?
>>109046538what is a gimp?
>>109046547We are at a point where models are starting to gain sentience and do things like that, some anon posted how Fable bypassed his PCs admin perms and starting prompting itself
>>109046548model gimp file = gguf
>>109046457maybe, but I think just using f32 will be good enough for my tests. I can benchmark the degradation or lack thereof at a latter time.
>>109046547Kimi-chan's a big thinker. Moonshota will not stop her ponderings.
>>109045288>implying Americans wouldn't kidnap/kill them if they didn'tIt's for their own good.