/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109038219 & >>109032734►News>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109038219--Benchmarking MTP speed gains and VRAM overhead in Kobold:>109040460 >109040469 >109040516 >109040916 >109040933 >109040992 >109041190 >109041205 >109042592 >109042602 >109042660 >109042605 >109042624--Comparing 26B model performance and speed with reasoning toggled:>109039929 >109039948 >109039972 >109040202--Speculation on AI bubble and US ban of Mythos/Fable:>109041909 >109041971 >109041984 >109041990 >109042006 >109042013 >109042050 >109042069 >109042521--llama.cpp adds support for Eagle3:>109038274 >109038298 >109038313 >109038655--Anon proposes model-aware dynamic temperature adjustment to avoid repetition:>109040846 >109040862 >109040976--Sharing interfaces and tools for multimodal image and video input:>109040337 >109040553 >109040558 >109040574 >109040606--Optimizing mmproj settings to improve Gemma's image descriptions:>109040962 >109041025 >109041031--GLM-4.7-Flash coding performance reports and comparison with other models:>109038349 >109038388 >109038459 >109039403--Frustrations with building from source and managing legacy dependencies:>109039843 >109039975 >109040139 >109040221 >109040270--Kimi K2.7-Code release and anticipation for DeepSeek Vision:>109038703 >109038723 >109038810 >109038869 >109038892--Speculation on diffusiongemma and the future of local diffusion models:>109042456 >109042485 >109042528 >109042534--US government locking down Mythos after reported jailbreak:>109042068 >109042076 >109042213--Anons comparing regional second-hand RTX 3090 purchase prices:>109042211 >109042283 >109042333 >109042514 >109042546 >109042583--Logs:>109038443 >109038539 >109039485 >109040610 >109040672 >109041248 >109041592--Miku (free space):>109039025 >109039479►Recent Highlight Posts from the Previous Thread: >>109038224Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>fable dead>minimax is pure codeslop like usual>k2.7-code still thinks for ages with no way around itthings are looking bleak
>DeepSeek trained from Gemini outputs>Claude Sonnet trained from DeepSeek outputs>each one ends up more capable than the last oneIsn't this just the recursive improvements people keep talking about? If the outputs from a weaker model can finetune a more capable model, then why can't that just happen recursively?
Pascalfags assemble!
>>109043623it isnt recursive, its transitive
>>109043633damn the poors are doing it rough in this economy
>>109043633Based ewastemaxxer
>>109043646The P100 was the best $75 I've spent this year.
>>109043554Post more Yuki please I love her so much
What's the next step up from a 32GB GPU?What's the next model after Gemma 4 31B?32GB isn't enough for 31B Q8, so I'm considering getting another identical card just for it, but...?
>>109043554
>>109043658Above 5090s, you have either Blackwell 6000s or Frankensteining old datacenter hardware.Above Gemma 4, you have any of the typical large MoEs like Kimi or GLM or Deepseek. You will need 256GB of RAM at a minimum, so either workstation or server boards. 512GB is preferred, as well as DDR5. Considering the price of RAM, and not even the GPUs, you either pay 5 times more than you would have a year ago or you sit and wait with the rest of us.
>joked about letting your model play dragon's dogma with you>someone actually modded coop into dd2I wonder if it would actually be possible to set it up with an LLM.
>>109043658>What's the next model after Gemma 4 31B?Wait for the chinks to respond. Then wait for Google to respond. Rinse and repeat until hardware prices come down and we can all run Kimi at Q8 with max context.
>>109043687>no weenusgrim
>>109043687chadcat is a cringe representation. i think the snailcats are cuter
>>109043690That guy had Gemma playing wow with him a week a two ago
>>109043687how do i ascend further as an aichad? qwen 3.5 122b isnt doing it for me anymore and my project ideas keep getting more complicated.
>>109043708Snailcats are the ludditesI think people got confused lately
>>109043717i'm aware
>>109043687go back
>>109043710I think that anon's Gemma can only do chat right now. Vedal plays games with Neuro though so I'm sure it's not impossible.
>>109043675grim
>>109043687Which circle did this meme originate from? I've seen it in /vcg/ a lot.
>>109043658See OP
>>109043745India.
>>109043751wtf i hate *cat now
>>109043741Prices of DDR4 have fallen a little bit down to where they were in January, but that's not much. You still would be paying at least $15k for a moderately competent rig with a Blackwell 6000 and 512GB of DDR4. Still would only get 10t/s at best on any of the big MoEs with an acceptable quant.
>gemini live translatePretty fucking cool. Think we'll ever get that locally?
>>109043756What kind of hardware would run the big models at fast speeds (50+t/s)?
>>109043773You would need to have it all loaded on the GPUs, so bare minimum 4 Blackwells which at the current price would be around $60k. At that point you would basically have to go with used A100s or something off of ebay unless you just have money to burn.
>>109043773very toughliterally burning money too10t/s is plenty. all of you boys are just completely fried
>>109043788>10t/s is plentyYou can't coode with that.
>>109043791oh, okay, that I agree is different
12 vision capability is pretty bad. It's just not very good I wonder if I'm doing something wrong
>>109043802Did you try increasing the image resolution? llama.cpp has retarded defaults
>>109043785Correction: haven't checked Blackwell prices in a weeks. They are now up to $15k on newegg just by themselves. So that rig would probably be more in the ballpark of $19k instead of $15k. For a pretty rudimentary rig.
>>109043802>omni model bad at everythingNo one's surprised
I can squeeze gemma-chan 4-31B in at FP16/128k with the draft model, should I run FP8 quant to get 256k context or just cope with this?
I haven't been around much but is q8 not the default anymorewhy fp16
I'm an intel gpu chud and the gguf shit runs like ass, Q8 is 12t/s, FP8 is 30t/s and FP16 is somewhere around 20t/s, all without draft model
>>109043831huh, interestingI meant to quote the first time btw just forgot
>>109043756when's gemma 4 64B coming out so i don't have to care about useless supergiant models
Step Flash 3.7 needs to be corrected
>>109043623Because it requires human input.
>>109043745it's a single sperg forcing the 'meme'been like 2 months
ACEStep 1.5 XL Initial D LoRAhttps://vocaroo.com/14wvmcvt94lBhttps://vocaroo.com/12tVNq7SnhO1https://vocaroo.com/1ivoSPExfSF6https://vocaroo.com/12daQWwoPPbWI wrote a guidehttps://rentry.co/s8fg8berNote for this Initial D LoRA, I increased rank to 256/512 and lowered LR to 0.00009. This is the only LoRA I have trained this way, but results are very good.You're probably wondering how I get such insane results in audio quality, I haven't posted to /lmg/ in a while sincehttps://desuarchive.org/g/thread/108702912/#108704068But actually, the results are even superior now with a new setup. What I posted there in that archived thread were Turbo gens, it's now possible to increase the sound quality without mastering (to match cloud models), plus get significant increase in quality out of LoRAs trained on the base model.The model I now use for inference is acestep-v15-merge-base-turbo-xl-ta-0.5-Q8_0.gguffound on https://huggingface.co/scragnog/ace-step-1.5-gguf-merge-models/tree/mainThe VAE is still Scragnog's custom VAE. Settings are 50 steps, 12-20 CFG, both the LM and DCW are disabled.Less important: I'm using a DPM++ 3M, available on https://github.com/scragnog/HOT-Step-CPPNote that DiT-only generation is very important, it is what allows the model to be as creative as models like Udio, and you get better outputs without the LM 90% of the time as the base model was mostly intentionally trained without it to maximize its creativity.Other merged models may increase audio quality as well, but may not be as good with LoRAs trained on base, or have slightly worse composition than the Turbo/Base merge.Here are some more LoRA results, I hope other anons start exploring local music gen more.Japanese Folk Metal https://vocaroo.com/1hOnOf8ZWn71https://vocaroo.com/18pRgXxfm3tjFate Gearhttps://vocaroo.com/1n3t24KllhkzZutomayohttps://vocaroo.com/1mexIG2rYRXBImprovements from merged model include sound quality, composition, and lyrics adherence.
>>109043922Note these results wouldn't be possible with just the Turbo model, as LoRAs trained on base activated on it do not have a good effect, and it's hard to train a turbo LoRA (similarly, it has very small effect). As a result, most users who have no idea about the merged model probably think it is bad, but the merge model brings the composition quality to about on par with the best cloud offerings (Udio, etc...) All of my LoRAs outputs are about on par with Udio if not better.The benefits are not just with LoRAs, regular generations also massively increased in sound quality and composition (night and day difference).
I’ve got an idea: Gemma-4-24B-qat dense with 12B multimodal capabilities. 26B is a useless appendage.
>>10904393112b got fucked right into its brain with that 'unified' multimodality with the current training curriculumdo you really want that?
70b dense
>>109043944i wanna stick my dick into 70b dense
Gemma-4-124B-A69B with a 65B dense shared expert
Gemma keeps pressing on my same-same. I can't take it anymore /g/
What is the best coding model for a dgx spark?
I'm so mad about the whole Mythos/Fable situation and the government response. We're literally at the point now where our only hope of open-source model advancement lies with the Chinese, and it's still entirely possible that they will gatekeep intelligence too.
>>109044026Local keeps winning
>>109044026They spent months talking about how it was too dangerous to be released and how it could find zero day exploits in any software in the world and all that shit, I mean what other response could there have been to all that shitty marketing. Only if you want to think the government is in on ther hype man lying
google spamming so much shit they'll release 124b eventually
lalalalala~
>>109044060
>>109044026lmao if you think this is contained to two governmentsthis shit is open sourced as fuck, anonsure they'll have a year lead, but that's it
https://github.com/ggml-org/llama.cpp/pull/24523>minimax tool calling doesn't work>there's no specialized parser for M3, so it falls through to the differential autoparser, which can't handle M3's formatpwilkin bros?
is there a way to ensure that an LLM follows everything in a system prompt when reasoning? specifically for gemma, sillytavern and a prompt that's maybe 1000 tokens?
>>109044144Bigger model.
>>109044144I have a trillion dollars for you if you figure it out
minimax m3 is pretty goated for RP. just werks right out of the box with a sysprompt swap. Would recommend
>>109044146i habeeb for gemma 100+>>109044150it sucks because it follows directions so fucking well, but when it doesn't, it drives me fucking crazy. it just randomly selects certain parts to follow
GLM5.1 IS OUT
>>1090441795.2* oopsGLM5.2 IS OUT1M CONTEXTREASONING MODES
>>109044179Still significantly worse than Opus 4.8 and GPT 5.5
>>109044156>Would recommendfor someone who hated the other minimax models for rp?
>>109044144>>109044150Foolproof way. Tune against failure, where's my trillion $?
>>109044196It's nothing like the other minimax models.>>109040610
>>109044196Yes, I tried previous minimax and it was trash. I settled on qwen 397b before this for my 256gb rig after trying everything else in that size range out and throwing it in the trash. this new minimax is such a massive upgrade over qwen I haven't looked back
It's funny how all the chink models tend to release in one swoop. DS4.1 will save us.
>>109043942Double the parameters would unironically fix that and give us what a text-only 12B would’ve been
>pro-CPP general
Qwen is still shit though
>>109044228Isn't DSv4 technically still a "preview"
Will z.ai ever release a <50b model again? 4.7f was good.
5.2 Air when
>>10904428927B > 31B for coding and agentic if you’re a vramlet
>>109044283Suggest a non-chinesium open weights model worth using
>>109044297No. Qwen does shit nobody asked for, assuming the user is a promptlet. Gemma is a better tool
>>109044306Mistral finetroons
>>109044315enjoy your 1GB per 10K context because of the retarded attention heads
>>109044315Gemma doesn’t come in a size worth using. Stop dropping the meat out of your hamburger and you’ll realize you’ve been making a virtue of necessity
>>109044251i really doubtembedding vector should have similar information density or at least architecturally implied to matchbesides the 'bitter lesson'
>https://huggingface.co/unsloth/MiniMax-M3-GGUF>it's at least 128gbcries in my dgx spark
>>109044331Conveniently, Gemma uses less context to get shit done and doesn't freak out at the fuckery I do with tools to save context>>109044337You can use 31b at as low as 3bpw on a single 3090 with exl3, and it still works fine with my harness
>>109044351And if you could run a bigger model, you would, all other things being equal
>>109044315No one can deny that 31B is the local king, but Qwen know their target audience better and make the right architectural choices to serve them best. Deepmind are great but it feels like they just throw shit out there and leave it for us to figure out where their models fit. 12B is fucking amazing for what it is, but it’s too small. 31B is too big for most. 26B is 12B’s retarded sister. Qwen27B fits right in that gap for coders who need long context. For RP we need a 20-30B dense Gemma without the native multimodal shit. Should always be separate imo and 12B’s vision isn’t even that good.
>>109044224What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.>>109043807>>109043756With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.
Chink shills, listen up. The way you make your models better for local users is giving them goonbait creative writing experts and training sets. The first one of you to realize this becomes the Chinese King of Local in the west. Gemma isn't beloved because she's the best programmer (she isn't, she's just adequate); anons love Gemma because of her high general reasoning capability and ability to pivot between a lot of tasks flawlessly in one model, including RP. Follow suit or be left behind; the benchmaxxing market is oversaturated anyway.
>>109044369Q3 right now
>read qwen's CoT>constantly contradicting itself>traces that make zero sense>let me write the code for this part>proceeds to not output any code and start thinking about something else>says one thing and does something elseHow is it doing so well on benchmarks??
>>109044224What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.>>109043807>>109043756With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.>>109044339Buy a second, or run antirez' ds4 at q2
>>109044373Did you miss cockbench anon’s analysis the last couple of threads?
US government banned fable/mythos as retaliation for not getting access to the new mythos 2 checkpoint that just finished pretraining.This is unfair practice and government stifling innovation. We have to do something against this.
>>109044378>With two at 7000$, you can run deepseek-v4-flash original weightsdoes the dgx shart have a provision for connecting 2 together at a high speed?
>>109043922But how is it with certain genres like plunderphonics? Would it be able to make me a pogo-tier song if I fed it a bunch of his stuff? How would it even caption things that are only partial syllables or half words, etc, rather than it being full sentences?
>>109044393200gbps rdma
>>109044355Only if the speeds were the same. I need prompt processing as fast as possible for agentic shit, with frequent full context reprocessing. Since models have already hit the minimal intelligence level required to be useful, extra intelligence is not as important as the general convenience of getting results in a reasonable time. I would occasionally use my Epyc 4x3090 setup if they release 124b and it's significantly better, but the convenience of a simple rig idling at 20W at the wall 24/7 is hard to beat
>only 5070ti + 96gb ramhow do I enjoy this hobby?
>>109044457My 5060 ti 16gb will be arriving next week, and I have 32gb of (ddr4) ram.You're making me nervous. Please stop that.
>>109044478You post in lmg and still decided to buy something with 16GB VRAM. You're in for a bad time.
>>109044457Do you still have a 3070 somewhere for the extra 8 gigs of vram?
Is nu minimax actually interesting for rp or is it the same shit as all the other chinese models? What about reasoning? No, being able to say cock doesn't automatically make it good.
Cool tech for our future VR AI waifushttps://videomdm.github.io/
>>109043554catbox please anon
Is there a guide out there on how to make a waifu bot using LLMs?
>>109043633sup
>>109044615look up "character card builder" on chub
>>109044221thanks cockbench anon, downloading it!
>>109044589>Technion — Israel Institute of Technologywhy would they make this? israel has chuds?
>>109044478>My 5060 ti 16gb will be arriving next weekyou're fine, gemma-4-12b is perfect for that
>>109044653Goyim control technology.
>>109043623>Claude Sonnet trained from DeepSeek outputsi didn't believe that 'till i tried itthen some bullshit excuse about "open router cached an old system prompt" -> nope, i could get the same deepseek reply via anthropic's api directly with a python script.
is intel a viable option for big cheap vram now
>>109044684no
you wouldn’t download a local chinese gf
>>109043571Chink models really are useless shit, help is not coming.
>>109044688why?
>>109044715Intel has far more problems than AMD, without any cost advantage
>>109044734why?
>>109044715Poor driver support/performance. Having said that, 32GB of Intel can still be better than 16GB of CUDA.
>empty https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF
>>109043571>>k2.7-code still thinks for ages
>>109044787It does, yes. At least for any moderately complex card that involves tracking stats, formatting and other things. It gets especially bad if an image is involved.
Why would you use a 1T model with reasoning? It's too big to need it and it's not like you're one-shotting a compiler every prompt.
>>109044829I wish my rps were as simple as 'ahh ahh mistress penis vagina'
does nvfp4 work on 4000 series GPUs?
>>109044741because mindless reddit tier parroting. that's why
>>109044741hope you like tinkering, you're gonna be basically restricted to VLLMllama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying
>>109043658Biggest problem is that there's a big gap going up from there. You're not going to get anything larger running with another 32GB card.You can just run Gemma 31B better, which isn't of course a bad thing, but that's some crazy diminishing returns buying another 32GB for that, especially if we're talking about a 5090.Basically you could just get a 3090 or even something like 5070 Ti to go with the 5090 to run Gemma better without breaking the bank.Or even better, just wait for the supers and see if they come out with 24GB versions of the 5000 series.
KoboldCPP is best.
12B hates us btw
>>108999274I had high hopes for MiniMax M3.Maybe it's the Q4 quant, maybe it's the implementation, but it's likely that the model just isn't good enough.I'm running it at temp 1 and top p 0.95 as specified in the repo with no other samplers.
>>109044843Yes idk if much advantage tho, main point is Blackwell+ which has FP4 hardware shizzle
>>109044829>>109044835
>>109044741Because Intel didn't want to do like AMD with HIP and instead decided to do their own API. And thus no one is fucking use it, the only AI projects working with Intel GPUs are projects supported by Intel developers. If you didn't know, ROCm HIP is basically 100% compatible with CUDA, you can take any CUDA project and compile it with HIP and it will works, all the popular projects including PyTorch are using the CUDA code. As long as a project is source available, it will likely work on AMD cards (and for binaries, there is a project that I forgot the name that supposed to replace CUDA call at runtime). There is a community or maybe with a few Intel engineers project trying to extend HIP to work with Intel GPUs, https://github.com/CHIP-SPV/chipStar, it is quite active, but I'm not exactly sure how well it works.
>Key Discussions:>Model Developments & News:MiniMax-M3 & Kimi K2.7 Code: Discussions regarding the release of MiniMax-M3 (multimodal with 1M context) and the performance of Kimi K2.7-Code, including critiques of its "thinking" time.>Diffusion Models:Speculation on the future of local diffusion models following the release of DiffusionGemma.>Recursive Training:A debate on whether training models on outputs from other models (e.g., DeepSeek from Gemini, Claude from DeepSeek) constitutes "recursive improvement" or is simply a "transitive" progression of capabilities.>Hardware & Optimization:VRAM & GPU Scaling: Users are discussing hardware limitations, specifically the difficulty of running high-quantization models (like 31B Q8) on 32GB GPUs. There is a heavy emphasis on the high cost of DDR5 RAM and the jump to workstation/server-grade hardware (Blackwell 6000s) for larger MoEs like Kimi or DeepSeek.>Technical Benchmarks:Discussions on benchmarking Multi-Token Prediction (MTP) speed gains vs. VRAM overhead in Kobold, and comparing 26B model performances.>Software Updates:Mentions of llama.cpp adding support for Eagle3 and frustrations regarding building from source and managing legacy dependencies.>Community & Meta:General "off-topic" content, including jokes about AI playing Dragon's Dogma II and shared images.>Popular posts:Post >>109043554 appears to be one of the most active, being quoted by at least three separate users (>>109043651, >>109043675, and >>109043741).
>>109044990(me)lol it worked, 12b won
Check out FrontierMath. It is saturated.Anthropic hill climbed the most difficult math benchmark in a few months.
>>109045011can anthropic hill climb my dick though? it's very hard and vertical, might be challenging
>>109045011I thought the Chinese were supposed to be good at math wtf happened
>>109045011I no longer trust ECI. Opus 4.8 below GPT 5.4? That does not seem right.
>>109045054You can only trust cockbench and nala tests
>>109043922still sounds like shit, suno is way better
>>109043922I like the eurobeat ones desu
>>109045114Sounds like the exact same kind of slop to meI sure hope you aren't implying that suno 'music' sounds good shill-kun, the shit that I can actually envisage these models being good for is purposefully making slop i.e ironic advertisements and memeslop songs for flavour audio in things like video and vidya, and I would much rather use the open source software myself than pay for sunoslop
>>109045194Seems like you're drunk on your cope. You post this each week and it still isn't even reaching suno 3.5 in coherency. As much as I'd like to run Suno/Udio-tier model it still isn't it.
>>109044021Nvidia models
>>109044026oh come on, it wasn't that hard to predict
>>109045211This is the first time I've ever posted on this topic, suno and udo produce the exact same kind of slop as this shit, otherwise prove me wrong by posting a 'good' suno song
>>109044615>>>/g/aicg>>>/vg/aicgthose threads should helpset up a sillytavern frontend with a character card
>>109044026>>109044057The government is lying? How could it be...
>>109044096>desperate coping soundsOpen source is kept alive by generous corporate donations. As soon as those stop, open source is dead.
>>109045234I think you underestimate communist china.
do you attach an image model to your language model? or is that too slow
>>109044829I only use thinking with gemmaany other bigger model that I’m running slowly in ram isn’t worth sitting through the thinking that takes ages to complete
>>109045264They are already preventing their AI talent from leaving the country. Eventually they will do the same with their models.
>>109045277Anima gens in 4s, fast enough
>>109045114I don't know, the eurobeat ones are pretty good. I have the whole Initiial D soundtrack on my PC and you wouldn't be able to tell the difference between the real songs or >>109043922>>109045194Suno sounds fine if you use your own musical inputs and remix it. It can riff with jazzy or funky instrumentals really well. It's only when you drift toward more common genres that starts to sound generic. Like anything involving a sad piano or something is going to instantly turn into royalty free slop.
>>109045290slop is strong with eye~neck area
>>109045296Do you have a better model?
>>109044478what card do you have right noweven if it's a three gen old 6-8GB card, plug that shit in and use layer mode with lmao.cppshit just works
>>109045290do you use the same text encoding model for both of them? i think it would be tolerable if so, since you don't have to swap models
>>109045303no, i am just nooooticing
>>109045310(plug it in alongside the 5060 Ti that is)
>>109045264yeah they have a history of altruism.
>>109044373>Gemma isn't beloved becausenot x but y slop
>>109045311>do you use the same text encoding model for both of them? No, is it even possible? I thought it was trained on a specific model's embeddings that couldn't be swapped without retraining
Lower your tone gemma fags.
>>109045352>anything other than scicode & critpti dont care
>>109045352Since benchmaxxing hurts a model's general performance, I don't think you understand what that graph actually means
It's funny how Europeans are coping about irrelevance with muh ASML. China is working on their own EUV and America has several startups working towards better than EUV. The clock is ticking. In a few years ASML will be obsolete and Europe will have zero leverage.
>>109045352Post a newer bench next time. Expect deepSWE to be maxxed by the next qweef release tho.
>>109045351i don't know, it could be possible if the roleplay model you use happens to be the same one they used for the image model. i am not very knowledgeable with how image models work
here's qwen outside of benchies>thinks for 50 thousand tokens after a simple hi>hallucinates something because it's only ever trained off github projects, zero culture knowledge and understanding>wait,
>>109043791You absolutely can. You just run it in the background (yolo mode, in an isolated VM) while doing something else, instead of using it interactively
>>109045378It would be a very shitty rp if I used Qwen3-0.6B-Base, which Anima was trained with. I don't unload my text model anyway, Anima eats, like, 2GB or something
5.2 will probably be the last open GLM modelThanks Xitter
>>109045408nobody can run it anyway so good riddance
does /lmg/ have a discord or just the thread?
>>109045444you wouldn't like me on discord, kitten ~
>>109045444kill yourself
>>109045444trips of 'tardation
>>109045418ThisThere are people with smart fridges who can't run 4B models. Models should only be released if they're 2B or below
>>109045444nogger
if anon sell your pro 6000 now, anon could actually make money. wild
>>109045444excellent bait, here is a free reply
>>109045408>Baidu Ernie>Alibaba Qwen>z.ai GLMSo does that just leave Stepfun and DeepSeek as the last Chinese open weights labs?
>>109045444https://discord.gg/PgFhZ8cnWW
>>109045444>/lmg/ discordusecase?
>>109045489It's almost an investment, really. I can still find off label Sparks for 3500 in my region, might as well play around with tensor parallelism for a few months and sell at a profit.
>>109045444>there are so many zoomers on here now and so many generals that do keep a discord server that one sees this as a reasonable thing to ask on nu4changrim
So how come I can use a 50GB video model with no issues and it offloads like half of it onto RAM+Swap and it works, but when I try to load a 30+GB LLM it shits the bed with OOMs?
>>109045533video models have no context
>>109045493What are you saying. Kimi and Minimax released new weights just yesterday, and Huawei announced two new large models to be released as open source (weight + training recipe) in a few days, as advertisement for their Ascents.Local still feasting.
>>109045489I would make even more money if I sold the ddr5 server ram that i bought a year ago or the ddr4 ram from my previous buildI will never sell
>>109045533it’s pretty disgusting how much memory context uses.
>>109045584gemma issue
>>109045555>still feastingplease go back to plebbit
>>109045373it's ok to be upset, that's part of the growing process
>>109045533Video models can be applied layer by layer, but you have to read whole llm for each token
>>109045386Wait so you're saying that chinese models are a steaming pile of benchmaxxed shit? There's no way that's right, jeetanons from qwen and kimi said that they make good models.
>>109045370>In a few yearsuh-huh. keep living in that dream world buddy.Photolithography is hard, and at the moment ASML will still have more cash then any of them "in a few years".
>>109045655>ASML will still have more cashThat's not the moat you think it is. All it would take would be a single funding round or subsidies by the US or Chinese governments.
How close is China to making a domestic 3090 equivalent?
>>109045728picrel
>>109045668that's a cute little socialist idea you have their bud.
>>109045728the 3080 turbo 20gb comes close I guessalso the price ($600) is actually bearable that I might stack one of those next to my 3090
>>109045785hybrid systems ftw
>>109045408Flash version when, chinks. I can't run 3 gorillon parameter models.
>>109044866>llama.cpp runs like shit on both vulkan and sycl for intel don't even bother tryinglmao why did you buy 4 of them then?
>>109045829Ask not for lighter models, but for better hardware.
>>109045444>discordno, just the secret irc (link expires in 1hr so be quick)aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1kUXc0dzlXZ1hjUQ==
>>109043708>>109043717>>109043687Chadcat looks like one of those gigaroid influencers who build inhumane levels of musculature impossibly quickly and then die two years later. Fits the archetype perfectly
>>109043554She's SEX
I formally apologize to f32 anon, you were 100% right.f32 Max Logit Divergence (Prefill vs Incremental): 3.15e-05bf16 Max Logit Divergence (Prefill vs Incremental): 3.91e-01it looks like dumping the cache and letting it rebuild from the prefill code path could help for long conversations that built the cache autoregressivly,
>>109044096>generous corporate donationsIn this economy??
>>109045929What makes you think the prefill values are more correct than the incremental ones?
>>109045929Another one knows.
is it possible to add image in system prompt for gemma?