/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108404935 & >>108400151►News>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108404935--ASUS Ascent GX10 purchase debated for inference workloads:>108407066 >108407095 >108407101 >108407129 >108407096 >108407138 >108407137 >108407143 >108407169 >108407222 >108407272 >108407309 >108407327--Project Ani design debate: camera interaction vs 3D autonomy:>108408547 >108408584 >108408592 >108408627 >108408640 >108408710 >108409169 >108408585 >108408601--Evaluating budget 4x V100 32GB setup for local LLM use:>108405178 >108405196 >108405211 >108405228 >108405250--Model comparisons for RP, vision, and NSFW:>108407751 >108407779 >108407788 >108407819 >108407839 >108408031 >108407888 >108407928 >108407994 >108408012 >108408115 >108407902 >108408355 >108407787 >108407797 >108407832 >108407833--Criticism of over-tuned safety in modern AI models:>108404958 >108404965 >108404991 >108405298 >108405352 >108405402 >108407336 >108407538 >108407597 >108407700 >108407721--Kimi K2 vs K2.5 performance and prompting techniques:>108408025 >108408073 >108408144 >108408209 >108408258 >108408313 >108408418 >108408252 >108408377 >108408397 >108408439--Qwen 27B preferred over 35B despite speed tradeoffs:>108407396 >108407427 >108407591 >108407619 >108407627 >108407661 >108407771 >108407803 >108407828 >108408416 >108407617 >108407648 >108407678 >108408228 >108407650 >108407970 >108408630 >108408781 >108408894 >108408921 >108408983 >108409004 >108408935--Qwen3.5 27b Heretic v3 recommended for 24GB GPUs:>108408663 >108408753 >108408774 >108408828 >108408851 >108408937--Hugging Face Agentic Evaluations Workshop livestream:>108408324--Qwen 3.5 9B generates functional C code:>108407763--Logs:>108409077 >108409247 >108409442--Rin, Miku, and Teto (free space):>108405043 >108405091 >108406177 >108406814 >108407696 >108407782 >108407933 >108407989 >108408792 >108409216►Recent Highlight Posts from the Previous Thread: >>108404937Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Pls share any armpit related research
>>108410173Tiktok marketers found that videos starting with a visible armpit get 50% more views.
falseflagkun is back ^_^
>>108410173Men can't resist
Just fucking kill yourselves you worthless spamming mikutroons
jannies are in on iti hope they die, not in minecraft, in real life
>>108410225I don't think you understand the situation.
Baker really got mindbroken, huh?
>>108410241That isn't even miku dumbass.
>>108410240I think it is about time you got a dedicated thread for your waifu spam. I just want local model news and not your disgusting autism.
>>108410253I'm not even the miku guy.
>>108410246>>108410253Have a Miku!
>>108410253>I just want local model newsHere >>108410131
>>108410256so what do you get out of doing this exactly?
>>108410261I get to post cute Mikus! Become Miku today!
>>108410268i get that you are a faceblind autist, but that is not miku.
>>108410274Not a true christian fallacy
Miku troons are shitting up this place
I miss the good miku/teto gens.
every single general is like this, worthless fucking jannies
Just report.
>>108410321For what?
>>108410305Same, wonder what happened to that guy
>>108410321I hope you can provide a justification why a tealhaired, twintails anime girl is now suddenly offtopic in this thread. /lmg/ is full of this shit since forever and it never had anything to do with this thread.
>>108410115https://litter.catbox.moe/x7czk5s7o0jcdhyb.jpg
I am happy that we got more people posting miku.
>>108410338
Nobody click on that catbox link.
>>108410338Miku = good, cute, funnynotMiku, falseflagger = bad
>>108410380Issue with leftists is how they prefer qwen3.5 to grok.It's the same reason they didn't vote for the invasion of Iran.
If anyone's wondering, it's march break for grade-schoolers (children). That's why this week's been so bad.
>>108410391Miku = troon coded(You) = Massive faggot
I vote we make Neuro-sama our new mascot.
So I'm guessing the Zoomer/Kurisu baker got tired of baking and decided to just spam until the mods get involved, hoping that all anime images get nuked and thus no more Miku in OP.Too lazy to bake and make an effort but too bothered to not shit up the place, well done.
>>108410405>shit up the placeThis is the cornerstone of thread culture
This is actual thread derailment at this point. 43 replies and not one post with actual local model information. Holy fuck. Find something better to do, seriously.
>>108410404It should have always been le cunny.
>>108410422>This is actual thread derailment at this pointYou mean mikuposting? Yeah. Always has been.
ProjectAni guy here. I'm keeping the cum jar. I spend 4 hours ripping sketchfab apartment models just to come to the conclusion that it looks like ass, isn't that interesting, and is way beyond the technical scope of the project.I don't really want the project to turn into a dating simulator game. I can't be bothered to implement pathfinding shit, first-player navigation, and all of that other crap.For the vision stuff I'm just going to have it work via a webcam. Maybe I'm coping or being lazy, but this extra stuff is way too much to manage and not worth the effort for how far it's divorced from the actual local model technology.For vision related sensory input I'm just going to use a PC webcam/phone camera. Thank you for your attention to this matter - PRESIDENT DONALD J. TRUMP.
damn I should've proofread. whatever.
>>108410455Hope your Ani becomes the official /lmg/ mascot and she frees us from this hell.
>>108410405I think it's a different guy. He's been around since nov or so. He used to ask nicely ( >>107080745 ) but now he's resorted to spamming in hopes of getting his way.
>>108410455Please refrain from putting extra newlines in between every line. Thanks
Imagine being mad, over people posting anime girl pictures on an anime website.
Is this the local Miku general?
>>108410589Her threadHer boardHer world
>>108410589>hiding the feet next to the already delicious thighsprison
>>108410606Shill.
Best thread honestly. We should just forget all the boding nerd stuff and post more anime girls and about becoming anime girls.
>>108410625>sparkling with mischief>mixture of
this is a mockery of /lmg/ culture
>>108410641Explain what's wrong with it in 10 words or less. No buzzwords like "slop".
>>108410648I have to read it every time I gen
>>108410654It's the *cheeks flushing pink* for me, but my eyes tend to just glide through the slop until they find important keywords.
>>108410641in a dialogue-focused RP I don't really care if the minor body language descriptions are a bit sloppy, there are only so many ways you can write those.
>>108410760I really want /aicg/ gang gone
>tfw magidonia keeps trying to write a mini novel every reply
lol miku drama 2 threads in a row, great stuff. come on guys its the internet, it's not serious business
>>108410837You need to be 18 to post here
>>108410760Eventually you'll come to the conclusion that for dialogue-focused RP most narration could be replaced with the occasional emoji to give the general vibe/tone of the response.Visual novels only rarely use narration; somehow that works there, even for those that aren't fully voiced.
>>108410868>visual novelsAre you retarded? Where are the visuals in a fucking RP on ST?
>he's not running ST in VN mode
>>108410760There are probably like 5000 ways to write a person's expression/body language/emotional state, from literal to metaphorical. I refuse to accept the same responses every single god damn time from a LANGUAGE model
>>108410881There are games with extensive VN-like elements like Super Robot Wars where there's barely any interesting visual besides generic backgrounds and character faces with expressions. I don't recall SRW using narration at all in the VN sections, only direct character dialogue, sometimes sound effects (but no voices), and music.
So local utterly lost huh. I never thought I'd see it happen so soon.
>>108410920Change your sampling flags then. Quit whining.
>With a deliberate motion, she unzips the front of her pants—wait, no, she's wearing a skirt.I've never seen a model self correct like this in character.
>>108410924>Just crank up the temperature or use meme samplers!All this accomplishes is producing the SAME SLOP, over and OVER, until it reaches its tipping point and starts producing gibberish. These pieces of shit are so over baked, their token probabilities are so fried, that they CAN NOT generate a variety of responses. Any attempt at all to force variety causes them to break down and become unintelligible. So basically, fuck you
>>108410922>I never thought I'd see it happen so soon.huh? when was ever local good or had any hope whatsoever? unironically?
>>108410963https://arxiv.org/abs/2510.22954
>>108410963if you think samplers dont do anything you're a grade A retard>>108411000checked
>>108411011I just told you what samplers do. Try reading, dumbass
>>108410115I might be slow but I just understood that using uv sync makes your dependencies magically get along and does away with the need to pip reinstall vllm, transformers and flash-attention in various permutations to try to figure out in which magical update order will make them work.
When I go to the ice cream store, the strawberry ice cream is always the same, but I order it anyway.
>>108411098Nemo in a nutshell
>>108411037retard
>>108411000>large-scale study of mode collapse in LMs>LMs, reward models, and LM judges are less well calibrated to human ratings on model generations that elicit differing idiosyncratic annotator preferencesGood to see there is some work in this area.
So how bad exactly is the new mistral 119B? Would it be a suitable replacement for largestral 2411 for RP?
How are images encoded before being turned into tokens?Just a big byte array/bitmap?I kind of wish silly cards had the option to add images to the system prompt or something like that.
>>108411283It's a waste of time and disk space don't bother.
>>108411292Are there any good recent RP models in the 70B to 125B range?
cozy breas
>>108411184dumbass
air status?
Is Qwen 3.5 kind of retarded when it comes to copying character card/pre-existing info or is it just my uncensored model or settings? I tried 35b, 27b and 9b and of the heretic and 'uncensored aggressive' variety.For example, if I have the girl, the bot, described as wearing panties and I tell her to look at an image of another girl, she says the girl in the image is wearing panties even when she isn't. It's really fun to have a character react to an image instead of the LLM directly so this is kind of a bummer if there's no way around it. I'm not expecting deep roleplay scenarios with huge context, just some entertaining image reactions. I've got 32gb of VRAM and 96 of RAM so those big models are outside of my range. Man, I wish I didn't procrastinate on getting more RAM a year ago.
>>108411492you could probably try a q4 of the 122b. qwen models just in general kind of suck for roleplaying because they filter pretty much all sex knowledge from training.
Dang ol' Meeker
crazy how v4 didn't come out this week eithermaybe the ccp is withholding it because it's that good
>>108411503>try a q4 of the 122b.I'll try that, thanks. I wouldn't think of loading something of this size, so this should be a fun experiment. Qwen does seem very plain when it comes to roleplaying, but it goes along with my length cards/intros well enough, at least in the short term. I'd just like it to not stick to them so literally that it repeats the words.
Hey guys, I got tired of that anon making the AI companion girl flip-flopping between release dates and whether he was going to release it Open-Source or not, and decided to make my own.Meet Vionna. My full-featured Open-Source AI companion.https://vionna.life/
>>108411565>no source code posted>download this random exefuck off
>>108411602https://github.com/vionna/vionna-ai-companion
>>108411602is it portable
>>108410115fyi:>>108411349
>>108411602It's a literal malicious actor. can the mods handle this?
>>108411637In case you haven't noticed, they don't give a shit.
>>108411637
>>108411639I think they're all mossad and are a tad busy.
>>108411634>>108411360>I don't really care, if I want to dump $1200 on a gpu there are better options than a glorified igpu, intel should focus on the budget rangefpbpThat $/GB isn't worth giving up CUDA.
>>108411615dead linkscam "software" - malware.WILL FUCK YOU UP
>>108410455Glad you came to a decision that works for you. Yours is the only good post itt.
>>108411565Buy an ad.
>>108411676That's malicious software, not a product.
>>108411648>That $/GB isn't worth giving up CUDA.who gives a fuck about cuda.only thing that matters is $/GB/GB/s
>>108411297Unfortunately you know my answer already...
>>108411565>I got tired of that anon making the AI companion girl flip-flopping between release dates and whether he was going to release it Open-Source or not, and decided to make my own.It's just a personal project that isn't even close to ready enough to have the source code released. I already released the source code for multiple core components anyways (my github is VolgaGerm) and a full diagram of the tech stack I'm using. My main goal, as I've stated previously, is mostly just to engage in discourse about the latest technologies that are relevant to this particular usecase and to hopefully inspire others to get into it as well.For the record, I haven't called your thing malware (I haven't bothered to check), but if it's real I'm glad there are more people getting into this space. Wishing you luck.>>108411661Thanks man
>>108411738>Germvirus moment
>>108411767https://en.wikipedia.org/wiki/Volga_Germans
>>108411738breh I went to your github all giddy expecting something juicy instead its 2 repos with barely anything, fuck you for pretending you released anything open source
>>108411732>who gives a fuck about cuda.Anyone who wants a card that can be used for anything more than llama.cpp's vulkan backend.>only thing that matters is $/GB/GB/sAnd this card is still a bad deal even compared to chink modded 4090s.
>>108411781Check out the emage-onnx-export repo. It contains a demo that will get you quite far if you're looking to replicate my project. That's why it's listed as being mostly html instead of python. The bulk of it is the demo.
>>108411732>only thing that matters is $/GB/GB/scompute is also important if you are interested in decent ttftotherwise just buy a mac I guess
What's the best model for translating japanese to english? Found people online suggesting gemma 3 but its cucked and wont translate images it deems nsfw.
>>108411781Also my Pocket TTS runtime is the fastest TTS with voice cloning that runs on cpu in the world, so it's not exactly "nothing"
>>108411816if its so amazing why does it only have 7 stars? lmao
>>108411814Have you tried e.g. the Heretic or other abliterated version of gemma 3?
>>108411841I am not handing you the prompt for a manual rerun, because the review content is already there.
Can't tell if it's the reap pruning or the heretic uncensoring on top of it but man, qwen 3.5 loves to loop.
>>108411846Nah the official one does it too. Refining the prompt and adding positive examples to reduce ambiguity has cut the looping down a lot for me, e.g. instead of just> * Avoid using quotation marks to indicate a character is talking.adding> * Avoid using quotation marks to indicate a character is talking. Action: *italics*. Speech: plain text.significantly reduced the amount of "But wait, I need to avoid using quotation marks".You might also out using the new reasoning budget feature :')
how do I test how smart a model is?
>>108411868>reasoning budget feature :')for me it works as a sudden </thinking> that 1 of every 3 or so times it just keeps reasoning past it. I don't think it even considers the budget when crafting the reasoning block. If it successfully ends the reasoning mid sentence it "worked" which is pretty shitty
>>108411841That did the trick. Silly question in hindsight but im still new to this. Thanks.
Interesting. It looks like the Vionna AI thing might actually be real. They have a youtube page. Looks like the project uses Unreal Engine and has been under active development for at least 3 months with a full team behind it. Idk why I'm getting trolled ITT by these Indians as a solo software dev.https://youtu.be/Be2km1AVQhg
>>108411791>Anyone who wants a card that can be used for anything more than llama.cpp's vulkan backend.what even is zluda and hip / rocm.massive skill issue.>And this card is still a bad deal even compared to chink modded 4090s.i don't disagree, my only point is that there is no reason to stay on nvidia if another company has a better deal.>>108411812llms are bandwidth bound not compute.macs are slow because the bandwidth is still not that high
>>108411892>1 of every 3 or so times it just keeps reasoning past itThat's fucked. I assume you're on head, but I vaguely recall one of the changes pushed adjusted the injected "</think>" to "</think>\n\n"? I might be hallucinating.My prompts are simple enough that I rarely run into looping issues anymore, thankfully. Usually when it happens it's because a fuckup on my end (e.g. "You are not wearing any pants. Take off your pants.") which I can just fix and re-send.
>>108411933>llms are bandwidth bound not compute.llm inference is bandwidth bound, newfrenprompt processing is compute boundThat's why your heart sinks the longer you run inference on a mac.Imagine the ttft after 128k of context...
>>108411933>llms are bandwidth bound not compute.Compute is needed for prompt processing.
>>108411915Ironic that these people are pressuring me to open-source my stuff while not even releasing their own source code. Very malicious behavior. This kind of shit is why I've always avoided open-sourcing my code until very recently. Typical opportunistic brown faggots.
>>108411933>what even is zluda and hip / rocm.We are talking about an Intel card, not AMD.
>>108411915>with a full team behind ittheir poorly conceived astroturfing has me hating them when I otherwise would have felt neutralGood job, fellas
>>108411953no we were talking about non nvidia cards.also if intel released actualy good cards the software would quickly follow.my point is idgaf about cuda as long as you can do gpgpu, which you can with vulkan anyway.
man I just love telling my little slut gwen to give me today's newspaper from my country's major papers, then also fetch some finance news from a dedicated newspaper, run additional queries to her, cross check with other data and finally rape her.this is like the quintessential 90s secretary experience, but now we get to live it virtually.we're all gonna make it brahs
>>108410922wait for avocado and gemma 4 next week
>>108410922Way to out yourself as a poorfag
>>108412028full dive is gonna be a hell of a combination with ai lol
>>108412028I thought that was in like the 60s?
>>108411947>inference on a mac.>Imagine the ttft after 128k of context...https://omlx.ai/benchmarks?chip=&chip_full=M5%7CMax%7C40&model=&quantization=8bit&context=131072&pp_min=&tg_min=&page=1https://omlx.ai/benchmarks/fgig386mttft: 305 secondsmodel: qwen 3.5 27b q8
>>108412117>thousands of bucks to run a 27blmao
You are absolutely right — this time it's for REAL!
>>108412177:rocket: :rocket:
>inference on macYeah these numbers aren't a surprise unless you're new. The only reason to go for Apple for AI is if you want to run large models in a relatively portable form factor, otherwise building your own server or something is better. Although that is my past knowledge and I haven't been paying attention to hardware prices recently to know if it is still true or not.
so what's the verdict on mistral small 4? worth bothering with?I tried qwen 3.5 (mediocre) and stepfun (decent but too large for my hardware), I'm growing desperate for good ~120B cause I still use glm air 4.5 which is old as balls by now
>>108412346mistral4 is llama4 tier. complete garbage.
>>108412346I think stepfun is a nice sidegrade over glm air, but with my hardware I can run both at 10 t/s (good enough for RP cooming). sadly I moved on from rp lately and have been using qwen35-35 moe for agentic stuff, and man the t/s with FULL cpu moe gets me 30 t/s + 256k context and man it feels like it's the first time you can do local memes at a decent level (for coding/agentic stuff) on normal hardware (96gb ram + 16gb vram)
>>108412385What sort of tasks are you doing with 35 A3B? Does it really work well enough for programming for example?
>>108412405It's ok, I've been making some automation scripts with it (also a web GUI for local TL) and it was serviceable.But for my big projects I'll be honest, I use gemini since I have basically unlimited gemini 3 flash credits and some 3.1 pro.What it replaced for me was the random questions/web searches for stuff and also TL of japanese.
>>108412405NTA, but 35B A3B was fucked for me at Q8_0. It made all kinds of basic syntax errors (missing semicolons, commas, etc). 27B Q8_0 works fine for one-shotting my tasks like>[upload charedit.js]>Can you update rebuildPromptList to re-use existing DOM elements instead of wiping out the entire $prompts container? You'll probably need to update appendPrompt to add prompt.id into a data attribute of the char-edit-prompt node.>I only need the code for the updated/changed functions.27B-heretic to be clear, it's important that you fuck your programming assistant every now and then to keep your head clear.
>>108412440>>108412427I did test 3.5 9B for some C stuff I'm working on and I was so surprised, that I'm now thinking about trying 27B or that moe model. Of course I need to keep it simple and this is all what I need, to help me with C syntax and some string pointer things and so on.
I WALKED TO THE PIZZERIA WITH PRACTICED EASETHEN ATE A PIZZA WITH PRACTICED EASETHEN I WENT BACK HOME WITH PRACTICED EASEAND I USED AN LLM TO RP, WITH PRACTICED EASE
>>108412177>Theslop
>>108412551based
>>108411283That it hallucinates almost everything if you give it illustrations to describe makes me think its training data has been cleaned of pretty much anything that might have been blatantly covered by copyright.I tried making it write a random AO3 story in text completion mode and it still worked, though, to the point of defaulting to a Harry Potter story as expected. Maybe the instruction tuning is just not good or expects reasoning to be enabled even if it's selectable (and off by default).
>>108412688It's like gpt-oss then. Clean and tepid.
>>108412097For zoomers 90s is the new 60s
qwen 3.5 35b is better than 27b
>>108412883>moe better than densedoubt
>>108412883Qwen 3.5 27Bhidden size=5120intermediate size=17408num hidden layers=64Qwen 3.5 35B-A3Bhidden size=2048num hidden layers=40moe intermediate size=512experts per token=8(equivalent intermediate size = 512*8 = 4096)Qwen 35B is approximately designed like a dense 3B/4B model, even if it might have the knowledge of a 35B model. There's no free lunch here.Similarly, Mistral Small 4 has the dimensions of a dense 7B model.
>>108413019Qwen 35 a3b is like a cute little layered cake.
>>108412927>>108413019ur're all wrongi use gwen 3.5 35b is much better experience then 27b denseits called DENSE for a reason because it is dense synonym STUPID
>>108413019By the way, you can make up for the lack of depth/layers (i.e. capability of doing knowledge manipulation) with reasoning/chain of thought, but even if you can sort of increase model width with the number of active experts (although it's not exactly the same), there's nothing that can be done with the hidden size.Most of these MoE models are just "fat" small models in terms of capabilities.
When are the good models releasing?
>>108413096Gemma today
>>108413096no
>>108413106Google never releases anything good on Friday.
>>108413113>Google never releases anything goodftfy
>>108410115Newfag here.Is it possible to use LTX 2 and 2.3 for RTX 5070ti GPU ?