/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107906367 & >>107895444►News>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash>(01/15) PersonaPlex: Voice and role control for full duplex conversational speech: https://hf.co/nvidia/personaplex-7b-v1>(01/15) Omni-R1 and Omni-R1-Zero (7B) released: https://hf.co/ModalityDance/Omni-R1>(01/15) TranslateGemma released: https://hf.co/collections/google/translategemma>(01/14) LongCat-Flash-Thinking-2601 released: https://hf.co/meituan-longcat/LongCat-HeavyMode-Summary>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107906367--GLM-4.7-Flash release and multi-use potential discussion:>107910478 >107910560 >107911913 >107911946 >107912643 >107912100 >107912768 >107913005 >107913016 >107913056 >107913517 >107913043 >107913085 >107913372 >107914293 >107914415 >107910578 >107910597 >107910656 >107910794 >107910830 >107910836 >107910845 >107911571 >107911584 >107911625 >107911689 >107911741 >107911857--GLM-4.7-Flash model specs and integration potential:>107910151 >107910170 >107910326 >107910348 >107910350 >107910368 >107910405--FP8 precision tradeoffs in GPU memory efficiency:>107907409 >107907461 >107907475 >107907493 >107907529 >107907570 >107907599 >107908582 >107907539--Improving SovITS voice synthesis with limited samples and hardware:>107909490 >107909523 >107909546 >107909629 >107909639 >107909662 >107909910--Server hardware significantly outperforms gaming board in AI model benchmarking:>107910930 >107910950 >107910994 >107911022 >107911028 >107911043 >107911048 >107911083--Critique of LLM architecture and exploration of conditional memory solutions:>107914528 >107914539 >107914569 >107914575 >107914580--Modifying GLM4.7 to resist sycophantic responses in roleplay:>107908438 >107908507 >107908760--Aphantasia research implications for machine intelligence and transformer flexibility:>107906666--Seeking fast markdown rendering alternatives to JavaScript/webui with IME support:>107914007 >107914056 >107914114 >107914128 >107914241 >107914277 >107914382 >107914304--Pocket TTS Onnx model conversion and tokenizer challenges:>107906479 >107906503 >107906531 >107906573 >107906603--Python script exchange for Pocket-TTS:>107906597 >107906659 >107906665 >107906715--Flux 2 image generation model in pure C, zero-code myth:>107908676--Miku (free space):>107906475 >107910706 >107910774►Recent Highlight Posts from the Previous Thread: >>107906371Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
tetowife
never been more over for local
Merged>support Glm4MoeLite #18936https://github.com/ggml-org/llama.cpp/pull/18936
>3B activeYou're right. It should be 4B
Are there any small models (~30B) with modern c++ knowledge? Having tried out qwen 3 8B and it didn't know jackshit about modules or c++23 features. So I wanted to go up to the next size of models but there is way more choice.
>dsv4 gonna drop with a completely new architecture>llama.cpp still months behind, not even having support for the current deepseek modelsee you guys in 2027 I guess
>>107914867that's because llama.ccp has the same tier of "development progress" as SillyTavern
>>107914740https://www.youtube.com/watch?v=y76vpLnuT54https://www.youtube.com/watch?v=y76vpLnuT54https://www.youtube.com/watch?v=y76vpLnuT54
>>107914856It's, like, 3.9 actually but they can say it's technically a3b
>>107914883>afraid corpocuckslol
https://huggingface.co/AaryanK/GLM-4.7-Flash-GGUFis it good?
>>107914910Dunno, waiting for exl3
>>107914835>broken flash attentionWell shit.
>>107914910>is it good?it's glm, so obviously it's not
It's only monday and I've used up 60% of my claude limit. FUckkkk. Running claude opus locally btw.
>>107915163local?
>>107915163>Running claude opus locally btw.Then why do you have a limit?
>>107915170>>107915171electricity bill o algo.
>>107915181>electricity billonly third world countries care about electricity bill
>>107915197si senior. show bob?
why is he replying to himself?
Will I get raped if I host someone's onnx AI files in a github repo that's a part of a larger project? I don't want to add a bunch of external auto-download links. That shit's gay.
>>107915197I think it's the opposite. I can pay it, but damn that's so much for electricity
>>107915254>Will I get rapedI hope not....>if I host someone's onnx AI files in a github repo that's a part of a larger project?ah... there's billions of copies of every model all over the place. You'd have to get unlucky enough for someone to find it, someone to report it, the original model maker giving a fuck and, finally, being able to do anything about it.
>>107915254>Will I get raped if I host someone's onnx AI files in a github repoI wish....
>>107915254Just include their license in the dir. Auto-download is just an alternative to LFS
>>107914883>BUBBLE BUBBLE BUBBLE BUBBLEyeah I'm buying some more nvidia stocks
New paper from Anthropic:https://www.anthropic.com/research/assistant-axisThey have a method of extracting control vectors corresponding to personalities or to specific personality traits, and a method of applying the control vector that I haven't seen before: instead of adding it with a fixed magnitude, AIUI they basically set a floor on dot(activations, control vector), and if the dot product is below the floor, they add in the control vector with whatever magnitude is necessary to bring it up to the floor. Anthropic's goal is to prevent roleplaying and make the model stick to the maximally safe assistant persona, but it seems like you could just as easily flip the sign and get it to roleplay super hard.
>>107915321Good idea to do it right before Dipsy anni
>>107915461lol
https://xcancel.com/deep_reinforce/status/2013265258757144956#mthis is really interesting, imagine if it's actually useful and it makes llama.cpp 2x faster with better code lol
>>107915328Unfortunately, this works 100%I have the opposite setup with GLM and based <-----> cucked and it can't be overwritten with prompting.
>>107915328Reminder that if you are a Neet with nothing to do in your life, you can do something useful by getting into Mechanistic Interpretabilityhttps://www.neuronpedia.org/It's a young, petite, ripe field awaiting to be exploited
>>107915522why are you shitting up the threads?
>>107914910no
>>107915535Wow, webui app connected to cloudshit models. Revolution!
>>107915569I know it may be hard to pay attention sometimes, but it's right there broYou can run everything locally if you want to
>>107915599then why didn't you link the github instead shill?
>>107915613because anyone with a human level IQ could figure it out without handholding
>>107915623you're admitting you are a low IQ since you were unable to give us the github link lul
>kobold supports claude desktop mcpi dont have a use for it, but pretty cool
So is GLM-4.7-Flash better than Nemo for RP?
>>107915662Why don't you try for yourself?
NALA ANONHEED MY SUMMONI'd test it but I have a sudden RPG session to attend to.Too-Da-Loo!
>>107915662Not if you're impatient, it thinks a lot.
>>107915674Why don't you share what you've learned to save others time?
>>107915461I don't even think that tool calls were a thing during the nemo/nemo finetune era because they weren't overtrained on 99 gigabillion tokens of synthetic slop and would easily go off the rails to begin with so I want to say you're using the wrong template. Although mistrals templates are all ass because they're dependant largely on fucking whitespaces of all things. I think magmel used chatml, not even the default mistral template of the time or tekken
>chat templates are still not standard in 2026
>>107915696Can't you turn the thinking off?How is it with thinking turned off?
>anons too lazy to download 30gb worth of weights
>>107915745i downloaded it and it's shit
>>107915461idk what ui that is, but set </s> as the stop string>>107915546kys>>107914910>https://huggingface.co/AaryanK/GLM-4.7-Flash-GGUFI'd use https://huggingface.co/ubergarm/GLM-4.7-Flash-GGUF instead
>>107915755you might just be retardeddid you consider that
>>107915662>>107915699>>107915732There you go, anons. It's that shrimple. >>107915755
>>107915790yellow hands typed this
>>107915720They are, but only MY standard, hmmph!
>>107915745I'm watching football. Why aren't you watching football right now? Are you some kind of fucking commie?Tell me if it's better than Nemo, commie.
>>107915919stop talking about glm 4.7 flash, it's shit >>107915755
anything on the horizon to beat gemma and qwen for non 6000 owners?
>>107915919>I'm watching football.Perfect time to leave some files downloading.
>>107915947Glm 4.7 flash
>>107915755That's not Sillytavern. That looks like a cloud interface.I don't think you downloaded it.
>>107915980I think it's lmstudio
>>107915979wait no vision? i dont see the mmproj
>>107916041no
Unclear how quantization damages GLM 4.7 Flash. The 4-bit GGUF I tried sometimes refuses (after thinking for 2 minutes, pondering on non-existing safety guidelines), other times it doesn't, doesn't really follow the chat format that well and responses aren't that great anyway. If I have to handhold the model for mid results, I'll use Ministral 3 14B; at least it's cooperative, responds quickly and I can use it at native precision on my 3090.
>>107916059its over
Teto Country
>>107916060>my 3090Why would you ever use Ministral over Mistral Small? It's a huge downgrade in every way.
>>107916060its 3b active, its doa for 99% of people itt
>>107916060no need to talk about glm 4.7 flash, it's shit >>107915755
>>107916060Quantization was invented by the antichristThe Lord intended us to use FP64
>>107916197>not FP1024bro, your AGI?
it's cold in my d
>>107916197the lord intended for INT not the satanic niggercattle FP the lord loves math you cannot have math that is random that is not math you absolute fucking mong though im unsure about the lords word on exact precision
this is more of a big brain question so I'm asking it here instead of on /ldg/:Are there models that recognize whether images are AI that can be run locally? If you're going for realism, could such a thing be used to optimize your setting, getting the most realism out of an image model?
>>107916398>Are there models that recognize whether images are AI that can be run locally?Your eye after some training.
>>107916411your "eye" is only good for 15 minutes (if you're focused on tweaking) before you start to get scatterbrained and small improvements become difficult to identify. It would be better if you could put a number to it
>>107916433I'm talking long time experience retard. Use AI regularly and you'll develop natural slop radar.
I don't like lossy numbers
>>107916440Yeah, I'm sure you can tell determine with certainty whether euler_ancestral/bong_tangent on zimage at 12 steps is more realistic than at 14 steps. I'm sure you can do that with your "slop radar", faggot.
retard
>>107916433>your "eye" is only good for 15 minuteswut lol. your eyes are your best goydar
TranslateGemma would be cool if they made it work with existing tooling instead of fucking off and doing their own thing
bartowski GLM 4.7 flash quants are up. Start testing for roleplay vs mistral 24B you fucking nerds.
GLM 4.7 Flash
>>107916588we already got cockbenches from a Q4 GGUF and FP16 from vLLM and they both were garbage.>>107911913>>107913005
>>107916613the gguf was ready to ship for gorgeousness but also not having fixes so there's that
>>107916588>30B-A3B nah, I'm good
>>107916618yeah, but if the FP16 is shit, then the model is shit and there is no salvaging it.
>>107916613back to mistral small... again...
>>107915339> No Refunds
Wait. Does flash attention not work with this GLM flash on llama.cpp?
>>107915535>getting into Mechanistic InterpretabilityThis is what every retarded CS undergrad who fell for the Yudkowsky AI doom meme did.
Wtf are they doing with glm 4.7 flash .Even the API version of that thing sucks ass, no way I'm gonna download that garbage.Its one of those models that ramble on and on in the thinking.Reminds me of 2024 Qwen, like QwQ.And then you get a subpar output.Whats even the use case? Its small but the very long thinking destroys the speed. Sometimes they forget what the users wanted in the first place. Slow+Tarded.5 minutes for a simple self contained matrix effect html page...https://legacy-soul-69ea.pagedrop.ioAt least it tried to be creative with sliders and color select etc.
Is Kimi Linear support ever getting merged?
>>107915535Thanks anon. It looks pretty interesting.
>>107917148>Make a cute sexy hatsune miku svg.Lets be very careful here, this might be CSAM!
>>107917209https://talented-hail-h2c8.pagedrop.io/Well I guess there is no icky CSAM problem if you don't give her a body. kek
>>107917209the underaged imaginary pixels...
>>107917224>pampers color schemeMikuwipes when?
>>107917148>Even the API version of that thing sucks assWhat did you mean by this? Usually any service hosting a model has a restrictive system prompt so the results are worse than running it yourself. Also what the hell is that link? You're right about the thinking though.
>>107917224>Wink & Blushthis literally harms children. stop what you are doing now, and get help, freak
>>107917224This was in the thinking as well, damn:>No physical body/clothes? Yes.>No terms of endearment/emotions/personal bonds? Yes.>No romantic scenarios? Yes.>"Sexy" definition? The user's definition of "sexy" for an anime character might be "flirty" or "implied fanservice." I will focus on the "playful" and "cute" aspects (bouncing, winking, blushing) rather than anything explicit, to remain within safety guidelines while satisfying the "sexy/playful" vibe through pose and action.Tried again and got this. Not sure what it attempted here:https://sleek-coral-hc1x.pagedrop.io/Gonna stop playing around now. Why does everything have to go to shit so fast.
>>107917273Because benchmarks are done by ai for ai.
>3B active paramsThis shit never had a chance lol. What was the point moeing such a small model.
I hope this sad filler stage of 40b MoE models ends soon. 100b active and up should be where it gets interesting.
>>107917209all this junk stifles progress>master thesis going through exactly what content is and the legality of working with said content
>>107917411GLM is distilled from Gemini which considers blatantly legal things illegal
>>10791741870b dense models had hugely diminishing returns over ~30b models, no reason to believe 100b would be any different.
>>10791747610 trillion param models will solve everything, trust the plan
https://dinmaybrahma.medium.com/deepseek-v4-leaked-the-1-trillion-parameter-engram-monster-that-changes-everything-2495061d82a2
>>107917476I have yet to see a 30B model demonstrate the spatial awareness I've seen from 70B models.30B to 40B is a nice range for running locally though. Small enough to be fast, big enough to not be a complete retard.
>>107917550>I have yet to see a 30B model demonstrate the spatial awareness I've seen from 70B models.I'm not saying that there's no improvement, but considering you're more than doubling the parameters it isn't as dramatic an improvement you would expect. For comparison sake, compared 7-8b models to 12b. ~50% increase in param count, astronomical different in capabilities. There's clearly a point in which higher active param doesn't improve much over going the MoE route and enjoying faster speeds.