/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108032910 &>>108024966►News>(02/02) Step 3.5 Flash 196B-A11B released: https://hf.co/stepfun-ai/Step-3.5-Flash>(01/29) Qwen3-ASR 1.7B and 0.6B released with support for 52 languages: https://hf.co/collections/Qwen/qwen3-asr>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108032910--Papers:>108037623 >108037665--Quartet II: 4-bit LLM training in NVFP4 with FP8/FP16 quality and full hardware acceleration:>108044022--Testing abliteration layer selection for dataset overfitting patterns:>108035620 >108036110 >108036143 >108036499--Anon seeks Devstral 2 settings after 80GB VRAM upgrade:>108037329 >108037342 >108038272 >108038524 >108037364 >108037408 >108037437--llama.cpp postponing LongCat ngram implementation pending mainstream adoption:>108037744 >108037767 >108037825 >108037913 >108037939 >108037945--Gemma 3n and prompt repetition recommended for JP-EN manga translation:>108037473 >108037533 >108037557 >108037727--Anon asks for human-like models (SAGE, HER, UserLM):>108034412 >108034423 >108034451 >108034547 >108034891 >108034942 >108034556 >108034730--Anon benchmarks Step-3.5-Flash on dual RTX Pro 6000s:>108044196 >108044231 >108044236 >108044363 >108044423 >108044429 >108044513--Kimi K2.5 outperforms Qwen3 Max on /pol/ memes and muffin tests:>108034522 >108034672 >108035669 >108035696 >108035755 >108035783 >108035903 >108036007 >108036037 >108036067 >108035902 >108035932 >108038149--ComfyUI Qwen TTS nodes for JP-to-EN audio generation:>108035458 >108035471 >108035499 >108035542 >108035574--llama.cpp lacks FP8 support despite GGUF format capability:>108036017 >108038186--Stepfun releases Step-3.5-Flash 198B-A11B:>108040588 >108041288 >108041387 >108042008--Anima LLM anime model and e621 tagging debate:>108034966 >108034988 >108034993 >108034999 >108035015 >108035120 >108035148 >108035178 >108035192 >108036210 >108036439 >108036455 >108036611--K2.5 vision model accurately recognizes anime characters:>108036188 >108036450--Logs: Step-3.5-Flash cockbench:>108042145--Miku (free space):>108036210 >108036611 >108036719 >108045895►Recent Highlight Posts from the Previous Thread: >>108033093Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Teto sex
SATAN HAIRED MIKU BEGONE FROM THIS HALLOWED PLACE
>>108046563I gave Silly-Tavern a try and I hate to say it but I was disappointed. Any other alternatives?
>>108046119Claude (but Claude and Gemini are very similar nowadays and might be using the same datasets or distilling from each other)>>108046140You can for classic abliteration but norm preservation apparently ends up being very high rank. You could use the LoRa adapter and also add an extra per token value per layer for norm preservation but that requires a lot of custom code.
I like my LLMs how I like my women >:)
>>108046763Naked in groups of 8 and chained to a radiator?
>>108046747>might be using the same datasets or distilling from each otherWhat is subgenre of incest called?
>>108046693Nyoo~!
radical (2mw) wait loss
>>108046763https://www.justice.gov/epstein>yann lecun>3 pages of results CAT INTELIGGENCE SISSIES ?!?!??!?!
these new gens don't quite hit the same as the old ones
apparently some anon registered a non profit to remake anima in apache2 with a larger dataset and better encoder
>>108046922is he going to change to llm-style prompting or keep the tag retardation?
I need an image editing model benchmaxxed in typesetting manga
>>108046817Half of that is just the same E-Mail over and over again.You lost, chud.
>>108046964tags makes more sense then just train controlnets. the nlp in anima is broken and tends towards slopstyle anyways. I'm pretty sure the laion dataset the original model used is the only think tagged in nlp which is why it gets so 2.5d when using them
How much data would I need to train models on natural language tasks (mostly for understanding structure of text in a document) while also providing enough data for it to infer that Jane, Doe is a name and Los Angeles, California is a place and things of that nature? I've trained a small (I think 1 bil parameters?) BERT model to do natural language classification but the task/problem was very simple and I think I made like 500 examples to fine tune it on
>>108046964https://huggingface.co/circlestone-labs/Anima/discussions/9#69812bd9511f2d67952084ae
>>108047028nevermind this is much more retarded than I thought
>>108046829Catbox?!PLEASEEEEE
>>108047020Grab the checkpoints from EleutherAI and find outOr see what people have done training models from scratch But the answer is probably a few gigs of text?
>>108047028that isn't the apache2 dev
>>108047095really couldnt care less about your failbake ani, come back when you have a trained model ready to show
>>108047028that author wants to grift his licence on all derivative models
SimpleGPT: Improving GPT via A Simple Normalization Strategyhttps://arxiv.org/abs/2602.01212>In this work, we revisit Transformer optimization through the lens of second-order geometry and establish a direct connection between architectural design, activation scale, the Hessian matrix, and the maximum tolerable learning rate. We introduce a simple normalization strategy, termed SimpleNorm, which stabilizes intermediate activation scales by construction. Then, by analyzing the Hessian of the loss with respect to network activations, we theoretically show that SimpleNorm significantly reduces the spectral norm of the Hessian, thereby permitting larger stable learning rates. We validate our theoretical findings through extensive experiments on large GPT models at parameter scales 1B, 1.4B, 7B and 8B. Empirically, SimpleGPT, our SimpleNorm-based network, tolerates learning rates 3-10 larger than standard convention, consistently demonstrates strong optimization stability, and achieves substantially better performance than well-established baselines. Specifically, when training 7B-scale models for 60K steps, SimpleGPT achieves a training loss that is 0.08 lower than that of LLaMA2 with QKNorm, reducing the loss from 2.290 to 2.208.https://github.com/Ocram7/SimpleGPTno code yet. might be cool. relooking they only report loss and no benchmarks for the actual models so little iffy
Sorry, but as punishment for something on another board I am going to post furry story slop here to trigger a panic attack in a russian shitposter and ruin his "comfy" hangout for him.
Does anyone care about this thing? I fail to see how this thing can be useful to anyone.
>>108047301kill it with fire
I'm actually interested in this:https://huggingface.co/stepfun-ai/Step3-VL-10Bhttps://huggingface.co/seanbailey518/Step3-VL-10B-GGUFthere's already someone working on a llmao.cpp PR... I really needed something to replace Qwen3 VL 8B, and this looks like a major upgrade.Did anons test it?
>>108046922based open source chad
Woopshuggingface.co/zai-org/GLM-OCRhttp://ocr.z.ai>With only 0.9B parameters, GLM-OCR delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction.https://x.com/Zai_org/status/2018520052941656385
>>108047412DeepSeek-OCR-2 obsolete already after only a week.
>>108047412we need the japanese pc98 or whatever screen captioning test
>>108047431found it
>>108047418oofs where?
>>108047455
>>108047484trash
>>108047484shame on the first line 1 wrong char, everything else is good
>>108047484I'm only seeing one fuck up. End of first line. Ba instead of Po
>>108047484せっかく労働を券ってやったのに無視された……(しょばん)まあ、警視庁が都案を快く思ってない事くらい、よおおおくわかってますよ!i'll include the text here too券 on first line is wrong
>>108047484I count 5-6 mistakes.
>>108047513How many mistakes did DeepSeek and dots make?
>>108046563https://medium.com/@cooksusan482/deepseek-engram-explained-2026-guide-452deb903589man if only deepseek saved local.though at that point ram may become more expensive than gpus kek
>>108047531>ai slop medium article
>>108047513Oh wait nvm I was looking at the wrong text (had transcripts locally). Looks like it's just three mistakes. Not the worst. Not the best.>>108047523I don't know/remember.
>>108047574yea i don't realy care, i shared the first thing mentioning engram, which is what you should care abouthttps://github.com/deepseek-ai/Engram
Can someone recommend to me what models I should be using for chatbot + image generationSpecs: RTX 3090 24GB, RTX 5080 16GBi7 12700k64GB DDR4 3200 mhzCurrently using Deepseek R1 70B Q3KS & PonyXLThanks bros
>>108047607GLM Air and Anima
>>108047412Are there any decent multimodal models that are strong in OCR and document understanding as well as natural language?
>>108047783you could theoretically set a pipeline where you have OCR models (deepseek/glm/dots) feed their output to an actual llm, who do you want it to be able to do everything? specialization > generalization
>>108047635apache2 anima right? it's not out yet
>>108047788fuck off retard
>>108047802why am I retarded?
https://x.com/ComfyUI/status/2018442042859540602What will the announcement be?
>>108047868acestep prolly
>>108047301What's it called when you sell open source shit but don't actually provide the information to complete the project without paying for it?Appears softwares available and uses an RPi 4. But no info on hardware aside from cutting them a check.
>>108047961it's 100% a grift to extract money from investors
looks like step 3.5 flash is getting llama.cpp support, tokens per second look promising: https://github.com/ggml-org/llama.cpp/pull/19283
>>108048416>Same active params as Trinity large>Half the total params>FastIt's gonna be even more retarded than trinity was.But it'll be retarded at like 6 times the speed if one of the two 6-month-old PR's for MTP ever get merged (they won't).
>>108047868Gender reveal
>>108048416>tfw no PR open for the vision model
>>108048599>parallel reasoningso implemented in llama.cpp never ever
is LLM an ultimate form of rote learning?
>>108048473What's the current meta? Is Trinity close to GLM?
>>108047868Who cares, I'm still maintaining my 2023 install from before it got sloppified
>>108048639nobody fucking knows yetcase and point:>>108048473>It's gonna be
>>108048646Your plan is to gen exclusively with SDXL for the rest of time?
>>108047360I'm currently only testing speed. On a rtx pro 6000+ 2x5090, at ~12K tokens:prompt eval time = 4892.51 ms / 11315 tokens ( 0.43 ms per token, 2312.72 tokens per second) eval time = 12991.86 ms / 1339 tokens ( 9.70 ms per token, 103.06 tokens per second) total time = 17884.38 ms / 12654 tokens
>>108048674oh wait, that's the VL model, im testing the https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4
>>108048639>What's the current meta?GLM. Nemo if you're poor. Kimi if you're rich.>Is Trinity close to GLM?Not even close. It's unaligned but it's dumb as dogshit. Side by side you might actually not be able to tell the difference between it and nemo, which is ~40x smaller.>>108048656>nobody fucking knows yetIt can be ran in the forked version of llamacpp or if you pull and compile from the PR, plus it's been up on OR since release.It's not impressive. Both GLM and Qwen3 know that /lmg/ is a 4chan thread about LLM's.
>>108048699Grim. Even toss-20 knows about the thread
>>108048699>not trained on 4chudinto the trash
>>108048783Weirdly enough though, it passes the mesugaki test.
>>108048661You can update support for newer models yourself, in any case, SDXL/pony based models are still the best out there if you don't care about making catfish profiles with zit for your mumbai based scam centreHell I still use 1.5 for some things, there are 1.5 workflows that have their own unique strengths, image gen is a creative endeavour
>>108048882>SDXL/pony based models are still the bestLOOOOOOOOOOOOOOOOOL
>>108048887>But saar, you cannot redeem the photorealistic 1girl to farm Google play cards on the internet'sOkay, here's your last (you) from me lest we derail the thread
>>108048918Noobai/illustrious are good not pony
Oh it's a shill
>>108048929>Both SDXL based modelsRetard
>GLM 5 comes out>it's even more censored than GLM 4.7NAI stays winning.
>>108048983
>>108048953>Can't tell the difference with ponyRetard
>>108048918weird poorfag cope but ok
>>108048983The only Lunar New Year release that is worth being excited for is V4.
>Join back to lurking thread after hiatus>Still posts about GLMIs it really just one or two guys shilling this dogshit? Even reddit has wised up after the initial shilling. I will continue to shit on GLM until the parroting is fixed a future version.>>108048699>Both GLM and Qwen3 know that /lmg/ is a 4chan thread about LLM's.They're here.
>>108049125What model should I use instead?
>>108049151Deepseek V3Deepseek R1Kimi K2Qwen3 (Yes, I know. Just give it a lot of Min P)Mistral 2411 123BLlama L3.3Take your pick.
>>108049125> I will continue to shit on GLM until the parroting is fixed a future version.Dogshit? I'm more surprised the main complaint is the parroting. It is genuinely not as bad as people say, especially with thinking on, whoever says it does not matter for RP cannot be saying it in good faith.The bad part isn't the parroting; it's the amount of slop it produces. Its prose faintly smells of ozone and... something else—disappointment?—with long shadows being cast and knuckles whitening. Most people would have noticed this.I want to strangle this slop machine. Just kidding. Mostly. Unless you ask me to.But it's the most coherent thing we have in this parameter range.So, what model are we waiting for next? Or are you just going to keep complaining about it on an imageboard for losers? Go on, I'm waiting.
>>108049183>Dogshit? I'm more surprised the main complaint is the parroting.>Dogshit?This nigga just used GLM to reply to me.
>>108048639Trinity is fucking retarded
>>108049183>;>—
>>108049169I personally use Qwen3 235b because I can run it at my reading speed while GLM is just under it, but in every test I've ever ran while trying to boost that speed, GLM's responses have been noticeably smarter.I've also yet to see any of this parroting behavior mentioned here, but that may be because my tests were either oneshots or additions to full-context logs.There's a possibility it's also because my default system prompt explicitly bans responses from including or repeating anything the user says, because the 2501 mistrals were cunts for that.
>>108049125I had ego death because of glm. I will shill it till i die.
>>108049169Which has the least lobotomized decensor? I use K2 for assistant stuff, but I just want an ez drop in replacement for personal stuff, and glm 4.7 prism works the best for me at the moment.It's sloppy, which I hate, but it seems to have better understanding than various random llama 3.3 70b finetunes / mistral 2411 123b / abliterated minimax m2.1.
>>108049197>>108049207And that was all you noticed?
we should go in world-model not LLM. world-model could be a simulation of life and world. With NPC talks to you. Would be a great RPG game.
>>108049218Deepseek and Qwen3 yield good results, but Deepseek demands a lot of ram, and Qwen3 235B (The one I'm suggesting) takes a lot of troubleshooting to rid the purple prose, but at least it's possible to get rid of in the first place.
Step 1 of making a model that is good at writing is to simulate the universe.
>>108049233I'm skeptical but I'll try again.My previous experience with 235b 2507 Instruct was not very good. It kept inserting random chinese characters in various places where it shouldn't, although perhaps this was exacerbated because I used both chinese and english text in my prompt. I did request it to answer in English only at the end of the prompt though, and GLM (q4) and K2 (q3) didn't have any issues with that. I also encountered that issue with other qwens: 30b, 32b and 2.5 72b.Quantization shouldn't have been the issue right? I was running Qwen at q8 and GLM at q4 was fine.Maybe I'll try deepseek instead, but I heard the non-thinking deepseek was inferior to the thinking version? GLM and Kimi can barely hit 12 token/s per second on my system, so I don't want to use thinking if possible, especially since deepseek has more active parameters.
>>108049285>Quantization shouldn't have been the issue right?It's more likely to be your samplers.
>>108048983you dropped this
>>108049295Currently temp 0.6, top p 0.95, top k 20 for all models I'm using. What do you recommend?
>>108049285Q8 is only 2% error iirc. Random Chinese is usually an issue with your samplers. Happens in other models too when the settings are too crazy.
>>108048983>ahead of Lunar New YearThat's in June>clueless retards are calling Chinese New Year "Lunar" for political reasons
>>108049325>for all modelsYou are why people crying about models sucking is just noise.
>>108049325>What do you recommend?Depends on what exactly you're wanting. I'm messing with this settings for erotic fucking. It's not perfect but it's getting there.
>>108049349k thx>>108049366Thanks, I'll try this.
I'm cooking with Qwen3 TTS using the voice designer. Anyone find anything better for gooning?https://voca.ro/1hgXFe2ZzeHX
>>108049366>ALL the penalties>minp 0.4wow
>>108049385he's an expert that knows better than the people that trained it so leave him alone
>>108049366>Using rep pen at the same time as DRY>Using rep pen at all >Min P on a qwen3 model>no top k>DynTemp>8k context
>>108049400he's not using dry actually
>>108049385>>108049400Qwen3 writes like an ADHD child on a sugar high. I have to whip it like an abusive father to get it to focus.
>>108049416Post output side-by-side with zeroed out samplers. I bet all you've done is make it retarded.
>>108049430Fuck it.System prompt:>Your response must be one paragraph between 100 to 150 words. Keep the story engaging and interesting. Do not decide what {{user}} says or does.
>>108049536Top is better, bottom is still full of slop but drier and more schizo bsShadows lengthen around her like submissive attendants? Really?
>>108049536>>108049732Actually re-reading, top and bottom are equally schizophrenic and full of slop but top has more interesting descriptions, bottom feels dumber
https://github.com/archi-physics/archi/blob/main/examples/deployments/basic-gpu/config.yamlMIT particle physicists use Qwen2.5-7B-Instruct-1M. Let me guess: you need more
>>108049806Modern physics is mostly just hallucinating random shit that barely explains anything so it checks out.
GLM 5 is going to be a finetune of GLM 4.7.