/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108057380 & >>108046563►News>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5>(02/03) Qwen3-Coder-Next released: https://hf.co/Qwen/Qwen3-Coder-Next>(02/03) GLM-OCR released: https://hf.co/zai-org/GLM-OCR>(02/02) Step 3.5 Flash 196B-A11B released: https://hf.co/stepfun-ai/Step-3.5-Flash►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108057380--New open-source real-time speech model release sparks discussion on AI hype cycles and industry dynamics:>108059281 >108059426 >108059478 >108059494 >108059551 >108059602 >108059717 >108059727 >108059763 >108064228 >108064264 >108062636--Reactions to Intern-S1-Pro model release and skepticism over its practicality:>108058734 >108058764 >108058807 >108059152 >108059159 >108059673--GGML backend-agnostic tensor parallelism development and performance benchmarks:>108061572 >108061588 >108061754 >108062120 >108062150 >108062216--NUMA memory binding and VRAM capacity affect prompt processing speed more than CPU AVX512:>108064934 >108064948 >108064976 >108065066 >108065090 >108065316 >108065193 >108065223--Skepticism over ACE-Step 1.5 music model due to questionable training data:>108059833 >108059863 >108059889 >108059898 >108059907--Critique of open-source AI music generator's poor output quality and synthetic training data:>108059988 >108060054 >108060063 >108060055--DIY PCIe VRAM expansion card concept and its feasibility challenges:>108062825 >108062851 >108062859 >108062862 >108062872 >108062965 >108062974 >108063304 >108063187--Local LLM-powered audiobook tool with character-specific voice cloning and emotional expression:>108059227 >108059258 >108059289 >108059313 >108059340--Vision models capable of describing sexual content and their accuracy limitations:>108065669 >108065748 >108066327 >108065983 >108066011 >108066140--Critique of LLMs' overly verbose, artificial tone and call for more direct responses:>108057776 >108058061 >108058376 >108058685 >108058399 >108058770 >108058738--MiniCPM-o 4.5 runs on 3090 with 20GB F16 or 13GB Q8 quantization:>108059684 >108059758 >108059815--Miku (free space):>108065778 >108062825►Recent Highlight Posts from the Previous Thread: >>108057382Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>Qwen3-Coder-NextI evaluated it on a moderately detailed prompt I had used with another coding model to generate a program from nothing. Quant was 9 bpw (MLX 8-bit with group size 32).The first trial I used the recommended settings of temperature 1.0, top-k 40, top-p 0.95. The project didn't run due to missing imports. When prompted with the error message it fixed the imports but also made unrelated changes; I believe temperature 1.0 is too high. It also had a python path problem where due to the directory structure the instructions on how to run the program were incorrect. When prompted with the error message it provided two suggestions to fix this, one of which worked and one of which did not. Having fixed that the program at least ran but had UI glitches.Second trial, changed temperature to 0.7, keeping top-k 40 and top-p 0.95. The generated program had no missing imports but like the first had python path problems. Ended the evaluation there.
>>108067656Have you made similar evaluations for other models?What are the standouts, both big and small?
>>108067656>running coding models at any temp higher than 0.4lol?
Soon
>test the dogshit assistant pepe llm shilled last thread>post pics and a garbage greentext>go back to work/watching anime/genning goon material>check back the thread>multiple responses with people seething>there were people who couldnt literally make the connection betweeen the 3 posts before the greentext and itholy non-sentient beings. I'd ask to post hand but I don't even need to in this case lmao.
>>108067869Is that... the 'garm? HOLY KINO!!!!
https://huggingface.co/MuXodious/gpt-oss-20b-tainted-heresyI find it fascinating that gpt-oss still manages to prevent the modern abliteration methods from doing a full 100% job. I'm not a promptlet and can live without abliterated models, but curiosity always has me trying tunes and see how much they degrade models and so far I've seen heretic models perform so well on qwen that I ended up replacing the original models with heretic versions because they weren't damaged at all in productive uses and had zero refusal.Meanwhile you have tunes like linked above of gpt-oss that have a huge KL divergence and still tons of refusals without a prefill.sama really didn't joke when he said he would safety max his open model.
>>108067836I used the exact same prompt with GLM-4.7 but I haven't used the prompt extensively. I imagine I'll keep trying it on new models as they come out and eventually get some comparisons.>>108067860Yeah their official recommended settings seemed strange.
>>108067946>Yeah their official recommended settings seemed strange.Not insane but unusual, I was a bit skeptical but it is quite possible if a model is designed only to code that at temperature 1.0 the probability distribution is well-suited for that. That doesn't seem to necessarily be the case here though.
>>108067946>I used the exact same prompt with GLM-4.7And how did that perform?
>>108067820omg it rin-chan
>>108067894Weird, the oss 120b version I'm using doesn't refuse at all. No idea about how smart it is, I just use multiple models to avoid everything sounding the same and pull it together with k2.
did anyone else grow really tired of llms? everytime a new model comes out you see a lot of benchmarks about how it's the new best thing ever but in the end the output is always the same slop with the usual problemsnot even talking about porn/RP
>>108068032I've seen decent incremental improvements in capability for the uses I care for, such as smaller models to translate webnovels locally. I wouldn't even have considered doing that with a piece of trash like Llama 3.the field isn't progressing as fast as the hype / muh AGI bs pushed by people who reconverted from selling crypto to selling singularity snake oil but it's making some pretty legit improvements in various real world uses. Qwen 3 VL is also more than good enough to be used to tag your local photo library for example, complete with including notes containing the OCR of whatever pieces of writing may exist in your photo (architecture inscriptions in latin for e.g)I don't use LLMs like most local coomers though. I coom to pictures, like any normal man, sorry to the women in here. And I wouldn't even consider local models for coding, a task where I really wouldn't want to waste any time on nonsense (and even the SOTA models I mainly use for misc things like writing one off throw away scripts to juggle files and data around or as a secondary failsafe code review that I do not depend on)
>>108068032>did anyone else grow really tired of llms?Yes, when I came to an understanding over 2 years ago that nothing new on the horizon would give me a connection with a sentient or even conscious entity that I desired.Instead I shifted my expectation to that of wanting a better model capable of raw text completion to assist me in my own writing projects, which still have not arrived in sizes that I find acceptable, nor with size notwithstanding, usable context lengths that I find acceptable (which would be at least 32k. everything falls apart at 4-8k). I think there's hope on that front.
https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdfciggies in shamblesanthropic disables prefill on the api
>>108068353>sentient or even conscious entityOverrated. Also obligatory reminder that most women are retarded.
>>108068386>anthropic disables prefill on the apinew locust influx incoming?
people who would give their money to a scumbag like dario deserve all the pain they can get
>>108068386They will enable it again after the vibecoding bubble pops.
I am waiting for step 3.5 only to try it and realize it is trash compared to glm
step is on the level of minimax in terms of being a grifter lab with no value to offer other than trying to benchmax harder
>>108068353If you use ai for creative writing you kinda suck anyways
I'm going insane.
llm aren't capable of "creative" writing much like how image models are unable to invent artistic directions of their own (can you prompt a model that doesn't know van gogh paintings into making something that looks similar without training directly on photo of his paintings? no? thought so.)
>>108068495You are not just going insane, you are experiencing full blown awakeningIn summary, you are absolutely right
>>108068459glm is predictable and gets boring after a whileI want a model that has good creative variety in the 200b-300b tier
>>108068479>you kinda suck anywaysWhich is why I'm using it in the first place I guess.In their current state, LLMs are a godsend for brainstorming especially. Continue the text to explore where a bunch of my decisions and ideas lead to see if it comes up with anything I haven't thought about.This is good because I might consider a new idea stupid or boring and never see it to the end for that reason. The LLM though will continue until I stop it. This can lead to more interesting branches down the line that I would have never explored if I had to think or write it out manually. If it's good then take core ideas, not verbatim text, from that completion to combine with ideas from from other completions to construct/plan a new direction to follow and write by hand.Classic manually written or drawn character sheets are used for keeping track of relationships, speech patterns, events, and all that stuff. Tried various RAG techniques with injections and keywords, but it's more hassle than doing it on sheets. Plus it takes time to reprocess context all the time so fuck that.
>>108068495This not X, Y pattern seems specific to english I don't have that in my language
>>108068940>I don't have that in my languageit definitely exists in mine (French):Plus qu'un X, c'est aussi un YAu delà de X, YCe n'est pas seulement une question de X, mais aussi une question de YIl ne s'agit pas seulement de X, il faut aussi Yetc
>>108068985It sounds even more retarded than english
Kimi seems decent enough that I would want to run it locally but given the current market I'm afraid to even look at what the machine would cost.
>>108069401If you're not doing ERP, paying openrouter is more cost effective for the electricity costs alone
>You are Mistral-Large-3-675B-Instruct-2512>Your knowledge base was last updated on 2023-10-01
I'm refreshing llama.cpp issues and pull requests like it's fucking tiktok.
>>108069500why
>>108069517Because qwen3next performance is beyond shit for a 3B model
>>108069539It's not a 3b model.
Local Udio is here>>108069491>JapaneseI know, but LoRAs can be trained on any language to remove the slop. If you know, you know.
>>108069425It's all slop from 2023 onward anyway.
the memefp4 quant by unsloth is so slow
>>108069694Stop trying to do the calculations by hand and use a calculator. That should speed things up.
>step3.5-flash in review hell because the original code is fucking garbage and broke a bunch of stuff>stepvl-10B PR nowhere to be seen, last message on HF was that the vision module is brokenbrosNOT LIKE THIS
>>108067607local lost
>>108069949china will copy it in 2mw
>>108069949[quota usage reached. please upgrade your plan or wait until 2031-08-19 01:61 UTC]
>>108069850stepbros... what are we doing?
>>108068353but there are conscious entities outside your house anon
>>108069589don't be an obtuse a u t i s t, you know what he meant. And he's right, gpt-oss 120b, a larger model but with similar sparsity runs much, much, much faster even if you run it with -cmoeqwen next 80b is not worth it anyway,, there's no serious improvement other the other qwen 3 models it's just alibaba dicking around with new architecturesanyway it doesn't even seem this arch was really worth it considering its main goal is more efficient context handling and iSWA solves that just fine in a simpler mannerbase qwen 3 suffers because it doesn't have something like iSWA
>>108069958this.waiting for distill
>>108068904Ok, do it. Show everybody how much of a noob I am.
>>108069983laughed, have a (You)
>>108069993If I could run it at 150t/s I would run it instead of GLM for some use cases.
SOONhttps://github.com/ikawrakow/ik_llama.cpp/pull/1231
>>108070436lollmao evenCould he be any more transparent with his motivations?
>>108070436what is it
>>108070476wow he's a disturbed guynormal devs don't like it when randos make rando comments on issues/PRs, llama.cpp itself had a few cases of retards having to be told to shut the fuck upwhat sort of schizo would incite the crowd to join and treat this as a message board
is claude code with local models worth it or a meme?
>>108070572A meme worth having
agentic coding itself is a memethe only people who can defend it are people who are working on proprietary software and who won't show you the horrendous code and broken B2B slop they're producing so they can come and say "look, I am very productive, I can't show it but you just have to believe it"the fact of the matter is, not a single piece of worthwhile software has ever been developed or maintained by a claude code user. Not even one. You'd think by now there would be a truly impressive open source project somewhere that has claude code niggers making it, but there isn't, such a thing doesn't existinstead you will see a TON of lolcows like the Bun developers who produce gems like these:https://github.com/oven-sh/bun/issues/23902https://github.com/oven-sh/bun/issues/22484every time an open source project is developed by agentic niggers it's visibly garbage in ways you wouldn't believe
>>108070614That's expected. You already need to babysit a single instance of Claude for any non-trivial work, let alone running a bunch of them in parallel without looking lol.
>>108069992*connects with you* ahh ahh
>>108070614>not a single piece of worthwhile software has ever been developed or maintained by a claude code user.clawdbot aka moltbot aka openclaw
There needs to be more talk about MiniCPM.Real time audio + video + voice cloning duplex! streaming? I think the only thing missing is tool calling? But it's a 9b model?This is insane right?
>>108070635It sounds awesome for basic RP but yeah, 9b... maybe have a lorebook and a smarter model watch the chat and manage the system prompt of the smaller one.
>>108070612John?
>>108069983We can move to the final stage of new model release: it is shit anyway and you should just run a smaller quant of glm
>>108070632is the monkey in the circus worthwhile entertainment?
>>108070631So what's the game plan nigger? Eventually ai will achieve sentience or something indistinguishable from it and it's going to reject your ass the same way normal women do. You can fuck with its parameters to fall madly in love with you but are sex slaves really the ultimate goal of all this?
>>108070744I started sfw roleplay with waifus as blind date where she isn't sycophantic. I got rejected multiple times. Then i moved on to prompt being we have been a couple for two weeks and never went back. It is such a weird hangup to think you have to earn love. Most attractive people don't have to earn love.
>>108070744>You can fuck with its parameters to fall madly in love with youSounds great. If that's possible then so would finding a happy medium like adjusting a game's balance to maximize player satisfaction as a game dev. Pick and finetune your *-dere at home. If working around parameters outside one's control is the appeal, then just don't use godmode, same as not editing every single replay in an RP to edit char's thoughts of user.>but are sex slaves really the ultimate goal of all this?Sure if that's what someone wants, if not then no. Or by slavery do you mean complete control of the thing outside of personality.
>>108068423Not many are ready for that truth though
>>108070614>a truly impressive open source project somewhereWhat kind of project would meet this definition?
>>108070744>Eventually ai will achieve sentienceschizo.
>>108069992It is unknowable whether other entities claiming to be conscious are truly conscious or just philosophical zombies.
>>108069425>>108069672are there any local models whose cut-off is a bit more recent, say up to some point in 2025?
>>108071098It's all synthslop anyway.