/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108278008►News>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
migu :3
>>108281695sex
what's the best model ever?
>>108281704the original pre-lobotomy c.ai model is still unmatched in terms of pure soul
>>108281704davinci 003 for writing stories, this shit was absolutely insane
>>108281699fuck yeah, glad to see Nitro+ XTX homieswhat model you running right now
>>108281730link?
>>108281737it's not avaiable anymore, OpenAI nuked it
>>108281737Dead, that's why we need local models.
>>108281695>>108281688
>>108281704it's a five way tie between summer dragon, OG c.ai, mythomax, goliath, and midnight miqu
>>108281764yet you don't use any of those, curious
>>108281771when you "assume" it makes an "ass" of "u" and "me"
just woke up from my 12 hour comais qwen3.5 122b the new glm 4.5 air
>>108281794Prove him wrong?
what you think? he pay or he pray?
Tangential to /lmg/, but still pretty funny.
>>108281804can you post pic i wanna try
>>108281804cuckgpt
>all this time later>still no actual pixelspace, VAEless image edit model>still no big, good omnimodal models that can generate images in chat>still no big, good, natively multimodal models that "see" the image fully and properly>still no real time voice conversation that you can have with the big, good models where they will also understand how you said something not just what you said>still no basic real time 3d/2d avatars>still no easy way to perfectly loop any image into an idle animation with ltx2/wan2.2>still no good image 2 3d model>ltx2 i2v still subpar>even biggest models still get suck on things, still can hallucinate hard>still no solved, just works, RAG>still no solved, just works, internet search with something like searXNG>still no just actually works browser usage>MCP clients are still spotty, especially paired with spotty tool calling>still no 1mil perfect context>still no 3-10mil ok context>still no infinite context>still no 1T params 1b active SSDmaxxer modeland hundreds of more thingsat least most big models are generally very good now and actually good enough to, with some help, vibecoood most actual projects you wantat least early moeGODS and ramCHADS wonat least z image turbo came out and was a huge leap in multiple big directions, basically solved resolution, almost solved out of the box realism (centered around portraits), huge speed boostat least ltx2 came out and was a big turn towards faster genning, getting out of 5s hell, getting out of 720p hell, getting out of no audio hellat least the great seedance 2.0 came out to be distilled by ltx3 or some other company this or next yearat least genie 3 showed that proper 3d space memory can be solvedeverything can and will be solved but the lack of some more basic but important things like pixelspace image edit models or at least a basic 14-32b native speech2speech LLMs seems interesting.
>>108281813tldr?
>>108281811Got the pic from /v/, but I believe it's>pic related
>>108281813gpt 5.4 checks a few of those
I can run Qwen 27B at 1-1.5 token/s or Qwen 35B-A3B at 15 tokens/s.
>>108281829gpt 5.4 doesnt exist
>>108281825>>108281804werks on my machine i guess
>>108281688https://www.stephendiehl.com/posts/computer_algebra_mcp/when tf will they add mcp support to llama,cpp aaah. any program recs?
>>108281838I imagine that there's a whole chat context we don't see that probably steered the model towards that sort of response.
Hello fellow anons. I need help with my qwen 3.5 27B Q5_K_M. its for some reason not thinking with each response its maybe 50% of the time and i have to retry the response to get it to think really annoying. im using koboldcpp btw is that the best backend? used previously ooga but it seems dead.
>>108281704Me.
local sisters every time we start getting and edge the corpos fuck us in the ass, you are telling me they already have 5.4 sitting on a shelf?
Do people nowadays care if a model works with context-shifting or not?
>>108281877>context-shiftingqrd
>>108281877Yes.When you send a bunch of requests to the model with just the last message changing, that shit is really useful.
>>108281879Its a feature in llamacpp/koboldcpp that allows circumventing reprocessing of the whole context once you reach the max context you have set.
>>108281884Qwen thinks otherwise it seems.
>>108281897no i dont
>>108281902are you Qwen?
>>108281897You mean how llama.cpp can't do kv shifting with smm models?That'll probably get fixed eventually.Probably.Eventually.
>>108281804Every single time I read chatgpt's output I want to kys myself and do an hero.
>>108281907No, that's a rnn issue, and it can't be fixed. if you remove a single token from the start you have to reprocess everything.
>>108281913says the lobotomite
>>108281804jfc what did they do to make it sound like this
Why are normies so dumb? And obviously the luddites are throwing a party not realizing this is a skill issue.
>>108281891koboldcpp has that functionality under "fastforwarding", kobold's "context shift" purges old tokens from context when context is full.
>>108281926You are too young to know what an hero even means. You are the real retard here.
>>108281936either that is fake and gay or the company is fake and gayeither way it probably doesn't matter that the ai was also fake and gay
>>108281948You are the newfriend, imagine saying you want to kys and an hero in the same sentence
>>108281936That just goes to show that the company in question is worthless, that it doesn't really matter what they say or do, and that that their upper management is retarded and doesn't need to exist.
You are the reason why /g/ has died.
>>108281946Doesn't work with rnn, still have to reprocess everything once you hit max context. Try running rwkv or qwen 3.5 and you will see that it won't work.
>>108281976good
>>108281976meant for >>108281928
I’m hearing good things about this “Qwen” model. Is it actually all that or can I go back to paypigging? I have 2x3090
>>108281978Yeah I know, I'm talking about how it works in models where the feature is supported.
>>108281988you need at least 2 6000s to run it properly, then it is legit better than opus 4.6
>>108281988Try out either the 27B model or the 122B-A10B model. They seem to be roughly similar with the bigger model being a bit better and faster since it's moe.
>>108282000Guess I’ll just fuck off then.
>>108282018Yeah...
>>108281988qwen2.5-72b fits on that at q4 which should be plenty
Sillytavern/Kobold user. I may have altered a setting ages ago that I cannot remember, and now after every general prompt it just keeps going and gens another one after another after another. My token size per gen is 250. Surely there's something simple I'm neglecting here?
>>108282085auto-swipes in ST user settings?
>>108281936I don't understand how that happenes. If you feed the model your data and ask it questions it will have numbers to quote but if you don't give it any data why would you expect it to have access to your sales data.Furthermore how do you not know your data well enough to do a sanity check simply by glancing at what it produces.You have the same issues when you ask a subordinate to construct a report. You can't just assume he is correct and despite trusting him you must also verify the results.I don't want to be mean but that guys issue is not AI.
>>108282094thar she blows, cheers m8
I've been out of the loop for a bit. What's the current best local model available for utilizing large amounts of RAM with 32GB VRAM? Is it still DeepseekV3 and Kimi K2 or has something else been released?
>>108282099why cant the ai figure out how to find and access the data on its own? isnt it intelligent?
>>108282110I still use this one
>>108282148>isnt it intelligent?No, stop falling for marketing lies like a retard.
>>108282155i bet you are either very rich or very poor
>>108282165so its smart enough to bomb iran but smart enough to figure out where the data is?
The bait will continue until anon's pattern recognition improves.
>>108282169you wouldn't get it
>>108281688
>>108281813>still no 1T params 1b active SSDmaxxer modelYou sleeping on snowflake arctic?
>>108282172>so its smart enough to bomb iranSorting through communications in a network you already have backdoors in, doesn't require intelligence. An intern doing ctrl+f through the logs could have achieved the same result, albeit not as fast.
>>108282193why she blushin
>>108282110K2.5 thinking at q4
>>108282018>>108281988You don't need that. Your current hardware is sufficient to run Qwen3.5 122B-A10B or Qwen3.5 27B. Both are good models. If you want to do ERP with them though then you should grab the Heretic versions of those models.>>108282040This is an old model, don't use it.
>>108282203q2 is better, more creativity
>>108282203>>108282213What are the gains and losses compared to K2-Instruct and K2-Thinking? Moonshot was hopping on the censorcuck train last I saw.
>>108282245can you not use such vulgar words?
you crazy nigga. but i appreciate it.
►Recent Highlights from the Previous Thread: >>108278008--Agentic roleplay potential demonstrated through blackjack simulation:>108278746 >108278774 >108278813 >108278819--StepFun releases 3.5-Flash models and training tools:>108280402 >108280421 >108280426--122B model excels at Japanese text transcription:>108278617 >108278679 >108279715 >108280042 >108280080--Manual offloading outperforms --fit for 122B model on 3090+3060 setup:>108281460 >108281492 >108281506 >108281543 >108281720--International models lag behind frontier labs on ARC-AGI-2 benchmark:>108279363 >108279384 >108279387 >108279404 >108279418 >108279428 >108279567 >108279598 >108279612 >108279657 >108279617 >108279629 >108279836 >108279469 >108279746 >108280473--Open-source AI models performance gap with proprietary models:>108279687 >108279804--Qwen3.5-35B-A3B GGUF quantization benchmarks:>108280652 >108280670 >108280678 >108280680 >108280735--Qwen 3.5 Small Model Series release and performance claims:>108278104 >108278328 >108280444--Qwen3.5-35B-A3B-Heretic hitting 72 TPS on 7800X3D/7900 XTX with new llama.cpp:>108281622 >108281636 >108281652 >108281657--Qwen3.5 35b 4-bit vs 122b 6-bit speed tradeoffs:>108280506 >108280525 >108280560--Devstral-2 model's flawed Jinja date logic template:>108278061 >108280633 >108280638--AI response generation process critique and benchmarking culture:>108278971 >108278991 >108279011 >108279036--Qwen 3.5 benchmarks:>108278349 >108278416--AI internal reasoning resisting offensive prompt bypass attempts:>108278112--Qwen 3.5 27B speed optimization on budget hardware:>108279596 >108279608 >108279623 >108279631 >108279638 >108279653 >108279662 >108279685 >108279689--A.I. Dating Apps Complicate China's Efforts to Boost Birthrate:>108278523--Miku (free space):>108278507 >108280771 >108281230►Recent Highlight Posts from the Previous Thread: >>108278113Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script