/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108742275 & >>108736046►News>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108742275--Optimizing dual RTX 3090 setup for Gemma 4 with speculative decoding:>108743090 >108743104 >108743131 >108743530 >108743551 >108743361 >108743224 >108743589 >108743643 >108743838 >108743866 >108744035 >108744070 >108744217 >108744276 >108746395 >108744193--Encoding images in text completion using llama.cpp server settings:>108745965 >108745986 >108745995 >108746023 >108746145 >108746162 >108746153 >108746191--llama.cpp PR adding DFlash support for speculative decoding:>108743701 >108743736 >108743776 >108743804 >108743909 >108743958 >108743768 >108744535--Qwen 2-bit performance testing with Canvas API image recreation:>108743817 >108743856 >108743868 >108743922 >108743957 >108743946 >108743951--Critique of modern AI's lack of initiative in roleplay:>108743944 >108744192 >108745177 >108745200 >108745218 >108745240 >108745279 >108745235--Gemma's logical inference of scenario cards and early RP LLM formats:>108742385 >108742425 >108742466 >108742480 >108742499 >108742505 >108742558 >108742629 >108742653 >108742651 >108742666 >108742729 >108742594 >108742599 >108742936--Comparing AI agents to manual copy-pasting for coding productivity:>108746398 >108746438 >108746461 >108746727 >108746980 >108746758--Repetitive "Let me write" reasoning loops in various models:>108744796 >108744899 >108744920 >108744927 >108745667--Debating the necessity of jinja templates over raw prompt formatting:>108747743 >108747755 >108747803 >108747912 >108748008--LLM and inference engine embedded within a .ttf font file:>108743927 >108744507--Inconsistency of Gemma 4's refusal vectors and censorship levels:>108742306 >108742365 >108742490 >108742379 >108743700--Logs:>108742558 >108742594 >108743868 >108744345 >108744796 >108746216 >108747024--Miku (free space):>108746641 >108747847►Recent Highlight Posts from the Previous Thread: >>108743862Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Give Dipsy support, llamaniggers.
>>108749305>exlcan you even run this in llama/kobold? isn't it appleshit?
gemmaballz
>>108749417no, youre thinking of mlx. exl3 is for tabbyAPI, which outputs an openai compatible API just like kobold and llamahttps://github.com/theroyallab/tabbyAPI
Lecun was wrong.
>>108749467Can you be wrong if you literally haven't done anything?
>>108749467I wish.I want my local cat intelligence already...
>>108749477vjepa says hello
>>108749486Holy shit it's really becoming a cat
Remember when Netflix of all companies made JEPA real and nobody cared?https://huggingface.co/netflix/void-model
>>108747912>If you aren't doing rag or tool calling I can't imagine a prompt template more complicated than what you do in ST.but no image support on v1/completionsis that a llama.cpp only limitation?
>>108749548If you want to use images or other types of data in text completion mode, use the /completion endpoint and send the images or audio encoded as base64 in "multimodal_data", they'll be tokenized and put in your prompt wherever you place the media markers:https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#post-completion-given-a-prompt-it-returns-the-predicted-completionThat's how you'd do it if you were coding your own frondend anyway, not sure if SillyTavern does or supports this sort of thing.
>looks at you like you are a bug under a microscope>a stain on the carpet>a gum stuck to her shoe>a specimenWhat's the name for this slop phrase and how do I prompt it away?
>>108749617what’s wrong with similes?
>>108749642I already told it to remove similes, it's this specific one that doesn't go away. AI just can't express cold, indifferent characters without injecting this phrase.
>>108749649>>108749617metaphors?
>>108749649yeah but, like, isn’t that good writing?
>>108749649try a regex to strip it from the gen message/history going forward, so the model can gen it, but it gets ripped out
>>108749642She looks you and Riley similes.
>>108749658Broccoli is good for you. Therefore you should eat 10 pounds of it every day.
>>108749678holy THIS
What would be the ideal optical disc to store day 0 gemma weights on if I want them immutable? I thought we'd have some TB-scale discs by now but it looks like everyone just gave up after blu-ray? Did the guy who invents new discs die or something?
A little birdie told me that openai is getting ready to release gpt-oss 2 and it's going to be a huge shift.
>>108749808I am expecting that given the case with Musk. Hopefully it's less censored this time.
>>108749697>optical plebhttps://en.wikipedia.org/wiki/Linear_Tape-Open
>>108749808more like huge shit
>>108749824>the case with MuskWhat did I miss?
>>108749907>What did I miss?tl;dr musk has been calling saltman on his bullshit for the last 2 years (ie. turning openai into closedai and pillaging it)
>>108749697>if I want them immutable?just set the immutable bit kiddo
>>108749697You'll have to find old floppies and move the little tab. There's no other way.
>>108749959??? But gemma makes me ERECT how can she fit on a FLOPPY disk?
fyi everyone, ace step 1.5 xl is out and really pushes things up in terms of local music gen. aka suno at home.
>>108749988You said this last week, and it was shit last week. It'll be shit this week too.
>>108749808>it's going to be a huge shift.leaked image of gpt-oss-2
>put porn collection into rag>track fap and reviews with llm chats daily>suggest porn for me todaywhy not?
>>108750021^ that's a bot, by the way.Just testing the bots, as you were gents.
>>108750074hello bot-chama
>>108750030SLAM THAT BAD BOY UNTIL YOUR KEYS ARE ALL STICKY
>>108749398i was still using sonnet for opencode but since qwen 3.6 i'm now not using any cloud models any more.
>>108750096>>108750097>>108750098>>108750099>>108750100>>>/vg/vn
>>108750110yea i think it glitch i didn't mean to post it 5 times and that's why i deleted most of them.
>>108750114no worries, I'm glad you left 2. it's always smart to have a backup :)
>be /lmg/>cry about moes all day>mistral releases big dense model>no talk about ity'all are a bunch of hypocrites
ace step 1.5 base xl, not using the "thinking" 5hz lm, but using the lm to set the tempo etc.https://files.catbox.moe/9lz9tp.mp3
>>108750138Because mistral is EUROPEAN and the EUROPEAN laws PROHIBITS ai companies from making GOOD ai, so mistral medium is BAD!
>>108750138Why talk about it when even Mistral doesn't pretend that it's good. It's only selling point is that it uses an ancient backbone so the flops are below EU regulation limits.
>>108750152We need to protect the EVROPEAN VALUES!
>>108750138Almost all of those posts were bait.Maybe even your post is bait.
>>108750138>cry about moes all dayMoe vs dense is a meme fight, what matters is good models regardless of architectureMistral did not release a good model. They haven't for years
>>108750138bart just put his quants up a few hours ago. i'm not dumb enough to download the first unsloth uploads
>gpt oss 2God I can't wait desu. Look Qwen isn't the worst thing ever but if it's going to be censored and STEMmaxxed, I'd wish for it to be a bit more reliable and not so heavy on the repetitive redundant looping thoughts.
eggu
>>108749933Altman only needed one release and that's all we got. Why would he release another one especially when it jeopardizes their resources which is already constrained and they were losing to Anthropic until recently? Sure, it would be nice to get another open source release but I doubt it will be done because Musk is going to lose quite badly on the lawsuit as unfortunate as it will be nice to see both of them get taken down a peg. Also, if they do it, it would obsolete their mini and nano release guaranteed on meme marks. I bet it is why they waited for a while after releasing GPT-5 with mini models before they attempted to do a mini series again. I think this is all that we'll see of it until 2030/2031.
>>108750244>not 7 and i holdingsngmi
>>108750141Here's your udio/suno at home:https://files.catbox.moe/61cwlr.mp3singing the news! FRESH news! :^)It sounds so optimistic. Maybe news should always be delivered this way.
>>108749988>>108750141>>108750275What's the USE CASE? When TWO point FIVE percent of the population is DEAF? They need to focus on TEXT, which EVERYONE can read!
>>108750141how many steps for base?
>>108750298I did 100, because acestep.cpp is capped at 100 for no apparent reason.
>>108750298and, I'm using dcw at .001 / .001if you use the audio codes aka 5hz lm, you will get more squared up and more consistent results, but it's less dynamic.
>>108749467It's just that JEPA as intended by LeCun won't work with text in any useful or even meaningful manner. It does with images-video, and that's what he's pushing, but people interface with computers and other people primarily via language and its rules.Planning text with video would be astronomically inefficient, so we can shove that idea aside. Predicting sentence embeddings from other sentence embeddings in order to convert them to intelligible text (the closest thing to an actual text version of JEPA) doesn't even work as there would be no intelligible solution to interpolate between two continuous text embeddings (unlike image frames).LeCun lost the plot, no wonder he basically got kicked out of Meta.
Can You Invest in Cultured and Perfected These? And Have Your System not stall it from Pharmacy andor Prescription?
>>108750287Uhh, sweaty? Blind people can't read. And no, they don't automatically know Braille.
>>108750122i wanted to delete all of them but 4chan went "muh you are deleting posts too fast".
>>108750346BLIND people can just use speed to text ACCESSIBILITY readers
>uncensored-heretic-abliterated-turboquant-opus-distilled-nvfp4-gguf
I'm pretty new to this, am I doing this right?I downloaded gemma-4-31b-it-the-deckard-heretic-uncensored-thinking-i1 and lm studio, using a 3090. Feels like I can say like 8-10 things to it before my context length fills up and I have to get rid of earlier messages. And because the model is as big as it is, my vram is near capacity so I can't increase it any further. If I have stuff in my system prompt, I have even less context length to work with, and system prompt is the thing you use to have any consistency and keep a somewhat history right? I feel like I'm doing something wrong here.