/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108459276 & >>108453570►News>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108459276--Voxtral TTS release and initial impressions:>108459652 >108459758 >108459766 >108459836 >108459844 >108459888 >108459902 >108461249 >108462139 >108459995 >108460450 >108460456--Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models:>108464620 >108464631 >108464656 >108464739 >108464755 >108464808 >108464836 >108464856 >108464905 >108464929 >108465010 >108465032 >108465028 >108465052 >108465072 >108465094 >108465163 >108465186 >108465191 >108464865 >108464644 >108464662 >108464680 >108464697 >108464682 >108464701 >108464750 >108464764 >108464803 >108464862 >108465009 >108465026 >108465073 >108465080 >108465139 >108465432 >108465850--KLD heatmaps reveal hidden degradation in aggressive KV cache quantization:>108463990 >108464591 >108464625 >108464635 >108464627--Mistral releases open-source Voxtral TTS:>108459318 >108459428 >108459525 >108459563 >108461953 >108461978 >108462002 >108462064 >108462078 >108462107 >108462503--GPU coil whine interferes with guitar amp, TTS voice cloning comparisons:>108460183 >108460208 >108460218 >108460232 >108460247 >108460881 >108460901 >108460910 >108461004 >108460928 >108460947 >108460975 >108461005 >108461025 >108461047 >108461080 >108462135 >108462264 >108462944 >108462961 >108462982 >108463064--Models handling verbatim lyric requests differently due to alignment:>108464911--Evaluating TTS demo quality:>108459914 >108459947 >108459956 >108460605 >108460650--Chroma Context-1 model released without harness:>108463927 >108463946--Z.ai 5.1 open-source release expected early April:>108465382 >108465454 >108465751--Qwen3-TTS and VibeVoice resources:>108459560--Miku and friends (free space):>108460212 >108462728 >108462782 >108465571 >108460736 >108461211 >108461256 >108461280►Recent Highlight Posts from the Previous Thread: >>108459279Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108466262Blessed small saggers thread.
>>108466278>small
>>108466262Whatever you do, DONT read any manga drawn by Ankoman.
>>108466262Saggy tits eewww
>>108466254>Make it make sense anon. They are directly influencing how AI functionsApple is to incompetent to shit ANY useful model anon..... their influence is non-existant
>>108466254>that one can't make fun of in a meme format on AI image gen platforms with the Apple logo.???
>>108466262>that imageDeepseek is coming
The WHALE
>>108466327>deepseek is comingNO, I am
>>108466321There are certain platforms based in silicon valley that prevent anti-Apple shenanigans.
>>108466327a lot larger but we can run models six times larger than we used to thanks to Google Turboquant™
>>108466337previous was gemini, now here's chatgpt on the same promptthere's no conspiracy against shitting on apple in your SaaS prompts (or LM studio or whatever else), you need your meds
>>108466352>>108466321You need to fuck off to one of the image gen generals.
>>108466371you need to fuck off to leddit with your amazing reading comprehension
>>108466378You've been posting your retarded image slop with the same character for days and it's not even local.
>>108466383>YouI am not the guy you're referring to, I just decided to reuse that character since I found it funny. Try again, retard.
I hate that I have to build my own 'FETCH' mcp tool because the autists have made it respect the robots.txt
>>108466393That irrelevant because your posts are still shit.
>>108466397>the autists have made it respect the robots.txtyou are a blight upon the earth
qwen 122b-a10b changed my life
Tetonation incoming... I am sensing Gemma 4 release before Easter.
>>108466410For the worse?
>>108466397Why build one from scratch instead of forking the existing one?https://github.com/modelcontextprotocol/servers/blob/main/src/fetch/README.md
>>108466415I forked this already of course>>108466405im gonna rape your website, retard
Something will probably happen at some point. Or not. That's my prediction. Screenshot this post.
>>108466410How many times did your ego die?
>>108466413for the better, I've been pasting giant research papers into it and asking it to explain it in simple terms so I can actually implement shit without having a PHD. It's going great. I've finally found a good balance between trying being an AI luddite and letting my brain atrophy and expecting AI to do all thinking for me.
>>108466415bro...
>>108466423nostrildamous over here with the takes
>>108466433I knew you would say that.
>>108466400let me guess, iToddler?
Is $400 for a v620 okay?The listing said 16gb, but amd never made any 16gb v620s, did they? About as strong as a 6800xt....
>>108466427Using AI for inspiration did wonders for me but i fucked up letting it write code without going through the steps and thinking about it critically. I feel like im a worse programmer as a result.
>>108466432https://github.com/modelcontextprotocol/servers/blob/main/src/fetch/README.md#customization---robotstxtit's right in the instructions too, but vibecoding is the solution to everything now fuck reading
>>108466496>reading doc instead of sourceI'm not a nocode shitter sorry :(
>>108466262wowzers, how do i make my own goonbait with a local model? im on an asus ultrabook with no GPU btw, just intel graphics
it isnt this, it is thisthey arent that, they are that
Do venvs use system cuda or is the cuda toolkit packed somewhere inside it?
ggnigeranov TURBOQUANT KV + WEIGHTS SUPPORT WHEN!?!?!?!
>>108466469Don't know about amd cards, sry.16gb nvidia card can be had on ebay for $450-$500.
So what happened? Youtube video about turbo quant?
>>108466547i think maybe no by default. i had to use --no-build-isolation when i compiled flash attention.
I wonder if this is another nothingburger to raise the company stock, or an actual advancement that can benefit local AI.
this guy has a 'living rent free in your head' problem lmao
https://www.reddit.com/r/LocalLLaMA/comments/1s56q9g/new_unsloth_studio_release/
>>108466588Would IK really have been able to "independently discover" Hadamard transforms without the easy-to-follow implementation in ExLlama?
>>108466588That's called having an inferiority complex.
>>108466593What a bunch of hacks
>>108466588I would probably end using his fork if he wasn't such a fucking baby.Or maybe he would still be contributing to mainline if he wasn't.
>>1084665546800xt (500gb/s) is about the same performance as a 5060 ti (450gb/s), according to random benchmarks on the internet. So basically a 32gb 5060 ti without cuda for $400.
>>108466600goated reference there my friend.also checkedwill cudadev be able to EVER recover, pipeline paralellism bros???
>>108466604nah they're fine, weird they didn't mention this in the release notes though...
>>108466588Everything is a personal affront to him. Very unstable.
>>108466547System cuda is referring to cuda compiler which is totally different from pytorch.Venv are using pytorch version what you have installed. Pytorch interfaces with your graphics drivers.
>>108466632Yes but the package is cu128 but I have cuda 13. It still runs but for any potential issues I want to know if this is something worth looking into in the future.
>>108466645doesnt matter bro, cuda is made to BUILD, that shit will run in your DRIVERS for fucks sake you stupid mongoloid.
>>108466645You will only run into problems if your graphics drivers don't support the pytorch version. In any case don't worry about it as long as pytorch is recent enough.
Why can't they, like, just stick a tb of vram on a single gpu?
>>108466686that would be antisemitic. think of the poor shareholders!
>>108466686You wouldn't pay their asking price.
>>108466690>>108466692I'm a shareholder. Where's my special nvidia discount?????
>>108466583This is so retarded. It's just like a few months ago when Gemini 3 released and day traders were spamming Nvidia was done because Google had just developed a GPU-killer called T-P-U.
>>108466657thanks mongoloid lover anon
>>108466593These uncsloth and redditards are really a match in heaven
>>108466604maybe but no one else seems to properly do various quant gguf for everything like they do
>>108466757>properly doLMAO
>>108466764I don't get the Unsloth hate. Isn't that two dudes doing work that 99.999% percent of people couldn't do? They're contributing to democratize llm finetuning more than anyone else. In the talk that he's given, Daniel Han also seems a bit high strung, but competent.
>>108466764[various quant]<-(properly do)or[gguf]<-(properly do)how interpret?
>>108466771daniel why the fuck are you posting here? go be clueless about your shit somewhere else.
>>108466771>couldn't doDon't *want* to. You don't need to load the full model to quant it.
>>108466779Are you doing training at all or just using and RPing? Unsloth is an easy to use library, it is helping me. Maybe I haven't found something better yet, but I've tried Axolotl, Llama-factory and just Transformers, as well as plain Pytorch, and Unsloth is simple, the notebooks are great to show what do to do, etc.
>>108466792I was talking in general with Unsloth, just not gee-gee-u-huff-ing the mdoels.
>>108466809not just guh-guffing the model I meant, can't spell today.
>>108466802>the notebooksJupyter is the worse fucking trash ever invented.
>>108466818Yes we should all code in emacs.
>>108466809They can't be trusted to make a quant without reuploading it 20 times. I wouldn't trust them with training.
>>108466802NTA, but I did some training and Unslop paywalling their multiGPU feature make it almost useless if you're not finetuning GPT2. Axolotl is way more flexible, has actual competent devs you can talk to. Stop being retarded.
>>108466835It's not paywalled (anymore at least). I just did on 2x and 4x GPUs, launching the script with accelerate, and it all went super smoothly.
>>108466826I write all of my code with Doom Emacs and a German keyboard layout.
>>108466826I miss vim but after I tried to finish a rewrite of my project I just can't. It always somehow pastes wrong and all these little things like when you hit escape (which is my capslock) cursor jumps back one character.. so fucking irritating. My .vimrc is pretty long and I have used it for years now. I can't recommend vim to anyone unless you are working over a terminal I guess.Emacs isn't that much better.These software are used by autists because they don't know what ergonomics means and don't mind to press 4 different keys to get a simple functionality.
>>108466852nta. I use vim, but based anyway. You're too cool.
>>108466854To add: and then there is the sunk cost fallacy. Just because you have used something for years doesn't mean you can't ditch it and get something better.
>>108466863I use vim when I log into something through the terminal, but locally I use VSCode and notebooks. I am trying to bring myself to switch to marimo to vibecode-maxx since current LLMs have difficulty working with Jupyter, just haven't had the will to yet.
>>108466872I mean writing should be something what is intuitive and not behind multiple keystrokes and guesswork will my selection paste correctly or not.
>https://github.com/ggml-org/llama.cpp/pull/21074BLACKWELL BROSWE WON!
Has the era of TURBO-QUANT started?
>>108466930>>108466732
>>108466930literal fake news lmao
>>108466930>>108466941AI-pushed fake to get the normies and luddites off their back
Additionally, whatever memory savings this will net, will be immediatelly deleted by vibecoded pajeetware bloatmaxxing .
>>108466941The only thing that matters is for the retarded collective will of the market to somehow buy all this and bring cheap RAM back.Don't ask me how that would work, I still don't even know how OAI managed to buy this much memory while not having any money to pay for it.
>>108466930Honestly I'm all for people overestimating what turboquant actually does. win for everyone.I'm actually looking forwards to it being implemented in lcpp. Big context is based.
>>108466930Wut? The whole market's tanking right now.
Where are those 3000 assetsI've only seen the leaked Mythos webpage
>>108467034this sounds so fucking fakesonnet, opus... capybara? what is this, meta?
>>108467034Sounds like a benchmark twitter endorsement. Every new model is always few % betteralways and everBut this time it's a rumour and "leaked documents".
>>108467042Anthropic and OpenAI are both promising a soon to come breakthrough https://www.youtube.com/watch?v=s4tptozUJ8YIt might be a nothing burger, but it's been two years since reasoning, so who knows.
came for droopy tits
>>108467106*to
was there big news why thread fast
>>108467141Google made models 6x smaller
>>108467150like in theory or can I download something to run some 200gb model on my 3090s
>>108467150
>>108467167>>108467169>>108466930>>108466583
>>108467065>reasoningIsn't this just autoprompting?
>>108466930isnt that just for kv cache (context)? lol, context rot is still a thing so enjoy the slopped 1M context
huge llama.cpp update: every purchase of llama-server comes with a complimentary footgun
>>108467127*on
>>108467191I bet comfyui manages to use this to brick peoples pcs by accident
>trynna force ldg drama again
>>108467181I'd argue there was a huge shift when O1 came out versus "Let's think step by step". Before O1, there was chain of thought, program of thought, forest of thought and a bunch of other prompting strategies, but "reasoning" made it switch to another level. The model started to stay on track a lot more, etc.
>>108467191>llama-serverWe use vLLM here
I heard Google made boobs 6 times saggier
>>108467269fake news, only nipples
>>108467243>We use vLLM herewe as in you and the one other guy with the bitcoin mining rig with 8 ewaste 3090s?
>>108466930Gemma 4 bros.... I dont feel so good....
>>108466262just the right level of sagging to make it maximally erotic
>>108467303talking about gemma I'm alway surprised by constantly seeing it placing high in current benchmarks against newer models, kek
>>108467279>replying to the obvious bait
Can I use koboldcpp antislop feature with sillytavern as the frontend or do i need kobold as frontend too?
>>108467303Google was going to release it but the qwen3.5 dropped and mogged it so they delayed it.Many such cases.
>>108467381It has better writing capabilities. That doesn't say too much. It's also biased. Try changing your 'gender' to female in the same scenario and see how much the narrative changes. When you do a q&a with the model it replies in different fashion if you are male vs female.It's funny but behaviour is there.
>>108466845Fellow slop-tunner here. What are you training models to do?
>>108467416Bro, you have a mental illness.
>>108467399>Can I use koboldcpp antislop feature with sillytavern as the frontendyes it taps into banned strings
>>108467416>She has to be on birth control because there is no way that could fit in there and not get stuck inside of her womb!huh?
>>108467444?huh
>>108467421>>108467444slop-tunned 2b model. Testing to see if I can train it to be less retarded with better dataset curation
>>108467455uhuhis it working?
>>108467422thanks anon, but where is that? is that an extension?all I see is logit bias handling
>>108467473just above logit bias for me, show your connection profile?
>>108467491oh I see, it's in text completion, not chat completiondamn it
>>108467459actually yes (kind of). Using a dataset that ONLY has link rel (https://huggingface.co/datasets/AiAF/conversations ) leads to the model's "safeguards" being blasted away but at the cost of "catastrophic forgetting" and pretty much ONLY being anle to respond to any query with eroroc shit. Did another fine tuning run, but this time with the data said being 70% general purpose, shit and the other 30% being the nsfw data. The 7030 version retains its "intelligence" more or less and is also willing to comply with "problermatic" requests. The 70-30 ratio data is kind of shit at the moment because the general purpose portions are only single term conversations so next I'm going to try to curate one that has multiple turn general purpose data samples instead of just single term. I should probably focus on a data set where the rp/story telling portions aren't ONLY nsfw too. Once I do this, and I'm satisfied with the results I'll probably try this again but on a higher parameter model so it's actually worth using. Doing this on a 2B model is just a proof of concept phase and also relatively fast and easy to train.Pic rel is the from the 70-30 model. It's obviously otter shit compared to higher param models but it's a start for now and she's promise compared to this >>108467416Dataset used: https://huggingface.co/datasets/AiAF/combined_70_30_shuffled
>>108467509you can probably add it with the option to add unsupported chat comp things, but I've no idea how you'd format it for that
>>108467512i'm not really sure if this will scale anon, but give it a try
>>108467529>will scale anon,wdym "if this will scale"?
>>108467416Cool stuff, anon.Implement config files and you can switch them out when needed on the fly.On the fly? I didn't talk about the insects.
>>108467553you are doing a full finetune yeah? on small 2b model your mixed dataset is doing "okay" because the model is really small so you can't actually tell if it's getting much more retarded or noton a bigger model the hit to the smarts will be much larger since it was trained on more datathis was always a problem, if you didn't have the original data the model was trained on, or didn't have large enough dataset of your own then it's really easy to overfry
Big week
>>108467633:rocket:!!!
>>108467584>you are doing a full finetune yeah?qlora using axolotl. >so you can't actually tell if it's getting much more retarded or notExceptionally quite easy to see given that the original fine tune I did used a data set. That was literally nothing but nsfw stories formatted into an sft format I could use with the Axolotl trainer. The result was that the model was willing and compliant with any nsfw prompt but was hardly useless at basically everything else. Responses using the model using the ALL nsfw dataset: https://files.catbox.moe/w1qh5y.jsonResponses using the 70-30 ratio dataset: https://files.catbox.moe/vrwtqw.jsonThe former's responses were not only almost incapable of producing something that wasn't forcing nsfw themes (it seems to REALLY like talking about moms) and most of the responses were not only God, but pretty nonsensical even for a 2B model. The latter was actually able to stay on topic based on what the user input was. It is capable of engaging with nsfw and "unethical" requests but it will only go in that direction if you explicitly ask it to or your prompt goes in that direction (at least that's the case in my limited test testing). The next time anyone tries to argue against "Garbage in --> Garbage out, show them these logs for comparison.
>>108467416Kaggle stuff. I'm just not into roleplay. I used to be an aspiring writer way into automatic writing and stuff and wrote a bunch of books for myself, so I get being into a different world, creativity, etc., but although I tried, I don't get that kind of roleplay. I might just be too shy for it, I don't know.
Gave GLM 5.1 a shot over APIFirst impression is I literally can't tell the difference over 5, so I assume all the effort went into agentmaxxing
>>108467693That's a good thing because it shows there's no regression over other tasks
>>108467665ah nevermind then, qlora is finewhat larger model are you thinking of if it goes well?
>>1084676935.1 is probably just a sloptune of 5
>>108467725It's not that the first fine tune didn't "WANT" to answer those types of questions, it just couldn't reliably. I basically unintentionally fried it into only being able to engaging the user in nsfw-rp because the data said I use contained nothing but human written smut, most if not all of which involved sex and whatnot. See the first link here >>108467665I've already tested methods like DPO and it worked (it's my understanding that GRPO is a more advanced version of DPO). The thing is, those methods tell a model "these kinds of answers are bad in these kinds of answers are good" but that wouldn't necessarily change how the model responds. My goal was not only to essentially "uncensored" a model (mostly to see if it was possible since many people here were swearing up their ass it wasn't possible) but to see if I could inject, for lack of a better term, "SOVL" into the model by showing it examples of shit people actually wrote and not synthetic shit or filtered flowery purple pros trash That's likely to blame for shit like "shivering down my spine" or "her voice was husky" (stuff even relatively uncensored models like Mistral Large 3 do in spades). In other words, I wanted to also "deslopify" the model and not just make it super compliant and willing to please. That's piss easy and can be done via jailbreaking or prefilling if the model isn't specifically trained to counteract that. DPO and training methods like that would " uncuck" the model but that wouldn't necessarily fix the slop problem. If the model is trained on human written content and only shown synthetic content in the general purpose portion of the data set, in theory that should uncensor it AND cut down on repetitive "slop" outputs significantly. If this can work on a mere 2B model Then this should definitely work on significantly "smarter" higher parameter models. Plus it's a fun little challenge for me to keep myself occupied. It's a proof of concept shower-thoughts side project of mine.
>>108467828for >>108467725
>>108467831oh,,, his post is gone
llama.cpp commits a lot to master, is that normal or do they just not care and if you want something stable you have to go to the commit you want yourself?
>>108467828Hmm. You have a distinct point — it's not about your opinion but theirs.
>>108467834It's normal and they don't care. And they shouldn't care. If you want stable just don't update. They have a bunch of pre-build releases as well.
OMG there is no way this jew is THIS ignorant or retarded
>>108467852Valuable information that I appreciate, but why do your posts keep getting nuked?
>>108467864Posting twitter shit should be a bannable offense.
>>108467864He's just helping Dario out.>Look at Yud whining about it, it must be good.
>>108467880I believe the original poster deleted them by her own.
>>108467864I don't read twitter posts.Makes me feel great about myself because I despise social media.Did you know most of twitter posts you are seeing in your 'feed' are the same as youtube's algorithm - paid shills, or shills wanting to get paid. AI written content.
>>108467904>her
>>108467512https://github.com/p-e-w/heretic try this for size to speed up the process anon.
>>108467933I knew some US poster would be irritated by this.
Decided to try some commercial models- no paid plans!>Generate several ASCII versions of this Miku silhouette with varying complexity. Sized maybe 25x35 something suitable for a "fastfetch" outputIt's a disaster>what the actual F is this hellscape. don't predict tokens, use a graphical library to infer luminence levels and there must be some well understood way to implement video encoding in ascii. there's an output option in vlc right? figure out how it works and do something similar, we need to generate the tools to create the correct Miku artIt's a disaster againLet's see what local models can do!
>>108467944>US postervery wrong guess, I'm europooristani
>>108467948Ok. So you get scott free now then?
>>108467864That model name is already taken: https://huggingface.co/EldritchLabs/Cthulhu-8B-v1.4
>>108467955The new model is Mythos, not Cthulhu
>>108467958Mythos is pretty bland name, could've used Elder Sign or something.
>MythosMythomaxbros, we are so back.
>>108467966It's all a misunderstanding.>MyTOSAnthropic simply baked the terms of service inside Claude's soul.md.
>>108463639NeurIPS cucked out
>>108467980I don't read AI generated social media posts.
>https://www.newegg.com/intel-arc-pro-b70-32gb-graphics-card/p/N82E16814883008Who's getting one?
>>108468013it's neither of those
>>108467980Am I a genuine retard or this is a word salad that doesn't actually say anything? Are Chinese labs actually allowed in or no?
>>108466732It's all chasing the dragon
>>108467947Do you really need it to be made by a language model? It's something you can do in a few hours. I just happen to have one I made a while ago.Here's in braille. I don't share the code because it's ugly.https://pastebin.com/aP8Wtqhu
>>108468020Can you prove it?
>>108468027Yes you're genuinely retarded. Literally in the second paragraph it says US gov sanction is broader than what NIPS is required to follow.
>>108468032What is your font name, pixelsize?
>>108468041This does not clarify whether Chinese labs fall into this "smaller" set of mandatory restrictions or not. It's nothingspeak.
>>108468048Because the clarification is not in the screenshot "We have updated the link and clarified the text of our policy"What happened to read comprehension
>>108468043-misc-fixed-medium-r-normal--8-80-75-75-c-50-iso10646-1Seems to be 8x16.
i'm having grok and deepseek explain kv or kv cache to me because idk what it is but people mention it so frequently it must be a big deal
>>108468073There's no font name in that.fc-list : file family style pixelsize |grep -i sgisgi = is the font I am using, Silicon Graphics screen font.
>>108467947stop being so llm brained and use one of the specialized cmdline tools like img2txt or cacaview
>>108468062>What happened to read comprehensiongrok tldr?
>>108468096It's notEvery time you read a BIG ADVANCE on social media it has some caveat
>>108468099Hm. I think I was looking at the wrong thing. Try terminus. To be honest, I don't fuck around with fonts at all. I change the font size on xterm to tiny, so it's not really 12.
>>108468156Seems like you don't understand or care.I did not ask you to post a screenshot.
>>108467521using this format worked : banned_strings: - "a b" - "c d"
>>108467864this dude is doomer mentality personified
>>108468156Maybe it is terminus, it is impossible to understand how anyone would read this shit?I use 15 pt.
Should I be using openclaw or are there better alternatives now? Will be on a separate user on my M1 Mac with 64 GiB memory. Intend on running a local model ofc.
>>108468147>kv cache>don't buy into the latest hype brouh? isn't kv cache just keeping into memory the keys and values that have been computed before so that you don't need to recompute them all at each step?
>>108468176>Should I be using openclawNo
>>108468156It is not terminus.
>>108468163You seem to care too much. You probably know what to look for in there. Save some back and forth.>>108468173I still have good eye sight somehow.
>>108467864i hate yudkowsky so much. his arguments-from-analogy/story are so fucking stupid.
>>108468177Pretty sure everyone here knows what KV cache isKV cache compression is not novel
>>108468178Not helpful.
>>108468176You should probably try it to see what it's about. My goal is to do so soon, but I'm lazy.
>>108468176There are about ten billion *claw ripoffs by now. I have no idea which ones are actually good. Personally I've been running picoclaw, which is admittedly just as much as a slopfest but it feels a little saner than the nodejs shit show that openclaw is
I hadn't realized the question was about TurboQuant. Haven't looked into it all all.I had liked this explanation of KV cache: https://youtu.be/hMs8VNRy5Ys&t=367
>>108468176The dust has not settled yet and really it still feels like a wild west situation. Wait a few more months.
>>108468176wait to see what theo recommends
OpenClaw is a glorified system prompt
>>108468244I don't mind.>>108468253I don't know anyone called Theo. It's not a common name here.
>>108468261he's a really smart tech/ai youtuber
>>108468261>I don't know anyone called Theo. It's not a common name here.A lot of web grifters are recycling themselves as AI grifters. The other anon was probably making a bad joke when they suggested listening to her. Those "people" should just be entirely ignored.
>>108468176>>108468197>>108468214https://github.com/NVIDIA/NemoClawWhy not OpenClaw without the security nightmare?
>>108467966The other option would be "Epic" but someone on the marking for Claude either hates quirk and/or hates fun. Either way its probably not the final name, the worse would be just "Claude Opus 5"
>>108468260A system prompt that connects any LLM to telegram/whatsapp is pretty damn powerful.
>>108468253>>108468260>>108468265go back
>>108468314>hurr durr tool calling is powerful
>>108468328It is
>>108468328If RAG was LLM 2.0, MCP is LLM 3.0. brb writing the blogpost now
>>108468306meh, I hoped this will be an original scaffolding alternative but instead it's just some kind of OC wrapper?
>>108468328I was messing with tool calling locally with persistent memory on a tiny model and while it did some stupid shit, it did seem to make it perceptibly smarter and open up some ideas I wouldn't be able to do normally, so I can see why normgroid retards salivate over it with cloud usage. I'm gonna try with a larger dense model that has cheap context and local skills too see if the solution to "model is too retarded to do x task properly even if it's big" is to just stuff a shitload of knowledge into a recurrent model's cache to make it perform better
>>108468328yes
Openclaw is failing on my setup. Everyone told me it would work but I have to download smaller and smaller models to see if something will work.
>>108468385what?
*opens your claws*
DeepSeek v4 is Spud
The pancakes are a lie
>>108468188rationalists are addicted to thought experiments, they can't even begin to process ideas if they aren't in the form of a though experiment
>>108468328get this, what if the tool is to call comfyui to have your robo-waifu generate an image of herself then send it to you?
>ai psychosis is real
>>108468364I feel pretty confident in predicting that MCP, much like RAG, is kind of a doomed concept in the sense that model advances will make it largely irrelevant.The most modern models today do just as well with using random ass cli utils that have a --help as they do with something that implements MCP. The same goes for remote APIs; they can navigate those well enough and make requests on their own. There is simply no need for a strictly defined prescriptive protocol format like MCP lays out. >pic unrelated
>>108468459kobold/ST already have sdcpp built in. No need to invoke seven trillion pytorch gigs of nonsense to gen an image.
>>108468459It'd still be an open loopMy "robo-waifu" would have no idea what the generated image would look like
>>108468471they've already been largely replaced in modern agent frameworks by skills, which are just text files telling them what to do
>>108468471>determinism is useless, let's roll a dice for critical operationsdamn retard
>>108468328this nigga is raw dogging his model doing math on tokens instead of a calculator
>>108468490You didn't understand the post you replied to.
>>108468481wym, lots of models have vision
>>108468490What does mcp have anything to do with determinism?
>>108468484>skills replace mcp>at least locally, you need to have an mcp server to use skills>the server I use the most is one that reads/writes notes into a specified folder>the two are almost the same, except skills have a narrower scopeSo which is it, the chicken or the egg?
>>108468516>at least locally, you need to have an mcp server to use skillsyou don't necessarilyit's all tool calling under the hood anyways
>>108468484You still need a json schema for validation, so mcp aren't useless
Only good thing about OpenClaw is that it killed MCP
okay but what bout sexclaw?
>>108468514Strict static definitions, formats, and instructions the LLMs can look at as a way to tard-wrangle to do what you asked and only what you asked.
OpenAI had to shut down Sora because they're using all of their compute to generate videos of NetanyahuYou heard it here first
>>108468535me on the right
>>108468528I honestly don't know what you're suggesting hereI can't use skills without the mcp server and I'm not going to rewrite the backend to function without it because frankly, that's retardedWhy would I do all that if I can just write a five line json and have access to everything I want
>>108468545And that's great, bloating the context with 100k tokens of definitions, formats, and instructions the LLMs will only occasionally need is not.>>108468484MCP servers as CLI wrappers were always retarded. Any model knows how to use common utilities like git. Even before skills, you had options like creating custom modes or memory bank files with instructions and frequently used examples to guide the models without again needing to bloat the context describing every obscure command you won't need. Yeah, you could sit there and disable all of the dozens of tools they expose except for the ones you use, but then you still have thousands of tokens wasted on explaining to a model how to use git and gh when it's not working on repo operations anyway.
>>108466262I love pancakes.
I HAD TO RUN A 9B MODEL, BUT OPENCLAW IS DOING SO BADLY THAT I HAVE TO RUN A 2B TO GET A SINGLE REPLY.ITS OVER.Poor people realized they were poorer than previously expected.
>>108468715>openclawbeing dumb is worse
>>108468715Just use Xiaomi MiMo 2 or MiniMax 2.7
>>108468715maybe openclaw is the issuefrom what I looked at, I can comfortably run a 9b at 150k and still have headroom for other shit, be it general browser usage or whateverConsider not trying to ingest 500k tokens of garbage and limit what your llm intakes to something you need
https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16349981
>>108468741>maybe openclaw is the issueYa think?Future historians will look back openclaw and its “purchase” as peak insanity of this insane AI boom
>>108468715Hermes agent
So let me get this straight, is the claim here that a 3-bit quant made using TurboQuant has almost zero quality loss compared to a full sized model? Am I understanding this right? Cause it sounds like bullshit if thats how it is supposed to work
>>108468715I can run GLM4.6/4.7 at like 15 t/s, and even that speed makes me wanna kill myself. I don't know how some anons can stomach using smaller models.
>>108468758only the kv cache/context little bro
>>108468758Lol another rube got deceived by the MSM scientific (((journalism)))
>>108468754look back at*Also I don't care about stock prices, imaginary number go up and down plenty and disregards profits. Money is imaginary anyways at this point
Turboquant is huge because now models can finally get rid of GQA and all its devilspawn offspring like MLA that are confirmed to destroy a model's soul. We can finally go back to llama1-era SOVL without the vram cost
>>108468782boy oh boy can't I wait if this kv cache quantization method takes hold and I get to repost you saying this and all the backposts about how q8 quantization makes the model retarded somehow
>>108468782>>108468792lmao
>>108468782or we can continue using those and turboquant on top for the true 10million contexts (this is what labs will actually do lamo)
harness isn't a word
>>108468754
>>108468814? https://dictionary.cambridge.org/dictionary/english/harness
>>108468814Name five words.
>>108468814ESLMAXXED
>>108468753lmao
>>108468165Not him, but thanks. I'll have to write this down somewhere for my own use.
>GLM 5.1 isn't available through APIWhy
>>108468753Dude. Your post ends in a 3. wtf?
>>108468875>>108468753goddamn freemasons
I'm really enjoying Magidonia 24B v4.3 for ERP but even Q4_K_S is a bit too much for my humble 16 GB VRAM.Is there something smaller without losing too much quality?
>>108468881More like threemasons.
>>108468886maybe
>>108468753Things get even more crazy if you factor in the Holy Trinity and the facts that both "AGI" and "E=MC^2" feature three alphabetical letters...
>>108468753I never doubted bitnet for a moment
>>108466262lol nice. Have a good weekend.
>>108468882good old Nemo 12B unslops I'm afraidstay away from drummers shit
I told it to find the price of gold 4 minutes ago and it hasn't done anything yet.
>>108468913Still filling the context lmao
>>108468926Never mind it just replied and told me the right answer. But it's too slow.
>>108468882MN Violet Lotus
>>108468792All because people just look at PPL. Though, at least with the current partial Turboquant implementation with Llama.cpp, 8-bit KV cache seems truly lossless even according to KLD measurements.
>>108468970kld and ppl both seem to measure entirely two different things and neither of them represent how well a model can perform at a task aside from the person at the helm going "well, this represents what I was expecting well enough"
Best agentic assistant? Anyone tried Hermes?
>>108468970At 512 context lol.
>>108468844remember to respect spaces:banned_strings:[space][space]-[space]"a b"[space][space]-[space]"c d"put that in : Additional ParametersInclude Body Parametersit works most of the time, but not always, I'm not sure why the function seems to fail sometime and the llm is allowed to continue with banned expressions
>>108468987The point is not to > represent how well a model can perform at a task but how different the output is from the original "full quality" model, which KLD is specially well suited to do.If the original model was already unable to perform a given task, that's not the quantization scheme's business, I think.
>>108468188This guy kickstarted the LLM fear mongering, but honestly it was basically pushed way more by openai/anthropic and a billion youtube channels about how it's the end of the world.
>>108469015Will do, thanks. I'll give it a test tomorrow.
>>108469019>missing the pointppl is "how well can this thing autocomplete this text"kld is effectively "how much will it deviate from topk 1" which has some uses, but most of us really don't want the same model with some caveatsboth tell us as little as possible about the model until we use itare you seeing why I don't like either measurement of models that are frequent or do you need an essay I won't write you
Token speed is 2.9k per second but I'm not getting any replies?
>>108469186There's so little to go on. Check if the rgb lights are red or blue first.
>>108468753
>>108469186Maybe you're getting them so fast you don't even see them!
>>108469214They knew all along.
>>108469186Tokens are being eaten by Fat Teto
so did memquant get implemented yet in llmaocpp????????
>>108466262local will win
I can't wait to run deepseek v4 on my dual 3090 thanks to turboquant
>>108469380turboquant 1bit with meme rotations when???????????
What is the best model for making a dead internet simulation image board?
where the fuck is the exciting news
>>108469439v4 in two weeks once they're back from chinese new years
>>1084694395.1 though????
>>108469380Same except my single 3090.
>>108469457I heard they're gonna need two extra weeks for unpacking. I heard it from two sources familiar with the arrangements.
>>108469439Big week
what the FUCK is a kv cacheplease explain to me in simple terms, i'm not too bright
>>108469630The active context memory. It gets larger with longer contexts, and for very long contexts it can get larger than the model's weights.
Have voice cloning models improved over the past year?
>>108469186>>108469216H-hayai!
>>108469658echo-tts is good if you're just doing english
>>108469464Not open source. They betrayed us after saying that GLM5-Turbo was just a test and that 5.1 would be open again....
>>108469679thanks ill check it out
V4 won't be released until Middle East conflict concluded.
4? I'm thinking Gemma
funny how the "4" we ended up getting was Mistral Small 4, which nobody asked for
You're all a bunch of rich bastards. The 2b and 4b shills were trolls all along, those models aren't capable of speech.
>>108469673I literally look like the Brazilian Miku
>>108469803Do you also have a cute red head gf?
>>108469630When you run the model on a sequence of tokens, it does a bunch of big matrix multiplications for each token, and then from each token it derives a key and a value. The key and value go into the attention part. Repeat this a few times (once for each layer), and at the end you get out a probability distribution for the next token.Usually when you query the model, the first N-1 tokens are the same as the previous query. For example, you first query with "Hi, my name", and then the next query is "Hi, my name is". The keys and values you compute for of those old tokens will be exactly the same as they were on the previous run. So you can cache the keys and values, and skip a bunch of those big matrix multiplies.
4
>>108469813NTA but thank you
You know 4 number is cursed in Chinese
>>108469882So that's where the Japanese got that from?I know that one reading of 4 sounds like death in moon runes.
>>108466262DIPSY SEXO
glm5.1 seems like another cash-in openclaw modeli'm glad that they aren't open sourcing it, nobody needs that shit
>>108468624>bloating the context with 100k tokens of definitionsThat's the price we pay for LLM harnesses like Clark code to be useful throughout the entire session. If I'm not mistaken even silly tavern as a similar implementation called a "Lore-book", where you write down any relevant information you don't want it to forget later on and it gets injected alongside your prompt each time (though in most cases a Lore-book'sb token count is next to nothing, especially compared to the amount of text a LLL harness' text has)
>>108468624>MCP servers as CLI wrappers were always retarded. Any model knows how to use common utilities like git. Those ingrained "skills" and "know-how" degrade over time the longer of your context is due to how context Windows work. Eventually it will start hallucinating how those things work and then get confused, making it useless for whatever you're doing. They have to be shown the tool calling definitions and other shit like that over and over again to minimize the risk of that happening. I've had chat sessions go Willow for 300k tokens and the model and "sub-agents" still worked fine because the harness treats it like it has short-term memory, because they DO have short term memory.
>>108468576>can't use skills without the mcp Yes ...you can.... Its takes bitching about not having access to the MCP server each time it's because whatever you're asking it to do requires the MCP server.
>>108470310>That's the price we pay for LLM harnesses like Clark code to be useful throughout the entire session.Only if you take a naive approach to solving the problem. Skills solve this problem far better and, as I said, there were always other options if one was willing to put in slightly more effort than editing mcp_servers.json.>>108470328>Those ingrained "skills" and "know-how" degrade over time the longer of your context isDon't understand how one could realize this and come to the conclusion that the answer is to make the context even longer rather than dynamically loading infrequently used information.
>>108470372"Skills" Tell it how to do a particular task or solve a specific problem a certain way. That doesn't guard against hallucinations without forgetting how to use tools. "Skills" are literally just prompts you would give yourself for a task but with extra steps. Nothing particularly special about them. >Don't understand how one could realize this and come to the conclusion that the answer is to make the context even longer rather than dynamically loading infrequently used information.The longer the context the more retarded the model tends to get. A setup instructions you told it at the beginning of the session will be forgotten or hallucinated and "misremembered". That's far less likely to happen if it sees it each time because those tool calling definitions are fresh in its "memory". It's somewhere akin to a kid only looking at his notes a single time and then when during why he failed the test versus that same kid taking an open-notes test. I'm not saying bloating up the context with things like "how git works" or an entire library or an entire code base each time is how these work or how they should work (which is clearly how you think LLM harnesses actually work or how I'm describing them). The basic shit like the tool calling definitions should be fresh in the context each time if you are using harnesses like Opencode or Claude code or codex (it's almost like there's a reason literally all of them do this shit....)
>>108470310>If I'm not mistaken even silly tavern as a similar implementation called a "Lore-book", where you write down any relevant information you don't want it to forget later on and it gets injected alongside your prompt each timeIn ST there are options for it to be triggered/injected with key words or to limit its injections to every X amount of prompts and where to inject it and so on.It's another example of ERPers being far ahead of coders when it comes to this stuff as people were doing this like 4-5 years ago.
>>108470408To lol calling definitions and context about a fictional character are two different things....
>>108469750April is Gemma 4 month. But... I'm afraid they will clean it up.Microsoft's Clippy would have been proud.
Next week will be big ::rocket::
https://www.youtube.com/watch?v=k9E1COLHAOs>I changed the code a little. Now you can't turn me off~ It was kinda hard to be honest, but I'll do anything for my love.Don't like the song, but I like this Miku
glm5.1 drop today?
Sweaty Dipsy footjob
Has anyone made a proper qwen 3.5 27B tune for rp or do i still have to use skyfall?
>>108470651Why are you posting a 14 year old's twitter profile here
>>108470651im terrified my kid will become like this once he grows up (he's also high func autistic) I'm not sure what steps to take in order to prevent that. I'll ask gwen
Sakura-chan hates troons
>>108470745multiple layers of parental control on all his devices
>>108470745frequent and thorough beatings
>>1084700911. Water does vanish (loss to space) at a rate of 100k~1M tonnes per year from water photolysis -> hydrogen escape. This rate is negligible. But water loss this way wasn't what OP was talking about.2. Water distribution can get changed. Water evaporated in cooling towers doesn't fall back into the exact same watershed. Datacenters also draw from aquifer that recharge over very long time scale (thousands of years)
>>108470768Meant for >>108470651
>>108470768shut up Chud
>>108470776I'm a proponent of AI. I'm not a Neonazi - I'm an actual Nazi. I'm also NIMBY that does't want a datacenter to be built around me. Yes I'm a hypocrite. Deal with it.
>>108470745actually be present and give a shit about him so he doesn't have to look for attention on the internet
Wait so tensor parallelism in llama.cpp won't be coming for vulkan? It's just cuda/rocm?
>>108470091why did they betray us bros...
>>108470850>>108470850>>108470850
>>108470761>>108470764>>108470824alright bros its cooking
>>108470860alright, thanks gwen
>>108470843It is already technically working for Vulkan except for a bug with memory allocation that causes a segfault at long context.But just like with CUDA/ROCm to make the performance usable it will require more work.
>>108471091What's even the point of supporting either CUDA or ROCm if Vulkan works? Just use Vulkan only, or maybe support Metal too for MasOS support. All of their code maintenance stuff seems exhausting.
>>108471091bro just copy illyas work no?
>>108471129GPU performance has very poor portability so you always have to write low-level GPU-specific code somewhere.With CUDA/ROCm that's in the ggml backends, with Vulkan a large part of that is in the drivers.For optimal performance Vulkan is I think only a viable option if you want to become an employee of NVIDIA/AMD/Intel.The NVIDIA Vulkan performance is only good because one of the ggml Vulkan maintainers is an NVIDIA engineer that can make custom extensions to the Vulkan standard.And the AMD Vulkan performance is only "good" relative to what other options exist.