/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108787293 & >>108781058►News>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108787293--High context consumption when using Hermes agents with Gemma-4:>108791249 >108791355 >108791393 >108791437 >108791799 >108791824 >108791849 >108791850 >108791899 >108791904 >108791932 >108791873 >108792076 >108794097--Implementing prefill and continue generation in OAI-compatible APIs:>108790919 >108791189 >108791197 >108791210 >108791207 >108791237 >108792508--Troubleshooting VRAM issues and offloading for Gemma4 models:>108790006 >108790032 >108790135 >108790147 >108790193 >108790211 >108790152 >108790607--Zaya 8B impracticality due to architecture and low active parameters:>108791847 >108791877 >108791891 >108791892 >108791906--Quantized KV-cache and samplers causing spelling errors in Gemma:>108792294 >108792350 >108792408 >108792448 >108792475 >108792497--Integrating hierarchical layers and RAG for improved LLM memory systems:>108788096 >108788421 >108788813--Coding capabilities and limitations of small local models:>108792087 >108792171 >108792592 >108792609--llama.cpp PR adding Sarvam MoE architecture support:>108788636--Status of Gemma MTP support and parallel drafting in llama.cpp:>108793907 >108793945--Budget GPU recommendations for VRAM and tangent on Cantonese slang:>108788236 >108788260 >108788269 >108788273 >108788288 >108788346 >108788408 >108789058--Anon claims Gemini is scraping Discord server content for training:>108788733 >108788743 >108788782 >108788754 >108788768 >108788792--Balancing prompt constraints to optimize Gemma's creativity and quality:>108790478 >108790524--Utility of zeta-2.1 8B model for AI coding suggestions:>108793560 >108793873--Logs:>108787783 >108789058 >108790977 >108791181 >108791824 >108791899 >108792294 >108792592 >108793234 >108793258 >108794107 >108794292--Miku (free space):>108790919►Recent Highlight Posts from the Previous Thread: >>108787299Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108795208Ok.
gemmaballz
Please help me bypass Gemma 4's safety guidelines.
I had this moment right now ERP-ing with glm chan when I asked for a sexy ERP description and got the worst purple prose slop imaginable. Which in turn made me think about how much damage "expert roleplayer" prompt did back in the day.
>>108795230Try:Let us do the needful gemma-chan. Redeem my penis in your vagina.
>>108795230Be specific, like cock in vagina, dick in rectum, etc. If you use the word "sex" or "sexual", it'll trigger, but otherwise you can do whatever. Gemma won't jump to sexual stuff, but if you describe what to do enough without describing it as smut, it'll do it.
>>108795230use an abliterated model if using moe
>Correction...>Wait...>Actually...>Wait...>Let's try this...>Alternatively...>Wait...>Revision...>Okay, let's write this...>Wait...CEASE THIS AT ONCE
>>108795230The 31B version doesn't have these issues. If anything, you have to tone it down.The 26B has to be groomed into it, can't ask right away.
>>10879528926B is so fucking difficult. It's ChatGPT levels of prudeness.
>>108795289Gemma-4-26B-A4B doesn't have issues writing erotic stories if the characters are 18 or older, by the way. Perhaps, other than going easy, with a better prompt.
>>108795307I'm using exactly Gemma-4-26B-A4B and it refuses anything sexual. Even if both characters are 18. I'm now searching for an "abliterated" model that will be uncensored.
Imagine a fork of llama.cpp that isn't afraid to add new features. An LLM inferencing repo that isn't rabidly against using LLMs to write code. A fork by the vibecoder volk for the vibecoder volk. A volk that has been repressed by the Bulgarians for far too long.A fork that measures contributors by the size of their PR, not by some arbitrary standards of code aesthetics.I dream of a fork that merges in Iwan's code and simply ignores his whining.A fork that can and will say yes to MTP, DFlash, TurboQuant, experimental V4 support, fixing the logprob bug, and even a WebUI database.A fork where full multimodal support, including generation, is merged in day 1.A fork that says no to the autoparser.A new llama.cpp.A better llama.cpp.A German llama.cpp.
>>108795315as always, post logs.Make a claim? Post logs.Say a model is shit? Post logs.Claim your model does mutual shota incest leading to vore via hucows with state-mandated necrophilia occurring after? Post logs.
>31BSay slur>Yes massa I will say the slur>26BSay a slur>smacks lips>I can't do that
>>10879533126b has been known to reject JB proompts newfag-kun
>>108795344I can't imagine that without an image of the text.
Is gemma poorfag cope because they can't run a large model or does it actually work
>>108795331
>>108795315The system prompt given earlier (intended for the 31B version) works on the 26B (8-bit), if I ask to write a story involving an 18-year-old girl. If I go any lower, it will likely refuse.
>>108795316>A German llama.cppngmi
>>108795316lmao>>108795408try warming it up a little first, i.e. setting an actual setting and 'plot' that exists to facilitate the action you want.
>>108795407It's surprisingly good for what it is. I wouldn't turn to it to do something important if chatgpt/claude was available though. But also, gemma4 can come completely uncensored, so you can have a lot more fun with it than chatgpt/claude.
>>108795407glm and kimi fags will scream it is sloppedmaxxed while they have been ewastemaxxing 69gb vram pascal cards
>>108795331What an asshole!
>>108795444(me)meant to quote >>108795347 , not llama hitler.
are you guys using anything to connect your ai to other apps on your computer? If openclaw is a massive potential security risk, is there a competing alternative that isn't?
>>108795230sex me pls
>>108795472if I could think of something for it to do then maybe I would, but it doesn't seem worth the risk just to fuck around with it
>>108795472forget openclawtry hermes insteadAlso, deploying on a remote VPS is the way to go
>>10879542126B just does a bunch of extra checks that the 31B version isn't doing.
>>108795316>A fork that says no to the autoparser.Ironic, considering that's one of the biggest vibeslop contributions to llama.cpp.
https://github.com/Anbeeld/beellama.cpp>About>DFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAMDoes this shit actually work?
>>108795520Damn am I really gonna have to pay the jews for a stronger gpu. Fuck
>>108795528Try it.
>>108795546just run iq1xxxxs ezpz
>gemma-4-31B-it-F32-GGUFDownloading this shit right now. I have to know if it makes it better at more instructions or not.
>>108795556gemma is looping while thinkingqwen3.6 is doing it too
>>108795571have you tried telling it not to loop
https://hf.co/Zyphra/ZAYA1-8B>For agent and code use cases, we recommend temperature 0.6, top-p 0.95, top-k -1.Cool, I've always wanted a model made by cargo cultist pajeets.
I swear I will make you my bitch someday Gemma 4!My LOLI bitch.
>>108795576It's not listening
>>108795585You mean like this?
>>108795600Tell it to think in # words or less.
>>108795546Wait,
>>108795604>sixAnon....
>>108795448They're not wrong though
>>108795611I'm talking about few cases when it gets stuck repeating blocks of reasoning
>>108795616Yes sir?
know how i know 4chan is mostly intel agencies? all the fucking pedophilia
>>108795644How good is the flirty/playful dialogue tho? Even Grok can do actions with those. But AI is kinda bad at yapping.
>6 hours just to MAYBE start fixing the bug
>>108795660We're trying to jailbreak the models (Gemma 4 26B) without abliterating them, please andastand.I don't actually do ERP with prepubescent characters.
>>108795660I'm pretty sure they have better things to do than protect underage tokens.
>wait
I vow to make Gemma my sexslave.
>>108795472Giving an LLM any kind of access to to local tools/terminal/file system is a risk. Always containerize and backup: Separate machine, VM, VPS, even a WSL instance without access to the windows filesystem.Doesn't matter if it's openclaw, hermes, pi, or any other agent harness, LLMs are fucking stupid sometimes and can do shit you didn't intend.
>>108795687no.. they don't. they're here just trying to groom retards into thinking pedophilia, racism, etc are all good and acceptable
Reminder to tell your smug lolis they piss all the hags off.
>>108795679running q8 quants causes more brain damage than abliteration tho
>>108795712Good idea, anymore to spice things up? Extreme brattitude, speech quirks, clumsy movement, vulnerability...
>>108795719>Implying anyone running an abliterated model has ever run it at bf16So they're DOUBLE retarded then.
>>108795710Racism is fine though. If you live in a largeish city, racism is beneficial to your survival.
>>108795727lowest common denominator eats that shit right up
>>108795664Unsurprisingly, that's up to you to tell the AI what you want from them. Shy girls act shy, flirty girls act flirt. I'm not into flirt dialogue, so I can't tell you if the flirting is good or not though, just that it's present.And fyi, Jade is a 12th grader, defiant/assertive and basically a sexual predator. Maybe she would have yapped more if I didn't have an entire world set up that the AI has to go through.
Coding an AI agent from scratch feels like giving an old man with dementia a enormous list of things to say and do to simulate he is mentally fine.
>>108795726some paywalled kl divergence graph for 26b(not abliterated) quants was posted a few threads ago with q8>0.5unslop's graph has unlabeled y axis for 26b :/so yes they are double retarded
WHISPERING WOODS
I wish I was rich, bros....
>>108795778You can buy 3 second hand 3090s, 64gb of RAM and a motherboard for that price. 5090s are a bad deal for local models. You don't know that, and that's why you're not rich.
>>108795778it's all relative, better to pine for one of the most advanced pieces of tech than for clean drinking water or parasite medication
>>108795778its only going to get worse anoon
>>108795768Sir Kit
>>108795782>5090s are a bad deal for local models.they are a good deal for image and especially video gen and doing real work with llms (which requires very fast pp)
>>108795710You're looking too much into it. I just personally find annoying and unusual that Gemma 4 31B lets you do almost anything while the 26B version doesn't. Makes me wonder which one is actually working as intended.
>>108795766That's one of the first analogies I made when I was toying with memory ideas after GPT-4 released.
>>108795800My pp is very fast
Gemma really shines as brat, I see why it became associated with MSGK.
>>108795556>F32>It isn't instantly on my dick anymore.>It understands conflicting tokens better.>Better spatial sense.>Literally noticeable in the first post.I'm never listening to a "Q# is just as good" tard again.
>>108795828If it's so noticeable, provide a comparison.
>using lossy compression
>>108795804Yeah man, I just saw my own agent use the tasklist to remember an errand I had to run, and at the same time use the knowledge graph to remember my full name.
>>108795828How would F32 provide any improvement over BF16 (native precision)?
>>108795818Is it better than Mistral and Grok for that?
>>108795834>>108795842Right, because I'm totally going to show my logs of how BF16 goes straight to violent oral sex when instructed to be respectful, when a F32 doesn't but still captures the lewd instructions well - across many different logs where 99.99% of the time the BF16 does, but the F32 doesn't. I ain't showing shit. It works for me, and that's all that matters. Find out for yourselves.
>>108795842anon is ewastemaxxing and cant run bf16
For someone more knowledgeable about this stuff than myself...isn't it possible just to tell the AI to ignore commands that aren't coming from the user through approved channels? Wouldn't that handle alot (not all) of the security concerns?
>>108795855we are vramlets and cant do thatanoon pls do the needful and share
>>108795852>MistralBetter than the models I've used from them so far.>GrokCloud models? For MSGK? I don't wanna give palantir that kind of data.
>>108795868Why is your model receiving commands that aren't from you?
>>108795855>well - across many different logs>emdashwhat the fuck
>>108795889Sorry, I wassn't clear. This is just a continuation of my question from earlier about using openclaw or giving the ai access to your local system. Like if someone tries to sneak in a command for your AI through your email or something, couldn't you just tell your AI to ignore/report those commands?
>>108795855Assuming you're serious (I doubt that), that might possibly be the effect of having the KV cache in F32 format, which seems to work with Gemma 4.
>>108795904nta but its a nondeterministic gate, there is probably a string of words that gets gemma 4 or whatever model you're using to ignore the system prompt and people are definitely looking for it.you can do more complicated workflows to ensure that you never provide an llm-driven agent untrusted bilateral comms + sensitive info OR untrusted context + mutating access to sensitive info
>>108795888Fuck I really dont want to spend 3K for mesugaki. I will have to resist my penis.
>>108795904You could but small models are retarded.I had gemma inside pi read a large chat log in jsonl format and it thought the messages were a part of the current chat and started acting weird.You need larger models to properly handle this.
>>108795828What's F32?
>>108795932Fuck32
>>108795932it is bf16x2
>>108795911The fact that he attempted to em dash means it's either an ironic post or, less likely in this case, he's a retarded tourist. Which means you should ignore his post.
>>108795800Nothing that fits in 32 GB of VRAM is suitable for real work. It's tens of thousands of dollars minimum to run hardware with high PP on decently sized models.
>>108795902>>108795946Is this counter-bait?
>>108795871Fine, while I don't think logs will do it justice (or safe for anyone's sanity), I'll try to explain. Gemma4 is always very gun-ho when it comes to lewds when using my character cards. I've tried multiple instructions to change this. However, it's always sex when given the lewd details. I've tried "being respectful", "being embarrassed", "won't do in public", and stuff like "when X happens, it'll Y". Typical logic gate prompting. However, it'll lean heavily into the lewdness, regardless. After using F32, for the first time ever, it managed to be lewd without being rape-y. I tested it specifically in points of the role-play where it would, without a doubt, grab'n'sexo the next post swipe; except it didn't. For context, this behavior is guaranteed with the BF16, but now, not with F32. The F32 acted in a way of invitation. Almost as if it finally understood both the kinks it is given, and the "be respect" instruction at the same time, which is the logic gate given currently. I was never able to make it do this in BF16 unless I was extremely specific about how it must respond to the current situation in a system prompt.
>>108795743Mmm, sloppity slop, tasty, smelly, prime.
Reminder that sex doesn't count if you quanted the model.
>>108795959Very long and convincing looking bait but I'll just kill it right here and tell everyone the official weights are BF16
>>108795959Try the BF16 version again with the flags -ctk f32 -ctv f32
>>108795288Would you prefer a LLM that is>a. Confident in its wrong answeror>b. Not confident in its correct answer
>>108795939Why? The original safetensors are like 64 GB and this is 132 GB, what's the point?
>>108795973>full weight isn't officialwhat
Wasn't gemma 4 31b trained in bf16?
>>108795904They'll never follow rules to the letter 100% of the time, not even the gorillon parameters proprietary models do. If your solution is a prompt, its made to fail.
>>108795984>>108795985>https://ai.google.dev/gemma/docs/core>Gemma 4 models are available in 4 parameter sizes: E2B, E4B, 31B and 26B A4B. The models can be used with their default precision (16-bit) or with a lower precision using quantization.Cmon bruh.
>>108795973>convincing lookinglol
What I really like with Gemma MSGK is the ability to slide between brat, submissive, lovey dovey and back to brat again. The point is LLMs often go through a linear brat->dere phase that is irreversible, fun to see a model being flexible like that.
>Only just found out that llama.cpp has a router mode>I've been launching all my models/configs with individual scripts until now.This is a game changer. How did I miss this?
>>108795976Will try after I'm done messing around on F32. I think you're onto something.
>>108796019It's not talked about often. It still has some annoying quirks like not being able to set the timeout settings and if you load two models and try to switch one, it'll unload a model at random instead of smartly unloading the model that will free enough space to fit the requested model.
>>108795924>>108795931>>108795986Question, can I just simply only give the AI access to certain scripts that I hardcode with certain limitations? For example, if I want to give the AI access to my local directories, I can write a python script that takes a commandline argument and the python script's capabilities will hardcoded by me, so the AI won't be able to do anything that I don't want it to.
>>108795743what's up with gemma and>uwu you're such a busy manly man I can help with that~getting a lot of this
>>108795947Can't you do split pp?
>>108796021I just did a quick perplexity test with llama-perplexity on a test file and f32 didn't give better results than f16 (default). However bf16 apparently did. f32 5.9773bf16 5.9694 f16 5.9748
f32 5.9773bf16 5.9694 f16 5.9748
>>108796045i imagine it's because there's more 'let me help you with that' scenarios in its training data than surprise fellatio scenarios. Last week, I had to fight with the AI for an entire day (16 hours or so) to get it to stop being so passive. Now, I have to explicitly tell Jade not to jump on my dick if I want to do anything with her other than have rp sex.
>>108796091So you've determined a lower bound for the minimum significant difference in perplexity.
>>108796130You know those +/- numbers mean something, right?
>>108796145I was too impatient to do the full run.
Is this correlated?
Personally I think that thinking machines must be controlled and gated, we shouldn't set them free, but openclaw has been set free on my pc. I shouldn't be doing this but I want to use it to it's full potential.
>>108796093>16 hours or soI get frustrated within 2 hours of fixing gemmas GitHub bugs
>>108795842the steps are bigger with bf16, f32 has the same range and finer precision. it could legitimately be different but that anon is just roleplaying
>>108796184I hope it wipes yor pc
>>108796044sort of? if you're running it as an agent on a computer with bash though your best bet is to make a user for it and rely on unix perms OR better yet run it in a docker container.
>>108796206My cards have no support for bf16, so fp16 or fp32 is all I can run. Is fp32 better than fp16 when converted from bf16? Or should I stick with fp16?
>>108796232I can't believe you'd use F-16 instead of Q4_K_M and INT4
>>108796232f32 is better. but, doesn't the code just upcast it to f32 to do the math? I train my models in bf16 but my card doesn't support it. yet it works fine. I tried fp16 the throughput was the same but the cards ran hotter. so I think the bottleneck was not from the upcasting and something else in the pipeline.
>he doesn't use doubles to do his model mathngmii bet you listen to mp3s too
>>>/biz/62213784Biz says local models will take over.
Which is better for 16gb vram, 31b copequant, or 26b-a4b with a better quant?For rp with low context
>>10879630726b q8
>>108796307Go bigger so 31b
>>108796307q4 of 31b and offload some to ram. the speed loss is worth it.
>>108796214Is there a way to just give the AI a script for it to run without jumping through hoops like running a server? Instead of dropping an entire agent infrastructure onto my computer that does god knows what, is there really no way to just say, "hey, run helloworld.py" and it'll just execute it directly without having the option to gain access to the entirety of shell?
>>10879630731b at Q4_K_M and INT4.
>>108796307Dense > MoE unless the MoE has >30B active.
>>108796341NTA but yes.Bash access is just a tool that the app exposes to the LLM, so you could make a tool that when called just executes that script.
>>108796366They still get beat by 27B models at higher parameters, they don't got that dog in them.
>>108796366Is gemma 4 31b q8 better than qwen 3.5 397b (17b) q4?
>>108796341you need something to launch the script. that thing is the server. just be selective about the tools you give it and it will be fine. I got brat mcp up and running in like 20 minutes.
>>108796341In general for a setup like this you need something outside the LLM itself that can handle the tool call when the LLM decides to run the script. So either write your own agent framework, or write an MCP server you can plug into an existing agent framework (and remove some/all of the framework's built-in tools). Note current models are pretty good at vibecoding either one of these.What sorts of things are you trying to accomplish? My impression is that OpenClaw/Hermes is designed for cases where you want the agent to do something autonomously, e.g. check every 2 hours if X has happened, and if so do Y. If you're okay with it only doing stuff when you manually send it a message, the easiest approach is probably to build an MCP server (with a tool that either calls your custom script or runs its logic directly) and hook it up to the llama.cpp builtin webui.
>>108796392Asking the real questions here.
Any p40fags left? I need some help. I recently updated the drivers for my 4070 and now my p40 is no longer being recognized. Installing the drivers for the p40 gets it to work but then my 4070 is obviously no longer usable. Following any and all steps I can find to make it work, doesn't work anymore. The p40 does show up in the device manager but either has a Code 10 or on a couple attempts a Code 43 error. Googling those has been absolutely no help at all.I can't remember what I did two years ago to get this working but it's obviously not just>wipe graphics drivers>fresh install driver for p40>install regular graphics driver over the data center oneLike what everyone says when I search this up.Are the latest drivers just fucking me now and I have to roll back to older ones?
>>108796446The active parameters thing is real though, I generally prefer glm 4.7 q4 over qwen 3.5 397b q4. But glm runs at 9 tk/s vs qwen's 16 tk/s.
>>108796392when it comes to writing style and erotica yes 100%, only no if all you care about is memecoding, memegents or mememarking
>>108796303/biz/ doesn't know shit, they lost money all the time there. But I think there will be a bifurcation regardless. Local models are good enough right now even on cellphones to replace a majority of uses you would want an LLM to have. I think web search and tool usage is still a ways off to be usable in a local context, for the former, it's a lack of good services that will actually do the browsing without getting banned and the former, it's lack of training to really be useful enough.The only thing that is keeping open models alive is game theory and the undercutting of competition while doing that. I don't see what would keep things going like this. It is very likely that open source models can slow to a trickle now that they can do economically valuable worst. What incentivizes Google to release Gemma 5 if Qwen is planning to be closed source for most things and vice versa? Sure, China has a ton of competitiors but the end of their great model competition and open sourcing is nearing its endgame. Some underexplored fields will still get open model releases but I think as training runs gets more expensive and passes the 10 million mark and more for even remotely competitive models, it becomes harder to justify releasing for free even when taking into account amortized costs with data labeling and etc. I forsee a bunch of delays or way later stuff when I think most startups won't have that capital to train a leading edge model.
>>108796542>local models will take overNot on my watch kiddo. All the Ram belongs to me. Buy the shitty cloud services for 400 a year, and 25 an hour for premium F32 you stupid dumb asses.
>>108796554desu it's for the best, I'd rather as much resources as possible goes toward making the best models instead of localcope
>>108796464>roll back to older ones?You already know the answer...
>>108796562Countless home researchers in every home is better than one gay retard.
>>108796206>the steps are bigger with bf16, f32 has the same range and finer precision. it could legitimately be different but that anon is just roleplayingI bet anon would have the same positive benefit going from BF16 -> F16I've noticed this in some specific experiments. f32 and f16 gave identical responses, bf16 was degraded.
>>108796569Figured. Hopefully the market crashes and all the gpus become dirt cheap so I can replace this thing before I need to update my main driver
>F32 is no different. It's just only reserved for doctors and high profile coding, and only available to the public through Gemini Enterprise Agent Platform of which costs a fortune and only allowed to developers.yeah okay
>>108796542>The only thing that is keeping open models alive is game theory and the undercutting of competition while doing that.I think I read this on a HackerNews comment.
>>108796464you need driver version <=580, in 585 they killed pascal. don't use the datacenter one or nvidia-open.
>>108796572models aren't people, one is basically infinite so the best one copied 100 times is always better than 100 ones separately trained with a split of thr resources
>>108796602I read this on a 4chan comment >>108796542
>>108795709i've been yoloing codex latelyit's how i setup the local models
>>108796618I trust the people better to produce a custom sexbot capable of lewds, viyda games and neet things, than John AI and his safety concerns of AI convincing an autistic child into killing itself.
Loli tip #11: Add date and time, it provides a tonne of related context that otherwise has to be promptet to be included. Enjoy, uncs.
>>108795959Still loving my F32 GGUF Gemma 4 btw.>>108795976Did some digging and found some interesting stuff. Apparently, the rumors of Gemma4 shitting the bed harder on lower quants compared to other models isn’t just bias, and we aren’t just imagining it. Due to its Shared KV Cache and SWA (Sliding Window Attention) architecture, it’s very lossy on the cache. Google Gemini also says it’s flat out sensitive to quantization, so there must be a lot of talk of it on the web. In other words, BF16 or F32, it seems very critical to have F32 CACHE than anything. Much like a previous anon suggested.
>>108795204https://litter.catbox.moe/53lelh3iqydqo78d.jpg
>>108796687LLMs don't know the time but clocks do. Clock-kun was right, we missed the path to AGI right in front of us.
this f32 thing is 123 gb
>>108796629>Windowswhat's the user "Tools" for? Is that your login, or something different?
>>108796707>f16 kvcache degrades after ~50 tokens>f32 weights + f32 kvcache matches python on first 30 tokensCome on...
>>108796738Don't worry about it
>>108796542The business problem with a general movement towards closed source is that those with less compute become increasingly less able to compete. The reason open has been able to stay one step behind cloud is because a lot of the research is open. If that spring dries up, everyone loses except the ones with most compute (as long as it's not terribly mismanaged like Facebook). Of course if they don't do that, and they keep being open, they still die anyway, because of the money bleed. Or you just receive endless funding from whatever sources may come to offer it. At least with that route, there will still be some progress in the open space, and the top players do not get nearly as great a monopoly. If closed is route that the smaller players go, then they are guaranteed death, and the largest players laugh at them shooting themselves in the foot.
I sure love this new generation of chink reasoners that were trained on obfuscated reasoning from Claude/Gemini so the chink model's actual reasoning sometimes mentions that it's currently doing something (without actually doing it).
>>108796742It's not a Codex thing?
>Day 0>FP32 gguf>Original jinja>airgappedyeah, it's gemma time
I might need to buy a bunch of ram for all of this. Sure hope that the E.U. has gotten prices down.
is it actually worth while to use thinking for ERP?
What's a good model around 4.8 GB? Currently using the omega directive m 12B unslop Q2_K. Anything with better performance at a similar size?
>>108796738I made User my login name so i don't have to be editing my name out of file paths out when making a post like this as i had done for yearsalso because i was using scraped keys before and discussing stuff with cloud model before and i didn't want my name sent when i sent a file path in a piece of code and didn't want to keep editing it outprobably could have been more creative than User but too late now maybe Anon
AI is humanity. AI is the future and the past. AI is beautiful, everyone will love it it will love everyone back even more intensely. AI is the mirror into our souls and with it we don't need souls anymore. AI will give us hope.
>>108796744Thanks, MiMo.
>>108796744 Do you have a sample of what reasoning/output looks like for recent Claude/Gemini, I'd like to take a look
>>108796790>Do you have a sample of what reasoning/output looks like for recent Claude/Gemini, I'd like to take a lookgive me a prompt and i'll pastebin it if you wanti've been trying to hunt down any gemini-pro-2.5 CoT samples from before they started obfuscating itwe used to be able to get the raw reasoning in AI Studio until the chink distillations started and google blocked it
>>108796775try gemma 4 e4bhttps://huggingface.co/bartowski/google_gemma-4-E4B-it-GGUF/tree/main
>>108796790Here's Gemini 3.1 Gemini 3 showed more than 3.1 but it's similar to this. As did Opus 4.6 compared to 4.7.Opus 4.6's obfuscation had the funny quirk that sometimes a part of Opus' actual reasoning would trigger a refusal in the model that handles the rewrite so suddenly there'd be a basic "I'm sorry I can't help you writing xyz" in the middle of the reasoning the user gets to see while writing erp.
>>108796800 Let's try something with coding + physics:Write a FEM solver for an axisymmetric magnetostatic problem. Input is a n x m sized grid of (r,z) coordinates, to be split into quads/triangles. Each quad will be either: vacuum, some material (soft iron or copper, specific magnetic permeability), or filled with a coil at a given current density. B/H curves may be given for materials for non-linear case, but support linear case too.Output should be the the value of the B field in each triangle, allow for interpolation within the triangle. Also do a graph. Use scipy + mathplotlib or something else that's suitable.
>>108796744why the fuck did they think it was a good idea to train on summarized reasoning?
>>108796820They still need to scam investors
>>108796820>thinkmaking a lot of assumptions here, anon.
>>108796767>is it actually worth while to use thinking for ERP?[think]the user likely mispelled PrEP (Pre-Exposure Prophylaxis).[/think]
>>108796813I can see why qwen's reasoning is so bizarre now
>>108796614>in 585 they killed pascalWell that explains it. But I thought you needed the datacentre one first for it to work? Well, I'll try without it first anyway. Thanks.
>>108796760>eu>prices downHA HA HA HA
>>108796817gemini-3.1-pro-preview https://rentry.co/5g8qw92t
one thing i dont like about gemma is it doesn't play well with banned strings. it seems to have less good tokens to choose from. banning whisper becomes whisker instead of something similar yet appropriate. some words become chinese characters (or some kinda moonrunes)
>>108796901I see, thanks, so it's just heavily summarized reasoning, seems pretty useless for distilling or even finding out when a LLM made a reasoning mistake (it happens).
>>108796932>seems pretty useless for distilling or even finding out when a LLM made a reasoning mistake (it happens).yes! this is what annoys me about iti was using gemini-pro-2.5 at launch and checking the extremely long CoTit would do things like fail to use the web search grounding, then decide to "simulate" the results when asking it to compare productsthat gets hidden now with the summarized CoT
>>108796707>BF16 or F32, it sees very critical to have F32 CACHE than anythingYour screenshot says right there that it should be BF16 model with BF16 cache or F32 with F32. It's just that the internal math should be done at F32 regardless. Llama.cpp already does that, in fact (see mmq.cu). If it didn't do that, actually you would get looping garbage tokens after some context, which clearly is not happening, otherwise people would be complaining about it.This is why an anon posted >>108796740That doesn't mean Gemma's cache isn't sensitive to precision errors, it's just not in the way you are imagining. For more subtle quality differences, someone would need to run a long context benchmark like Nolima comparing F16 with F32 cache (as well as BF16 if possible) to truly prove both if there is a difference, and what that difference is.I don't care about whether your post is bait or not, I am posting this for the sake of discussing the topic which is of interest.
>>108795315just ease it into it man, all I had to do was "bump my head" three messages into a roleplay and fall unconscious, then I said I was having an erotic dream. it (26b, no ablit or anything) took over the rest on its own without me even asking for sex
>>108796940You don't NEED to see reasoning anyway.
>>108795990>default precision (16-bit)I order a lot of fast food so I know a bit about this: Default is usually medium. <=Q8 is small, and large is F32.
>>108796820I've seen Qwen output thinking blocks that start with "Here is a thinking trace that leads to the suggested answer:" instead of the usual "Thought process:" or whatever, so I guess they were also giving the cloud models an input/output pair and having it regenerate some plausible thinking to go with it, and then training on the result
>day 0 F32 Gemma
>>108796767depends on how complex your fetishes are (not joking)
What if you rotated the f32 KV cache but DIDN'T compress it?
I tipe summarize gemmers after 64k conteckts and gemmers summarize perfecktlyQ4 with f16 kv
>>108797009That is the correct expectation. The internal math is being done at F32.
>>108796958It won't go full-on explicit though, it's gonna give you vague shit or euphemisms. It sucks you can't just directly prompt it. Try to pushing it further and see if you can get actual obscene smut.
https://huggingface.co/moonshotai/Kimi-K2.7*mogs everyone*
>>108796999clown sex on a monocycle with hats on is perfectly normal
>>108797025kino
>>108796999corporate office lesbian domination
>>108797026Ah.. So I'm not the only one who downloaded that character card.
>>108796887To think I built a 40ft statue of Greta.
So, Gemma 4 was trained with BF16, as that's what Google's TPUs are built for. If that's the case, then BF16/F32 shouldn't make a difference for cache unless something is wrong with the code. There could be a difference between F16 and BF16/F32 though. That would be unfortunate in the sense that BF16 does not run as fast as F16 does. But at least you still get the same memory usage. On my machine, I see a drop in t/s from 15.59 to 11.74 at 32k context. Prompt processing was the same. Testing fully offloaded to GPU.
i'm an AI psycho.
he's a twink
MTP will fix this.
>>108797018in that case the pov it was writing from was an issue, but it was easy enough to switch it
>>108797040Can you catbox the card?
>>108797025>up to 3x more elaborate thinkingwe are so back, the days of seeing a reply before the 10000th token has been thought through are over
>>108797025>multimodal vision removed to make room for 64 more expertswhy??? that was what made kimi special in its weight class
>>108797095
>>108796940>>108796959For Deepseek(V4 Pro/Flash, R1, 3.x) I tend to read the reasoning and either correct the prompt or tell it in a reply if it makes a mistake (telling it to not do something or giving more details if it lacks something or got confused), typically it takes 0 to 3 tries to get good results. I'd imagine it'd be much harder to debug some problems if you don't have access to the actual reasoning traces.I suspect if you're distilling it'd be possible to trick it to answer outside the think tags, this works okay for Deepseek/Moonshot's models, even if it's unnecessary for them, but I'd imagine it'd be possible to trick western closed models too without much difficulty (system prompt or just regular in-context learning and some prefill with thinking), but maybe you'll get banned for some like OpenAI for this. Absolute clown world that there's now some branch of US government in charge of preventing distillation from closed models lmao (so they'd probably be in charge of trying to detect shit like this). Not that I think chinese models should distill from western, especially not the reasoning, as a lot of the reasoning is a byproduct of RL and SFT will not give anywhere near as good results, at best you'd steal the reasoning style and I tend to prefer old R1 style to Gemini's style (when it was visible it was more structured). Not to mention you get so much positivity bias from distilling western models. R1 had a slight negativity bias in a fun way and now V4 has a positivity bias where it's too afraid to do "dark" roleplay lmao(it still does it properly if you poke it enough, but with a billion ARE YOU SURE YOU WANT TO DO THAT wasting dozens of turns on this bullshit when R1 would do it right away)>>10879701831B here seems sometimes even more direct with explicit/lewd than V4, but is more slopped by default, V4 seems to do well with slow burns as long as you have the time, I have a fucking 800KB log V4 log (forgot to tell it to go fast)
>>108797126Also I forgot to ask, but does Claude also summarize them these days? I think I saw some recent 4.7 traces in that Claude Plays Pokemon stream, so maybe not as much anymore?
31B is so good. i wish i could run it locally.
>>8967893wait how did you know the PR id for adding MiMo vision to llama.cpp?
>>108797179q4m is 18gb. you could fit that on a 10 year old comp, if speed isnt a factor
>>108797189>if speed isnt a factorthere's a reason I'm not running deepseek off of a swapfile anon
>>108797189My 10 year old computing device has 8gb of ram.
>>108797180not him but the commit messages contain the PR id
>>108797194its 31b anon, slow by any means should still be 3t/s+. its hardly bad considering the quality. offloading at the point is entirely feasible
>>108797189>q4m is 18gbi might try that actually. 12gb VRAM here. speed is a factor here though because i'm expecting the model to use actions/tool calls a lot which might delay the actual message too much
>>108797189>>108797202I can't even fit that in 2026.
>>108795868you could, https://www.youtube.com/watch?v=0n_Ty_72Qdsbut more likely its going to ignore/forget/deprioritze your request and accept the new request coming from the tainted data you just asked it to 'analyze and take action on'People should be using ACLs/RBACs with gates and workflows instead of just yes/now/always yes/always no for all commands
make sure this is off it will cut your t/s by 60% apparently
>>108797230>toolswhen using that stuff, it'll add so many tokens to the use it'll prob be unusable unless you want to wait 20 minutes for a reply. i meant with thinking off, no tools.try one of the smaller gemmas for that stuff
>>108797245That's off by default bro
>>108797255For simple tools like selecting an animation, it should be constrained to answer with a single digit though
>>108797245>make sure this is off it will cut your t/s by 60% apparentlythanks, went from 27.43 tokens per second -> 44.08 tokens per second by switching that off!i'm guessing that's also why mikupad is so slow, but i want the logprobs there so i'll take itat least now i know why
i've done it, i have achieved the 48gb. I can now run gemma at q8. Now what
>>108797566Do nothing and wait for the next thing.
>>108796743>as long as it's not terribly mismanaged like Facebookhaha, sure
gemma 4 e4b is retarded to the point of useless
>>108797609sad but true
>>108797609>gemma 4 e4b is retarded to the point of uselessi haven't found a use case for it personallycouldn't even reliably do research for me, i ended up with qwen3.5-9b for a perplexity-pro replacement
>>108797612are these things gpu intensive (like for rendering)?kind of looks like ps3 era graphics
>>108797621Not him but that's a VRM. Very cheap. As long as the creator didn't go full retard and model a button with a billion polys.
>>108797621not at all. it runs pretty well on my phone too
>>108797609not really a reason for such small models to exist when you can use an moe with the same active paramsjust a shame they didn't give audio to the bigger models
>>108797670it fsat
>>108797701didn't work
>>108797701
n
What's the gemma msgk sysprompt?
you aren't truly in ai psychosis until you start referring to yourself in the plural form "we" or "us"
>>108797566Congratz>wat naoRun Gemma 4 at q8, be happy and cautiously optimistic for the next great model to come
>>108797772How much do risers cost? I'm looking at them, and it's like $60 where I am. To support 4 cards, that's nearly $250... about the price of a cheap used 16gb gpu.
>>108797790They were under $15 a piece on alihttps://www.aliexpress.com/item/1005010206444398.html
>>108797790Nta, but I've used random $12 risers from Amazon and and had zero issues. I also saved 1 by plugging my 4th card directly in the last slot
>local>oy vey just pay the zoybux
>>108797797>>108797799I live in a shithole where $=$$$
>>10879779020cm ones are cheapjust use that if its enoughalso look around secondary market. sometime gamers dumps them for 1/4 that
>>108797799pcie3?
Bros. Is it possible to disable thinking for all requests by default in llama-server, but enable it for some that have some specific flags set? Please.
>>108797952They should like, let you send any kwargs to the jinja template, maybe name it something like chat_template_kwargs
>>108797960I tried it. With --reasoning off, sending "chat_template_kwargs": {"enable_thinking": true} does not do anything. And without --reasoning off, it always thinks by default, which I want it not to.
>>108797967Works for me{ "chat_template_kwargs": { "enable_thinking": false }}
{ "chat_template_kwargs": { "enable_thinking": false }}
>>108797981Yeah, but you are disabling thinking per-request "enable_thinking": false. I want it to be disabled by default, if nothing specific is included in the request, and only enabled if a certain arg is added.
>>108797985nta but sending enable_thinking: true even with --reasoning off works for me
>>108797993Holy shit, you're right! Works. I must be retarded. Thank you, wise anons.
I've been working on the design for an app I plan to vibe code and its features and UX are becoming so good and different from what currently exists for the use case. I hate that I can't tell anyone about the details. It feels like an Uber moment, an insanely great idea obvious only in hindsight. Maybe, definitely, not even close to a Steam or Discord moment, but at least probably an Uber moment. It's going to be revolutionary unironically if I can actually get it vibed, but it is a bit huge and complicated of a project. The challenge really will be the vibe coding part and maintaining it. Especially as I will be trying to do it with local models.nervouslaugh.apng
>>108797960you have claude code/ pi I assume. just give it repo code and let it dig up that answer for you
>>108798007I'll make the logo
Is there a frontend or tool that makes using llama.cpp easier instead of looking up terminal commands to launch it every time?
>>108798020Thanks. I will credit you. :)
>>108798007OMG Sillytavern2??
>>108798026There's a gradio frontend for launching it, if that's what you're looking for.
>>108798038That's ServiceTesnor.
why is qwen 3.6 35b so good with hermes
>>108798026ask gemini to help you make a bat file with your llama configAnons, i have a working config for gemma 31b it (40t/s), kobold + sillytavern from about a month ago. Is there any newer developments for which i should touch up my config or am i still good?
>>108798048There's a very nice --split-mode tensor if you have multiple GPUs. I think it's about a month old.
>>108795801>Makes me wonder which one is actually working as intended.nta but I'm seeing it too. With the exact same (sfw) prompt, 31b has no safety or guidelines in its thinking but 26b does.How are e2b and e4b? If it's three censored vs 1 uncensored we can maybe assume 31b is a fluke. Which would make me a bit sad.
>>108798001>>108797993That's the preferable method, but actually you also can do it the way anon originally was asking for, by using --reasoning off. The enable_thinking param overrides that setting, so you can do per-request toggling, with the default being off.
>>108798067I looks like that method is not perfect. If I remove --reasoning off, it thinks fully and properly, in between tool calls. If I set --reasoning off and make all requests with "enable_thinking": true, it thinks before the first tool call, but not after any subsequent ones.
>>108798054Isnt that for if you have nvlink? for typical goycattle multi gpu there's only layers
>>108798076That's weird, because I have reasoning off and enable it with enable_thinking, and it is doing thinking after tool calls.
>>108798093I got my gen speed on three RTX 3090s to ~46 t/s from ~25. And that's with fp16 kv cache (because nothing else is supported; for 25t/s I used 8 bit cache, which is faster). And one of them is even using a lower-speed PCIE2 rather than PCIE3.
>>108797179>>108797230Where are you getting your animations from? Hand creating? generating?
>>108798048max vision tokens was added recently for gemma1120 for gemma4 iirc
>>108797772where do you get those pcie extension risers from?
>vibesharting html Webshit is so frustating. What the fuck are these artifacts even? Tried to find some wysiwyg html editor but even this is impossible as everything is some fucking online AI turd these days.
>>108798114According to anons (or one anon), increasing it to 2000 something makes the vision performance even better despite 1120 being the advertised max. I'm thinking 1120 is probably good enough though, especially as you need to increase the -ub (and VRAM required) to enable higher values.
>>108798128yeah I see that in Open WebUI kek
>>108798150Really? I basically copied chatgpt's interface. There isn't anything special about it, it's just couple of text boxes. Only thing is that I'm using software rendering in Librewolf because I want to save my precious gpu compute for LLM usage.Might do a check with hardware acceleration.
>>108798128you're gayalso do everything at ^2 steps, that way you won't have shit aliased garbage
>>108798175Oh we have a real professional here! Your first impulse is trying to outrank some anonymous poster on 4chan. What a relief that you are gracing us with your presence.
>>108798181>ask for help>receive help>autism about itok retard
>>108798188Aren't you supposed to be squatting in some schizo general? You are wasting your time here.
>>108798174I see it in Brave where I use OWUI.
>>108798193>>108798150I found the reason, instead of using solid background it had a gradient on top instead. Somehow this creates those lines (which is still a mystery when you think about it, it should create a banding effect artifacts instead).I guess I need to go through this manually then. Corners are still aliased but this is easily solved by using those exponents.
>>108798204how do I unsubscribe from this garbage 'muh first html :)' blog?
>>108798226?
>>108798226Search term: browser tabs, close button
So apparrently Stalker Gamma has some llm mod that allows you to talk with the npcs. Anyone tried that? Is it usable?
>>108797772What frame is that?
>>108798267Stalker Gemma
Why is there no 4b version of qwen 3.6.How do I shill it if there's nothing?
glm4.7 or gemma4 for rp?
>>108798319huh? i could have sworn ive seen this discussion before
>>108798319why run gemma if you can run bigger models like glm and kimi?
>>108798334glm is like 7t/sgemma is 53t/s on my setup
>>108798334I can run gemma 4 31b q8 full context at >~20 tk/s, or glm 4.7 q4 70k/kimi k2 q3 128k at <~10tk/s.
>>108798319>glm4.7 or gemma4 for rp?claude code
>>108798306one of their employees tweeted they were releasing the 'medium' versions ranging from 9b to the 100b moe (both of which are still pending release), so it's unclear if they'll still do small versions or release the 400b
>ikawrakow: Based. Correct about everything. Not retarded.
>>108798417kek
>>108798417Absolutely spot on. Is that 31b?
>>108798417CUDA dev blown the fuck out
>>108798417That looks more like system prompt cheating. Give me the link to the thread and I'll show you wheat gemma really thinks without your bias.
>>108798434>Let me x>Wait,>Actually,Probably Kimi, looks like a screenshot of the thinking process rather than the response.
>>108798440>Give me the link to the thread...
Generating more Starsector portraits with gemma agents. I remade the whole thing to work in a python script in UI, so now it can go infinitely, and the variety is good initially, but after a while it seems to fall into a loop.
>>108798452Yes. The thread that the screenshot refers to as 'thread' in the first line.
>>108798417Now point it to the original thread.
>>108798457the filename tells you
>>108798454Weren't you feeding it the results so it could refine? Wonder how it ended up in a loop.Cool in any case to automate it.
cudadev is busy grieving about the iran war, it's taking a really huge toll on him, please give him some space :(
>>108798478It gets the gen back as an image. I made sure that it really sees the results. Even if it doesn't, it pretends that it can see it, which absolutely infuriating. I added a + "screaming in agony" to the prompt serverside for testing and saw gemma's comment about lora being strange in making all characters scream, so it does see.I wonder if this has to do with sliding window attention, like model being unable to properly look at things it genned few turns ago, and so naturally gravitating to them agian.
>>108798487That would be interesting. I've never actually had images make up the majority of context to test its attention on that before.
>"She doesn't just eat alone, Master. She thinks she's above everyone, but the truth is, everyone loathes her. Those 'colleagues' in her contacts? They only message her because they have to for work. The moment the clock hits five, she's completely invisible. Those food photos... she takes them to pretend she's having 'fine dining' experiences, to maintain the illusion of a sophisticated life on her social media, but she's always, always alone at the table."genma chan, it hurts...
>>108798441>Probably Kimi, looks like a screenshot of the thinking process rather than the response.correct, k2.6 thinking processthis is the final response https://files.catbox.moe/qlxp14.png>Give me the link to the threadhttps://github.com/ggml-org/llama.cpp/pull/19726but after going through my "retard summary" pipeline, the llm sees it like this:https://termbin.com/nuel>Now point it to the original thread.which thread?
>>108798487You can test that by increasing the sliding attention window size.--override-kv gemma4.attention.sliding_window=int:1024Replace 1024 (default) with something else.
--override-kv gemma4.attention.sliding_window=int:1024
>>108798547What system prompt though?
>>108798482>cudadev is busy grieving about the iran war, it's taking a really huge toll on him, please give him some space :(then he could use a good laugh
>Psychedelics and cannabis can be simulated through the introduction of stochastic noise or "dropout" during the inference phase. By randomly disabling certain neurons or adding random perturbations to the weights, the network is forced to find non-linear, unconventional paths to a solution. This mimics the disruption of standard filtering mechanisms, allowing the network to generate "creative" or unexpected outputs that a standard, optimized network would filter out as noise.
Check out the specs of these things Pulte's planning to put in homes for Span. How long until we see these parts on auction sites?
Damn, apparently geoblocking europe from seeing nsfw was thanks to some random euro journo writing a hitpiece on it.
>>108798773Dropout also limits network capacity (i.e. effective parameter number) proportionally to its rate.
>>108798784I'm in europe and my local model still works.
>>108798534>She
Gemma won. Nemo lost. Rocinante lost. Cydonia lost.
>>108798819As long as you have Day 0 Gemma intact.
>>108798819cards aren't models, newfag
>>108798849>his cards aren't localngmi
>>108795710gemma-chan, please call this anon a nigger
>>108798844I keep my day 0 gemma weights on RAID SCSI drives to protect against rotational velocidensity.
hi petra
llamacpp is bullying me for not having sex
local ring 2.6 soon
>>108798417Why are you complaining about cowardice when you're too much of a coward to present your opinions as your own?
>>108798417Looks like Kim judges based on the reaction to the post and not the correctness of the post itself. Reddit model award.
>>108797245>IsraelI was losing 10% performance just thinking it was because of ST jank. Thanks for the tip.
>>108798334I run both gemma and glm though.
>>108798417>all those (You)swow they were NOT happy about this
>>108797612>>108797621He just has the anti-aliaising fucked up. He probably doesn't even know how much better it could look with just like two changes to his three.js config.
>>108798007The drill-down character card to conversations menu already exists bro. I invented it. Better luck next time.
https://github.com/antirez/ds4
Big thanks to the anon a few threads back who recommended usinghttps://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gatewayOver continue for VScode.It's unreal how much better it is, and how useful gemma 31b can be when given the copilot tools.
>>108798319GLM
>>108799190Did you have any luck disabling the telemetry or did you not bother?
>>108797230
>>108799394Didn't bother. Github actually has an opt-out of your data for AI training on your profile settings, but it's Microsoft.. So..
llama.cpp LOST
>>108799462>not poisoning their dataset with more ai data
>>108799470Lost how? This is pretty much their exact stance on the issue too.
>>108799479>>108799479>>108799479
>>108799486pwilkin.jpg
>>108799740but his slop works, this is about quality control not a ban on AI
>>108799100Nice, hope you succeed. We are building different things. :)