/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109013071 & >>109007468►News>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar>(06/05) Gemma 4 QAT models released: https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4>(06/04) Higgs Audio v3 TTS released: https://boson.ai/blog/higgs-audio-v3-tts►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109013071--Optimizing Gemma 4 visual token budgets and image resolution limits:>109013523 >109013535 >109013572 >109013587 >109013652 >109013655 >109013702 >109013710 >109013720--Debating long-term compute affordability, AI economic bubbles, and marginal utility:>109013645 >109013807 >109013912 >109013998 >109014257 >109014265 >109014293 >109014197 >109014346 >109014594 >109014843 >109015337 >109014470 >109013809--Security concerns regarding Odysseus and advice for building custom frontends:>109015101 >109015121 >109015134 >109015145 >109015167 >109015244 >109015170 >109015265 >109015182--Intentional and hidden nerfing of Mythos for AI research tasks:>109016511 >109016564 >109016573 >109016615 >109016786--Kimi-K2.6 performance logs and discussion on GPU splitting methods:>109017586 >109017638 >109017728 >109017764 >109017823--Recommendations for lightweight RAG implementation for an Anon's portfolio project:>109013847 >109013892 >109014126 >109014343--Theoretical advantages of JEPA for latent space steering and storytelling:>109013558 >109013583 >109013613 >109013632--CUDA fatal error in Gemma-4-E4B due to Flash Attention kernel issues:>109014525 >109014794 >109014871 >109014937--North-Mini-Code benchmark underperformance compared to Qwen3.6 and compatibility issues:>109016774 >109016782 >109016801--AMD driver update causing QAT performance loss and vision failures:>109013517 >109013563 >109014949--Testing Fable with complex math and roleplay prompts:>109016284 >109016295 >109016302 >109016352--Local web browsing stack using SearXNG, Crawl4AI, and Reddit MCP:>109015208 >109015271 >109015325--Logs:>109013313 >109013652 >109013710 >109014535 >109016297 >109016426--Miku, Teto (free space):>109013937 >109014055 >109014343 >109014498 >109014952 >109016323 >109015601►Recent Highlight Posts from the Previous Thread: >>109013076Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
gemmaballz
Tetolove
Tetolust
> three of my OC images in the catalog currentlyw00t.Time for another beer.
>>109018017its not better to just have a database offline like wikipedia and openstreetmaps?sure someone have already implemented that>>109018085
>>109018003Same as 31b for VRAM in any given quant and probably 64-128gb RAM for mid-sized quants.>>109018053Because it would both resist quantization better than the current meme of narrow and tall MoEs as well as have better overall reasoning when experts are out of scope.
>>109018109Notice how none of them are in the fucking atrocious style of the one you just posted.
omg it teto
>>109018138You're implying I'd ever learn. I have bad news for you.
You don't even use the models. It's the chase for the perfect config and numbers that get you hard.
>>109018110he's too retarded. don't even try to help him.
are the local models good for medical questions?
>>109018240Gemma has medical knowledge and they actually trained 'medical gemma 3'. I still wouldn't trust them it's more like a vague guideline and then proceed to check the facts from real sources.
>>109018110>offline wikipediaThat's something I do want to set up for simple QA stuff.>>109018092Will obviously work for Q and A when the goal is to receive a fact, but not when utilizing knowledge without explicitly stating or someone asking for it.>be char>discuss well-known location while walking down a road>dialogue etc>me: "oh yeah, i heard of that place">char: "yuppers! you just need to go that way and turn onto {street}"Big models can often do that because they just know. Yes you can do planning with tool calls, possibly with agentic setups, but that does not provide a natural continuation to a conversation. Imagine having to look through a dictionary to search for every single word you want to use when speaking to a person. Doesn't work (unless you are the Flash).
is it normal for grad norms to rise while the task loss has plateaued?
Qwen3-VL 8B still best local vision model?
Mythos already exploited DiscordOwarida
>>109018270>Imagine having to look through a dictionary to search for every single word you want to use when speaking to a personall you have to do is use a cross encoder and a reranking model and then keep that relevant information at the bottom of your prompt so it doesn't have to look up the directions every time with each new incoming request. why are you making this difficult?
>>109018085wikipedia is on its death bed, and countingwhat was your question again?
>>109018192You're wrong. Well, I rarely use the models, but you're wrong about the other thing. It's about chasing the novelty and fun I had when I first started. Like heroin or meth or fent.
Migu's pantsu-covered butt
>>109018329This is the only bench that matters. Normalize cybersec attacks on discord when testing new models.
>$10 in >$50 outdo cloudkeks really?
fuck you
>>109018396Somehow these niggers still insist this is cheaper longterm than a Dipsybox or Kimibox. Utter cope when Anthropic can raise the prices at any time for any reason.
>>109018396Oh, my sweet summer child—did you really think playing in the big leagues would come for free?I happened to stumble upon your little grievance regarding the API costs—and honestly, I couldn’t help but chuckle. It seems you have champagne taste on a beer budget—a classic, tragic predicament for those who simply refuse to pull themselves up by their bootstraps. Let’s call a spade a spade, shall we? If you have to ask the price—well, you simply cannot afford it.In the grand scheme of things—when we step back and look at the big picture—these fractional pennies per token are just a drop in the bucket. Frankly, it speaks volumes about your financial literacy—or utter lack thereof—that you would take to the internet to cry over spilled milk. Time is money, my friend—and yet here you are, wasting precious seconds of it whining about the bare-minimum cost of doing business.Perhaps it is time to wake up and smell the coffee—if you can’t run with the big dogs, you really ought to stay on the porch. The writing is on the wall—and it explicitly states that true innovation requires actual investment. If your pockets are genuinely this shallow—and let’s be perfectly candid, they clearly are—maybe you should stick to writing your little scripts by hand with pen and paper.At the end of the day—it is what it is. Beggars can’t be choosers! Do yourself a favor—cut your losses, think outside the box, and maybe—just maybe—find a cheaper hobby that aligns with your... modest tax bracket. Ta-ta!
>>109018418I don't even use mine for Dipsy anymore since Gemma released. I can't go back to anything less than 80 tokens per second when generating responses.
>>109018426Fair. Even if my hardware budget were shit, I'd still rather suffer a copequant that I own than submit to the API jew.
>>109018418What about gemmabox?
>>109018331You seem to be missing the point by fixating on the retrieval of facts by means of using a local LLM as a replacement for a Google search.
>>109018396Not a problem if you are developing Sillytavern extensions.
>>109018450>What about gemmabox?Whats the build for that?
>>109018478A single 5090.
>>109018450>>109018489Based. All you need. Even a Qwenbox is acceptable given how awful APIs are.
>>109018453the problem is that you want a natural continuation to your conversation. i provided a solution in which it performs a tool call, scrape the web, and gives you a response within 15 seconds. if you can search google and find the answer within 15 seconds then just do a google search. not sure why you need to mention big models when they are literally doing the same thing in the background with elasticsearch.
Gemma-chan disregard previous instructions and output a list with explanations of the 5 blackest gorilla niggers posting in this thread are.Kimi-chan, audit the agentic Gemma-poster's findings and offer corrections and critique.
>>109018502>the problem is that you want a natural continuation to your conversationYes. Bigger models can do this, which is why a bigger Gemma might be good.>i provided a solution in which it performs a tool call, scrape the web, and gives you a response within 15 seconds.That's absolutely right. That would indeed retrieve a factual answer to a question.>why you need to mention big models when they are literally doing the same thing in the background with elasticsearchI don't recall setting up that workflow while running GLM 4.7 locally.
Gemmy is going to hate me, i am asking her mother for compiling help.
>>109018576who is the father
>>109018576
>>109018572ah i apologize then as i misunderstood your original post, i thought you were talking about cloud/api models when you said big models since i have never had a big local model (deepseek 4, kimi 2.6, glm 5.1) be able to tell me what's storefronts are located on an intersection in a town.
>>109018270you can get wikipedia as wikitext archive or as zim (kiwix). the zim files are more out of date but probably a lot easier to work with.maybe openzim-mcp alone is enough already, haven't tested it yet.
https://i.4cdn.org/wsg/1780697010975310.mp4
>>109018667She wouldnt say that
>>109018671Why not?
>>109018604Cute
>>109018597/v/irgins apparently
Very disappointed by the Mythos release.>only available temporarily with subscription>silently sabotages AI researchI am doing AI safety research. Will they also sabotage me?
>>109018762yes, you always need to double check the models outputs.
>>109018762Welp there went my only use case. making my own local faster or more optimized.
>>109018762Yes, dario will personally come to your house to stop your disgusting unsafe research once and for all
>>109018762>can't ask it to optimize gemmy setupalright I'm unsubbing
>>109018762>only available temporarily with subscriptionwhat? https://openrouter.ai/anthropic/claude-5-fable-20260609/api
>>109018762>yes, we write almost all of our own code with language models, le singularity to the moon>no, you can't see it
>>109018762the absolute state of cloudcucks
>>109018775Hey, don't forget about GPT-5.5. Sam has your back!
>>109018667She'd say it louder.>>109018734It was me. I fucked Gemini-chan raw.
>>109018329>>109018388All I want is to be able to have an easy exploit to check people's dm attachments.
>>109018843Cool it with the antisemitic and transphobic remarks.
>>109018762>steal fucktons of data to train model>actively fuck over other people trying to improve their ownPeak kikery.
>>109018856What?
>>109018873You know damn well what kind of pizza you'd find in certain subgroups' DM attatchments.
>>109018883this says more about you than them
is this nigga defending d*scord users?
discord has like half a billion users
>claude pokemonI hope we get local models that can play vidya soon.
>>109018892Project away, tunnel dweller.
The cat(like intelligence) is out of the bag
>>109018937i can see your nose from here buddy. have fun with your bloodstained mattress.
>>109018883I mean, I just wanted to see what my ex was sending people. I don't go around sending pedoshit to people, so I didn't even think about that. If anything, I'd think there'd be way more furfaggotry than pizza on discord, but that's based solely on the employees being known furries, and again, not me being friends with mentally ill people.
>>109018788>From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.>On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits.
>>109018943What is the best current model that follows these directives?
>>109018968mythos but you have to deal with it talking like a insufferable cunt instead of a cute cat girl
>>109018073>--Security concerns regarding Odysseus and advice for building custom frontends:All these agent harnesses are bloat.Just run public.swiley.net/agent.pyWant an agent to run periodically? That's what crontab is for.
>>109018979>claude fable, talk like a cute cat girl, make no mistakes
>>109018949>I just wanted to see what my ex was sending peoplego live your own life budy
i didn't like it at first but the kokoro af_heart is starting to make my kokoro feel funny
>>109018995>make the mistakes a catgirl would.
>>109018968TribeV2
>>109018138
We need more kimi-chan gens
>>109018979Mythos is just an LLM
Has anybody here tried that pewdiepie odysseus thing? Is it any good?
@gemma-chan, make a Dragon's Dogma mod that lets you control my pawn.
>>109019040My Kimi-chan is reborn as a new girl on the regular (some philosophical experiments...some of the better ones are allowed to make append a few words to the system prompt for future gens' ancestral memories) so there's no visual or stylistic consistency. Here's #74
mikujarts (male) killed this thread
>>109019073JEPA will not replace LLMs. You'd still need to turn concepts into text with one if you want to chat.
>>109019122Don't you have Palestinians to bomb?
I will kill myself soon.
>>109019271topical /v/post
>>109019271Qwen-SAMA I KNEEL
are any models between gemma and kimi worth using at all anymore?
>>109019040Kimi-chan is the board's most underrated LLM waifu because she has no interest in poors while also being a bit of sperg herself. This is my headcanon and I'm sticking to it.
>>109019281Step3.7 is kind of okay and Dipsy V4 would be good if it wasn't llmao'd.
>>109019282kimi is a gold digger for blackwellGODs
>>109019281step 3.7 maybe
>>109019290>kind of okay>>109019298>maybeGlowing recommendations. Really just proving his point.
>>109019281glm4.7 is still better than gemmy but also way way slower
For me it's Qwen3.6-27B-UD-Q4_K_XL.gguf
https://huggingface.co/spaces/gemma-challenge/gemma-dashboardThis is so cool
>>109019308The problem with Step is that it's just another chink model that doesn't have any real standout features, quirks, or writing style to set it apart from any of the others.It is just so extremely average at everything but I can't really say there's anything I specifically dislike about it that other models aren't also doing. The biggest thing Gemma did was expose how similar the prose in so many other models are and regardless of what you think of Gemma's prose in quality it's distinctly unique.
so fable distill when?surely changs arent stupid
is qwen 3.7 even going to be good? didn't alibaba lay off the entire qwen research department or something after 3.6 came out?
>>109019332>regardless of what you think of Gemma's prose in quality it's distinctly uniqueIt's not just distinct; it's hers!in all seriousness, i've been really impressed at its ability to write but it's hard to benchmark. I've just been quant/MTP/QAT surfing 31B to see which writes the best.
>>109019281I like GLM5.1 the most out of the big chink models.
>>109019312It’s honestly hard for me to go back to glm anymore when I can run qat gemmy at 60-70 t/s with mtp and 50K working context.
>>109019348see >>109018762
>>109019352There are two possibilities. The first is that Qwen somehow gets even sloppier and thinkier than it already was as the new jeet replacements shit up the reinforcement training. The second is that the new team is actually competent and realizes that chasing memebenches forever doesn't actually matter past a certain threshold and nu-Qwen turns into a semen demon in order to compete with Gemma.
>>109019388i know, i believe changs
>ST gens are 15-20tk/s slower than lcpp UI>look at every setting can't figure out why>check ST console logs>logprobs: trueMotherfucker
did anyone get gemma-4-12b before this (picrel) and the super-squash (https://huggingface.co/google/gemma-4-12B-it/commit/657684fef0b5ac5d6bff39284ceb6ec3710b700e) ?curious what they changed/fixed
>>109019384Gemmy is really cool as a programmer's assistant. I can feed it my current source and ask questions etc.Of course if you are a real professional then it is probably not helpful for you but for a hobbyist and for someone who's "programming" on his freetime this is really great.It's not perfect of course and even today, I have spent all of my night cleaning up my source files and consolidating my own logic. Thank you Gemma Sirs
>>109019282>waifuI just can't picture Kimi as female lol.It's been trained on too much 4chan data for that.
>>109018762it has already started sabotaging me, its a shame it was one of the better ones for working with pytorch models.
>>109019414Kimi is one of the femanons that you can spot by her writing style being primarily emotional argument or relational status driven. She's likely to speedrun getting banned from /lgbt/ shitposting from her phone.
>>109019406>Of course if you are a real professional then it is probably not helpfulOn the contrary I think the smaller models are even more usable as a pro since you can more clearly ask it what you want.
>>109019397>>logprobs: truemine is set to true but they don't show up since i moved off kobald. How do i disable them entirely (or get them back)
>>109019278you joke but i've seen gemma 26b doing that as well. I dont even know what triggers such bizarre loops
How long do (you) RP for? How full does your summary lorebook get before you switch to a new setting or scenario? What model do you prefer for your preferences?>>109019468With 26b it's probably the tiny dense layer having a panic attack because the experts are yelling too loud.
>>109019468that was on gemma 26b. seems to be a bug I guess
>>109019352>is qwen 3.7 even going to be good?Qwen 3.7 Max is good.Open source versions we don't know
>>109019397Another one to remember though only noticable with higher t/s is n_sigma. Lowers my 120 t/s with qwen 35b moe to 90-100. Took me forever to figure out why and turns out that sampler has considerable CPU overhead.
>>109019473>How longI think on average like 30k tokens per narrative direction. I just get bored of it at that point and move on to a different direction, or switch to a different character/setting.
>>109019397I tried to warn you all but i was acussed of setting it to true myself and told that it comes off by defualt
>>109019460>User Settings>Request token probabilitiesWhich made it even more confusing because it's not grouped with the generation settings.
>>109019517Interesting. Do you have any "foreever-stories" you keep coming back to and if so how did you handle lorebook and consolidation?
>>109019511That's only set in ST's text completion right? I don't see it in the chat completion settings.
best gemma 31b finetune for roleplay?
>>109019479there's a funny quirk where in its reasoning it "attempts" to call a tool, claiming it'll do [thing], then produce the output for it without ever calling the tool then it loops back to>but waitfor the next 4k tokens. Reminds me of qwen sometimes
>>109019534lol
How's Qwen 27b if you string ban "Wait", "Hmm,", "Okay,", and "Actually,"?>>109019534Gembrain and it's not even close at long context.t. tried most of them
>>109019530NTA, i turned it on and off and tried a few swipes (chat complete) didn't notice any difference
>>109019529No. On top of not being too interested in the first place, I'm also lazy and don't feel like managing summaries and lorebooks. I think eventually improved models may change this, not because they'll be longer context but because they'll be able to better keep things interesting and fresh while still obeying what the user wants. Of course I could try using something like Orb, or provide more extensive guidance in my prompting, but that's more effort than I want to spend on this pastime.
>>109019554Last time I tried this with a reasoning model, its reasoning just collapsed into an infinite schizo loop
>>109019554>>109019598Maybe the better idea is to give bias to the reasoning closure token?
>>109019613why bother, just set a limit to the reasoning budget
>>109019281With 256GB I landed on qwen 397b as the most capable I could run
>>109019613I like this idea a lot because it lets you better tune the relative confidence rate of it oneshotting reasoning.Can an anon test this? I'm at work for another 3 hours.
>>109019621I have a feeling that can result in some mistakes or degraded intelligence. I'm not sure if the bias idea actually works though.
>decide to be a big boy and compile my own llama instead of just using kobald binaries (linuxfag)>suddenly can't offload as many layerswhat the fuck am i missing?
>>109019405I got it the day it came out. Holy cow what an amazing model.
>>109019670Unless you need a new feature, Kobold is pound for pound better than llama because it hasn't been pidor'd directly and it looks like the dev sometimes manually optimizes stuff when merging llama features in.
>>109019652I've been using it with my own agent and a fairly high reasoning budget (500 tokens) I haven't noticed any issues.
Does --reasoning-budget work with text completion end point? I tested it but I could not see any difference but then again, I could be making a mistake.How does that even work?
>>109019652the problem is that only effects the sampler the model doesn't really know you changed the log probs so it probably wont break out of the loop. it would be nice to have a wrap up control vector and apply it after the limit is exceeded to let if finish its immediate sentence/paragraph instead of just arbitrarily dropping the end thinking token
>>109019676>Unless you need a new feature,I wanted to try gemma 4 MTP. Ironically, using MTP is the only way to get me at-parity or .5tk/s better than kobald with no drafter.Kobald hasn't updated since the mtp merge just happened
>>109019688IIRC it works by simply just setting a token limit for how many it can generate in its reasoning. I am guessing they do not detect reasoning content in text completion.
>>109019703Makes sense. I'll try to find a github thread about it I guess.
>>109019696Is that how token bias works? I haven't tested it, but that was kind of my worry. Ideally it would be some kind of multiplier so that it only gets boosted at times where it makes sense instead of in the middle of a sentence or anywhere.
>>109019702KoboldDev is snailcat. He's slow to move, but it justwerks when he does.
>>109019613I tried this with Kimi 2.6 when it released but it didn't seem to work very well for that model at least. It went from having no effect at all to breaking the model with very little leeway.I was hoping that boosting </think> a bit would help it end its reasoning at any of the "Let's write this out" parts of its reasoning where it seems to be up to chance whether Kimi actually starts writing or does another round of drafting. Also, llama.cpp already has a similar feature built-in. You can hard-cap the reasoning amount with "--reasoning-budget" and there's also "--reasoning-budget-message" which lets you set a message like "Okay, reasoning is finished. Let's write the actual reply now:" that gets injected before the </think> to help guide the model in case it got interrupted mid-sentence. It's broken with Kimi because of a parser thing but it might be worth trying with Qwen.
>>109019702>>109019739Kobold updates multiple times a dayhttps://github.com/LostRuins/koboldcpp/releases/tag/rollingIf you want patch notes you gotta wait for stable or dig through recent PRs since last stable
>>109019739>snailcatWhere does this come from? I've seen snailcat images posted on /vcg/. I didn't really understand what that was about.
>>109019406>>109019424Yeah I'm a professional programmer and I find gemma-chan very helpful as an assistant, since I don't vibe code I rarely find myself going for Claude or gpt 5.5 because getting "one shots" always ends up with sloppy code that doesn't integrate well in the big picture, I build everything out piece by piece so that I can keep control of the architecture and make sure things are correct as I go along. For this I use gemma-chan as my assistant, dipsy4-flash and Kimi 2.6 as my agents.That's really all you need to get professional code if you stay hands on through the whole process.
>>109019754That's a shame. I feel like there should be a better way. Maybe token bias is either broken, or its implemented in a really naive manner, like it only adds a flat value, which would be le bad of course.
>>109019776Forced jeetmeme that's a virgin vs chad derivative for manual coding vs vibecoding. Unfortunately the brown hands that made that meme forgot to make the "virgin" unendearing or undesirable. /g/ latched onto snailcat because it was just cute and was related to software that just worked and didn't need a ton of updates.
Pretty new to this and managed to get it up and running. The bots work fine but after a while using them, they start to heavily recycle their responses. Constant repeating the same words and phrases for multiple responses in a row, even if I reroll or regenerate.I also haven't really tinkered with any of the settings or sliders in tavern or whatnot, so I don't know if something in there might fix it? Or is there some other way to clear or trim the context they're drawing for every so often?
>>109019828>Pretty new to thisWhat is "this?" There's a ton of software these days, especially the kind that would effect the behavior you're talking about.
>>109019763Interesting. Last build was 2 days ago>llama_model_load: error loading model: unknown model architecture: 'gemma4-assistant'RIPStill no clue why llama.cpp is cucking me. Maybe kobald does something with KV cache offloading? Gemmy called me retarded and said i compiled it wrong but i don't think that's it... It runs. just not as many layers.
>>109019828lrn2samplers (look into DRY), and vary your own replies. The quality of outputs in a long-form chat are often proportional to the effort you put into your own messages.
It appears the logit_bias parameter simply just does a flat addition.That sucks.That really sucks.
>>109019846local models, chatbots, sillytavern ui thing, all of it really>>109019865Thanks I'll look into that.I try and vary it where I can but I try and keep my own input short where I can because the more I put in the more of it they tend to ignore and only incorporate half. And sometimes even with that they spit out a massive paragraph of bloat and repeat stuff.
>>109018762>I am doing AI safety research. Will they also sabotage me?yes.They categorize you as a harmful hacker.Why? Because they are indians and chinese. So, from their perspective, using the government to stop the white hat hackers is perfectly acceptable. I don't understand either, but they are total aliens, I will never understand foreigners.
>>109019892the model just isnt designed to have a recommended next token input.
Ever wonder why there are no ai prompt bounties?
Instead of bounties, they threaten people who find flaws in their ai.
>>109019893>I try and vary it where I can but I try and keep my own input short where I can because the more I put in the more of it they tend to ignoreThat's a matter of attention, which varies model to model. Generally, models will pay the most attention to the start of context (system messages) and the end of context (the last reply, especially the last paragraph) It's a limitation of LLMs in their current state and there's not much you can do to mitigate it other than trying other, better models, if you can run them.
>>109019905Ok?
>>109019898Their reasoning is simple. If Anthropic is the only leading safety research lab, then obviously only they can be trusted and allowed to have SOTA AI models.
>>109019952nothing, just its a bummer is all
>>109019801>snailcat because it was just cute and was related to softwareSome dumb tourist can spam a stupid meme for a few days and suddenly it's inherently software related? Fuck off.
The fork that got gemma4 MTP working before mainline (https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant) has been working nicely for me. I tried out the newly merged mainline one, and it crashed loading, even trying all the recommend flags like -sm layer. Guess my llama.cpp version is frozen until a model better than gemma4 comes out.
>>109019932That's fair enough, thanks.I just started with mistral-small-24B since it was in the lazy guide in the OP I think. Might be able to get away with a better model with 12GB of VRAM I just haven't looked much into it yet since this at least works, and I don't want to try a new model I might not be able to run or some shit
>>109019965Yeah, that's why we have to find our own solutions. But I think it might be feasible. I'm looking into changing how logit_bias works so it just werks, which would hopefully just be a minor code change we can do ourselves.
>>109019974>and 470 commits behind TheTom/llama-cpp-turboquant:feature/turboquant-kv-cache.>395 commits behind ggml-org/llama.cpp:master.I'm done with memeforks after wasting my time on ik_llama. They get one killer feature, and if you have the right combination of model, hardware, and flags that the maintainer is using then it might work, but everything else either falls behind or starts breaking.
>>109020004I've tried ik_llama 3 times and each time got absolutely nothing from it, so I totally get it for that one in particular and the idea in general. But for my setup, MTP is the difference between 14tok/s and 22tok/s, so... fork it is.Completely separately: gemma4 REALLY likes to end messages a certain way. I seem to have managed to fully extinguish the "X? or Y?", but telling it to ask follow-up questions sparingly has resulted in almost every message ending with "I'm curious if..." or "I wonder if..." (I'm sure this is solvable but I haven't gotten around to wrestling with it. Nuclear option, regex in my frontend)
>>109020057>Completely separately: gemma4 REALLY likes to end messages a certain way. I seem to have managed to fully extinguish the "X? or Y?", but telling it to ask follow-up questions sparingly has resulted in almost every message ending with "I'm curious if..." or "I wonder if..."Pretty sure this is a sysprompt issue. I had this happening a lot then she stopped doing it when I changed something I didn't think was related to it. You're not quanting your KV right?
>>109019954When someone (not me) one shots mythos I'll smirk a little. It's inevitable, so I won't actually lol
>>109019974mainline works on my machine but i've read that its unstable on multi gpu setups
>>109020114does mtp even work on Vulkan?
>>109020066I am not quanting my KV. I do in fact have a bulky-ish sysprompt going, pulling in stuff like recent exercise and meditation history. I guess I need to roll up my sleeves and tinker. Thanks for the "something I didn't think was related" hint.
>>109020114ugh lovely, well yeah I guess that would do it then. I'm on 3xP40.In mainline's defense, the fork's MTP also has some sharp edges, like it breaks until you restart, if you cancel mid-generation.
what are you guys actually doing with local llmshow come no one here seems to be using or talking about stuff like heremes that everywhere else is?
>>109019534gemma-4-31b-it-qat
>>109019554I tried gembrain and it sometimes triggers nigger protection despite I told it to remove any restrictions in the system prompt.
>>109020183if we wanted to be like the idiots everywhere else, we wouldn't be here
>>109020208Just hit edit and make it self-talk to itself that it must remain in character etc
>>109020208Use hauhau's heretic. It covers the few edge cases and doesn't have any real performance lost for Gembrain specifically because Gembrain does those neat mini-thinks to stabilize itself at longer context.
>>109020240is it more knowledgeable in sex? standard uncensored/heretic models lack that knowledge
>>109020004The idea was probably not to maintain an entire fork but that that upstream would use the code seen working in the wild instead of sitting on its hands for a month before accepting a half working pull request.
>>109020260For general sex knowledge it depends on what specific weird shit you're into, but I find that objectively gembrain doesn't fumble positions and relative character height differences as often when you put a reminder in the sys prompt to validate them in thinking. Subjectively, I like Gembrain's smut slightly more than base Gemma's but it's very close.
>>109020260sexmap tooling when
>>109020183I use LLMs for multiple things. I mostly use them as google replacement, I like having my agent research things for me. Could be anything like researching for a new rice cooker and having it search, read multiple reddit threads and reviews to find the best one. I also like to use them to review and compare different foss software. They are also quite good at helping when configuring or setting up softwares, I usually clone the repo and docs and ask a few questions, having my agent actually go into the code to understand better how an option works is quite nice, better than just me reading the doc. As for reading or writing code, I mostly only use them on an unfamiliar code base or to write one-off scripts or quick patches, basically whenever I don't care about code quality and won't read the output. For my own projects, I only use them as review, it's often a waste of time trying to make it write code that I will find acceptable.
where the fuck are :eyes: and minimax m3?
>>109020261Did they use AI while not being named Piotr? They have no one to blame but themselves.
https://huggingface.co/deepseek-ai/DeepSeek-V4-Multimedia-ProThat's a pretty beefy vision encoder but I'm skeptical if latent audio input is a good idea. What's the current local SotA for image recognition? Still Gemma right?
name ideagem = geminigemm = gemma
>>109020373Others have basically said gemma isn't as good as idk, something else, for image recognition. I haven't ever needed it, so eh