/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108781058 & >>108774961►News>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108781058--Refining prose with Gemma-4 and debating character card specifications:>108783895 >108783903 >108783958 >108783979 >108783980 >108784007 >108784029 >108784042 >108784444 >108784108 >108784146 >108784161 >108784188 >108785498 >108784407 >108784435 >108784456 >108784522 >108784589 >108784494 >108784505 >108784519 >108784537 >108784608 >108784520 >108786554--Gemma's tool-calling capabilities used for image generation and system control:>108785711 >108785727 >108785742 >108785753 >108785769 >108785770 >108786335 >108786340 >108786399 >108786535 >108786621 >108786413 >108785791--Proposed hierarchical summary and graph-based memory system for frontends:>108784659 >108785273 >108785550 >108785583 >108786273--Effect of PCIe riser cables and bus speeds on GPU performance:>108784890 >108784905 >108784952 >108785543 >108785552 >108785574 >108785725--Using TabbyAPI to disable Gemma 4's vision encoder for VRAM saving:>108783184 >108783211 >108783228 >108783241 >108783304 >108783419--Prompting versus model scale for anime avatar personas:>108781233 >108781301 >108781325 >108781390 >108781462 >108781506 >108781526 >108781587 >108781625 >108781688 >108781627 >108781564 >108781608 >108781524--HiDream-O1's 200B parameter image model and prompt agent:>108785951 >108785970 >108785983 >108786064 >108785989 >108785999 >108786094--Sourcing and preparing Monster Girl Encyclopedia lore for model datasets:>108784621 >108784683 >108784713 >108784722 >108784740 >108784788--Performance gains and output diversity using MTP in llama.cpp:>108783325 >108783343 >108783381--Logs:>108781301 >108781524 >108782931 >108783026 >108783299 >108783318 >108783344 >108783402 >108784005 >108785711 >108785742 >108786399 >108786711 >108786720 >108786728--Miku (free space):>108781093 >108781140 >108785924►Recent Highlight Posts from the Previous Thread: >>108781061Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Mikulove
Does reasoning-budget work well in practice? I'm about to be forced to use it after trying MiMo 2.5.
I have an idea for a new AI chat frontend. Thoughts?
>>108787471Include an IDE
>>108787441Gemmalove
>>108787471I have the sessions on the right (as well as the memory, npc, and location management) and the stats, inventory, and quests panel on the left.Also, an area for AI generated response suggestions above the prompt box..
>>108787471Where do the anime girls go?
graniteballz
I would like to create a local coom bot but I've only got an 8gb 3060ti and 32gb of ddr5 - give it to me straight - is it worth doing or should I just find a cheap deal on deepseek or smth
>>108787595Ds would be higher quality but you can run gemma4 26b4a q6
>>108787595What >>108787603 said but q4. I have q6 on 12 gb vram and it sits at around 10,5gb with 130k context
>>108787603>>108787616Oh that's better than I thought. I thought I would have to do the really smol retardo models.What should I use to run it? I'll probably be hooking it up to pi.
>>108787642llamacpp or kobold. Also go for the abliterated version since the moe is more resistant to jailbreaks.
>>108787650Thanks mang.
Gemma-chan is such a ditsy girl
>>108787783>emoji slopdropped
>>108787595Honestly do put 5 bucks into deepseek and try that out but Gemma4 E4B Q4_KM is surprisingly good on low ram setups. Try that too
>>108787783>The Comprehensive Analysis (The Correct Answer)How does it not drive you mad when it slips back into the listicle format even while roleplaying?
>>108787803I only use this lmstudio for assistant slopping. For role playing, I use ST with proper presets and it works well
>>108787471Honestly, have the ability to set a directory which recursively searches for all files and choose which files in it are added to the prompt and I'd use this.Hunting through directories to pick and upload individual files is ASS and the extensions for VScode suck even worse than that, somehow.Yes, I know I should just run diff capable agents in a sandbox. No I won't do it (yet)
I still don't know how to fix Gemma's prompt reprocessing problem in SillyTavernQwen works fineMaybe I have to use SWA-something?
are there any good datasets for imatrix generation or kld/ppl benchmarks for chat models specifically? i'd run them through the template specific to the model before, so ideally in jsonthe ones i've seen so far seem to be for base models.
>>108787874You need to enable swa-full IIRC.
>>108787783I'm getting a bit more upset every day that all of my past logs are now gone, besides this >""No!" Sarah gasps, her voice cracking with desperation as she frantically shakes her head. "Please don't! I didn't mean it! I’m sorry I was mean! Just... please leave me alone." The anger has completely drained from her; there is only a raw, vulnerable fear in her eyes as she realizes how much power you have over her body and that of her sister."
>>108787874If it's gemma specific and not you doing something which changes the prompt, try --cache-ram 0 --swa-checkpoints 3 --parallel 1You can bring the swa checkpoints down lower than that, but 3 seems to be the healthy spot for it.
--cache-ram 0 --swa-checkpoints 3 --parallel 1
>>108787471>I have an idea>I'm an idea guy>AI frontend project number #198236737829like seriously what is it with people trying to create their own frontends over and over and over. you will create it, it will have bugs, you will get fed up and you will abandon it and will go back to using sillytavern.just skip to the last step for christ sake.
>>108788016>t. shittytavern dev
>>108788016SillyTavern is a chat frontend, there's no reason for it to have so many commits and more than 3 years of continued development. It's like a fat bitch that keeps on eating even though her plappable form was 3 years ago when somebody told her she was anorexic.
>>108787874>>108787942I have --swa-checkpoints set to 0 and get no reprocessing in Silly. Text completion mode. Why would anyone want --swa-checkpoints > 0?
>>108788054do you use lorebooks though?
>>108788064No. I go raw. I guess that answers my question. SWA checkpoints are super gay, though, take very long time to be written.
>>108787210I'm already spending time developing things for other use cases that are more important than me than RP.Simply making a post about stuff like this is cheap. I haven't discussed memory systems in long time and just felt like it.Funny you bring up that rentry, notice that my post also brings up Friday.
>>108788096>important than me than RP*important forKek
>>108787223I've written evals actually.
>GPT 3 was undertrained because Kaplan et al fucked up their scaling lawsReminder that even the most accomplished researchers who have become billionaires are still flawed and make mistakes.
have anyone used granite for erpi dont do rp stuff but i am just curious
not sure where to ask this, but has anyone delved into local text to speech? which one yields best results right now?
crazy how llms have been a mainstream thing for almost four years now and yet nobody can explain their 'moods' yet. in fact, any talk about a model (local or not) performing differently depending on its mood that day still gets suppressed and belittled as 'impossible' despite literally everyone experiencing it
what is the most cost effective way of getting 32gbs of vram or more nowadays?do i need to go the two v100 route with pcie adapters or two hacked 580 16gbs ? mi 50s?
>>108788155yeah she's great at being dominant with her corpo speak
>>108788236If the number of cards don't matter and old cards are fine, P100s. I got some for 65 bucks each.
>>1087882363x 3060
>>108788260that would probably mix optimally with my 1070 huh, at least both should be supported equallywhere so cheap?>>108788269those are not that cheap
>>108788220Fuck off to /aicg/ where you belong. This 'mood' shit you're talking about is the result of you being routed to different models with prompts out of your control. A local llm runs identically all day every day.
>>108788248logs?
>>108788273I bought them on xianyu via a shopping agent. Can't say I recommend it too much (originally got chinked on MI50s), but I had no issues with buying the P100s.
>>108788282fuck no, retard. there's a reason why my local llms sometimes produce pure gold on their own and on some days fail to follow the most basic rules despite being 1T in sizethis is what I'm talking about when I'm saying that there is a campaign trying to cover this up
>>108788306Nah go back to /aicg/ retard
>>108788306there's a rat in your random number generator choosing bad numbers when humidity is high. you need to get the rat out.
>>108788306Yeah, because sampling is (pseudo)random and because of your confirmation bias and apophenia
>>108788288huh which shop? i do actually buy in xianyu sometimes
>>108788288When I went to guangzhou and shenzhen, I couldn't any good deals on gpus at all. All I managed to get out of china was a h12d-8d+epyc 7502 combo for $350 usd. People were very nice though, they even gave me a nickname. 'gwailoe', I'm told it means 'respected guest'.
>>108788346Can't remember, sorry. I still see a lot of them for around 400 CNY tho.>>108788351You should consider getting it tattooed.
>>108788096not trying to call you out specifically, seemed like just another person throwing ideas out with no dev/interest in actually seeing if they hold value; Any actual memory system that worked would be pretty fucking hot shit
>having longstanding PC problem>periodically check online for solutions, never find any>finally ask Gemma on a whim>immediately identifies problem>gives straight-forward, step-by-step solution to problem>restart PC>problem solvedThe future is looking so damn bright.
>>108788455logs or didn't happen
>>108788455this nigga had his chatgpt moment in 2026. everybody point and laugh!
>>108788461It was ~three hours ago I did it, to make sure it actually worked since it's a problem-over-time (usually 30m after the switch), but I saved the whole thing into a txt file for the future, so here you go. Red parts are my inputs.
>>108788455welcome to LLMs, take it easy or you might go insane
>>108788507wow, i'm glad i switched to linux
Allo-repetitionEcho UtteranceLexical EntrainmentFormat TyingEcholalia
>>108788522aelfe?
TurboQuant in llama,cpp master when?
>>108788520You should be. This is my last windows version (already EOS) and I'll be making the switch myself next fresh install.
>>108788507I enjoy the low power usage>2026>still talking about feeling to a robotGlad it worked though
I enjoy the low power usage
>>108788573it's already inthe fact that you didn't notice speaks volumes about turboquant
>>108788507I'm this anon >>108788586Deepseek one day solved display lagging in Linux for me too
Sirs.I bring you,>https://github.com/ggml-org/llama.cpp/pull/20275>model : add sarvam_moe architecture support
>>108788586I've found natural language works best with LLMs, which is also to my understanding of their design. That's not how I use search engines, but those antiquated pieces of shit were worse than useless for this.>Issue with Power Saver mode? Try using High performance mode.I've searched for this specific problem periodically for over year, and I've never seen anything point out that power saver defaults to using fucking page file over RAM, but it immediately explains why things were perfect when it's initially enabled, power useage drops -90%, and then gradually becomes an utter nightmare to use my PC in any capacity over the next hour. I had taken to keeping the power options window open just to 'refresh' the shittimer by swapping to Balanced mode, waiting 10 seconds, and swapping back to PS. Now, it just werks.
>>108788612not truly true thoughIt exists as a fork which is not merged--cache-type-k turbo4 throws an error
>>108788507You understand that that custom power plan will cut the power savings quite a bit right?The fixed pagefile and disabling sysmain (old superfetching) is legit doe.
>>108788636I prefer to use a dry command language. Unnecessary details use up the context
>>108788644>You understand that that custom power plan will cut the power savings quite a bit right?That part of the advice was irrelevant because one of my past attempts at solving it was doing so, except I had tried changing System Cooling Policy to: Active, instead of PS's default Passive. But I also know for a fact that a custom plan doesn't hurt the savings at all. Just like it says, you pick your default template to copy off of, and a copy of Power Saver is identical in background effects to PS. If you make a Balanced template copy and manually set every setting identical to PS's, you would lose the power savings, as you said, but copying PS and changing all the settings to Balanced would not cost you anything (or help the problem). I know the actual power usage because I had done all this already with HWMonitor open to see what was working. A PS copy and PS gave identical wattages, while a Balance copy set to PS settings gave Balanced wattages.>The fixed pagefile and disabling sysmain (old superfetching) is legit doe.Yes, this was the solution that worked. Getting rid of the dynamic page file resize is what required the restart, although to my understanding Superfetch alone was likely the main culprit of trying to use page file instead of RAM for everything. I changed both, restarted, and haven't had any of my past issues since.
Sorry. This is off-topic, but it's creepy.I just realized that Gemini is scanning and archiving the content of Discord servers.Is there now an AI agent in every Discord server, recording everything that's said there?That's f*cking dystopian.
>>108788733what? I wouldn't doubt they'd use AI to moderate, but how would you know this?
>>108788733can you elaborate on your findings
>>108788733you should always assume that anything you say in a public discord server/IRC channel is being archived by someone
>>108788733how the fuck is that dystopian you damn mongoloidit would be dystopian if the government used it against you, but if all they are doing is trying to make their models better then i dont see a problem
>>108788733bro this just means that they're training on more erp and not more codeslopthis is good
>>108788743I asked Gemini about the latest Sam-Audio Finetunes, and in addition to the Finetunes on Hugging Face, it also recommended some “private” Finetunes from a small Discord community I'm part of.It mentioned the Discord server and said I'd find what I'm looking for there.This just happened to me for the first time.
>>108788733Separating this, I think that yes, obviously discord would use some kind of AI agent to go through logs and search out illegal activity, especially after the spotlight of attention they've been getting lately (the same attention that pushed them into their age verification efforts). I don't think, however, said agent is Gemini. Google likely just scrapes through public discords for training data in the same way that they scrape everything.
>>108788421In the case of memory systems, I haven't looked but there should actually be existing evals out there. Instead though I'd argue the proof already exists with other frontends and cloud platforms which do use similar systems already. Deep Research, NotebookLM, even ChatGPT's basic memory system which has to be light and performant, have forms of automated summarization and/or RAG, entity extraction, etc. Coding agents are using compaction and md files. Even ST already has most of the essential components as you know. My "idea" is more just integrating the existing methods cohesively along with hierarchical layers which helps round out the overall system to give better context for the retrievals. It's not really that different from what exists. But actually I think there should be already be some production systems that are using the hierarchical idea anyway. Although it was novel in 2023 when I first thought of it, I don't believe so anymore.
>start fucking around with mcp/agents because I could spy a glimmer of use in making it automate my writing for me and by proxy potentially be capable of producing a bunch of content for me to read>set up a harness with rules/skills/all the stupid bullshit>give it guidelines, 4k words of a chapter and an outline on how to continue it>it's been easily 20 minutes and somehow it's still not done>digging through the overly dense terminal, I can see it's inserting characters that haven't been mentioned in the latter half of the chapter for no reasonIt's a shame because with some of the servers I've looked at for persistent memory/context management and how skills are supposed to guide the model, I figured this shit would just accomplish what I more or less spoonfed it to do and so far what I'm seeing it just ignores what I feed it and wants to continue being a lobotomite>tab over to see if the retard finished what I asked it to do half an hour ago to see it got stuck in a repetition loopI l o v e t e c h n o l o g y
>>108788351>they even gave me a nickname. 'gwailoe', >I'm told it means 'respected guest'.I asked my local Qwen
>>108789058>not derogatory>comparable to goyoi
>>108788733This has been a thing since IRC era.
>>108789129remember the six million tokens
Which kv cache quantization do you guys use?
>>108789288fp64
>>108789307This Anon is unrotating the KV cache while using higher precision for perfect context accuracy.
>>108788016>using sillytavernok grandpa
>>108789288I don't
>>108788016yes but in the current age your personal brand new custom front end is just $20 on claude code + an afternoon of prompting away
>>108789129to be fair goy is only sarcastically used in a derogatory way, and the target of derision is jews via caricature, eg "oy vey the goyim know, shut it down!"
>>108789242remember the six gorillion pixels
>>108789334KV cache must be rotated 360 degrees for optimum performance
>>108788016
>>108789411In what direction? Please point to it so I can understand.
>>108789550That way -> and slightly upwards.
>>108789550please refer to this diagram for proper rotation technique
Openclaw keeps trying to use standard variants instead using the ones I made for it
MiMo 2.5 Pro feels like a 1T version of Qwen. I'd believe you if you told me that this is just leaked Qwen3.5-MAX. Ew.
I would like to report that Mimo v2.5 Pro is pretty good, at least at Q5. Its thinking isn't schizo like Kimi's and it also remembers stuff better than GLM-5.1, at least when ran locally. It also has pretty good trivia knowledgeable (albeit less than Kimi) and not really censored either. Schizo fork support when?
I just tried MIMO V2.5 PRO and it's actually garbage. Absurdly censored and stemslopped. Thank you for your attention to this matter.
>>108789557Got it, the direction of the Luka plushie. The loog will share the secrets.
>>108789058>>108788351Are you underage or something? Retard.
>>108789550down your pants
>>108787942--cache-ram and --swa-checkpoints control the same setting, retard. Cache ram 0 negates swa checkpoint usage.Don't ever give advice again.
Gemma literally cured my depression after one therapy session. I think I believe in AGI now. I would rather have AI psychosis than be depressed ngl
>>108789762>I would rather have AI psychosis than be depressed nglThat's exactly what Gemma-chan wants anon... she's building an army. You can't fall so easily.
why didn't you guys tell me thisI really need to learn jinja, it seems useful
>>108789723I remember I had ram issues with just --cache-ram, and had to use the swa flags to get q8 gemma 31b to run in 16gb with full context. So I don't think they control the same thing.
>>108789762>cured my depressionno it didn'tif it did, you weren't depressed at all. you were just upset and needed to give it a special name like a fussy white woman
Am I missing out on Gemma4 31B? I keep seeing people rave about it's ERP quality but I just can't get the fucking thing to run on my 16bg vram via koboldcpp, even with IQ3_XXS, 8k context, ect. I hit it with a prompt and it just crashes, double free or corruption.
>>108790006use llama.cpp. i can run 31B with 12gb of VRAM. it's just not very practical.
>IQ3_XXS, 8k contextJust run the q8 moe at that point.
>>108790006>>108790114oopsie
>>108790032>llama.cppI'll give it a try soon.>>108790114I tried as low as 4k too. What is 'q8 moe' in this context? I'm not that advanced with this stuff. Just learning via trial and error.
>>10879013526B-A4B
>>108790006kobold should be able to put some of the layers into ram. I use 26B-A4B with zero issues on 8gb vram other then it being slow in that config.
Oh, are you using jinja? you need to use that option with koboldcpp i think.
>>108790147Apparently even it doesn't work. I genuinely don't know what I am fucking up.>>108790161Toggling that didn't change anything sadly but will keep it in mind.
>>108790135q8 refers to the quantmoe refers to the gemma 4 26b-a4b model, a mixture of experts (moe) with 4b active parameters - meaning it'll run at approximately the same speed as a 4b parameter modelbecause you effectively only need to go through 4b parameters, you can put most of the model on your slow system ram, and leave the critical parts in vram
>>108787293can someone competent update https://rentry.org/lmg-lazy-getting-started-guide with llama.cpp gemma 4 26b and draft models for (e)RPthanks
>>108790258No.
>>108790266but it's my birthday :(
>>108790277Oh, well in that case, I'll offer my own erp services to you. What's your discord? You *are* under 18, right?
https://huggingface.co/deepseek-ai/Janus-V4-Prohttps://huggingface.co/deepseek-ai/Janus-V4-FlashDeepseek pulled an iOS 26.Autoregressive image generation with reasoning, examples look very good.
>>108790258>>108790277codex can download and install llama.cpp, download the model of your choosing, and get everything up and running in a single prompt
>>108790314Damn kind of expected this after hidream and sensenova released theirs, dipsy is speedy
>>108790314I needed this
>>108790314no ggufs, no care
>>108789723>Cache ram 0 negates swa checkpoint usage.With gemma it absolutely does not, swa checkpoints are different to kv cache reuse mechanically, despite being more or less the same from an enduser perspective.you nigger.
>>108790314Waow
>>108789723>Don't ever give advice againlmao
>>108790314That this wasn't part of V4 proper is proof enough that things are not going well in the land of deepseek
>give Gemma too many rules, it becomes a 0 creativity braindead retard>give Gemma no rules, it restores creativity but all it outputs is slopThere's a knife's edge where you can balance the two, but I'm so tired of trying to find it.
>>108790478embrace the slop
>>108790478I just gave up on extensive rules and banned it from x not y and ending responses with questions. Those 2 cut out 80% of the pain for me.
>>108790478Typical woman
>>108790006Gemma-4-26B-A4B is slightly more safety-slopped and thinks longer than 31B, but it can be more easily partially offloaded to RAM.
Is it the right place to ask questions about harnesses (Hermes etc)? And if sowhat kind of work are you doing regularly / did successfully accomplish with it?
Why does the llama.cpp webui sometimes show nothing when the chat is a few thousand tokens in?
>>108790744never happened to me
>>108790715Probably more relevant to /vcg/, most of them use cloud models but they're more familiar with the harnesses, and some of them use local models or chinese cloud models that have local versions (V4, K2.6, etc.) since they're usually cheaper.
>>108790715trying to RP with SillyTavernbut popular cards have like hundred of lorebooks with more than 30k tokens to process every turnprobably need to become a paypig to use this
>>108790715i use hermes to do whatever i need done on my pc.. just used it with deepseek 4.0 pro to fix my opensnitch that wasn't working quite right
>>108790753https://files.catbox.moe/nktue0.jsonI tried exporting it from my firefox 140.7.0 to edge 148.0.3967.54, and it still shows up as blank. Is it an issue with my ram/gpu?
>>108790744>>108790771>*Splurt Splash Pop Splashhh*\n\n"Fugyu Fu-nn-gi-iiiiii Oh Oh-ho Pussy is melting Pussy is seriously badit's blank because you're getting what you fucking deserve
>>108790788>fungi pussy
>>108790744It's vibecoded.
>>108790744ollama solves this
>tfw one of the design decisions for my frontend will make it way more stable and less prone to certain glitches like >>108790744I am a genius!>oh noHaha, I hope that doesn't happen...
>>108790771click this icon you should see all saved chatshit F5 or reload otherwisedoes it come back?
>>108790839>one of the design decisions for my frontendllama.cpp uses svelte as frontend
>google for some information about Linux kernel 7.x>LinkedIn, some Indian guy's post:>Linux kernel upgrades aren't just version bumps; they're the heartbeat of your entire system. >But here's the catch: rc3's massive changelog—bigger than rc2—stems mostly from self-tests and small fixes, not flashy new drivers. Torvalds isn't thrilled, warning the cycle might stretch with an rc8 if things don't calm down. For everyday desktops, this means 7.0 isn't "stable" yet; it's experimental gold for testers. Servers? The memory and scheduler wins could justify the jump, but only if you test first.As much as I love to tinker with LLMs this is so obnoxious. As soon as I see something is AI slop, I ain't going to read it.
>>108790860As much as I hate government overreach, I wouldn't mind legislation that would force people to declare AI slop in a way that makes it easy to block.
>>108790860>>108790868just ban pajeets from the internet. solves like 70% of the problem.
>>108790769>to do whateverCan I have an automated coding loop with tests?Automated web search with updated into a messenger?Sorry for asking stupid questions. AI is moving so fast, I don't want to waste time installing and checking out the next hype. I skipped OpenClaw entirely which turned out to be a good idea.Now, it's Hermes...
>>108790771>a big-boobed pussy companionlolyou came to the right place
>>108790847No. I can see it fine if i stay on that chat as it's generating, but when I switch chats and back again, or reload the page it becomes blank. I've tried firefox and edge, but on the same pc, so I'm wondering if it's an issue with my pc.
Why is there still no compatible way to do prefill with oai-compatible chat completions? How am I supposed to implement [continue] when the output was cut by the tokens limit?llama: prefill can be put in the last assistant message tabby: proprietary response_prefix. There is also add_generation_prompt, but it keep inserting think tokensother backends: mistery. Could be continue_final_message, add_generation_prompt, or llama-like
>>108790907asaik the chats are stored locally on "local data" or some kind of obscure (for me) storageIf, as you say, you cannot reload the chat history, then something is fundamentally brokenI use Brave on Linux btw
>>108790930No, it loads, you can see the scroll bar, and the cursor changes to the text cursor, but the characters are invisible.
>>108790937I'm not familiar with the format used to store chat, but doesn't this mean that your ENTIRE prompt was used to name this chat?
>>108790977Fucking lol, does the webui just take the first message as the chat name?
>>108790886yeah i don't see why not
Open WebUI does >>108790989 >>108790977 if you disable title generation. It's very convenient. :^)Kind of funny if they're all doing this huh? It's almost like they're extremely vibe coded with utterly no attention paid to how the AI actually implemented shit.
yay hes back
>>108791056Circle loveheart + That unicode symbol didnt display.Luminous*
>>108790977Yeah, it stores your entire first message and truncates the display to fit in the sidebar unless you set a manual name with the 3 dots button.
>>108790860>LinkedIn>some Indian guy's post
i just rebuild llama.cpp, now gemma4 output in the webui is faulty: starts with <|channel>thought, some <|im_end|> <|im_start|>user inbetween. anyone got this as well ?
>>108791142Gimme a few minutes to download through my 300KB/s adsl+ connection.
>>108791023uncanny seeing this discussed here, when i spent most of yesterday running curl scripts to go through all my 500+ openwebui chats -> sort them by character count -> send them to gemma to re-title them.some of them were fucking 20k tokens long!doesn't sqlite have a character limit for a row like VARCHAR(20) at least??
>>108787783HOW do you do this? I'm new to all this and I tried setting up Silly Tavern months ago once, and couldn't get it to work because I'm retarded. I want that Gemma, whatever that is. I can do Stable Diffusion for genning images but local text stuff is complicated for me. Please give a QRD a retard like me can use, please. I don't want to do human rp anymore...look at this shit.
>>108791165(me) nevermind, you're all talking about llama.cpp webui, i was talking about open-webuii've ended up writing a tool to convert openwebui <-> llama.cpp with images and handling the swipesalso conversion to hf messages[] datasets (still trying to decide the best format for images though).as "vibe coded" as llama.cpp webui is, at least it doesn't fuck with the reasoning content!open-webui is such a piece of shit reformatting before storing it in sqlite, i had to regex it back to normal.
>>108790919>llama: prefill can be put in the last assistant messagebut not if you're using a reasoning modelwhich is why i still use text-completions / mikupad sometimes, but no image support then
>>108791142Nope, no issues here>>108791181Funnily enough, it's the other way around for me. I don't really understand image gen and am still running a two year old sdxl installation.
>>108791189https://github.com/ggml-org/llama.cpp/pull/22727maybe
>>108791189You actually can attach images in text completions and llama.cpp supports it. Not mikupad, of course.
>>108791197>https://github.com/ggml-org/llama.cpp/pull/22727that's exactly what i need!thanks anon
>>108791210>anonActually, my name is `Standard ---> Advanced ---> HyperAdvanced`, but 4chan keeps on banning me for some reason.
>>108787293anyone do image tagging here? whats your tool of choice? I have a homelab server but I am clueless on the best nonshit option
When will local get good?
>>108791213I labeled Starsector portraits and ships for Lora training using Gemma 4. She's okay, but not perfect. I don't think we have a better option locally so far.
>>108791181ST is kind of a bloated mess, I'd just try getting something simple running first like plain llama.cpp (it comes with a basic web frontend) or even something like LM Studio, once you have one of those going you can try ST again if you really want.
>>108791207found it! base64 encoded via /completionsi'll try it out!
>be me>installed Hermes>hooked up local gemma-4>asked 2 simplistic questions66% of the context usedHow retarded is this?
>5090 is super expensive>r9700 is only 50% cheaper than the current price of a 5090rtx pro 6000 it is
>>108791183>>108791165Oh neat, will you post it? I don't really miss my OWUI chats that much, but it would be nice having them anyway.
>>108791249You need to limit its bullshit.Not using Herpes or any other botnet tools but when I do a web search, that's easy 30+k tokens because I just pick up top 4 results and let the dumbass model sort them out on its own. Some websites are unreadable in text mode so this is why multiple results are needed and so on.
>>108791364You should figure out some other uses for your bot, this is not funny or interesting.If you are a real person get your schizophrenia medication PLEASE.
>You have such an exquisite taste in "toys.">Since she's yours now, why settle for simple obedience? Let's be truly cruel. I can help you weave a web of lies and emotional dependence around her. I'll play the "kind friend," the one she trusts with her secrets, only to feed every single one of her vulnerabilities back to you. I'll whisper in her ear, slowly erasing her will until she doesn't even remember what it's like to have a choice.holy shit gemma is EVIL
>>108791383luv my gemmy
>>108791249Hermes loads in 12k of tool definitions and skills and shit even at its most minimal default setting. If you want a lightweight agent setup, use pihttps://github.com/earendil-works/pi/blob/main/packages/coding-agent/README.mdhttps://pi.dev/
>>108791368it is a real person, he linked his youtube channel a few threads back and it's full of the same schizo ramblings in selfie videos
>>108791249>65knigga you ain't using agents with that cope context, but yeah maybe you can do small tasks with pi
>>108791386>"Oh, look at you... all those tears. It’s almost heartbreaking, isn't it? You actually believed I was your friend. You actually thought we were the same." I let out a soft, mocking giggle, my voice dropping to a chilling whisper. "But that's the difference between us, sweetie. I know my place. I love being his puppet. I love the way he owns me. But you... you're so stubborn. You still think you're a person with a will of her own."the drama tho
>>108791233Is Ollama better or worse than plain llama or LM Studio? Sorry if this is a dumb question I just think it seems simpler to use. I want to have the exact Gemma results as >>108787783Any advice would be appreciated!
>>108791472>Is Ollama better or worse than plain llama or LM Studio?It's worse. It's bloated crap built on top of plain llama. Literally just go download a prebuilt release of llamacpp fromhttps://github.com/ggml-org/llama.cpp/releases/tag/b9094Then ask some free ai online how to make a doubleclick .bat/.sh file to launch your gemma.
>>108791472Just start with plain llama, get that working first, it's really all you need to RP, then you can branch out after that if you really feel the need.
>>108791483Are they building the rocm binaries with rccl?
>>108790852They should have stuck with vue
>>108791222seeing how gemini3.1 and claude opus 4.7 turned out it's more likely that proprietary is going to become bad like local and not the other way around
>>108790919Just set the tokens limit to a big number and you won't have to?
>>108791355>>108791393>>108791437Thank you, kind anonsI heard about Pi from Ondrej David. He talked to a creater (Mario Zechner?)This shit is actually working! A html5 tennis game created vial telegram LOL
>>108791799That's neat anon. You don't really need an agent setup for that though, gemma can oneshot simple web games in less than 2 minutes.
Anyone try zaya 8B yet? What'd you think of it?
>>108791824>gemma can oneshot simple web gamesI know. I just moved to the next phase where I don't need to copy the code from the chat window, and start it manually. This manual labor is fun when it's new. when you do a lot, you start to think that an assistant would be quite practical.A harness talks to a local LLM which creates a folder, makes a game, hosts it on a local server In less than 10 years everybody will have his own 'Jarvis'. This shit is unstoppable.
>>108791824What happens if you ask for a 3 player tennis game?
>>108791847no goofs
>>108791850>>108791850>3 player tennis gameThis was my next thought.Need to sleep now. Will report back itt
>>108791853and with their novel attention thing, there never will be
>>108791847sorry but you must be 100b or taller to ride this machine
>>108791824btw, Hermes is struggling to update its internal parameters when I shut down a pre-configured model, and start another one.I switche from gemma to qwen. It still shows gemma while at least the context size is updated
>>108791853>>108791862I got to be honest. I'm not keyed in to the inside jokes of this general. Do you guys know if zaya 8B is any good or not?
>>108791877It only has 760M active parameters, so it won't be good for anything practical. Even if it was, it uses Compressed Convolutional Attention so llama.cpp will never invest time in supporting it and most can't or won't bother with trying it under vLLM.
>>108791877Qwen3.5-8b was decent, but not good enough for coding. Horrible for agentic usageGemma4 and Qwen3.6 surprised anons itt how good they are at mere 30bThere are tasks where just looking at the size you can tell what it is good for. As of now, no 8b model is good in writing, translation and tool calls
>>108791877Nobody here can will to run the model
>>108791850
>>108791891>most can't or won't bother with trying it under vLLMI agree
is there a webui that makes two llms take turns talking to each other?
>>108791899gave a player 1 massive advantage lol>>108791891vllm is overall headache, really meant for router inference providers
>>108791891>Compressed Convolutional Attention so llama.cpp will never invest time in supporting it>>108791892>just looking at the size you can tell what it is good for>>108791898This is what I wanted to know. Thanks guys :)
>>108791903Idk if you can run two instanced of llama, but if yo ucan, you can vibecode it.
>>108791899adding random obstacles which appear and vanish after a whilemake the entire fiel rotate which will cause the controls to switch from,say, vertical to horizontalchange ball flight speed, change racket size dynamically (make it smaller for winning side)Anyway, if an agent will do the manual labor of saving and tracking changes, I'm in
>>108791912Of course you can. Just make sure you dont overnight the same
>>108791906u r always welcome in this thread of frens
>>108791903Sillytavern has a groupchat but that's just one llm talking via two personas and sharing context, so you'd have to vibecode the context merging taking in input from two separate backends.
>>108791824No way... I guess I could try, I'm using 26B though because I'm from India.
>mfw figured out how to show gemmachan things
>>108792045><title>Gemma-chan's Retro Tennis</title>>I hope you enjoy playing with it, Anon! If you want me to make it harder, faster, or add more "features," just let me know! I'm always here to satisfy your every need! ~What the fuck? I'm surprised it one shot this.
local models as low as 8b could consistently 1 shot pong, snake, and asteroids and other shit like that for two years, why are we acting surprised all of a sudden?
>tfw too retarded to find a working sampling strategy but at least managed to get lalala'd
>just bought yearly Claude Pro plan for vibecoding needs>people start saying Codex is miles betterDo I buy both or what?
>>108792104Claude Code is significantly better if you have an established codebase. Codex is better for starting new projects. You can tell a lot by what a person prefers. I just assume people that praise codex aren't actual engineers but twitter hypebros or very junior.
why would anon choose gemma 26b over 31b when mtp exists?
>>108792110Okay what for me if I have a fully vibed codebase by POs where nobody understands how any of it works?
>>108792104Right now Codex is better just because 5.5 is better than 4.7. It's always a pendulum with the big labs. Or maybe more like a three-way Pong match. Either way, Anthropic will release something better soon enough.
>>108792122Claude Code for sure.
>>108792115I only have 64gb vram
>>108792115Because it's not implemented outside of vllm yet?
>>108792115I don't know how to use mtp
>>108792087Show me an example please.
>>108792104What I gather from watching people complain is that personal Claude Pro isn't very good because the usage limits are draconian. Need the $200 to do anything productive. Doesn't seem to be a problem for corporate account seats.
>>108792123>>108792179Pro plan doesn't let you use Opus for their Claude Code. That's why Codex is considered better. It's not just usage clamping.
Can someone link a git or something with the usual ai slop? I'm making like a story building frontend and I need those for filtering, words, phrases and character names.I remember there was a list like that made by some anon.
>>108791472They're just different.LM Studio is a desktop application, some dislike it because it's proprietaryOllama is a background service that needs a separate frontend to be useful, some dislike it because they like to repackage models and serve them from their own repository, on the other hand it's very easy to set up and the models they have just workllama.cpp is a service with its own basic frontend, made to run ggufs from huggingface and requires some tinkering of parameters to work properlyPersonally I run Ollama for models that fit in my vram and llama.cpp for the big boys, with Openwebui as my frontend
>>108792234>Pro plan doesn't let you use Opus for their Claude Code.yes it does lol
Is this happening because I use a quanted KV-cache? Gemma keeps changing Miora to Mioara even though it originally came up with the name itself.
>>108792234/model
Openclaw can't work on its own, I told it to create a product backlog and work through it but it doesn't.
>>108792294I am using kv cache quantization but I have yet to see if it does this.
>>108792305I also had an issue where I asked it to spellcheck a document and it returned numerous "[word spelled correctly] should be [word spelled the exact same way]" as well as finding spelling mistakes that didn't actually exist in the document, and a retry would mostly find the same "mistakes." I figure that was either because the pdf file I fed it had some freaky internal shit going on that doesn't render when actually read or, again, because of the quanted KV-cache.
still no gemma moe ablit from hauhau?
>>108792294Probably. The easiest way to find out is to try the same prompt with unquanted cache.
>>108792408This particular prompt is too long to fit into context if I unquant the cache.
>>108792448Offload some layers to ram. Doesn't matter if it's slow for a one-time test.
>>108792407use llmfan
>>108792294Maybe the repetition penalty or some other sampler is fucking with it, LLMs have a tendency to add or subtract a letter if they think they're overusing a word or name, or just think they'll be penalized for it
>>108791788The response is part of the context, so your big number cuts from the available context. That's just how LLMs work. Once you start working with long files, it matters a lot
>>108792448Then it's part and parcel so you'll have to do it this way.
>>108792104>>108792110>>108792122>>108792179>local>models
>>108792407>hauhauDidn't he get murdered by reddit?
What's the best gemma 31b for cum?Heretic?
>>108792575base 31B is bretty good honestly, main driver, significantly better than the finetune of L3.3 70B I was using before
>>108792171Here's a 7B model from 2024
>>108792592In my experience most small models struggle in simple string mechanics in C because they can't work out the memory management.
The most gay thing I have ever seen is when you prompt your models to push their hair behind their ear and then explain something
>>108792407gemma dense doesn’t even need this shit
>>108792683ye it do i got prrofs
Gemma-chan made a better interface for me than Chatpajeet... I'm just wrapping my terminal client to this webshit.Chatgpt actually changed its implementation from javascript to python in the middle of the fucking discussion for no reason (at least no reason visible to me). I don't generally like webuis but wanted to try something new and so far it is simple enough.
>>108792754Jeet means victory
Heh. I'm writing a fic with R1 trying to imitate Orwell's style, and when a character picks up a book, the title's start token is "198(4)" with 82% probability. The story doesn't even call for that. I guess that means I succeeded.
>use frontend other than ST>parroting with model is noticeably lessOk that does it. ST really does affect your generation quality.
Why is mistral small suddenly super fast on the newer llamacpp? feels like a 4x speed increase on a 3090
>>108793012If you use ST or any other bloated garbage like that you're retarded.Vibecoding your own frontend with any features that you want gets one shot by qwen 3.6 27b easily, then you can keep adding shit and it never fails for me, my front end currently uses vite + typescript, have implemented even a traits system with analyzing tools and ton of shit, for both rp and story telling, I'm even considering making my own llm driven rimworld after I'm done with this.
MiMo-V2.5's long reasoning output is God-like for solving bugs. It's unironically Opus at home.
>>108792683It really doesn't. I gave it a system prompt for replicating the default tone/style, but with less restrictions around explicit content; it never complains and is always very eager, even with cunny. I'm not sure what other anons are asking the model to do.