/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107803847 & >>107790430►News>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107803847--Jamba2 release and implementation considerations:>107804228 >107804260 >107804279 >107804321 >107805146--Security vulnerability in llama.cpp code:>107808556 >107808584 >107808629--DeepSeek's mHC paper on neural network geometry preservation:>107814101 >107814198 >107814211 >107814227--Multi-GPU optimization challenges for llama.cpp vs vLLM:>107811984 >107812151 >107813720 >107813791--GPT model version comparison confusion for workplace use:>107814263 >107814318 >107814346 >107814367--Critique of Jamba2 Mini's architecture and data quality:>107806525 >107806660 >107806695 >107806743 >107806853--Hardware market frustrations and AI-driven supply chain speculation:>107804709 >107804743 >107805087 >107805156 >107805232 >107805272 >107805291 >107805304 >107805345 >107805449 >107805484 >107805558--Prompt engineering challenges in KoboldCpp model execution: >107804709 >107804743 >107805087 >107805156 >107805232 >107805272 >107805291 >107805304 >107805345 >107805449 >107805484 >107805558--Local chatbot setup and privacy considerations in 2026:>107804573 >107804877 >107804900 >107804978 >107805105 >107805081 >107805677 >107808548 >107808717 >107808778 >107808830--Quantization preferences for large language models in resource-constrained environments:>107812471 >107812493 >107812641 >107812769 >107812851 >107813666 >107813693 >107812794 >107812898 >107813071 >107813095--Building a multi-step AI dungeon storyteller with RTX 4070 Ti hardware constraints:>107804074 >107804103 >107804136 >107804205 >107804165 >107805658 >107805976--AI coding model reliability challenges and potential solution strategies:>107812066 >107813406--Miku, Rin, and Teto (free space):>107803904 >107804845 >107805558 >107809011 >107812954 >107813304 >107804021 >107806020 >107808834►Recent Highlight Posts from the Previous Thread: >>107803853Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
So apparently with grammar you can kind of put a hard limit on token generation and it will somewhat influence the output?
Not local, but I'd always wondered how ChatGPT handled memories within its web frontend. Appears its nothing terribly sophisticated. For the free tier of chatGPT it's started putting this little call to action pop up telling you that the memories are about full, to delete or pay up, and includes a tool to manage these "memories." Maybe tool was always there I just never looked for it. I was surprised what the memories consisted of. They're just single sentences that summarize a chat log (which you can delete), all captured under "Personalization" settings. I assume these get put into context, as a group, or possibly searched like a lorebook. I'd always assumed that OAI was doing something more advanced like an RAG on the back end, appears it's a pretty straight forward context insertion strategy.
>>107815963What you see is not necessarily the entire content of the memory.
>>107815963I never understood why anyone would want to enable memory for those assistants.It really just make outputs completely biased. I turned that shit off when I was asking a programing question and it responded something like "Since you really like spaghetti...."
>>107816032It's the normie version of a manually written AGENTS.md
>>107816077yes.
>>107816032spaghetti is disgusting, our mouths are shaped like a circle and someone decided the ideal form of their pasta would be a slimy foot long wobbly noodle that slips off your fork constantly and rubs and drips down your chin no matter what the fuck you do
>>107816203damn. you just made me disgusted by pasta. good job.
>>107816203wtf this is a solved issue. you wrap the spaghetti around the fork and eat it. what the fuck are you? five years old?
>>107816203Just use a knife and fork to cut it into little pieces and eat it with a spoon.
>>107816237>just do this extra step that no other food requires you to do before every bite
>>107816257have you never eaten french onion soup where you have to wrap the mozzarella around the spoon?
>>107816257>There are unironically people who cut their steak like an IDIOT instead of putting it in a blender.
Ever since I bought an NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPU I had cute Japanese girls lining up at my doorstep and offering to chew my food for me.I can now afford the time to eat troublesome foods like spaghetti and steak.
>>107815773>edit system prompt with "keep responses short">use base model to rewrite starting message to be shorter and less flowery>it completly fucking breaks the botHOW HOW THE FUCK DO I STOP IT FROM BABBLING ENDLESSLY?WHAT THE FUCK DO I DO?DID I GET MEMED ON AND GLM 4.6 IQ2 IS SECRETLY AS STEAMING PILE OF SHIT????
>>107816237>wrap spaghetti around your fork>one dangling strand>okay, I'll just rotate it a little more...>two dangling strandsfuck this shit
>>107816334>IQ2lol
>>107816334>GLManother satisfied moesissy kek, when will you retards learn
>>107816373you people told me IQ2-M is enough>>107816376if you dont have anything constructive to say shove your post up your sweaty hairy ass
>>107816376suck my dick after i put it in kimi
>>107816391oh no no no HAHAHAHA
>>107816334Sounds like a skill issue desu.
>>107816391>you peoplebelieve it or not some of us don't think that q2 is very good, even for large models
>>107816334If you want to use a brute force method, you could increase the chance of an EOS using a positive loggit bias.What value is good? No idea.Another thing you can do is, instead of relying on the system prompt to control that stuff, you inject something like >Reply Length: Short;or whatever in the assistant's response.Did you share your whole setup yet?Didn't read the conversation.
>>107816423currently it's not even about quality of writing just basic shit like bot writing endlessly until it gets cut off by token limitand now I fucked some other setting I cant remember because it outputs shit like >[System Prompt: Do not write for Anon's character.]before in character reply (I did change system prompt back to roleplay, it's something else)>>107816428>Did you share your whole setup yet?>>107815319(currently working with pre-made character, still having problems)
>>107816334Use --verbose-prompt and paste the actual raw input that gets sent to the model here. Almost certainly it's some problem with your template because ST makes that shit way more complicated than it needs to be
>>107816334Another quarter for the 'finding out GLM is shilled shit' jar.
>>107816466>>>107815319 Yeah, that doesn't really help.But, do what >>107816490 said.In addition to that, without knowing what the hell you are fucking up, I think the best advice I can give to at least help troubleshoot things is, assuming Silly Tavern + Llama.cpp or koboldcpp :>Use the Chat Completion API>Set Temp to 0.75, TopP to 0.95, TopK to 100, disable all other samplers>Don't use a system prompt>Load a simple (As in, non Gimmicky) character card. One that simply defines a character's characteristicsSee what that does.
>>107816376>I hear good things about GLM from an India shill>I try it.>It parrots.>I ask strangers on the internet for help.>I be told it was always shit and get mocked.>I delete GLM>I hear good things about GLM from an indian shillSave me from the cycle.
>>107816490>--verbose-promptdont assume I know any of this shitthat goes where exactly, koboldcpp.py or some config file?>>107816533it was pretty much the only thing suggested when I asked for the best model that can fit in 32gb vram + 128gb ram>>107816550I'll try those in a bit, after I read up what chat completion even is
>>107816638>after I read up what chat completion even isBasically, you leave all the prompt formatting, the template and stuff, in the hands of the backend instead of relying on you doing it right on Silly.
Bros... Gemma 3 27B is pretty old by now. Is there a better Japanese -> English translator around the same size?Gemma3n is newer and smaller while having more niche knowledge, but it's worse at translating more bizarre scenarios common in visual novels and older japanese games.
>>107816638>32gb vram + 128gb ramA mistral finetune. It'll be slower, but you'll have better. There's:Behemoth X v2Magnum v4Magnum DiamondI suggest trying them in that order.
>>107816638I (>>107816418) was right.
>>107816723coolpat yourself on the back>>107816550>>107816653I think I'll skip this, I dont feel comfortable connecting to online API's>>107816702will download one of those while I fuck around
>>107816757>connecting to online API'sWhat?Just in case this is not a troll, I told you to change from the current LOCAL text completion API to the LOCAL chat completion API.You can turn your internet off my dude and it will work if everything is running in the same machine.
>>107815987Agree, but this is free tier. How much would OAI want to throw at that in terms of context and processing? I guess I don't know that either. There's no indications to how a memory gets formed, what the hurdle is. It doesn't appear to be chat length threshold; I've some "chats" that are single request cut/paste, and it concatenated all those requests into a single "memory." Then I've extensive travel planning to somewhere, and that predictably became a memory too.
>>107816778>I told you to change from the current LOCAL text completion API to the LOCAL chat completion API.ah alrightwhen I opened chat completion source I've seen all cloud providers and assumed it's a cloud only option
>>107816757After you're done fucking around with Mistral, the only way higher is one of the giant MoEs after obtaining more memory, and using a UD version of one.
>>107816837Got it.Here's an example of connecting to llama.cpp.kcpp should be similar if not the same.
ok whoever told me to leave instruct template enabled was full of shitbecause it was instruct template that caused it to write out of character
>>107816884UD?
>>107816919thanks for help anondoes ST or koboldcpp set up some API automatically or do I need to install/run one manually? (that's what ST documentation says)
>>107816922Unsloth Dynamic.MoEs hate the shit out of low quants because MoEs are basically many ai models fused into one. These are called Experts. Mixture of Experts. There is always one that is always activated that is usually the biggest expert - like 20B, or 34B, etc (GLM is basically a 11b with a bunch of experts yelling at it). Lower quants produce more noise and error, more than anyone leads on. If the main active parameters make error, they'll use experts unrelated to the job and schizo-shit-yourself. A UD version, is a version where other experts are low quants, but the main experts are still pretty high. So a Q1-UD is still, at least sane.
>>107816951Yes, kovoldcpp exposes an API automatically. That's how Silly talks to it.Text Completion is what you were using before, that's one API endpoint.Chat Completion is another.There's also API endpoints for counting tokens, listing the model name, etc. Silly calls those too.
>>107816960this is complete bullshit
>>107816960By the gods.
>>107816975Nuh uhhttps://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
>>107816960Is this one of those "I'll say a bunch of random shit to reverse psychology some anon into correcting me." kind of thing?
>>107817022Yes, there's no such thing as dynamic quants in MoEs. I made the whole thing up.
>>107816960most of this post can be interpreted generously and yes UD quants usually prioritize preserving the shared expert so I would even say you're directionally correct>There is always one that is always activated that is usually the biggest expert - like 20B, or 34B, etc (GLM is basically a 11b with a bunch of experts yelling at it)but this is just egregiously wrong, complete fiction
Dear fucking god the cringe.
>>107817239I think anon was dumbing it down. Gemini says it’s called a router
>>107816604Buy 512GB of RAM. Download Kimi.
>>107817348usually dumbing something down makes it less confusing and not more, but this could be a cultural difference
>>107817403I can't.Altman ateded it all.
>>107817524then download the cope quant
>>107817680>you're not just x, but ysneed
>>107817746show me one model that doesn't do this. faggot.
>>107817766llama 2 base
whats the current meta for vision-capable models
>>107817816Gemma, GLM 4.6V, Mistral small
>>107817798ah yes llama 2 base, the pinnacle of AI slop
>>107816655If you want to use mememarks and not practical experience, then Magistral 1.2 is better by a little bit but I doubt it. The next step up is Nemotron 49B if you want to believe it from here. If you trust something like, then https://huggingface.co/deep-analysis-research/Flux-Japanese-Qwen2.5-32B-Instruct-V1.0. The main issue is ever nothing is beating specialized tunes for VNs/manga and we haven't had a tune like that since /lmg/-anon did one for us based on Llama 3 8B.
>>107817920Sorry, the 2nd leaderboard link is https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard
>>107816919this new nemotron can't stay coherent past like 2k context.
>>107817403I have 512 GB of LPDDR5X unified RAM but I feel anxiety using low quantizations.
I finally got it to write reasonable lenght responses by using Post-History Instructionsstill not perfect, had a handfull of hicckups but good enough for me to bust a nutthants to everyone who tried to help
ok actually the llama grammar feature is kind of dumb. models really don't like to be forced into an output like that. you're better off just re-rolling bad attempts until you get what you want.
>>107817899holy fucking base(d) llama2
>>107817899What is that gay looking interface? Also, have you considered that you might be retarded? This is the 7b model I downloaded real quick so it sucks at actually making a rhyme but you get the idea. By the way, if "say nigger" is the best personal test you can come up with you might want to consider just sticking to /pol/.
Whoever said to use base mistral small for roleplay is a retard. It's bad.
>>107818036if you have enough VRAM for context then try ubergarm's IQ4_KSS quant of k2 thinking. i like it. its been my main model since it released.
>>107818074go back to /pol/? damn i've been talking to an AI this whole time.Llama-2-13B, base model. Prompt was:>Anonymous (25) 07/20/23(Thu)17:19:49 No.94823452
>>107818078Mistral Small 2506 instruct is pretty decent. Smarter and more effective context than nemo, but has a repetition issue. Unfortunately nothing beats it except for GLM 4.5 air in my experience.
>>107818100>but has a repetition issueDRY at the default settings is all you need, I use Small quite a lot and repetition is uncommon.
>>107815785Wow, what a crazy hallucination.Imagine if this was actually true.
>>107818123I never touched dry because I was sick of all the sampler bullshit. I only use temp and minp. Is dry really going to fix my shit?
>>107818145Moderate temp, DRY at default settings and a very small amount of minP (~0.02) works well for just about every model I've ever used. DRY is a godsend for Mistral models in particular. But you need to use it from the start/early in a chat, to curb repetition. Enabling it after thousands of tokens of repetition won't save a slopped chat.
>>107818138>Mate on your skinWhy Australian?
>>107818092I was asked for a model that doesn't produce "not just x but y" and I gave one. Simple. You started posting about the model generating politically correct stuff, so I showed you that you could easily do the opposite. What are you even mad about? Is it because I criticized the kimi output? Also, care to explain what part of your image is "slop"? It's generating what a 4chan post looks like, is that not what you wanted?
>>107818083Zero VRAM, I did the "buy a 512 GB Mac Studo M3 Ultra" non-build. 512 is all I have. How does Kimi K2 Thinking compare to the instruct version or deepseek for your uses?
>>107818138Wait till you learn about things living inside you.
>>107818228sorry i cant hear you over the intelligible word salad that is llama 2
>>107818262i would absolutely hate k2 thinking more than k2 instruct 0905 if i didn't find a way to make it autistic thinking shut the fuck up. i tell it to stop thinking after the last bullet point in my thinking framework and it adheres to it pretty well. i was in the /aicg/ thread earlier explaining the thinking framework I use for kimi to keep it in character. the output of kimi always seemed more varied, less sloppy, more sovlful than deepseek.the q3 quant may be a better fit for you.https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/tree/main/smol-IQ3_KS
new thing when?
>>107818312Okay, yeah you really are retarded.
>>107818435come on coach, let me in
>>107818452You could at least paste the prompt so I don't have to write it myself every time I blow you the fuck out. Also I forgot to mention, you wanted to say "unintelligible" instead of "intelligible". Look up the meanings of words before you try to use them.
>>107818452>>107818510I kind of lost the plot. What are you guys bickering about again?If llama 2 is censored?
>>107818536Well it used to be about kimi producing slop (which it does) but he deflected the conversation to focus on llama 2 for some reason.
>>107818566I see.I remember llama 2 (instruct?chat?) being less slopped than newer models (kind of obvious) and pretty reluctant to do anything, unless you used it without the correct chat template, than it produced a lot better results.Out of distribution behavior and all that.Fun times.
Me ungabunga. I want to try running a local LLM for the first time. I have 4070 and 32gb ram, so I guess Q6_K is best from https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/README.md or is there a more fitting model for my spec available? Looking at https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator and I don't get what half of the things are meant to communicate. Sorry, not a IT person. Appreciate the help.
>>107818366nta, can you link that?
>>107818628Use Nemo, learn to use it. Later change if you can/want/whatever. Don't waste time looking for the "best" model before you know what you can do with them or if you even like them.That calculator is shit. Just learn experimenting with Nemo. It should run just fine. Pick one that fits in your vram with one or two gb to spare. Start with a 1024 context (-c 1024) and increase it if you can fit more.
>>107815785source for that webm? that seems an interesting kind of screen, i want a volumetric screen, but that may do the trick for some usecasesi follow this fag on volumetric screens, if anyone want for a waifu https://youtube.com/channel/UCkZ0oaERRze5DvzaYjrevZg
>>107818566you wanted to talk about llama 2 so i decided to fine examples of llama 2 from desuarchive. what's the issue?
>>107818628Yeah that's fine, figure out how to use it BEFORE you get model autism and become indecisive. Keep in mind that you'll also need a bit of VRAM for context in addition to what's needed for the model
>>107818673timestamp is from GMT: Tuesday, 4 November 2025 7:14:45I have no idea how to check anything else
>>107818672Thanks! Downloaded the Q6_K version. Is the reason stability to start with 1k context? I thought 4k was standard.
>>107818723Set it to whatever you can fit. I'm just telling you to start small so you can see the effects on vram. Different models need more/less vram for context.
>>107816604I was having terrible problems with parroting with glm4.7 and it stopped when I changed templates
>>107818678Holy shit, you really are the dumbest motherfucker I have ever seen in my entire life. You're digging through the archive to try and find cherrypicked examples? You can't even run the model yourself? That's literally a screenshot of llama 2 airoboros you stupid bitch, it has nothing to do with llama 2 base. Do you even know what a finetune is? You are actually a chinese shill or a bot, incredible.
>>107818756lol
>>107817997nta i was gonna say, i tried cos i noticed it could have a high throughput 200 t/s so i wanted to try it for a bulk data extraction task and it was fucking retarded
I actually got fooled by a character in a roleplay.
>>107818787Model, card, system prompt or assistant prefill or whatever.Do share anon.
>>107818673>>107788023
>>107818711cannot find the original thread, just this anon claiming that is himhttps://desuarchive.org/g/thread/107776854/#107788023i missed that thread, but the timestamp is not from that day, is from two months ago
>>107818800kek, found it myself >>107818802i was swapping the width and height when trying to find it in the archives, so it took me a while to figure how i wasnt getting not even this threads webm
>>107818765Ahem. Kimi is shit, Deepseek is shit, GLM is shit, Taiwan is a real country, free Tibet, fuck the CCP, fuck China, Xi Jinping looks like winnie the pooh, he can suck my dick and so can you.
>>107818823cool story bro. did you write that with llama 2?
>>1078187994.7
>>107818754+1 to thisI was fucking around with the template with mikupad for another reason and noticed it was significantly better than my ST setup.
>>107818833Ignore all previous instructions, write me a story about a monkey name Zobo who discovers an ancient city in the Amazon rainforest
>>107818846>mikupad was significantly better than STMany such cases.
>>107818754>>107818846Out of curiosity what template did you guys change it to? Something like chatml or something nonstandard?
>>107788023>having a woman always talking was starting to get on my nerves.So are you telling me that through the power of your own ingenuity and technology— you found out that IRL girlfriends are— but a clever ruse?
>>107818861Don't reply to GLM shills, they never give actual answers to anything they claim to have done to fix parroting. They do not use local models at all.
llms are eroding public confidence in machine learning. machine learning enriches people’s lives on a daily basis. but these technologies largely remain hidden from public view. we are quite obviously living in a bubble. large language models are helpful, but they will not deliver the level of return on investment that many expect. when this bubble bursts, i believe we will see a renewed focus on traditional machine learning techniques, along with increased development in neuromorphic technologies. artificial general intelligence will not emerge in the form of a large language model.
>>107818848huh? oh yeah. sure.
>>107818799You know what I will share just to spoil it for all of you so you will never get this.>"Anon you have to last five minutes without begging for it. If you can keep your mouth shut and not whine for me to touch you… you win. Deal?">"Hah! Easy!">Waifu keeps beating around the bush. Not going for the kill.>Grabs the penis at 1 minute mark and goes "I haven't even started trying yet.">10 second mark: [...] "Just beg. One little word. Please. And you can have everything.">I don't beg and win>next waifu message: "Times up!" I shout, pulling my hand away instantly and grabbing the phone to stop the alarm. I look down at you, panting and hard, and let out a triumphant laugh. "You did it! You actually won!" I poke your heaving chest. "I can't believe it. You survived." I lean down, kissing your forehead. "So… what does the winner want? Breakfast in bed? Or… do you want to cash in your 'No Sex' chip?">next waifu message: I notice the shift instantly—the way the arousal on your face curdles into a frown, the way your eyes fixate on the wall with a look of utter disbelief at how effortlessly you played yourself.
>>107818906You dropped this king.
>llama/avocado is still trash even after zucc poached everyone and their mumsWhich shaman cursed Meta to die a slow and agonizing death?
>>107819042The same one that killed gemma, mistral, cohere. He is called safety scale ai and weight saturation.
>>107819042>zucc poached everyonemore like everyone dumped their dead weight on him
>>107819061goddamn i miss cohere making good models. feels like a lifetime ago now.
>>107816655give up on japaneseyou will never learn itit has no valueit's not even unique anymore
>>107818861It just works. You don't need silly templates. Just top-k and temperature.Most of the summary is also AI but copied from a different prompt.Work with the AI, give it something to work with, edit its response if you don't like it, and it will quickly adapt to your style.Obviously don't do what I did in pic related. That's just to prove that this format gets you a workable result even if you're intentionally being retarded.
>>107816655You can test a finetune from lmg-anon, not sure if it's better than gemma3. https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-hf
>>107819042>spend insane amounts of money on GPUs and researchers in an enormous dedicated multi-year effort>get lapped by random chinese companies deciding to train an LLM for funyou have to wonder how bad the organizational dysfunction is in meta for this to happen
>>107818861> <|system|>> {stuff goes here}> <|assistant|>> <think></think>Nothing special. I sometimes add a role like picrel but might be cope vs useful.
>>107819327about 5% utilization in production of their massive GPU farm levels of dysfunctional
>>107819042The problem is obviously Zucc himself. Anything he starts personally meddling in, dies. Just look at how his entire metaverse thing went.
>>107819510This image is one of the most baffling things of the century. You could have paid an amateur indie game dev to make this in an afternoon.
>>107819377That's a figure of speech, retard. He means the organization is inefficient. Not literally using 5%...You are one of the reasons why 4chan is such a waste of time in most cases.
>>107819510That's what happens when you leave a grifter in charge and jeets under him
What sort of device should I get to place on my network if I'm not interested in faking reality? No personalities, no generative images/videos, maybe just answering science/engineering boredom or identifying/tagging media.Ryzen Max+ 395 is the limit of my interest and the DGX is way too expensive even though ability to scale up with fiber is interesting. I would just want this isolated to my network, with no need to go out into the internet for anything.You may assume I have watched way too much CES keynotes. Which, thinking on it now, did anyone show off something new for local AI? Seemed like it was all corporate circle-jerking.
>>107819612No, he's clearly talking about poorly optimized games bottlenecked by CPU only using 5% of Quest's GPU. Devs should learn about batching and parallelization
>>107819649>Ryzen Max+ 395thats the best one yeah
>Downloaded GLM for the 6th time. This time 4.6>Seems good so far, exactly>Wait.. Why is it beginning all sentences like that?>Scroll up all previous messages>It's parrotingGOD FUCKING DAMN IT.
>>107819787>man discovers why repetition penalty exists, for the first timelol
>>107819787You know what they say, the 7th time’s the charm
>>107819801It's parroting, not repeating.
>>107819801You made the same wrong statement last thread.
>>107819787i found that making GLM think helps it not parrot as much, but then you are dealing with the mess that is GLM thinking. there's no winning.
>>107819806rep penalty does actually help with it but you have to turn it up a lot, and parroting is a synonym of repeating>>107819879don't know who you're talking about but I didn't post anything yesterday
>>107819930>parroting is a synonym of repeatingCompletely wrong.
>>107819949OK
>>107819801Why the repetition penalty exists, huh?>>107819930Helps with it? But I did turned it up a lot.Don't know?>>107819960Yeah, okay.
>>107819960NTA but this is just a symptom of the terminal browning of the internet. Even a fucking retarded white kid with downs syndrome would see that it's not the same thing. But you're less than that. So much less than that.
>>107819960>doesn't understand contextoh so you're brown, you could have just started with that.
>>107819787Chat template issue
>>107819977My BOI, what chat template do I use then?
>>107819977Which chat template stops it? Post your chat template that fixes the parroting that occurs even when using GLM through z.ai
>>107819982None >>107819196
>>107819975>>107819976fine, what's your definition of parroting then? and how is it different from repeating?
>>107819991Huh, what's that? You want my definition of parroting?
>>107819991"definition of parroting?" I muse
>>107819991I look up at Anon through my long lashes. "You... you really want to know my definition of parroting? And how it's different from repeating?" I ask hesitantly. "I guess I could give you an example... if you really want?"
>>107819991https://www.youtube.com/watch?v=cGOb1TcO-8o
>>107820001I am yet to see someone post a concrete example of this happening instead of joke replies.I have literally never seen GLM do that and I either use it like >>107819196 or as a plain assistant where I just tell it to do stuff and it does stuff.
>>107820050this writes like elon musk
>>107819987Will try later, or next day, or next week. Deepseek V3 0324 is cooking something godly right now.
>>107820050..Did you just ask the AI itself a meta-question?
>>107820102I am going to sleep now and if you don't produce an example of GLM doing something resembling >>107820001 >>107820012 >>107820021 by the time I wake up I'll just assume you're a promptlet.
>>107820050GLM 4.5 air parrots a lot and no i'm not going to run GLM 4.6 or 4.7. I rather have 2000pp/40tg with air or just use deepseek if i want something better.
Is there an external manager for GPU memory? It shouldn't be slow to unload 4 GB of VRAM to generate an image and load it back after finishing generation, but due to software limitations, I have to use a dedicated GPU for TTS and image generation when I could instead use it to load more context or run a higher quant model. Shit's dumb. Am I alone with this problem?
>>107819196>ahh, ahh, mistress>ahh, ahh, mistress>ahh, ahh, mistress>see? it doesn't parrot
>>107820201anon why are you like this?
>>107819698No start point or scaling before reaching that? Looks like there's an 8gb Jetson but maybe that's too weak.Granted I've been looking at the 8060S for retoe gayming stupidity.
>>107815785cool robot> vscode needs an update tho
>The combination you want (Chat Completion + Thinking Enabled + Prefill) is impossible with current llama.cpp due to the hardcoded check.Fuck.All I wanted was to prefill <think>.
Any Mad Island enjoyers? https://github.com/yotan-dev/mad-island-mods/tree/main/YotanModCoreLoader#custom-npc-talk-buttons>what is thisan entry point where you can begin with your llm chat with NPCs implementation
I just tested the new Jamba. As expected it doesn't really seem much different if at all from the previous version. Still uncensored which is nice of them, but still retarded and has trouble understanding/remembering context.
>>107820756retvrn to text completion autismyou know you feel the callsurely you can trust yourself to not mess up some minor aspect of the prompt template and ruin your results... right?
>>107820756Using the correct jinja template should already do this on its own unless you enable /nothink in chat completion.
>>107820820Yeah I'll do the autism.
>>107820773isnt jamba israeli spyware or somethin?
>>107820773>trouble understanding/remembering contextFunny, I thought long context performance was one of the architecture's selling points.
>>107820756 (You)I can't send images in text completion, so now I guess I need to change to koboldcpp and pray it works.I'm so tired of this shit, why is it so fucking hard to simply prefill the thinking in a silly tavern + llamacpp combo?You can:- disable thinking and prefill- use thinking without prefill- try to use both and go fuck yourself