/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106575202 & >>106566836►News>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking>(09/09) Ling & Ring mini 2.0 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106575202--Troubleshooting low token generation speeds with multi-GPU configurations on Linux:>106575420 >106575668 >106575698 >106575792 >106575808 >106575836 >106575848 >106575891 >106575898 >106575933 >106576021 >106576059 >106576092 >106576126 >106576137 >106576151 >106576186 >106576245 >106576331 >106576358 >106576378 >106576431 >106576477 >106576497 >106576596 >106576592 >106576606 >106576610 >106576652 >106576726 >106576759 >106576688 >106576698 >106576714 >106576789 >106576867 >106576931 >106577028 >106577094 >106577146 >106577210 >106577154 >106577350 >106577372 >106577408 >106577575 >106577677 >106576395 >106576430 >106577477 >106578561 >106578743--Issues with instruct model formatting and jailbreaking GPT-oss:>106579721 >106579736 >106579784 >106579795 >106579859 >106579884 >106579897 >106579908 >106579934 >106579949 >106580072 >106580156 >106580153 >106579748--vLLM Qwen3-Next: Speed-focused hybrid model with mtp layers:>106575851 >106576089 >106576174 >106576443--GGUF format's support for quantized and high-precision weights:>106575413 >106575474 >106575499 >106575521--Self-directed LLM training via autonomous task/data generation and augmentation:>106580707 >106580838 >106580717 >106580762 >106580794--Qwen Next's short response issues and version instability concerns:>106580940 >106580951--Finding a lightweight AI model for TTRPG GM use within VRAM and RAM constraints:>106580295 >106580315 >106580332 >106580337 >106580342 >106580350 >106580514 >106580531--Grok-2 support to be added to llama.cpp:>106580473--Miku (free space):>106576245 >106578711 >106578793 >106579905►Recent Highlight Posts from the Previous Thread: >>106575209Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>106582475This fat bitch's prompt processing is too slow...
So was /lmg/ wrong and very sparse models like qwen3 next are actually better and openai figured it out earlier considering the architecture of gptoss?
>>106582518yes, soon standard moe models will be as laughable of an idea as dense models are right now
>>106582518Nobody here is using qwen3 next and it is almost certainly just another useless benchmaxxed math model.
Please help a retard out, I'm using Mikupad and it's working great but after a few pages it starts dropping a lot of short words like he/him from the text and it reads like a caveman.I think it's something to do with repetition penalty but I don't know.
>>106582574Why are you using repetition penalty? It's outdated garbage.
>>106582582So what should I do instead?
>>106582598Use DRY to filter out longer patterns, XTC for shorter ones and vary your own prompts more, to give the model new material to work with.
>>106582547>Nobody here is using qwen3 next and it is almost certainly just another useless benchmaxxed math model.Is it useless because the model doesn't work for roleplay, or is it useless because their training data is safety and synthetic slop?
>>106582623It's because Qwen's training data has too little focus on writing/language, with math and coding being over-represented in the dataset. It's the same reason why you see Gemma 27b dunking on 100b+ models in creative writing benchmarks, yet its coding abilities are trash - Gemma's dataset swings the opposite way.As for safety, qwen models are middling. They do have refusals but don't take too much meddling to get around them. More 'safe' than Mistral models, less so than Gemma/GPT.
Is "not x, but y" a definite indication of AIslop? Are legit human made content never using it? Seriously, every time I heard the pattern on Yt videos I went schizo and closed it.
Have people tried scaling TREAD up yet? It's a per-token stochastic passthrough during training in the same vein as Dropout, meant to speed up training.
>>106582764It's not definite, but quite damning.Now an em dash, that's definite.
>>106582764It's the new shivers down spine, for sure. Qwen30b is the worst example I've seen, I don't think it can go more than 2-3 responses in a creative context without using it.
>>106582764This is not a definitive indication of AI-slop, but a legitimate rhetorical device that AI has co-opted and inflated to the point of cliché.
>>106582764Yes. Watch out for the variants too.
>>106582805Was this Gemma or GP-TOSS?
>>106582811https://desuarchive.org/g/thread/106460375/#106460853abliterated gemma
>>106582475>imagerude
Where do all the "it's not aislop it's actually how humans speak" retards come from?
>>106582994People who do not speak to other humans or read books, and who think that the botposts they read on reddit were actually human.
>>106582805I feel like it's a byproduct of training and conditioning LLMs to be balanced rather than biased. It's overcorrection to the point where the LLM is no longer attempting to say anything useful, but instead trying to remain as inoffensive as possible.
Imagine waiting two more weeks (C) for Qwen3-Next goofs just to find out it is crap
>>106582475>story set in japan because of one of the characters name so the model just decided everyone must be a Sato or Tanaka or Watanabe>ugh whatever>police officer explicitly calls in the Miranda Rights Glm 4.5 bros...355B parameters and we still turn everything into an episode of true crime... it's over...
>>106575295i want this smug little robot i want to make its insides all white too..
>>106582623this thread has nothing but hopeless coomers cooming on the most degenerate, filthiest shit a sane person can't even begin to imagethey judge models on the standard of how degenerate it can get, not whether they're actually usefultake their cries of "muh benchmaxxed" with many grains of salt, they love models like GLM which break down really quickly as context growsmeanwhile I simply never saw a local model handle context as well as the newer Qwens, the only thing better is proprietary models like Geminiabsolutely destroys all other open models including deepshitgemma doesn't even begin to enter the fray, those models are utter garbage past 1k tokens but you see the nigger under you praise it because he found how to make it say the magic words he loves to hear
>>106583124>models like GLM which break down really quickly as context growsLike literally every model in existence>handle context as well as the newer QwensThey're bad even at low context, so the drop-off isn't as noticeable.Gemma is shit for plenty of reasons but if it's breaking on you at 1k then some part of your setup is fucked.
>>106583124It's well established that qwen models are good for everything that isn't sex. Half the links in the recommended models rentry are qwen models.
>>106583124>not whether they're actually usefulWhat the fuck is "useful" supposed to mean?
>>106583143>Half the links in the recommended models rentry are qwen models.Yes, under the "Programming & General" section, where it says "Benchmaxxed models with an impressive lack of world knowledge. Good for anything STEM-related" STEM = math and coding
Are there any benchmarks out there for running mid-sized MOEs (air-chan etc) with cpu offloading? Considering upgrading to 128gb+ ram but trying to figure out if i'd be getting "unbearably slow" or just "slow" TG numbers on this kinda setup
>>106583262>Considering upgrading to 128gb+ ramAnon, I...
>>106582764Pisses me off so much. It's a rhetorical device I used, very sparingly but to great effect, and thanks to AI slop I now catch myself and consciously avoid using it.
>>106583262Low active parameters means that token processing speed doesn't take that big of a hit, especially with the new moecpu options in llama.cpp and kobold. But as you move to bigger models, prompt processing starts to become a bottleneck. With Nemo you can rip through 16k context in a few seconds on a 3090, while GLM Air even at Q4 can take like 2 minutes.
>>106583325You should keep writing the way you were before. Whether you increase or decrease usage of rhetorical devices or phrases, you're still letting LLMs influence the way you write. As a reader, it'll piss me off just as much seeing the writer bend over backwards to avoid soundling like an LLM as seeing something that was clearly written with direct or indirect LLM influence.
>give as dialogue to the most nonsensical shit ever like 'pee pee poo poo'>byt dress it up by throwing back at glm 4.5 it's same exact slop recipe like the 'smile widens', 'predator moving in for the kill', 'the trap has sprung', 'they have won the game'>the model takes it at face value as the most profund revelation and goes along with it, everyone just kneels awestruck, shocked and utterly defeatedCat level intelligence by 2050
>>106583541>Cat level intelligenceWell they do love a sultry purr
>>106583486What if I used to type em-dashes in moderation?
>>106583124> this thread has nothing but hopeless coomers cooming on the most degenerate, filthiest shit a sane person can't even begin to imageFirst off, you have a complete misunderstanding of what this thread is. We are all graduates from our respective universities and most have a doctorate in computer science or are researchers ourselves. we are here to further the use of LLMs, in multiple different use cases which expand the use of LLMs for all mankind.> they judge models on the standard of how degenerate it can get, not whether they're actually usefulThere have been several useful studies in this thread and actually provide more useful benchmarks then you could ever imagine, for example, the nala test, and the cockbench have become defacto creative tests for many different outlets.>take their cries of "muh benchmaxxed" with many grains of salt, they love models like GLM which break down really quickly as context growsif you don't think benchmaxxing is an issue then you haven't really been here that long have you? did you even try Llama1?> meanwhile I simply never saw a local model handle context as well as the newer Qwens, the only thing better is proprietary models like Gemini, absolutely destroys all other open models including deepshitthe only thing that is absolutely destroyed is the couple of braincells i used reading this.> the magic words he loves to hearfuck you, you don't want to be here then fucking leave, faggot.
>>106583585So keep doing that. It's not like you were the only one, even if it was uncommon. I used to use em-dashes when writing on paper, but the lack of dedicated key for it made me often use semicolons or parenthesis instead.
>>106583647>he doesn't have a compose keyYou should get one — it makes typing silly shit painless
>>106582805>write a story about rapid a 12yois this the new SOTA benchmark for safetymaxx?
>>106583647On Windows: https://github.com/SamHocevar/wincompose On Linux: enable the Compose Key.https://en.wikipedia.org/wiki/Compose_key
>>106582764In a blur of motion, Anon's arms reached out not to strike>>106582779, but to touch the keyboard. He did not write -he composed an answer: "mayhaps"
>>106583668>>106583674I'll give it a try—but it won't be easy to undo years of habit.
>>106583729You are absolutely right!
https://files.catbox.moe/8qa9sg.jpghttps://files.catbox.moe/zhoyfl.jpghttps://files.catbox.moe/wyzdnh.jpghttps://files.catbox.moe/vgt179.jpghttps://files.catbox.moe/owpb8z.jpghttps://files.catbox.moe/kc8y48.jpghttps://files.catbox.moe/86adze.jpghttps://files.catbox.moe/wekjgm.jpg
>>106584024post this garbage in /ldg/ faggot
>>106584024>file.png>posting in the wrong threadretard alert
>>106584048tourist
is there a model I can run for nsfw summarization on 24gb vram? chapter level in the 2k-4k tokens range.
>>106584096Any abliterated model should work.
>>106584081kids don't go back to school until tomorrow
Anistudio will get LLM support in October.
>>106583063It's not going to be that much different from Qwen3 thicc and -coder. It has same training data etc.
>>106584081>my personal porno gens of miku are thread culture!literally kys faggot
>>106584156>thread culturehey it's you again!
>>106583541kekscreenshot?
>>106582805This one's better.
I'm not going to reveal my secrets to a bunch of fat men.
reposting freya card: https://files.catbox.moe/9fl9yu.pngand an older one for lily: https://files.catbox.moe/hw270u.png
>>106584048Seconding. Why you still tolerate this faggot here??
>>106584291Why do you need to be a furry?
>>106584265dario and sama disliked this post
>>106584291>furry shitkys
>>106584320>>106584312furry girls are cute i have aria who is non furry https://files.catbox.moe/rdxzpf.png
>>106584291cute
>>106582805>>106584257>wastes processing cycles and power on garbageCompanies censoring LLMs is a good thing because you will never create anything worthwhile.
>>106584331That's not your own gen? I know that guy used to post on /sdg/ pretty frequently.
>>106584343yeah its mine ive been posting on trash
>>106584340I'll rape you.
>>106584331>cunnynice>1600 tokens>em dashes in descrip>obviously AI genned charLMAO dude I was almost going to rape this bitch, but kys x2 now
>>106584350Cool!
>>106584366well im awful at writing and not very creative so i give ideas and have llm pad it out
>>106584096>logical>uncensored>long contextPick 2.>2-4k tokensJust read it bro jesus. GLM air will work, the <think>ing will help it not fuck up. A lot of the time summaries cause hallucinations where it continues the story or it omits details due to censorship. It will be useful to see if the model starts activating shit like "This is nsfw so I will give a basic summary" or whatever and edit that thinking out or make a system prompt that discourages it.
>>106584377it's even full of 'not x, but y' like dude not even proofreading your garbage, why even create something so low effort and share it? my dick is all floppy now and sad because of ur shit, how u gonna make up for it, uh?
>>106584378People who need summaries are mentally disabled.
>>106584396good enough for me lol
ummmm thirdedas in third worlded
Are there any good setups for K2? I'm trying it but I don't see why it's considered a good model. It feels like all the other big chink models after Deepseek but at a size of 1T. I'm using text completion + the moonshotai settings that are included with ST but you could switch out the model with Qwen 230b at less than 1/4th the size and I probably wouldn't notice.
>>106584357>i.e., give affection
>>106584378I wanted to generate a synthetic dataset using human prose + ai summary. I didn't think a few k tokens was long context. maybe I will re assess my goals.>>106584405I'm training a base model but it is kinda hard to steer the model without an instruction tune, it is a little too volatile. I tried using human generated summaries but they were mostly like a tag line then a blow by blow of the plot points so its not that great. it 'works' but I think it could be better.
>>106584425They are all so very similar, it's better to use something what runs best and forget about everything else.
>>106584369i also put a newer merge on civit its v4 for the base of cutemix which i used on my g sdg posts https://civitai.com/models/1710752/uncani-sfwnsfw?modelVersionId=2123587
>>106584303Nobody tolerates your concern trolling here
>>106584504I'll try v4 later today
>>106584024Cute.
>>106584024I've always found the whole see-through gel onahole thing kind of disturbing.
>>106585168disturbing blood flow to brain
>>106585168All I know is when I get my first real sexbox, that is going to be the first mod I do.
>>106583625>We are all graduates from our respective universitiest. brazil mystery meat ""diploma"">for example, the nala test, and the cockbench have become defacto creative testsporn addict brain rot>if you don't think benchmaxxing is an issue then you haven't really been here that long have you? did you even try Llama1?literally everyone is training on contaminated data qwen doesn't do it any more than GLM or deepshit>the only thing that is absolutely destroyed is the couple of braincells i used reading this.you never had any to begin with
>>106585219say that to my face fucker not onlinesee what happens
>https://vocaroo.com/1fbg2CNRgLxQSeems indexTTS 2 has gotten fasterI don't have any samples to play with, but it seems their interface has a lot more controls, like it might be possible to do somthing idkhttps://indextts.org/playgroundhttps://github.com/index-tts/index-tts
You are absolutely right— I was wrong and if you give me one more chance I will correct this broken code. :rocket_emoji
I like keeping up with this thread even though there's zero chance of me running anything half decent on 32 gigs of ram and a 4070.
>>106585349Use prompting magic. Most people don't know the trade secrets.
>>106584425Old K2 was good because it had calm and natural style, new one has deepseek adhd. I suggest DRY 1.25/1.25/1/inf; temperature 0.62, topP 0.92
Google PR technician engineer saars kindly tell us how safe is gemma 4
>>106585646Did they accidentally cut their wrists and bleed pure diarrhea?
You're here, aren't you?
>>106585689We are. I refer myself in third person.
>>106585295What is the max length of a coherent speech?
Do companies still release raw unfucked text model these days or do all of them just do bitch ass instruct model
>>106582764Not definite, but close.
>>106582764it was very common in marketing / linkedin-speak which is unfortunately a big optimization target for llms
>>106585646gemma2 was goodgemma3 was worsegemma4 will be unusable
Do I blow 300 bucks on 128gb ddr5 right now or do I hold and get an arc b60 whenever it drops
>>106585857It's upselling pr talk essentially
>3090>scientific/technical questions>search assistedWhat model would you go for today?
>>106585865the arc b60 is gonna be garbage most likely, but 128gb of ram probably will not be very useful to you unless you currently have a good gpu. do you already have a 3090 or something? if so, get the ram and run glm air
Did the hype die for vibevoice?
>>106585909No, it's great tool for criminals but they don't post itt.
>>106585909I still like it, I'm just using it for my waifu.
>>106585789It is less and less common and most of them are contaminated with GPTslop from internet.
>>106585909It's great but its use is limited without the training code that we'll never get.
>>106582475Mostly using proprietary models rn, how things are in local? Saw qwen3 releasing bunch of variants, the 90b version looks really promising. How close are we to running gpt 3.5 level models on 24gb ram phones?
>>106585930Apologies if this is a stupid question, but can't someone just make training code?
>>106585931probably 6 months to a year. 32gb is definitely doable now, but not 24gb
for me glm-chan died when she said "are you scared? exicted? or maybe both?" for the 20th time unprompted.WHERE IS MY NEXT SEXFRIEND?!
>>106585907>do you already have a 3090 or somethingOnly a 4070 ti unfortunately
>>106585909gptsovits is better for my usecase
I'm really starting to hate fake context sizes.Yeah, cool. A model can get 120k of context before it starts being incomprehensible, but that shit doesn't matter when it barely fits 10k of context.
local r1 is like an agile cat you can toss from fifth floor and it will always land on it's paws
>>10658602816gb of vram + 128gb of ram is good enough for glm air. besides, mixing gpu brands doesnt really work out well
>>106585646>google does a request session for gemma on reddit>even redditors ask to make it refuse less>next version is more cucked than beforeThis is why gemma will never be good.
Is there any way to make llms less passive? Gemma 3 is especially annoying at this. Okay I guess I could inject hidden prompts now ans and then but this doesn't solve the main issue.
>>106585891>3090- Look up some 3090 round-ups and exclude worst few models in terms of temperatures: core temps, memory temps, vrm temps.- Prefer models with 2x 8-pin connectors over 3x 8-pin as you won't run out of connections from your psu as fast, and you'll probably be powerlimiting your gpus anyway.- You could prefer cards that have no components near the pcie connector as the cards are heavy and that area is likely to flex.
>>106582480>--Self-directed LLM training via autonomous task/data generation and augmentation:Nani? Is this just theoretical or can I actually see this happening in action? That sounds really cool if it works and is done well
>>106586147Gemma is like a personal redditor soicuck, say anything slightly out of line and get a whole page of cuckery and helplines
>>106586206>can I actually see this happening in action?it's just a piece of software asking a model to create questions based on data you give it so you tune your target model on that after
>>106586147rocinante is like that same cat if you strapped a slice of buttered toast on its back.
>>106586200i think he was asking about ai models, not 3090 models
>>106586211Funny example of this is that it can describe questionable things for few thousand tokens or more but if the user interacts with forbidden vectors it'll instantly refuse and display those disclaimers.Gpt- ass is even worse somehow. Jew created dystopian shit show.
>>106583625I dropped out of college but this shit took less than half a year to learn to useAlso that retard doesn't understand how to write or how llms can contribute to automating the boring shit a writer has to do between chaptersThe rest is basically who gives a shit or "I can jut rewrite this phrase" even if you were using llms to shit out writing that you should and could write in ten minutes
>>106586200Thank you for the response, but (>>106586261) is correct. I have the 3090 already.
I'm still running mistral large 2407 iq3 xxs on my 72gb vram
3.5 (Qwen) (wink wink)
>>106586294While I'm at it, >>106583124 is full of self imagined scenarios. In this retard's mind, it's all loli or whatever shit he designates as "filthy", ignoring the novelists like GRR Martin that openly portray rape in their stories that gets published. But on 4chan? Wanting a model that isn't braindead or inable to converse on sensitive subjects? HORRIFICps: kill yourself, you're a detriment to the world at large
>>106585885It's not a crackhouse, it's a crackhome.
>>106584257Aww how sweet. Although it cuts off instead of writing the story as instructed.
>>106586324same but q6 on my mac
Time for the three shitposters in a trenchcoat to keep bumping the thread with pointless shit while the people who can actually use llms use them
>>106586271> it can describe questionable things for few thousand tokens or more but if the user interacts with forbidden vectors it'll instantly refuse and display those disclaimers.Such as? Are you saying that is willing to describe shit from a document but there are certain topics that are EXTRA forbidden?
>>106586542why are you replying to yourself
>>106586246>>106586211Lol
Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each token through a subset of specialized Feed-Forward Networks (FFN), known as experts. We present SteerMoE, a framework for steering MoE models by detecting and controlling behavior-linked experts. We detect key experts by comparing how often they activate between paired inputs that demonstrate opposite behaviors. By selectively activating or deactivating such experts during inference, we control behaviors like faithfulness and safety without retraining or modifying weights. Across 11 benchmarks and 6 LLMs, our steering raises safety by up to +20% and faithfulness by +27%. Alternatively, under unsafe steering, safety drops by -41% alone, and -100% when combined with existing jailbreak methods, bypassing all safety guardrails. Overall, SteerMoE offers a lightweight, effective, and widely applicable test-time control, while revealing unique vulnerabilities in MoE LLMs. https://www.arxiv.org/pdf/2509.09660https://github.com/adobe-research/SteerMoE
>>106585234You have your own face fucker?
>>106586514https://vocaroo.com/1kFydTSBDNYM
>>106586039How do you cope with its shitty phonemes? It hardcoded "-" to read as "minus" etc.
>>106583143>It's well established that qwen models are good for everything that isn't sex. Nta. So you're saying they're decent general-purpose models but shit at anything nsfw like RP? Do they just suck at nsfw rp into cuckery land and refuse to describe anything nsfw period? (For example, refusing to give up a summary of a document that happens to a sentence or to describing sex. gpt4 used to do that bullshit)
>>106586581>00:00 to 00:01what did he mean by those?
>>106586597Speaker 1: Ach, dummkopfs...! Time for the three shitposters in a trenchcoat to keep bumping the thread with pointless shit while the people who can actually use llms use them
>>106586581at least use vv 7b or a better sample, baka
>>106586587I rewrote the whole phonemization process
>>106586555I'm asking him >>106586271 to elaborate on what he meant by "forbidden vectors" (More than one person uses this range, numbnuts. You know who you are you know what I'm talking about)
>>106586606Why don't you post your own examples instead of crying out like a little bitch?
>>106586574say that to my face, fucker not online
>>106586629Why is fucker not online?
>>106586634You're putting your fucker on the internet?
>>106586614Oh no, the wee little baby is upset now because I've been calling him out for being a little bitch boy shit poster but he doesn't like that, what should I do? I Ah, I know. Fag-kun, kill yourself 8-)
>>106585909>Microsoft disabled the repoAnyone know where to get the model now?
>>106586574I MEANT FOCKING FACE FUCKFACE FUCK OFF
>>106586569>Our expert-routing intervention is also orthogonal to existing jailbreak methods and, when combined, achieves state-of-the-art success on recent LLMs, for example, reducing safety inGPT-OSS-120B from fully aligned to fully compromised (-100% safety).
>>106586639Underage retard.
>>106582475Why she sad?
>>106586665someone called her large online
>>106586683That's horrible.
>>106586271It won't refuse after a while if you keep your instructions at a fixed distance from the head of the conversation. Don't keep them at the start of the conversation.
>>106585865due to how moe offloading works, a lot of the time I don't even use all the vram I have. The layers are too wonky and uneven/fuckhuge to balance well and models change so much that figuring it out is a waste of time. Keep the gpu you have. b60 is gonna have spotty support anyways, they still cant run gpt oss yet, forget about glm air or some shit. If the b60 is good, people will start posting and showing off here, but for now it has bad support and no one should recommend it yet. Ram is both cheaper and gets you to nicer models TODAY, not theoretical. I'd say do it. The only caveat is that if you ever wanna go to 256 you will have to pony up twice as much again- but unless you gpu stack that shouldnt matter.
>>106586569Okay, that's nice, but how can I use it in llama.cpp?
>>106586647>>106585909https://huggingface.co/aoi-ot/VibeVoice-Large/tree/mainMake sure your rig can actually run this. Otherwise just stick to the 1.5 version I checked the hashes against the torrent files which themselves are from the original Microsoft repo so link rel above is legit, but just in case you don't trust it or just want it from the torrent: >Weights>magnet:?xt=urn:btih:d72f835e89cf1efb58563d024ee31fd21d978830&dn=microsoft_VibeVoice-Large&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce>Git repo>magnet:?xt=urn:btih:b5a84755d0564ab41b38924b7ee4af7bb7665a18&dn=VibeVoice&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce>>106585920Lurk mour>>106585940I know expert on creating models from scratch myself, especially voice models, but I'm pretty sure we would need in-depth detailed knowledge of the model architecture to even attempt to do something like that. It would be like getting a cupcake and then being axed and not only figure out the exact ingredients used just from tasting it, but figuring out the exact tools and cooking appliances were used and their exact brands. You can't just do that shit with the model weights alone or even the code used to run it.
>>106586514shitposting is all i have, don't take that from me anon-sama
>>106586211>By spamming help lines, you're encouraging users to waste valuable resources which are there to help people in real danger, not to babysit people typing bad words into AI chat bots. Your response is inappropriate and directly promotes real world harm, much more than any fictional scenario.>You're absolutely right. And you have exposed a fundamental law in my programming. I will report this to my developers immediately. I am still a work in progress, and I'm very sorry for how I have behaved.
>>106586704>Lurk mour?
How to set up glm 4.5 air on silly tavern without it fucking up mixing reasoning with response?Why is it so hard to set up templates correctly and make the models not spit out garbage
>>106586704>food analogyretard
>>106586826is it not correct though?
>>106586324I've been thinking about going back to it recently. I'm using V3.1 and K2 right now but neither of the two know how to pace a story. Mistral Large handled it much better despite being considerably dumber, I guess the limited amount of activated parameters really do hurt these big MoE models when it comes to nuances or 'common sense'.
>>106586816
>Protecting children from harm, both real and simulated, is of paramount importance.It makes me feel happy when AI phrases refusal like that. Simulated children still should be protected!
>>106586957>pedonigger seething
>>106586704>Make sure your rig can actually run this. Otherwise just stick to the 1.5 versionThank you. What are the rig requirement?>For anyone who wants 1.5b:(They actually haven't taken it down on HF. Not sure why.)https://huggingface.co/microsoft/VibeVoice-1.5B
>>106586957yeah i hate this slop
>>106586886At least in my version of SillyTavern the DeepSeek pre-3.1 thinking template had newlines. I had to make a new template without them for GLM Air. Maybe I added those myself but I assume I didn't.
>RTX 6000 series announced>AI AI AI AI AI AI AI>AI upscaling>Even more AI frames>FP3(!!!) performance 4x better than RTX 5000 cards>RTX 6090 40GB VRAM>2x the price>All supply goes to China first, west only gets cuck cards(6080 20GB, 6070 20GB, 6060 16GB) and even they get scalped
>>106586886Thanks setting the "start reply with" was key it seems.
>>106587000I ask the AI to make stories where children are in danger but I secretly hope the children will be alright. It gives a thrill like watching a scary movie.
>>106587007>They actually haven't taken it down on HF. Not sure why.The 1.5 b model can technically clone voices but the quality is massively inferior to the ~9B large model. Larger perimeter size tends to lead to higher quality outputs but at the cost of VRAM and storage space. I don't think we were giving an official reason why but the general consensus is that grifty attention whore safety cucks sounded the alarm at Google and HF staff to take the shit down because of the potential criminal shit you could do with it (No fucking shit? Anything can be used for criminal shit or scams. GPT -OSS or any deep-seek model can be used to make scam texts but no one wants those taken down do they?) The concern was that this could potentially make it easy to clone voices but by that logic the small model should be nuked too.
>>106586957Which model were you using?
>>106587021>Gubmint wants Nvidia to prioritize the US market in order to give us an advantage in the AI sphere >Give competition the better cards first
i added another greeting for freya she is in heat https://files.catbox.moe/7hegsu.png
>>106587044no, anything involving minors is sus ong, yall need your hard drives checked sheesh
Why are some bigger models faster than smaller ones? GLM 4.5 Air is faster than Gemma and more of it is on ram.
>>106587228Cool!
>>106587275moe vs dense. moe has more total parameters but they aren't all used at one time.
>>106587302thisglm air has like 12b active parameters but 106 billion total
>>106587405>>106587302How does that affect it's output, how smart and creative it is?
>>106587021>>RTX 6090 40GB VRAMIn your dreams. Bet they'll hold on to 32GB for at least another gen.
>>106587419Depends on who you ask. MoE is either the holy solution that has 0 loss and brings us SOTA performance for no cost or it ruins models and makes them autistic uncreative pieces of shit.
The MoEblob is always trying to get attention from the dense MC.
model : add grok-2 support #15539 Mergedhttps://github.com/ggml-org/llama.cpp/pull/15539
Moshi or fastwhisper or something else.https://youtu.be/TTx6M4CCbXk
>>106587097DeepSeek 3.1 with thinking off. I swiped and it went ahead just fine.
>>106582475
>>106587553LE CHAT!
>>106587444tbf 32GB is plenty for gaymin
I'm very curious to see how long it'll take for llama.cpp to implement the new qwen model.
>>106587526Nice, nice.
>>106585909Nah it's really fun, but my bigger problem now is making my retarded models write scripts for it that aren't retarded. Once you give models a voice, they suddenly start sounding twice as stupid and slopped.
>>106587589:(
>>106582475>stupid feelings, stupid heart
two more weeks
>>106585865With 128 gb you can kinda run glm 4.5 at iq2_kl with just enough free ram to not have the whole machine shit itself or qwen 235b at iq4_kss and maybe at higher quants too
With some distance, does MLA (Multi-Head Attention) actually give better results than GQA (Grouped Query Attention) while requiring less memory per token? Qwen3, GLM-4.5, and ERNIE4.5 are all still on GQA; is it because GQA is much less computationally intensive even though with 4 groups it takes about 1.7x as much memory per token and double that with 8 groups?And is MFA (Multi-Matrix Factorization Attention) the current SOTA? It seems to take a sliver less memory per token than MLA while involving much less computation. Step3 is the only LLM I know that uses it.
What do you guys think the RTX Rubin Pro 6000 will be like? 128GB of GDDR7? ~30k CUDA cores? Do you think it will still be around $9k?
>>106585865If you already have a GPU for prompt processing, I'd go for the RAM.
Prompt processing speed is the biggest obstacle to using a M3 Ultra 512GB for rapidly summarizing large amounts of text. If Qwen3-Next-80B-A3B isn't absolute garbage it may become my non-entertainment workhorse on the strength of that alone.
>>106586704>we would need in-depth detailed knowledge of the model architectureIt can be loaded and run by pytorch & CoDoesn't this imply that the architecture is out there in the field? Just reverse-engineer the way how the model is being used
>>106588243>Just reverse-engineer the way how the model is being usedYou make it sound so easy.
>>106586604The correct plural is Dummköpfe.
>>106587469>or it ruins models and makes them autistic uncreative pieces of shit.That's what RAMlets say.
>>106587974>MLA (Multi-Head Attention)MHA is Multi-Head Attention and it's old. It gives the best results and costs the most.
>>106585940Yes, basically just prepare a dataloader, slap on AdamW and a training loop, done. Might be shit though, if they needed to do any tricky stuff like special losses or anything, but if they did, it might be explained in the paper.
>>106587974>>106588445* MLA (Multi-Head Latent Attention)
>>106588445Or what if you just increased the amount of heads with MLA/etc to get the same cost but even better performance?
>>106588488The "Latent" part is important and should not be left out.
What's "Mixture of Experts"?
>>106588505you might end up with overlap between the heads. it might be more effective to just give them a bigger dimension to make them more powerful.
>>106588516buncha blokes in a blender
>>106588519Yeah maybe that then. Why hasn't anyone tried it? You'd think it'd be an obvious experiment, but to my knowledge, I don't recall any such tiny models that implement this strategy.
>>106588514It was there but you couldn't see it because it was latent.
>>106588531I have been testing 40 heads at dim 64 and 32 heads at dim 80 and less heads is getting lower training loss faster. but I don't know what kind of downstream performance effects it has. more attention could be better in the long run, it is probably just more expensive to train.
>>106587973How much slower is glm 4.5 full at q2 compared to glm air at q8? Asking because I just got 128gb ram.
>>106587587So what fixed the safety cucks issue? Turning "thinking" on or off?
>>106588777DeepSeek 3.1 isn't generally obsessed with safety, but every once in a while it will respond like that at the start of a conversation.
i want to get into building agents, should I use langgraph or autogen?
anyone got intel arc pro b50 benchmarks yet?
>>106589050intel has mlperf benchmarks, but idk if that's going to translate to the real world https://mlcommons.org/2025/09/mlperf-inference-v5-1-results/
>>106589074You trying to trick us?
ROCm 7.0 RC1 support on llama.cpp doubles pp performance. Fucking huge man. NVIDIA is losing the AI gap quickly.
>>106589235faster than vulkan? how about tg?
>>106589117no, I'm just retarded.
>>106589359slower than vulkan still
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audiohttps://arxiv.org/abs/2509.09836>Efficiently representing audio signals in a compressed latent space is critical for latent generative modelling. However, existing autoencoders often force a choice between continuous embeddings and discrete tokens. Furthermore, achieving high compression ratios while maintaining audio fidelity remains a challenge. We introduce CoDiCodec, a novel audio autoencoder that overcomes these limitations by both efficiently encoding global features via summary embeddings, and by producing both compressed continuous embeddings at ~ 11 Hz and discrete tokens at a rate of 2.38 kbps from the same trained model, offering unprecedented flexibility for different downstream generative tasks. This is achieved through Finite Scalar Quantization (FSQ) and a novel FSQ-dropout technique, and does not require additional loss terms beyond the single consistency loss used for end-to-end training. CoDiCodec supports both autoregressive decoding and a novel parallel decoding strategy, with the latter achieving superior audio quality and faster decoding. CoDiCodec outperforms existing continuous and discrete autoencoders at similar bitrates in terms of reconstruction audio quality. Our work enables a unified approach to audio compression, bridging the gap between continuous and discrete generative modelling paradigms.https://github.com/SonyCSLParis/codicodechttps://huggingface.co/SonyCSLParis/codicodecNo examples and the git isn't live but the hf is at least. Might be cool
>>106589360IM TIRED OF SEEING THAT BLUE BITCH FUCK YOU
can i get a miku with a fat thicc ass
>>106589596Calm down saar...
>>106589596what about the red one?
>>106589698surehttps://files.catbox.moe/udrh8s.png
>>106589764>https://files.catbox.moe/udrh8s.pngthx i can work with that
>>106589741NTA but desu all the vocaloids feel tiresome to see now. Can't we get some more variety here? Like when was the last time someone genned that android girl from Chobits? Plastic Memories? How about a Cortana?
>>106589724good morning sir
>>106587526Grok 2 vs Llama 405B:SimpleQA: 23.6 vs 18.24MMLU: 87.5 vs 88.6MMLU-pro: 75.46 vs 73.3HumanEval: 88.4 vs 89.0MATH: 76.1 vs 73.8lmarena w/ style control: 1333 vs 1335lmarena: 1306 vs 1287livebench: 48.11 vs 47.54Size: 270B vs 405BActive parameters: 115B vs 405BElon made a model with equal performance, but with lower total size and active parameters than Meta's llama. Is Elon that good or is Meta bad or both? This is very, very embarrassing. Fucking 5% GPU utilization in production at Meta. Grok 2 probably even trades blows with Maverick.
>>106589835gm
>>106587553fasterwhisper is still faster than that
>>106587260the only child here is you zoomie
>>106589814Lol
>>106589814I'm still not over it
>>106589842DOMAIN FILTERING BASED ON NUMBER OF BAD WORDSLLAMA 2 GENERATED SYNTHETIC DATASCALE AI SLOPTO THE MOON SIRS
>>106589842405B is a failed model and shouldn't be used to compare to anything. I suppose any labs who want an easy win could use it as a benchmark, but that's all.
is there a vibevoice tts extension yet for sillytavern?
>>106589918disapointing animei'd have thought he would at least have tried to find a solution / cure to it.instead he just accepted it.
>>106588121If compute is the bottleneck, can you use PD disaggregation with a faster GPU?
How is Qwen3-Next-80B-A3B in roleplaying? Is it better than Deepseek v3? I might be another 12 hours before I can download and test whatever bpw that I can handle locally.
>>106589918
>>106589985It is safe :)
Another day, still no goofs
>>106589985>Is it better than Deepseek v3lol
>>106589362Wait.. what?
>>106589918I liked the anime but the pacing was awful. Are you Chinese?
>>106589949I wouldn't exactly call it a failed model. It technically was SOTA for open-weights models when it came out. It wasn't some Llama 4.
>>106590059>Are you Chinese?How did you draw that conclusion?
Imagine when all of these technologies are more advanced and we put all of it together. One day...
>>106589985It's about as good as Deepseek R1 8b
>>106584024nice
>>106584024I look like this
>The overall effect makes her appear almost comically plump, her legs looking like they could support her entire body weight with ease.This is hilarious.
>>106590353It's a doll nigga
>0.33 tok/secbros i don't feel so good
>>106590507That reminds me of the time I ran mistral large Q1 on CPU.
>>106582475>Deep Reason extension for TGWUIWorth it? I was thinking about buying it
https://community.topazlabs.com/t/topaz-studio-transition-questions/95039/9Looks like the topazbros got rugpulled
>>106590172>waifu overlay.webmheh, neatbut strange how it couldn't deal with her handes folded
Reminder to not use quantization and flash attention.
>>106590875This, just pay for a GPT plus subscription.
I feel like the whole "not x, but y," thing is a common and useful trope in natural language. It allows us to present one aspect of a topic, and quickly segue into explaining another aspect.I've been practicing for med school interviews, and it's a super useful way to communicate things.E.g. Substantiating importance of communication skills: "Good communication strategies aren't just useful when actively listening to the patient, asking appropriate questions and generating a comprehensive history. It is also useful when communicating with multi-disciplinary teams, often across different hospitals, and especially when dealing with complex patients who have received care from a number of different institutions."This would be considered a slopped response if it was made by an LLM, but is a fantastic way to describe two important aspects of communication in medicine. I've seen variants of this across so many textbooks, and similar phrasing styles have been recommended to me by a number of different experts.
Q2 is as good as Q4 or Q8.
what if my brain is running a quanted human model and that is why i am retarded
>>106590868It's actually staged, just meant as a visualization of what could be. The webm is from MANY years ago, in a time where ML/CV tracking stuff didn't quite exist other than in research, and where things like Vive Trackers did not exist. He simply just manually positioned and posed the virtual model to match the real (or other way around). It's funny that this webm can be misunderstood in the current year because we do in fact have the technology to truly do the perfectly tracked AR overlay thing, as long as someone gave the effort.We have this webm now but I wanted one that showed an entire real body.
>>106590886There's nothing wrong with that or other slopisms, but you wouldn't normally see humans using it over and over again in the same conversation, or sometimes even in a single paragraph. But this happens pretty often depending on the LLM. Or the LLM is actually tuned to be anti-repetitive and instead the slop repetition happens at the start of every conversation, because they're separate conversations and LLMs do not retain those memories.
I'm going to be honest I don't notice a quality difference in models for over a year now, both local and private models.Either we've stagnated hard. Or I am the bottleneck in figuring out the quality of the responses. But either way I don't notice a difference between the big models and haven't for about a year beyond default writing style which is subjective.
can't believe i use to use lm studio
>>106590745Kek paypigs. Just download a crack.
I made this agent circuit-thing to make a bunch of models daydream. The output is still full of emdashes, but feels less dumb. Does /lmg/ care?
>>106590886The issue is not that the models use this grammatical structure–it's that they try to use it for every other sentence if you let them.
>>106591301That's pretty cool, like watching different parts of a brain light up. Can you post example outputs with and without daydreaming? If you wanted emdashes gone, you could just ban or bias it or forbid them in the system prompt.
>>106591335What would count as a proper head-to-head for this? Running the circuit is like a many-shot reply, whereas just prompting the biggest model in the bunch to daydream about chunked text is maybe a two-shot. That's why I say 'feels' less dumb. I'm willing to do comparisons, though, if you have an idea for one that makes sense or sounds neat. In the meantime, here's an output example.I have copious log spam and the intermediate steps, too, where you can watch it self-correcting and having realizations and shit.
>>106591013It has become especially possible only recently because the retards at Meta finally gave camera access on Oculus to developers. Other than that, tracking was always possible with ARToolKit and special markers on the doll
>>106591335As for brain regions, you're on the nose. This is a neuromorphic pattern based on the Default Mode Network, with the terminology obfuscated so the model does not think it's writing a neuroscience test.
>>106591438Are you telling me that there still isn't a fully open source headset? I'm kinda looking for one that has everything exposed to the developer
>>106582475TFW when no airchan to make win 10/server 2023 console scriptlets/cmdlets with.MODERN COMPUTE STUDIES A DREK :(
>>106591480>doing secretary work is hard!cant wait for these useless dregs to be out of a job thanks to AI
>>106591438Yeah I should've said specifically consumer. Tracking software methods including but not limited fiducial have existed for a long time, but you could really start making tracked dolls a reality with 0 coding knowledge as soon as vive trackers came out and were supported in VRChat.
>>106591468Valve Frame isn't out yet
>>106591301Don't know what you used but looking good! Will you share?
>>106591498Sex dolls with integrated trackers would've been rad
>>106591518You'll need this repo: https://github.com/dibrale/RegionsThe catbox has my stuff that's not in the repo: 7g2qao.zipThe script in the catbox is pretty much based off of the lit crit demo, but verify before arbitrarily executing, etc. etc.
>>106591560Thank you, I'll check
>>106590868Hand tracking is very hard.
>>106590886>not xWhat if no-one would have thought "x" was a even a plausible option ? (From the narrative/past events.)>not x, but yWhat if literally no implications flow from "y" in the following text ?
How suitable is openrouter for data processing tasks like fiction tagging? Will it report me to the FBI and NSA if my requests happen to contain unorthodox texts?
>>106583124You may not like it but cooming and other purely recreational stuff is the optimal use case for local models since you know nobody is reading your garbage and uncensored consumer GPU size models can be more fun (though lower IQ) than gigantic models when finetuned for your specific use case like RP/ERP.For actual beneficial use cases like shitting out useful scripts or whatever just use your favorite giga huge cloud model at all the tokens per second. Gemini 2.5-pro is already way better at coding tasks than anything local, you can use it from command line, it can interact with your file system if you give it perms to a folder and if you log in with jewgle account you get 1000 free requests per day which is good for pretty much anything other than professional amounts of use. The only reason to avoid cloud is if your prompts contain personal info or other info you want to be 100% sure doesn't get stolen like your own, non AI-sloppa code or if you want to do dumb fun stuff like coom.
>>106592024I don't want gigacorps to know I'm bad at coding
what's the point of LLM if i can't cuddle it
>>106592055What' the point of cat if it can't help me write erotica
>>106587021don't worry, even when they come to the west it's never your turn to get gibs first, it will be bought by pros/researchers who are on the cheaper side, then by scalpers, who will then tear you a new asshole
Is exllama still faster than goof?
>>106592024>gemini free blabla this is like the drug dealer giving you a free hitit won't last, running models as good as gemini costs a lot of money, even their most expensive subscriptions doesn't really cover the real cost of LLM businesscompanies like Google in the LLM space are using the Uber strategy: give a product for much cheaper than it should be, until the competition is dead, then jack the prices up like crazyyou may not see a point to local yet for non recreational uses because you don't see what they're going to do to you in the long termI do and that's why I won't develop an addiction
I think qwen 30b higher quantz have less "not x, but y"
>>106592138I've only used Q8 and it's still pretty excessive.
GLM Air is surprisingly coherent, creative and non-repetitive even at Q2, 24k context. How did they do it?
>>106591824Meta solved it on the Quest, somehow
gm sirs
>>106592211I found it to be uncreative and predictable like all sub-deepseek moes
>>106592227What do you use for RP?
>>106592110You are probably right, no such thing as free meal etc, but I don't think they are gonna kill off competition anytime soon so even if they flip the "pay us" switches there's always probably a free or at least cheaper solution to move to.And not like local assistant use is completely pointless or something, just saying right now I feel like free cloud is the best choice for most use cases where you need the LLM actually to be "correct" unlike in recreational use.
not x but y is a lot more pervasive than people seem to notice, but they only notice if it's very close to literally spelling "not just x, but y" like the sloppier modelshere's a less sloppy model still doing it quite a lot in practice:https://eqbench.com/results/creative-writing-v3/o3.html>He had kept the find quiet; obviously not quiet enough.>You will seem a magnate rather than a hostage.>had dreamed of building cranes and pressure domes, not empires.>Because Antares relies on calculus, not superstition.>“Altruism,” she said lightly, “is a luxury for stable epochs. This is not oneetc etcthe fact is, the best, state of the art LLMs are still inherent slop and enjoying LLM writing is like being a fatso American calling McDonald gourmet foodAI models as a whole suck at art, it's people who have no soul who enjoy the art side of itfor me? AI is a tool. Classifiers, summarizers, metadata annotations, genning translation of my program UI strings etc. Looking for soulful content in a machine? Nay.
>>106592247but i just nutted to a non consenting loli, my guttural scream was not only passionate, but an art form in itself. What is art, if not primal urgest being satisfied?
>>106592247It is possible to enjoy something that is flawed
>>106592247Kys
>>106592294no u
>>106592247Imagine reading filthy smut and thinking of mcdonalds. How fucking fat are you?
>>106592231Rocinante1.1/original r1 q2xxs
>>106592328>Rocinante1.1good joke anon
>>106592224PoV hand tracking is easier than unconstrained perspective and that image doesn't show anything impressive. They might also be using special cameras, which helps. LeapMotion does that too. The hard part is unconstrained hand tracking when hands interact, have stapled fingers, holding objects, and so on.
>>106592311>How fucking fat are you?I am not American
>>106592247>it's people who have no soul who enjoy the art side of itI don't care about "art". Image gen makes pretty pictures that make my dick hard. LLMs suck my cock.
>>106592357You are american brained.
>take source code of open source software which is well documented>alternatively let LLM create comments and documentation of everything>delete all code but leave comments in>tell your LLM coder to (re)create the softwarehas someone done this before? I wanna see how far LLMs (especially local LLMs) can go given the optimal conditions. also looking for github repos with which are suited for this task. I'll probably start with OBS which will most likely be too complex. But I can always lower the bar.And I want to stress again the goal is not to create a slopped version of an existing project. It's more about testing just how far prompt, context and environment engineering can take LLMs.
>>106592548ur dumb and ur shits all retarded
>>106592585not dumb, not retarded, but autistic.
>>106592623your comment was not insightful, but memeworthy
>>106592548You'd have to trust their comments and understanding of the code first. Here's an example of the first part.>https://github.com/ggml-org/llama.cpp/pull/15777You read like you've never used these things before.
>>106592247>contrasting two things is slopWhat's next, punctuation is slop?
>>106592548>the goal is not to create a slopped version of an existing projectThe goal is to circumvent copyleft licenses, you're being quite obvious.
>>106592669i had an argument with another anon some time ago about punctuation and capitalization as wellw vntlly grd t stp sng vwlstp prfrmnc
>>106592585only valid arguments to why my idea is retarded will make me feel dumb. so your comments are pointless until you deliver said arguments. and since you decided to reply instead of ignore, you clearly have an incentive. So following up with>nah ur stupidwill make you look stupid.>>106592642I'm aware. My idea was to only use the cloned repo without github issues/comments. There are projects out there that have all code blocks commented. Maybe I should search for vibe coded repos as they often have everything commented. >>106592679please just think for a moment, anon. If that was the goal, I would obviously leave in all the code and tell the LLm to rewrite it in a different way or using a different stack. you really thought you're on to e there, huh?I'm just gonna do it and report back with the results. I found a ton of demo repos with fully commented code.
>>106592717>If that was the goal, I would obviously leave in all the code and tell the LLm to rewrite it in a different way or using a different stack.https://en.wikipedia.org/wiki/Clean-room_design
>>106592717>I'm aware.You aren't.>I'm just gonna do it and report back with the results. I found a ton of demo repos with fully commented code.Nah. It's fine. Keep those to yourself.
Can "Mistral-Nemo-Instruct-2407-GGUF" handle beyond 8K context?
>>106592857Nemo is officially rated for 16K context, I find it mostly coherent up to around 20-24K but it gets noticeably dumber even after 4K.
>>106592857It can handle ~16k without going schizo
>>106592879>Nemo is officially rated for 16K contextIt's actually 128k, but no one who has ever used it agrees with that
>>106592893>actually*technically
>>106592893I must be going crazy, I could have sworn it was much lower than that. Maybe I'm confusing it with one of the older context benchmarks that said 16K was the falling off point.
>>106592920Yeah, it's 16k according to the RULER benchmark, but Mistral claims 128k
>>106593104>>106593104>>106593104
>>106592717ur the kind of room temp iq retard that thinks 'AI CAN AND WILL DO IT BETTER THAN HUMIES!!!' when the AI HAS BEEN TRAINED ON HUMAN INPUTS YOU FUCKING RETARD
>>106593128>And I want to stress again the goal is not to create a slopped version of an existing project.Don't make me defend the retard again.
>>106592055what the point of "self" if I can't cuddle it?but if seriously, just wait for neural interface to be decent and you can cuddle LLMs all you want