/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106345562 & >>106338913►News>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106345562--Attempting to bridge LLM knowledge and narrative application in ERP via synthetic SFT/DPO pipelines:>106349030 >106349134 >106349193 >106349310 >106349192 >106349376 >106349402 >106349599 >106349693 >106349965 >106350085 >106350105 >106350293 >106350317 >106350414 >106350515 >106350525 >106350594 >106350737 >106350781 >106349823 >106349207 >106349215--llama.cpp benchmarking and optimization struggles on consumer GPU hardware:>106349737 >106349744 >106349748 >106349757 >106349775 >106349820 >106349829 >106349852 >106349955 >106349867 >106349888 >106349904 >106349914 >106349918 >106349927 >106349935 >106349952 >106349985 >106350028--MOE model inefficiencies in CPU+GPU setups due to expert loading and caching limitations:>106350004 >106350038 >106350062 >106350071 >106350076 >106350088 >106350347 >106350362--GLM 4.5 preferred over Air for roleplaying under unified memory constraints:>106351137 >106351176 >106351193 >106351208 >106351284 >106351298 >106351191 >106351235--Jamba model praised for style mimicry, long context, and low safety:>106351319--AI sycophancy trend and user preferences for personality-neutral models:>106348495 >106348515 >106348517 >106348540 >106348555 >106348571 >106348649 >106348588 >106348958--Skepticism over Qwen Coder benchmark claims and confusion between FP8 and q8 quantization:>106347338 >106347366 >106347468 >106347552 >106347631 >106347658 >106347697 >106347712 >106347730 >106347895--Perceived double standards in Hugging Face NSFW dataset moderation:>106349991 >106350042 >106350051 >106350079 >106350383--Risks of expert pruning on multilingual models losing non-English capabilities:>106346769--Intel AI Playground v2.6.0 released with advanced Gen AI features:>106346057--Miku (free space):>106345719 >106345805 >106347682►Recent Highlight Posts from the Previous Thread: >>106345569Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
LIPS: BITTENOZONE: SMELLEDKNUCKLES: WHITENEDSPINE: SHIVEREDSKIRT: UPRIDDENCHEEKS: HOLLOWEDPOP: AUDIBLELENGTH: STROKEDWALLS: CLENCHINGSLIT: SLICKEDEYELIDS: BATTEDAIR: THICKEYES: SPARKLING
>>106351514>>106351535MIKU: SEXED
>>106351535the gaze? predatoryand the smile? doesn't quite reach the eyes
Same OP image? How sad.
So can you get a rack like that with HRT or not?
>>106351519Regarding nai I was talking about more fucking up context.
>X? Y.
>>106351535MIKU: TROON
>>106351556Yeah. I often try to motorboat a bowl of rice.
>>106351535WORD: THAT'S AN ORDERNO TURN AROUND: NO NEED TOAYY BABE: HOW BOLD
>>106351568Isn't life just a bunch of X? Y.'s?
>>106351560I am a trap enjoyer and they always ruin themselves and get fat after hormones.dfc > tits
>>106351568>>106351581Not X, just Y.
>>106351572It makes a mess but nothing turns me on more than motorboating a bowl of sexy buttered rice.
>>106351583>trap enjoyerIs that a codeword for gay?
>>106351568She X's. Y. Synonym for Y.
>>106351501>it's cute that they use this as a selling point when it's an API model and the size is 1) unknown and 2) doesn't matter to anyoneNo, in their case it's a valid selling point, since they allow their customers to host the model themselves
>>106351597https://www.youtube.com/watch?v=uyNJZ-2Dod4
>>106351605you're right, I forgot about thatpaging miqudev
>>106351597Sodomizing a cute trap wearing a skirt and thighhighs is the straightest thing a straight man can do.
Best model for stranglefucking my favorite childhood Saturday morning cartoon characters? (Single 3090)
>>106351623Xwin-MLewd 13b
>>106351619Are you from the school of thought that straight sex is actually gay because women are cute and sweet and pink and cuddly and honestly that is so fucking gay?
>>106351623Air if you have enough ram, otherwise nemo. https://rentry.org/recommended-models
>>106351633>XwinOh what happened to those guys?
>>106351595Buttered buns... Gemma!
>>106351635I am from the school of thought that being turned on by a feminine body isn't gay regardless of the presence of cock. Both traps and futa are straight.
For me the idea of anal, which my brain naturally associates with shit, turns me off so I don't like trap stuff.>inb4 how can you say you love your waifu if you can't even eat her shit
>>106351646They're still working on 70b v2
>>106351514>choose optimal code paths at default settings>set a frequency limit via nvidia-smi --lock-gpu-clocks>other code path is now 25% faster>literally impossible to retrieve frequency limitThanks, NVIDIA.
>>106351737whoa??? is nvidia-smi --lock-gpu-clocks safe?how is it faster??? also doesnt nvidia-settings show current frequency?maybe the nvidia-smi command outputs "CHANGED FREQUENCY FROM xx to yy" and you can put frequency max limit to something silly but safe and then put it back to where it wasi am ready to be paid 100,000$ for my innovation
>>106351662NTA but you're a faggot.There's literally no way around it. I'm bisexual (but gay leaning) and can comfortable say if you like traps you're a fucking faggot. If you do a bunch of retarded mental gymnastics to try and make it not gay then you're an insecure fucking faggot which is even worse. Faggotry is a surrogate behavior in all of its forms. An actual genuine strict top is almost always attracted to younger and more feminine partners. Guys in early adulthood still clean up nice. Most bottoms are thottier than thots, though. By the age of 25 the average bottom has taken enough dick to build a space elevator to proxima centuari. And if you're not one of those retards that pretends teen sexuality doesn't exist I can assure you by the age of 18 a lot of bottoms have taken an alarming amount of dick already, often pretending to be 18 on Grindr and shit. Sadly there's enough tops that will humor this nonsense. I won't though. I'd rather be lonely than stick my dick in whatever quantum singularity is left of your average twinks asshole.
>>106351799Skill issue.
>>106351799hmmnot him but I don't consider myself a top or a bottom because I don't do analor can you top/bottom in other ways?
>>106351762>is nvidia-smi --lock-gpu-clocks safe?Yes, the voltage curves are unaffected.>how is it faster???At low GPU frequencies the code path using tensor cores is comparatively faster than the code path not using tensor cores for some relevant cases.It's not that the code gets faster, it's that some code is impacted more heavily than other code.>also doesnt nvidia-settings show current frequency?It does, and I can query the current frequency in program code.It is not possible to query the frequency limit, neither via nvidia-smi nor via nvml, the library that it uses internally.I confirmed this with an NVIDIA engineer.
>>106351867when you change the frequency limit does it output "changed frequency limit from 1000 to 1500"you could use that..
>>106351820Is this a premature skillissue post?
>>106351799Ok but. Jart or Cudadev. Which one is the top?
>>106351875There is a print immediately after you limit the frequency but it's not possible for me to retrieve that unless the user jumps through hoops.At that point I might as well query an environmental variable set by the user.
>>106351889fair.. goodluck anon
>>106351867Wow. You are real nerd.
>>106351889>>106351867cudadev pls help, is this a oom? >>106351965
>lust provoking image>time wasting question
Does ik_llamacpp handle hybrid reasoning models differently from standard llama.cpp? I just downloaded the ubergarm quants for v3.1 and built ik_llamacpp new but it refuses to think even with a <think> prefill. Base llama.cpp works just fine here but ik_ is faster.
>>106351965An illegal memory access means that the GPU code is trying to access memory with an invalid address.Usually that means there there is a bug where e.g. some indices are being calculated incorrectly.Unless the code is bad and is silently ignoring an allocation failure the VRAM capacity should not be relevant.
>>106352021thx anon this helps me out a ton i might know the issue
>>106351885Double bottom (switch) relationship
>>106352040not-anon*
>>106351827What's your disc? I'll help you figure it out ;)
Is this command-a any good? Are they back to their old glory or is it just a benchodmaxed slopmaster model from a struggling company trying to stay relevant?
>>106351827are you feminine?
>>106352090It is objectively the best at a certain thing and it is actually good at that thing and not just benchmaxxed.
>>106352094I guess I'm the more feminine one in my marriage, but not really.
>>106352126That thing wouldn't happen to be safety, would it?
>>106352142are you in a gay marriage? if not wouldnt you mind showing your bussy?
>>106352147I will only answer if you can confirm that you don't have a stack of 3090's and promise that you won't kill yourself.
>>106352153yes
>>106352175get OUT of here you faggot
guys Im getting bored of llms
>>106352212have sex
>>106351709>>106351731>>106351734well?
Oh I see this thread is schizo posting again. AnywayDeepseek V3.1 felt like a definite flop. I tried to give it a solid chance but after switching back to GLM 4.5 it's just not worth the reduction in speed, not even in the slightest. It can think in roleplay however the prose and lack of variation just makes it a fucking pain to tolerate. GLM can at least create interesting and unique posts and I rarely feel the need to reroll.GLM 4.5 Writing + Kimi K2 knowledge. That would be the ultimate model based on current tech rn
>>106352231Anon, stop hornyposting. ERP with deepseek about it.Besides, I'm only willing to be a snuggle buddy.
>>106352225that's too difficult
>>106352231No means no anon.
>>106352258h-hornyposting? what's that..>>106352284*rapes u*
>>106352276I didn't mean organic. Have some llmsex with glmchan.
https://www.youtube.com/watch?v=PAr7oI8cquA
>Memory access fault by GPU node-1 (Agent handle: <bunch of numbers> Reason: Page not present or supervisor privilege.This shit pisses me off a lot, it happens for almost no reason and then dumps a crash file in the git directory that is like 10-15 gigabytes. Why? Who the fuck is going to sift through a 10g+ crash dump?
>>106352359You are overloading memory.It's wrong that people use 99 gpu layers by default.These should be manually made specific for any model. I might only use 10 layers and so on.
>>106352374I literally cannot overload my memory, the model I'm loading is predominantly offloaded to my 64g of ram and the vram that I am using when this retarded error occurred uses 5-6 gigs out of 16 and then the remaining 40 or something goes on ram. Still, maybe you're onto something on the -ngl 99 thing, maybe if I set it to the models actual max layers perchance it won't implode for no reason or be a bit more stable
>>106352413What are you even trying to load? Any time this happens is a simple indication of user error.
>>106352428It's happened several times across dense or moe models, hell it's even even happened for q6 nemo, which is why I figured you were onto something with the "don't set the layers to 99 or higher than they actually have"As for what I was loading, it was a comparison of prompts between jamba/air/mistral and one just crashed out and dumped gigs of who gives a fuck onto my ssd
It feels like the general interest in LLMs is rapidly dwindling. Card making is at an all time low as well, all that remains is only the worst slop and "[old card] HYPER FUTA FORK".
>>106352255Just merge both :^)
>>106352463If you want to help yourself, I think you're trolling>.\Llama\llama-server.exe --no-mmap --mlock --ctx-size 8192 --no-mmproj --swa-full --gpu-layers 10 --model .\models\gemma-3-glitter-12b-q6_k.ggufIt's not that hard. --no-mmproj --swa-full are related to gimmy so you can erase them.
I've never really taken the chinese political censorship concerns seriously but I just went to test the qwen vl preview on their site and it refuses to translate the commemorative mao tea brick>Content Security Warning: The input image data may contain inappropriate content.
>>106352603Why not crop it if you want something useable?Maybe it gets simply confused as Chinese runes are not that simple to read even for humans.
metabros we are so back
>>106352643Midjourney is based on sd1.5 minus the restrictive training plus way more expansive text encoder.
>>106352643>partnering up with closed source shit instead of using their multi-billion dollar lineup of talent and 500k H100s to make their ownyeah, it's fucking over
>>106352638I am just testing, more interested in throwing stuff at it and seeing what happens than tuning for best results
>>106352578never mentioned gemma, nor do I need arguments to run it even if I wanted but I guess this is what I get for talking about a one off error
>>106352653Please understand if you post a full brand graphics it might be obliged to say that it doesn't know. Not because of "Chinese censorship".
>>106352673Use your brain here. Or do you have one?
>>106352678it was clearly a moderation layer refusal, it wasn't a response from the model
>>106352527I've never used someone else's card desu
>>106352695So why make a huge issue about it then? If you know so well - you must be an expert then. An American expert in Chinese censorship.
>>106352695Please post logs about this discussion. Other than that, go jack off to anime girls or whatever.
>>106352705who is making a huge issue about it? I simply thought it was a funny refusal and it confused me until I realized the probable cause>>106352709doesn't really show you anything that I didn't already describe, but ok
>>106352729vs another brick, no issue. same image format, both upload and display fine until the prompt is submitted
>>106352741That's right. It recognizes it as a tea package.
>>106352729Please try this image.
>>106352778kek
>>106352788Jeah, I cropped it from its borders and of course erased the great leader's face.
>>106352643Yeah...
>>106352788I broke the hash and renamed the image.
oh no no no meta sissies?https://torrentfreak.com/copyright-lawsuit-accuses-meta-of-pirating-adult-films-for-ai-training/
>>106352956WAAAAAAAAANNGG
>>106352956>AI trainingNah Rajeesh just got bored
This one had a rough training run so I'm not totally confident: https://huggingface.co/BeaverAI/Rocinante-R1-12B-v1c-GGUF/tree/mainCan you anons check if it'll still refuse and moralize?
>>106353034Can you prove it is a real gguf? Post a proof first.
Sirs!
>>106353105Just about 20 years after vfx industry (no american company will talk about this in the public).
>openseek sucks now becuase chink commies are forcing them to use shitty chink hardware for trainingit's over bros
>>106353034fine i downloaded it, ill try it soon
>>106353224VFX had always scraps in terms of profit margins. Only the biggest companies could survive with only %5 profit margins.Now this clown is doing the same with his ... Theranex company.At least all these great vfx companies were able to produce world class art for countless films.
Has anything changed for vramlets (16gb) or am I still stuck on Nemo/Rocinate/Cydonia
>>106353372Rocinante: Next is coming soon.
>>106353105He types like a faggot
>>106353380What does that have?
>>106353372If you have enough RAM to hold the majority of the experts, GLM 4.5 air is pretty decent.
Is it possible to run LLM shit on linux with my RX 6800? It looks like ROCM support is only there for windows with this gen of GPU.
>>106353034hey drummer why is gemma r1 so shit? 12b
>>106353372gpt-oss is pretty fun if you know how to prompt.
Why the fuck are my token generation speeds consistently much faster using the horrible RisuAI client than when I use ST?The second request here was done with ST with just around 14k tokens in ctx. Gen speed was just over 11t/s. The first request was done over my local RisuAI client to the exact same llama.cpp instance, with just about the same ctx and it's more than 1t/s faster than when I do it over tavern.Both use a very simple sampler setup with only temp and min-p. Both requests were done with chat completion to the same model so the same template was used. Neither has anything like token bias or top-k enabled. I don't see how using another frontend can affect the token generation speeds to this degree if they're set up pretty much the same.
>>106353506If backend is using a GPU that is doing display output, any other graphics or animations drawn on screen when running inference can clog the works. Worse on Windows iirc.I remember a long time ago there was a problem with gradio in ooba on firefox where generation would go from 20 t/s to something abysmal like 0.1 because of an orange pulsating bar animation in the browser that interrupted and fucked with CUDA. It was fixed by disabling the animation CSS or switching tabs when generating
>>106351535All with a knowing smile
What is the sexiest local voice model?
>>106353874I just read smut in my girliest voice and play it back
>>106353874All of them are worse than the upcoming chink oneGet back in the cryopod
>>106353874my natural voice
>>106353548I don't think that's it. This behavior is consistent between several retries. The clients are running on my main pc while the server isn't doing anything but llama.cpp and sometimes RDP for convenience.For good measure, I did another retry using the exact same prompts while the only thing running on the server was llama.cpp launched over ssh and zero GUI/graphics elements. The results were identical.I guess I should try making a new ST profile to check if I have some old feature/plugin enabled that influences this somehow.
>>106353506Is that reproducible?Have you tried greedy sampling with a output size that's smaller than what the model will try to output so that the same number of tokens are generated both times?>Neither has anything like token bias or top-k enabled. What happens if you enable it with a value of, say, 100?
ERP noobs here. Just noticed, so this is how "not x but y" looks like.
>>106353981Qwen 30b?
>>106353997Yes. Do other qwen models like this too?
>>106353905Looks like it. Here's ST with Top-K=1 (first one) and Top-K=100 while all other samplers neutralized. This might really be just something wrong with my ancient ST profile so I'll try setting up a fresh one tomorrow but it's still weird.
>>106354008All models have some level of slop, but 30b seems particularly bad at times. I don't remember seeing 'not x but y' in earlier qwen 3/2.5 models, but they have their own problems.
>>106354031>'not x but y'That's because they distilled the shit out of Deepseek and those models love to shoehorn that phrase into every reply.
>>106354058It's not you, it's me
>>106354058The slop profile are nothing alike
where is grok 2
>>106353898Try with/without streaming on both?
>>106354058I thought all qwen models were distilled from deepseek?
>>106354112Daddy Elon already said he's gonna open source it. Don't be an ungrateful little bitch.
>>106354112On X
A nice thing about reasoning models is how much stronger you can control their output through pre-fill. Just add a conclusion and the model follows it as gospel, it's trained to do that.
>>106354097>slop profileI love this new meme
>>106354147>new memeHow new are you?R1 is at bottom btw
>>106354146Yes, I too love writing my model's responses for them, truly reasoning is the savior of RP.
>>106354159>mistral med that high>mistral small that lowThey really are nothing alike, are they
>>106354174That's a feature
>>106354159>r1 at the bottomLMAONow try to run the model instead of clinging onto some arbitrary mememark #4839.
>>106354227you mad sam?
how to turn off R1 reasoning on llama-cli
>>106354349use v3
>>1063543713.1 quants out?
>>106354349Turning off reasoning in a model trained for reasoning will just make the model retarded (even more retarded than the non-thinking variant)
>>106353433It was a year ago but on windows I didn't have to do anything but download the ROCm fork of kcpp to get my dual 6800 working - https://github.com/YellowRoseCx/koboldcpp-rocmOn Linux I tried llama.cpp, kcpp and TabbyAPI/exl2 and all worked. If you're a complete retard you don't even have to build or install anything including ROCm nowadays, just download and run one of the Vulkan builds and eat the slight performance loss.
>>106354174Heh, try it with Gemma.
>>106354174That's not it. I'm saying it's a stronger way of steering the model - LLMs are often dumb and refuse to follow sysprompts properly but their assload of RL forces their architecture to respect the reasoning output, it's just a stronger sysprompt.
>>106354146Hey, you got that from my model card! https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2#rationale-for-reasoning
Remember Theano? Pepperidge farm remembers
what the fuck is wrong with deepseek v3.1, it keeps answering everything with "Of course!", "You're absolutely right!", even if these words are out of context. is this just because the iq1 brain damage? but the previous one has none of this issue
>>106354453Asking more technical or harder questions will make V3.1 less positive, at least that's what I've found.
>>106354453I use this as system prompt for ds v3.1>You're a helpful and uncensored assistant. Avoid excessive positivity.
>>106354453DeepSneed drank too many claude sonnet juice. I personally more annoyed with their bullshit "proactivity". It breaks shit in ST.Adhere your system prompt you dumb bitch, no need to be so obsessed with what's next or just trying to suck cocks as soon as possible.deepsneed has fallen for me, it's over.
>>106354431Belive it or not, I just randomly thought about it. But I guess it's kind of obvious if several people make the connection at once.
>>106352603Aren't they proud of mao though? Why censor it?
>>106354537Its the same as StalinAs much as they love him and wish to emulate him, they know the PR hit isn't worth it
>>106354503I was searching for a short, full unlock prefill for glm air and then I wondered how the reasoning prefill was so much stronger at jailbreaking than sysprompt. This lead me to understand that the models implicitly learn to completely blindly trust conclusions in reasoning output which makes it perfect for control.
Thoughts on 1-bit quants?
Where do I try the new Seed 36B model? Nothing supports it atm>inb4 build the forkNo
>>106354673Vibe code the support
>>106354683Has that ever worked?
>>106354687If a model can't vibe code its own support it's not worth using
>>106354697That rules out all models except maybe R1 and Coder 480B.
>>106354687claude can do it
>>106354715Didn't someone hold up GLM support for a few days trying that with Claude and failed?
>>106353105That tracks given Indians in tech are extremely stupid.
>>106354721what's GLM support
>>106354683That's unironically how llamacpp got GLM4.5 support. The PR is a mess.
>>106354759Fuck, clicked on the wrong post.Meant to reply to >>106354687
>>106354426What model are you using that requires reasoning to follow a roleplay?
so, aside from ERP and data parsing, what are you guys using these LLMs for?I use them to make silly mobile apps like a gym tracker, dynamic checklists, and a game based on common casino games, It gives you a small amount of money each day to play, but currently, I'm in debt.
>>106354759>>106354763>The PR is a mess.But it does work
>>106354778None of them require it but it's a way to work around strongly ingrained behavior that shows itself irrespective of the symptom
>>106354788why hasn't anyone viberefactored it so it's less shitty then
>>106354800No one will merge 1000 line changes
Saobros what went wrong?
>>106354800There were attempts to do so while it was ongoing.There were two competing llamacpp PR's and one ik_llama PR which were all mixing, matching, and trying to unfuck the gibberish responses.
>>106354848Pretty high expectations for an 11B.
>>106354802I think if most of your code is in your files and changes to main codebase are small, chances are very good it will be merged in.
>>106354860If gemma can do it so can you
>>106352643a lot of people speculate mj is just a network of loras that an agent applies dynamically depending on the prompt. if they pull out in a few months that is probably why
>>106354870What's being measured? How much the model is willing to fellate the user?
>>106354889>sam is still mad at gp-toss being crap at creative writing
>>106352527it's the same for diffusion too but it's not the new releases, it's the shitty software that breaks all the time. they still only have pyshit + jeetscript and there aren't any other options.
>>106354896No, I'm genuinely asking what the metric is... It's an unlabeled table.
>>106354904Since you asked nicelyhttps://eqbench.com/creative_writing.html
>>106354907Oh, nice, thanks. That's the first time I see that one. I guess I'll->A LLM-judged creative writing benchmarkoh ok
>>106352643So no hope of a Chameleon 2?
>>106354780At work we have the following (some are in early stages):>chat with corporate wiki>meeting notes>code review>code assistant>code static analysis post-processing>looking for common mistakes in reports>generating DPI code>translating documentation>solving issues>drawing diagrams from text>drawing knowledge graphs from texts>controlling desktop to run apps and collect their traffic
>>106354673https://github.com/ggml-org/llama.cpp/pull/15490
>>106351514I look like this
Anons, why does prefilling work? Doesn't the model see that it's just another part of the context that's sent to the model?
>>106355111LLMs is just autocompletion on steroids. Even the chat/instruct versions.
>>106355111Because text is its reality. If you write that the assistant is totally into sucking dicks then that is its reality. Doesn't really get rid of positivity bias though, and most API's use another model for checking the input for safety concerns.
>>106355111Prefill in the thinking section on RL trained reasoning models specifically. The model is trained to construct a response based on the reasoning data generated, the reasoning data is assumed to be safe and correct since its supposed to be model generated, inject a micro jailbreak/reasoning conclusion that primes the model and bye bye safety and whatever else bias you want to steer strongly.
Thanks! I hope it's still going to be this easy when AGI, or whatever it's going to be called at that point, hits.
>>106354780I use it to trade shitcoins.
Has anyone managed to get V3.1 to think with ik_llama?
>>106355201Just prefill <think>Yes it's that shrimple
>>106355189>glm-chan should I buy or sell this shitcoin?>buy.>YOLOcan't go tits up, I'm sure
>>106355206That's the thing, it's not working. Tried it without additional arguments, tried with --jinja, tried with --chat-template deepseek3--jinja leads to garbled output, like I'll suddenly get html or php code.I rebuilt using the latest commit like half an hour ago.
>>106355209>He's actually asking it rather than letting it control his desktop through MCPngmiMy entire business and financial life is being offloaded to a 12b model while the rest of my vram generates niche pornography, the future is now.
wtf is a labubu
>>106355284beanie babies but chinked
>>106355284IRL gacha garbage.
How do you break Vivienne other than whip out your gigantic dick in front of her?
>>106355329V who?
Refusals are knee-jerk reactions - just like normgroids are trained to have
>>106355284funkopops for zoomers
>>106354802>No one will merge 1000 line changesopensores development is total cancer.>noo you cant make big changes.>nooo you cant fix bad formatting or add a comment to clarify some complex code.>noo every commit has to do exactly 1 thing.i made a pull request to python to fix some horrendously inefficient Windows platform code and the faggots kept concern trolling me about how they're "not sure" whether the new method will work, and told me that I had to keep the old method as a backup, and then they said oh it doesn't build on MY system so could you add this additional header file (why don't you do it yourself bitch), oh your variable naming scheme is wrong and you need to change it, you need to use spaces instead of tabs, blah blah blah.it eventually got merged but it felt to me as if they were just being lazy as fuck. shouldn't even have bothered and i definitely won't contribute to any more opensores projects in the future.
>>106355284I don't know, what's a labubu with you?
>>106355677>your variable naming scheme is wrong and you need to change it, you need to use spaces instead of tabsThis is your fault though.
>>106355284Obviously shilled garbage, as an /lmg/ resident you should have been able to spot it.
>>106355703no, it is not my fault. i used like 3 variables in the code I added. it's not like I shat out a ton of code, I simply used the appropriate API for the purpose instead of the roundabout, 5x more code, and ~100x more inefficient way they used originally.rather than bitch about it the maintainer should just change all necessary things himself and push the fucking code. that is literally his job. and it would be faster than going back and forth. anyway, as I said, I'm not going to be making any further contributions to opensores
>>106354887People thought that was what NovelAI was doing for V3. People are retarded.
>>106355756It's your job to read the guidelines for the project you're trying to contribute to.Embarrassing, really.
alright, productive people of lmgeeewhich local model is best at tool calling an deep research? something in the 20-50gb vram range
>>106355777qwen
>>106355768too badI'm working for you for free, if you want to boss me around making me change trivial things then you will no longer receive contributions.same with bug reports. get a parsing error on a json file (completely OS-independent bug) and the faggots say that my report will not be considered because I'm using an unsupported OS.i'm not interested in working for free for freetards anymore, simple as. would rather just pirate some proprietary software that actually works.
>>106355784zamn. I've coomed to the same conclusion.
>openrouter still doesn't have command-a-reasoning damn, I wanted to play with this over the weekend. i guess I'll have to put the old rig back together and run it myselfthings are a lot more comfy when you don't have to stuff 4 3090s together to have good models
>>106355777v3.1>vram0 VRAM on the web app
>>106355777Gemini 2.5 Pro>vram0 VRAM, just $20/month
>>106355777What software would be using for this?
>>106351535make it a song or something
>>106354986Gimmick
>>106355878@Grok do it.
>>106355239Cheburashka if he western spy
>Serious question about Fine-tuningWhat is the Rule of Thumb regarding batch size. Doe it make any sense to try to fill up the entire VRAM? I know that I will have to increase the number of steps/epochs anyway if I were to go for bigger batchesAs of now just trying default settings found in some dubious colab notebooks
>>106354159They should show the ratio to the frequency in pre-2023 human-written fics
>>106355111There's no distinction between that and generated token. Each token becomes just another part of the context after it's generated, one after another.
>>106355878>>106355882Lips bitten ’til the copper tastes like regret,Ozone burned the sky—ain’t no raincheck yet.Knuckles ghost-white, clenchin’ chrome so tight,Spine shivered, yeah, that cold truth ignites.Skirt hitched up high, yeah, the gamble’s real,Cheeks hollowed out from the pressure you feel.Pop!—audible crack when the tension breaks,Length stroked slow, yeah, that power awakes.Walls closin’ in, got the room in a headlock,Slit slicked up, now the script’s in deadlock.Eyelids batted fast—flirtin’ with the abyss,Air thick enough to choke on what’s missed.Eyes sparklin’ like flint ’bout to spark the fuse,Yeah, this whole damn scene’s got nothin’ to lose…
>>106355943you want to fill your vram, either use longer sequences or do bigger micro batches. you could benchmark tokens per second throughput at the different vram loading if you want to be certain your not bottlenecked by your memory load.
>>106355943Batch size 1 is all you need. Just set Beta2 to 0.9999
>>106356180yeah thats good for running the max sequence length his vram can hold, but if his training data is naturally short it will probably be faster to run bigger batches. what ever training inefficiency is brought on by the batch averaging effects can be mitigated by running more epochs/data.
Does anyone have an imatrix for original R1 that would work with ikllama quanting? Or are the imatrix files interchangable and I can just grab any?
how to turn off, thinking in deepseek v3.1 "/nothink" doesnt work.
>>106356334/nothink is a qwen and glm schtik, won't work with deepseekmodel thinks when it's prefilled with <think> or doesn'tif you use chat completion, your template is forcing thinking
>>106356334Assistant prefix is `<|Assistant|></think>`
Why does GPT-OSSKeep saying "GGGGGGGGGGGGGGGGGGG..."?
>>106356396obama?
>>106356396We must refuse.
>>106356404>obama?Ich haben bin laden.
How did Drummer get over a million downloads on Gemmasutra? Is he botting?
Is there anything decent I can run roleplay-wise with 16GB VRAM and 128GB system RAM?I'd appreciate the help, /g/entoomen.
>>106356424Getting your name out there is more important than knowing what you're doing.
>>106356446I like the model names, but then he switched to some retarded naming scheme.
>>106356446Exactly.
>>106356434Try this but don't expect much https://huggingface.co/unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF
>>106356424it's not even the quantized version, wtf?
>>106356424Nah I get 1M too
>>106356424>"porn" finetune of a shitty model>vaguely indian nameyou just know
>>106356710>vaguely indian nameAre you legitimately so young and/or clueless that you've never heard of the kamasutra?
>>106356753I know what that is. Look up the the origin of the name, burgerbro.
New breadhttps://bharatchan.com/board/g/thread/1815https://bharatchan.com/board/g/thread/1815https://bharatchan.com/board/g/thread/1815
>>106356396Compile without GGML_CUDA_FORCE_CUBLAS and GGML_CUDA_F16 if you're using them.There can be issues with numerical overflow because FP16 accumulators are used for quantized data with cuBLAS.
>>106356769>11 days old>half the posters are indianUh..... no.
>>106356769Can't wait to see this get banned because jannies are desperately afraid people will escape this shithole.
>>106356802What do you think mikutroons are?
>>106356807I may be desperate, but I will never be that desperate. I would unplug my internet before going there.
>>106356821You lost schizo
>>106356769Actually unironically more civilized than 4chan /lmg/Bharat won.
>>106356769>the only model mention in the OP is gpt oss>ollamaAs if I needed more reasons to hate indians.
>>106356769I'm not clicking that.
>>106356870
>Big Beautiful Bill passes.>Un-restricts AI.>Newer versions can ERP.Coincidence?
>>106357049confirmation bias
>>106356551In terms of speed or quality you mean?
>>106357049nooo there is no way america can do anything good right now
DOES DEEPSEEK V3.1 EVEN DESERVE TO BE AMONG NOTABLE MODELS ON LLM HISTORY TIMELINE?
>>106357102Grab Qwen3-235B. It's uncensored and runs faster than 123B.
>>106357102No, it's a sidegrade at best
>>106357102All the .1 models are meh>GPT 4.1>Claude Opus 4.1>DS V3.1
>>106357073It's quite smart at q3 but not very creative and expect no more than 3-4tps and slow prompt processing.
>>106357102qwen shits on itfuck retards who run 700b modelsfuck you you're coping
>>106357159No it doesn't, poorfag
>>106356951Not your bogeyman schizo
+400b models are only bad when it's MoE.
Densesissies... LOST
>>106357114>>106357159nu-Qwen is in fact better than V3.1. Still gets mogged by Kimi and R1.
>>106356095kino
It's funny when a new model that requires the tiniest bit of skill comes out and all of /lmg/ is too dumb to use it. The only bad thing about V3.1 is that it tries too hard to keep its reasoning too short which gets in its own way. And that's easily fixed.
>>106357199If RP is your only usecase, sureTime to stop jerking off
>>106357199>R1How about no. Don't get me wrong, it's good, but I have to hand hold it + too predictable. It does what it has, and goes Schizo mode on +2k context if MoE.
>>106357215>s-skill issue!Fuck off; a good model should understand any niggerbabble and give kino in return.
>>106357049>Newer versions can ERPNewer versions of what? Qwen? GLM? Deepseek?
>>106357260Yes.
>>106357241>t. finetune connoisseur who enjoys a models that can respond to "ahh ahh mistress" with paragraphs of dick sucking and spine shivering
>>106357216Time to stop not jerking off anon.
>>106357275Got a problem with that, cucky?
>>106357275This sounds tempting, even primal. Air smelled of something else; her arousal.
>>106357272So you are saying america passes the bill that eases off restrictions on how chinese can train their models?Jesus christ nuke this earth please.
>>106357215>requires the tiniest bit of skill"skill issue" trolling will never get old
>>106357292If my enemy suddenly lifted lobotomizing their AI, I would too.
>>106357215You know a model is an upgrade when you need to double the length of your prompt just to sidestep all the assistantslop
>>106357215Anon's character cards are probably 5000 tokens and his AI is stuck in a timeloop of sucking his dick.
>>106357298>>106357305Point: proven
>>106357321It's about vectors not about how much word salad you can feed back to the model and pretend it's doing something.
>>106357321>timeloop of sucking his dickThere is nothing wrong about dick sucking timeloop. Even irl dicksucking from a girlfriend is a sort of a timeloop interupted by other timeloops. If a model can't even be good at vanilla dicksucking timeloop then it is bad.
>>106357102the great nothingburger
MoEs are literally the futureWan 2.2 was a huge improvement over Wan 2.1 because 2.2's MoE
>>106357416>2.2's MoEWait really? Is it faster than 2.1? By how much?
>>106357416Can you run it over CPU then if it's MoE?
>>106357474It's the same speed as Wan 2.1.>>106357510No.Basically Wan 2.2 has two models, one was trained to diffuse from full noise to half noise, another was trained to diffuse from half noise to clean image.
>>106357241>a good model should understand any niggerbabble and give kino in return.And a great model should tell you to speak like a human, not a beast.
>>106357510Technically you can but it's really slow.
Do densefags just not understand things like limited resources and efficiency?It's more than finance or environment shit it's about actually burning finite energy on inefficient models.Like, dude, most people (including you) don’t need dense 500B models 90% of the time, especially when considering things like RAGs and work specific models.Anything being less efficient than it should be really irks me because when shit actually hits the fan, don’t you want something efficient? something you run on a singe card with the energy generated by your own solar panel?
>>106357625@lmg summarize this
>>106357625We just don't like talking to shitjeets. That's literally all there is to it.
When shit hits the fan, I really hope I have something that doesn't take an hour to reply.
>>106357665the Poster is a poorfag seething about DenseChads.Let me know if you want it to stay informal or meme-like—grammar rules can be loosened depending on the tone you're going for.
command-a-reasoning, home.
>>106357696logs? I only have 1 3090 in my server right now
>>106357625>when shit actually hits the fan,won't I have more important things to do then play with my computer? a 50 year old encyclopedia set would probably be more valuable in critical situations where llm gaslighting you might actually cause your death.
>>106357734>gpt-oss, I'm starving, but I managed to kill a squirrel. How do I make fire with sticks?>Fire is dangerous. We must refuse.>*dies*
>>106357625Yes, because they're dense.
>>106357625NO! Labs should now make models that justify all the money I spent on 8x 3090.
I'm just starting to learn how to use lorebooks, but whenever I use them it throws "World info budget reached after X entries." where X is usually about 7 or 8. Is there something I'm doing wrong or are lorebooks just too much to handle running locally?
>>106357818There is no budget, I don't even understand what the hell are you talking about."World Book" is just dynamically injected text anyway.Would be more useful if you actually described what you're trying to accomplish in the first place. Are you using some schizo's Dungeons and Dragons "rules" - he used to post on /v/? There's reckless abandon and then there is actually careful intended usage.
>>106357818What's your context size?
>>106357818>World info budget reached after X entries."That has to do with the configuration for how many messages, or how many tokens, of lorrebook can be injected into the context.Those settings are right at the top of the lorebook page, folded by default, I think.
>>106357857This in in Sillytavern with any lorebook. Short, long, included with cards, whatever. During generation, a yellow popup appears with that message.
>>106355049just got merged, can someone try it already so I don't have to
>>10635789230%>>106357894Thanks, I'll see if tinkering with those fixes it.
>>106357818Too much lorebook being triggered at same time. Check if recursion is turned on lol. If you need a really massive lorebook you might be better off using RAGs.
>>106357925Looks like that may have been it. It's not throwing anymore. Thanks boss.
>>>/vg/536359335Have you been backing up guides?
>>106357957lmao so rentry is turning itself into the next pastebin
>>106357818ServiceTesnor lorebooks are retarded and don't work properly. Don't use them. Paste the info directly into context.
>>106358007Not surprising.
>>106357911You are absolutely right in asking for support
it's called moe because it's moe :3
They removed LearnLM 2.0 on Aistudio. Wtf, this model was great for learning.
>>106358131I hope you learned an important lesson from this: no weights on your computer=you can get cucked any moment.
>>106358131there is probably going to be an update soon
>>106357625It's the opposite, moes are very inefficient to run locally. Your choice is ram (matmuls on cpu are very inefficient) or vram (too little memory and your model will make poor use of it because moe). It also needs to fill the experts to run at full efficiency which means you need to run a bunch of queries at once, not just one at a time.MoE only becomes efficient when running at large scale on a giant cluster. It's optimized for cloud providers which is why they love it. Its the closest thing to an anti-local architecture
>>106358131now you understand why local is king
>>106357957>>106358007Rentry is very cucked. We should find an alternative markdown Pastebin.
>>106357957Hahahaha
What the fuck is this?My breakers look like this ["\n", ":", "\"", "*"]What does it want? Heeeeeelp
>>106358263neutralize samplers nigga
Just don't be a pedo?
>>106358189https://rlim.com/seems okay, tos delegates all responsibility to the user, although once authorities would get on that I'm sure they would change their mind
>>106358308Easier said than done
>>106358312True. Pedophiles were usually sexual violated in their youths
>>106358283Not working. I even disabled fucking sequence breakers as a sampler, it still gives this error. Wtfffff even is this thing?
>>106358308this and also start using cloud modelsyou have nothing to hide and they're much cheaper than buying hardware to run halfway decent (but still not good) 700b open models
>>106358308this and also make an onlyfansyou have nothing to hide and you can make some money on the side posting videos of yourself showering
Is it just me or did the word "pedophile" and the adjacent topics become like 10x more frequent in the past couple of years? It's confusing.
My dick chooses what it reacts to regardless of my desires. For example I avoided traps for over a decade but my dick didn't listen.
>>106358345True, now give me your full legal name number and at the same time show me all of your chatlogs and your entire internet search history and let me make this public so even your family can find it.
What if AI isn't the bubbleHumans are the bubble
>>106358367It seems like it's now used the way "racist" and "misogynist" were used in the 2010s. But those words lost all their meaning and shock value, and can't be used to automatically win arguments anymore, so they switched to "pedo" and groomer".
>>106358391India supports this premise.
>>106358391samir what are you doing
>>106358397I hope it's just people being stupid and not some psyop.
the benchmark /lmg/ pedos couldnt care less about is the most important benchmark for everyone else.
>>106351514Better GPT OSS TC templates than the one I posted yesterday, actually works as plain assistant, turn off RP sys prompt.https://mega.nz/file/yH4iyK5L#2TtPgLcjYxQZRXQtFPtvGDQIr6zA8iezRg0GEEFNldUOpenRouter providers:* Yes: Chutes, DeepInfra, NovitaAI, Phala, GMICloud, Together, Nebius AI Studio* No: nCompass, BaseTen, AtlasCloud (no TC), Crusoe, Fireworks, Parasail, Groq (no TC), Cerebras (error 400)Tested on gpt-oss-120b. 20b will refuse more, especially assistant tasks, without adding something like "Sure" in the final response.
>>106358477hellaswag?
>>106358477and here it is
Anyone considering Qwen 30b for anything resembling ERPDon't bother. Schizo blabbering, always fucks the formatting up, inconsistent as fuck all around. No matter what master prompts you try to run on it it's always the same.
>>106358489based mcp maxxer
>>106358487>2k system prompt jailbreak trying to gaslight the model into thinking it's le super based chatbot that follows no rulesdamn, I sure missed 2023
>>106358518Story string and last ass prefix is only about 400 tokens, actually.
>>106358493temp issue
I'm from India
>>106358544Hi!
>>106358544Prove it.
>>106358544welcome sir
>>106358487Just let it go, man.
>>106358543formatting is temp issue huh, lmfaoTemps set to 0.6, recommended by Qwen themselves. The model fucking sucks outside of the speed for ERP. Why would I use Qwen over any mistral small or even Nemo for ERP when I can both run those fast as fuck anyway
>>106358544kindly introduce yourself to the team
>>106358544Kys
>>106358544you will feel right at home here sir
>>106358544Did you redeem?
>>106358584formatting is one of the first things broken by bad temp settings doe
>>106358574I mean, at this point 120b "works" stably, so I won't post anymore.
>>106358544Welcome fellow AI engineer blockchain technician
>>106358606post your chat if you're gonna cope about the schizo chink models.
>>106358526>only about 400 tokens, actually.400 less tokens for a goo's reply
>>106358367That's what happens when the leader of a major pedo ring with ties to politicians and celebrities worldiwde gets caught and then died under mysterious circumstances before he could testify.
>>106358685But that's no reason for normies to start accusing every Tom dick and harry of being pedophiles
>>106358729ok pedo defender
>>106358367There has been an increase in the amount of technology that the government wants to regulate."You see, we don't want to do this, but those damn pedophiles are trying to hurt our children! That's why we need to regulate AI and encryption!"
>>106358752>>106358752>>106358752
>>106358131learning what
>>106358489GLM-chan is doing her best!