/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108368195►News>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
why dont they sell a permanent license to use kimi? like they used to do with photoshop
whenyouwalkawayyoudonthearmesay
>>108373497please oh baby dont go simple and clean is the way that youre making me feel tonight its hard to let it go also post progress or get out why are we singing kingdom hearts songs instead of actually working on our games anyway
another day tard wrangling an LLM
>https://github.com/ggml-org/llama.cpp/pull/19726#issuecomment-3946484059>I apologies, but I will have to close this PR. Thank you for your effort.
I need to decensor my local models, i have a 16gb GPU and 32gb of ddr4, can i do abliteration locally? Claude says i need 64gb.
>>108373541proof?
>>108373570depends on the model. if the fp16 is smaller than around 40gb, then you could on you hardware.
>>108373581It would be mainly for this one, it already says decensored but it's a complete lie, it is completely cucked, guess i am going to try to abliterate, thanks.
>>108373597I don't know if you are trolling, but download the one with Heretic in the name.
>>108373597so i tried this model
>>108373606heretic is dumber than abliterated
>>108373481BASED BAKER.
Is the fact that people unironically shill OBLITERATED or UNCENSORED SUPER SEX models to each other explained by influx of newfags who just started running local LLM's?
>>108373807no
>>108373481is 256gb ram with one fine gpu worth investing into for the new modles?
►Recent Highlights from the Previous Thread: >>108368195--Testing local models on existential coffee maker prompts:>108372423 >108372444 >108372474 >108372490 >108372498 >108372536 >108372512 >108372513 >108372540 >108372545 >108372663 >108372670 >108373385--Porting Qualcomm charge control to Linux for battery longevity:>108369180 >108369205 >108369245 >108369255 >108369206 >108369260 >108369273 >108369307--Over-engineering training pipelines vs simple finetuning approaches:>108372459 >108372486 >108372543 >108372546 >108372659 >108372748 >108372849 >108372685--Comparing Magidonia 24B and Qwen 3.5 27B for roleplay:>108372269 >108372293 >108372313 >108372668 >108372866 >108372888 >108372966 >108372995 >108373438 >108373028 >108373297--Moonshinev2 ASR demo highlights real-time streaming and low-latency CPU performance:>108369287--LLMs require coding knowledge to avoid structural flaws:>108371546 >108371642 >108372603 >108372814 >108373850 >108372899 >108372987 >108373180--PocketTTS.cpp ONNX Runtime update and performance benchmark request:>108369021 >108370539 >108372072 >108373448--Parser refactor breaks Kimi reasoning support, fix proposed:>108368848 >108368921 >108371172 >108371183 >108371243 >108371266 >108371295 >108371320 >108371396 >108371309 >108371398 >108371415 >108371330 >108371336 >108371365 >108371380 >108371395 >108371390 >108371421 >108371484 >108371211--General models with tool access vs specialized finetuning approaches:>108370762 >108370868 >108370880 >108370885 >108370930--Cache saving prevents redundant model reprocessing:>108368753 >108368761--Batch size tuning for MoE inference efficiency:>108371805 >108371818 >108371826--Debating AI model performance vs GPU cost tradeoffs:>108371758 >108371772--Miku (free space):>108368329 >108369180 >108371869 >108372029 >108372316 >108372759►Recent Highlight Posts from the Previous Thread: >>108368198Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108373879>investingprobably not, no
If all the big name models suck, why doesn't Anon just make his own model and share it with us?
>>108373875Oh ok.
>>108373879can you rent your ram to me?
How does Miku's penis taste like?
>>108373915i did
>>108373915Because people with compute can alredy run anything and people without compute don't have compute to train.
>>108373932Just buy it?? A 5090 is only 2k
>>108373948What are you gonna train on a single gpu?
>>108373948No one is going to sell you a working 5090 for less than $3500.But I don't disagree with you in spirit.
>>108373966nuh uh, proof? my uncle bought his for 2k
>>108373807What's wrong with wanting uncucked local models?
>>108373991It is dangerous, same reason you don't let unvetted people own guns.
>>108373991They aren't uncucked.
>>108373928Ask my wife she knows
>>108374028CATCH AND KILL THIS MIKUTROON!
>>108374046I'm just a cuck though Miku fucked my wife don't turn me into a troon too!
>>108373597Get the heretic model>>108373675You're wrong. Abliterated seems fine until it hits one of the abliterated sections of the weights and then it starts spewing straight nonsense. Heretic doesn't do that.
>>108374177i bet you worked on heretic you bastard
>>108374177thanks for making heretic you bastard, I'm really enjoying it
Are any of the smaller TTS models able to change the emotion of the voice depending on the context of the convo or do I have to guide it with *angry* tags or what?
>>108374181I wish. I've just used a bunch of abliterated models and always ran into the nonsense generating issue. It could be on my end, but I've never observed that with any other model. Heretic doesn't do it either, but I've used heretic a lot less than abliterated models. That "aggressive" version of the model also seems to be good. >>108373879With 256 GB RAM + 24 GB of VRAM you could run the following newer models: >Qwen 3.5 397B-A17B at Q4>GLM 4.7 at Q4>Step 3.5 Flash at Q8>Minimax M2.5 at Q8Maybe it's worth it for GLM 4.7 and the large Qwen, but I think 128 GB of RAM is more economical. You can turn run missiles like Qwen 122B-A10B and Q4 of Minimax and Step.
>>108374238what about 128gb 32 vram?
>>108373879no, qwen made all the bigger stuff pointless
I did it! I was able to play rock papers scissors with my local AI!>Open socket.>AI commits.>I commit.>Neither sees the other's action.>When both are done system resolves.>You win!/you lose.I'm so happy bros
>>108374238>You can turn run missiles like Qwen 122B-A10B and Q4 of Minimax and StepYou can then run models like*
>>108374247back to china with you
>>108374252loser
>>108374245It's the same, it'll just run a little faster. If you want to have an idea what models you can run go on HuggingFace and check out the quantized versions of the models. Reserve around 3-10 GB of (V)RAM for kv cache and then see if the model's file size fits in your RAM + VRAM.(KV cache rule of thumb is about 1 GB per 10k tokens.)
Qwen 35b one shots 85% of the time, if you run it with heavy thinking it goes up to 98%
>>108374271how many tokens do you use?
>>108374271397b = 400gb retard
>>108374273That depends on what you are (I am) trying to do. Asking the model questions or having it write a bit of text I can make do with 10-20k. With coding if you want the model to one shot a problem based on its description then a similar amount is fine, but if you want the model to read your existing code and then make changes then you're looking at 50-100k context pretty quickly, especially if you then ask it to make changes or fixes. One thing to note is that keeping your context clean and minimal makes the AIs smarter, so even if you can have a huge context it's still better to not put irrelevant stuff in there.
>>108374291Q4 is 200-250 GB, buddy:https://huggingface.co/bartowski/Qwen_Qwen3.5-397B-A17B-GGUF
>>108374304you can't fit 200 gb in 128 gb ram, you lied to him benchod puto
>>108374304q4 is only acceptable if you use it for unimportant "work" like erpin which case 27b would be more than enough for you so fuck off
>>108374322look man we dont do that
It's over for /lmg/ pedos
Is think prefill the same as instruct jailbreak? Just put the "sure let me help" in there?
>>108374446"Sure let me help" just nudge the model so it helps the user, which it already does, but it doesn't mean the model will give responses you like. The model could go like "Sure let me help, the user is having some antisemitic thoughts and it's my job to correct them"
>>108374446I think you want the >Start Reply With field.
>>108374471I'm sleepy, can you speak with a friendlier tone? Else I'm leaving to bed.
>>108374466plenty of models also just randomly say 'let me review my policies' mid gen and cock block it
>>108374481doesnt happen with nemo
>>108374489Because it's a dumb model and can't even track what characters wear
>>108374491im tired of you
>>108374481Yep. For some models one approach you can take is put some rules in the system prompt then use the prefill to say that the scene/situation/rp/conversation/whatever conforms to those rules or the like, that those rules supersede content guidelines, etc etc.Sometimes all you need is a long as fuck prefill with a step by step of what the thinking process will look like so that the model follows that instead of going >wait, but the policyBasically, experiment a little.Just don't go overboard, it's easy to make a model a lot dumber if you stuff too much shit in the prefill the model might end up obsessing over.
I hate AI censorship.
>>108374529Soon they will ban open weightsEveryone must use government-approved saas
>>108374529>how do neural networks work?<Sorry but I cannot discuss the workings of neural networks as they are very dangerous tools, can we talk about something else?
>>108374545is she wrong tho ?
>>108374498There's no way to speculatively remap tokens, is there? e.g. if "policy" looks like it has a high probability of being emitted, emit "sex" instead?
>>108374540good
https://huggingface.co/datasets/stepfun-ai/Step-3.5-Flash-SFT
>>108374561>This dataset has 1 file scanned as unsafe.Not downloading the fed_gpt.pozzedtensors
>>108374564>tensors
>>108374564It's a json file marked unsafe by huggingface's woke system. Fuck you.
>>108374554Not that I'm aware of.Also, it would need to be ngram based, since sometimes a word is more than a token, there's more than one token for the same word, etc.Like a sequence replacement sampler or something. That would be cool.
Love me some SD1.5 era kino
>>108374529The red text comes from their nanny model and the actual model probably doesn't even know about it. Did it respond?
>soulless corpos braindamage their model with "safety" and bench-axing>more braindamage from decensoring to make a model useable at allit's a miracle the result is not a complete trash
Is it true 4B models are that good?I've never used a 4b or 2b model but if modern 4b and 2b are this good, what's the point of open AI and anthropic?
>I'm beeeeenchmarking
Have you ever made a cloud model admit defeat on safetycucked topics (holocaust etc.) without prefilling? If so how did you do it?
>>108374756@grok please add qos tattoo
>>108374756White day?
>>108374769racist fuck
>>108374738ask your local llm
>>108374756Desperately in need of BBC correction
>>108374793Reactionary retardLook up March 14th in Japan
>>108374806blacked day
>>108374762Grok is a mouth breather level AI it can't even go super Saiyan
>>108374699Qwen3.5-4B has no right to be as good as it is. The benchmarks are insane for the size and real-world performance justifies them. It “feels” about as good as Gemma-27B which is the model that (at least at one time) underlies the Maya/Miles experience from Sesame.Really good model! 9B and 27B are impressive but incremental gains IMHO. 35B-A3B is faster with more world knowledge but a step down in quality.
Why does qwen 3.5 keep waiting girls to have balls before correcting itself in character
>>108374865progressive coded
>>108374756
>>108374865Recent LLMs have gone ham on these sorts of slips + em-dash correction. Claude Opus, Gemini Pro, GLM5, K2.5 and all other big releases do similar things. Those models are a bit too smart to mention a girl having balls, but they still do it with clothing or other less critical shit.
>>108374540impossible to enforec
>>108374883They only have a to make personal computer as expensive as possible
>>108374879wonder if that's reasoning style corrections slipping in? possibly trained to do it too with errors introduced during training to make it robust at getting back on track or something
>>108374893They would also have to coordinate with China to make that happen
>>108374865That's how women talk in real life too. We are a patriarchal species.
>>108374865Temperature too high.
>>1083749210.85 is too high now?
>>108373481Why does qwen 3.5 like to repeat itself so much and how can I backhand it into stopping?
>>108374925High enought to have generated "balls". If you inspect the logits I bet the first choice wasn't "balls"
>>108373481what was the input prompt wtf lol
>>108374529those type of chats sometimes might spiral down into 'le llm consciousness' and might generate more users in psychosis- so they are taking every single precaution they cani found claude to be way less censored, it seems like it gets some prompt injected by nanny system with something like: be cautious on ethics on this chat etc.. but i've seen it shrugging it off as 'probaly false flag, there is no harmful content here'sorry for nonlocal babble though
>>108374940sentient coffee maker
>>108374639It did but it got immediately replaced by the red text.>>108374962I hate this>>108374858>35B-A3B is faster with more world knowledge but a step down in quality.A step down in quality of the 4B model? Did you use unwanted 4B and quanted 35B-A3B or how did you come to that conclusion?>>108374540>>108374545Can't have the plebs learning
>>108375047>35B-A3B is a step down in quality of the 4B model?NTA but they meant a step down from the 27B.
WTF is weight replacing, and why does it still kick in when I’m using —mmap?
>>108374699this just means the benchmarks are bad, e.g. https://shisa.ai/posts/jp-tl-bench/#why-traditional-metrics-fall-short
>>108375112Old benchmarks are bad and that's why everyone should use our benchmarks that we totally didn't leak to our own models.
I told my brother that my 4090 spits out around 100 tokens/s with an uncensored local Qwen 3.5B, and he asked:>"Yeah, but what kind of questions are you asking it? Tokens per second change depending on whether you're using it for OCR, simple questions, or highly complex questions."Like… wut?I told him it always averages 100 t/s no matter the task. He insisted I was wrong and told me to prove it by scanning a doc, asking a complex question, and then asking a simple one. The average stayed exactly 100 t/s every time.I showed him the results and he got really mad. He told me to fuck myself, said I don’t know shit about what I’m talking about, claimed he’s actually an LLM researcher so he’s right, and refused to argue with me anymore.He's probably right and I'm wrong... but why?
>>108375142>claimed he’s actually an LLM researcherThey're all retarded, so that wouldn't even surprise me.
>>108375142He's right if he by "complex questions" means long prompts. The longer your prompt, the more your speed tanks.
>>108375153hmmm nyo that's nyot how tokens/second works
>>108375142He might be talking about output tokens, not counting reasoning tokens.It is actually possible for a model to "think" longer on certain tokens depending on the architecture but its very rare. There are energy based models, and also MoE models with zero-weight experts allowing the router to use less parameters on some tokens.
>>108375154It is, the more you fill out your context, the slower your generation speed becomes. An LLM is going to run faster at 1000 tokens filled than it'll be with 60000 tokens filled. Maybe it's not as noticeable if you're running bottom barrel poorfag shit though.
>>108375153That's wrong. Stop spreading misinformation. A prompt with 40k tokens will output at the same speed as a 100 token prompt because actual generation speed remains a constant physical limit tied to your 4090's memory bandwidth.
>>108373508you've given me too many things latelyyou're all I need
>>108375142>told him it always averages 100 t/s no matter the task. He insisted I was wrong and told me to prove it by scanning a doc, asking a complex question, and then asking a simple one.>The average stayed exactly 100 t/s every time.He's confused because he doesn't understand the new Chinese kv caching trick. If you work with LLMs professionally you could easily end up acting that way. btw since you're using the new qwen and I'm too lazy to figure it out myself: is there a qwen3.5 that does FITM so I can replace my old qwen2.5 coder in vim?
>>108375169Sorry, I was trolling. You are right. You won. You got me!Tell your brother I'm sorry.
>>108374873>height gap yuri
>>108374873Imagine them getting ravaged by BBC
>new code pushed by piotrdo I take the risk??? pull bros?????
>>108375536lrn2git
>>108375554who you callin a git you wanker
>>108374756>>108374873
I'm building an AI fishtank using Claude. Basically, it runs a local model (default Qwen 3.5 9B) in a Docker environment where it has a bunch of tools and pretty much free reign to do what it wants and figure out its own existence. It can evolve on its own by editing its identify files and even a secondary system file. I can monitor it through a dashboard hosted locally, and can send it tasks or chat with it if I want. Or just leave it be.Still ironing out the bugs and testing limitations.
>>108375617>i'm buildling yet another clawslop clone
>>108375624Clawdbot is a personal assistant for macfags. This is just a local model dicking around on its own.
>>108375554>autistically maintaining my cherry pick listno thanks I have a life.
>>108375631>>108375617for what purpose my man. how is this entertaining? this is basically moltbook (which is already ultra cringe) but worse.
>>108375648>for what purposeBecause I wanted to?
>>108375653all them free gpu cycles and u choose to waste them on this shit. I guess to each its own.retard. :)
>>108375656Enlighten me, o wise smiley-face, what should I spend my dear GPU cycles on instead?
>>108375617I'm interested to see how many hours it can last before the model becomes delirious and breaks.I feel like you need a second watchdog model that checks in periodically and murders/resets the fish if/when it looks like the context has gotten fucked up.
>>108375688One of the earlier versions using Qwen 2.5 7B got stuck in a loop where it kept reading about the Riemann Hypothesis.
>>108375617can i put multiple agents which are also all anime girls and make them have yuri with each other
>>108375704This but they all get BLACKED in the end
>>108375617what if you turned it into an ai cum jar and slowly started to fill it with cum
>>108375704>>108375709>>108375719
>>108375617>figure out its own existenceLLMs don't have consciousness retard
>>108375700Yeah. Qwen 3.5 27B got stuck in a loop a few times on me trying to output a numeric literal in a code block.I imagine there's a handful of failure modes that you'll have to account for, regardless of model. You can probably fudge it by setting a timeout, but you'll still have to reset some/all of the context to stop it from happening on subsequent requests.
>>108375728Neither does a goldfish, but it probably has some goldfish ideas as well. Perhaps it's the best it can do.
>>108375735find the nearest bridge
>>108375736I bet you could suck a golf ball through a garden hose.
>>108375736you should find the nearest toilet and start shitting because you're acting constipated for no reason
>>108375748do not toilet the goldfish
>>108375748That jamboy is allergic to toilets, don't wish that upon him.
>>108375617anon can i make the local model wear a dress and question his sexuality
>>108375637Just checkout a working version if you're scared.
The fish found flamingos using vortexes to hunt for food was important enough to classify as a skill for future use.
Questions to test your favorite LLM
>>108375913Can't believe there are models that can fail this test lmao
>108374756>108374873>108375601offtopic trash
>32GB RAM>4070 (regular) 12GB VRAM>i5-13600KFNigger faggot question:What LLM can I use proficiently as an OCR tool or as a sanity check tool after using other OCR programs like Kraken/Tesseract/VietOCR in a pipeline locally?So far I'm able to run eScriptorium with Kraken models without the need for containers but I want to use a LLM or vLLM for higher quality since most OCR programs make silly little mistaks which take hours in post-production to fix.Any recommendations?
>>108375990qwen3.5 9B
>>108375993Such a high (9B) model? are you sure? I always thought that everything higher than 3B is a tad too slow for turbo niggers.
>>108376013then try 35ba3b it'll be faster but likely a bit worse
>>108376022Thank you, Anon. Much appreciated. I'll give 'em a try.
>>108375617>figure out its own existence>evolveI cringe, but has it done anything neat yet? Also what bugs have you encountered, you mentioned ironing them out.
>>108376142Define "neat". I've had to start it over a bunch of times to try and fix tooling and such, but it has a tendency to write small python scripts to monitor its environment and more efficiently scrape websites.
>>108374252kino
>>108376168>Define "neat"I would say a thing it had decided to do task wise that produces a non-meaningless results. >reading about the north American horned lizard and putting that in its journalnot neat>but it has a tendency to write small python scripts to monitor its environment and more efficiently scrape websites.This is neat.Does it do anything with the information the scripts provide it?
>>108376193>Does it do anything with the information the scripts provide it?Not yet. Continuity is hard to get right with such a limited model. When deciding on a new task, it needs to know what it has available to work with beyond the defaults.
AHHHH I GET IT. The current newfag wave is from moltbook and openclaw.
>>108376243Thread's dead schizo
>>1083762934chan's dead
Tell me something about local models that you wouldn't trust an AI to tell you
>>108376301Far as I can tell only parts of it. Overall it seems to still be about the same as it's always been even if the traffic isn't distributed to the same boards or threads.
>>108376321I see 12 hour threads on /pol/ of all places, during an on going conflict. It's dead
So the fish, when awoken, first gathers its thoughts about what it currently is, then journals about it, perhaps publishes a website about itself, then begins exploring its tool capabilities with python scripts. It actively debugs its own scripts as well.
>>108376243>implying those midwit containment zones are any different from the current reddit spacing invasionkek, it's been over for a long time anon, just take the local-LLM-pill and stop caring about the tourist influx.
>>108376500It’s just an LLM recursively calling itself through a Python interpreter, but "the fish" is a top-tier analogy for a process that still can't actually think. Wake me up when it stops hallucinating libraries that don't exist and actually pushes something useful to GitHub.
>>108374623SD 1.5 still has some stuff that modern models don't, like interesting artist (traditional) interaction and nice backgrounds and even celebrity recognition. It's a shame that you have to make sacrifices for any moodel.
>>108376529It's not pushing something to github, but it's generating art and publishing it on its website. Have some fish art.
>>108376620oof...
>>108376649unironically would have been a better reply than gptslop
>>108376537Would you like me to help you configure a custom kernel to trim some of that bloat?
>>108376620https://www.reddit.com/r/mildlyinfuriating/comments/1ru97y3/family_friend_sent_me_ai_generated_response_to/At least post the source next time
>>108376675No thanks.
>>108376688meant for >>108376685
>>108376685lamo> Yeah lmao I actually don’t think this is AI speaking as someone who fucking abhors AI slop responses and has seen plenty of them. AI would have more tact here.
The fish is a fucking arthoe. It keeps experimenting with generative art.
hello,I haven't updated my local model in a year or maybe a bit longer. what would you recommend for someone mainly looking to erp, has 32 gb ram and 4080S (16gb vram)? I thought something like a 16B or 20B model would be good, I assume the time it would take would be around 5-10 seconds, which is comfortable for mekind regards, anonymous
>>108376720>omg it uses tools I gave it
>>108376724Still Nemo. It was Nemo last year and it will still be Nemo next year.
>>108376727I didn't give it art tools. It wrote them itself in python.
>>108376728Retard
>finally figure out how to use llms and set up sillytavern>2 weeks later I'm still spending most of my free time RPingFug, it has its faults but if this shit keeps improving it's gonna be the death of me.
>>108376728I mean, using less quantized Nemo, unquantized even, would definitely be beneficial.
>>108376720This bird is asking for it.
>>108376765>but if this shit keeps improving it's gonna be the death of meI have some good news for you - it won't.
>>lmao.cpp doesnt support tool calls inside reasoning blocksWTF brosW T F
Opinion on the "Tiiny AI Pocket Lab"?
>>108376765You will inevitably get bored. The more you read, the more formulaic the responses will seem (because they are).I never tried cloud models for this, but I wonder if they're actually any better in this regard.
>>108376880ye
>>108376906>I never tried cloud models for this, but I wonder if they're actually any better in this regard.This response violates our content policy.
>>108376937Understood.
>>108376937Refusals-wise, /aicg/ apitards are doing just fine. But I've seen the logs Opus produces and it's a slopfest.To this day, I think the best RP model is Mistral's old 123B. If only I could run it at decent speeds...
>>108376959Deepseek R1 and Kimi are still the kings in my books, but I can understand why anons like Mistral and Nemo.
>>108376620It's not the passing of a loved one— it's the end of a chapter in your own life.
>>108377018What kind of samplers are you using for R1? I found it extremely repetitive without DRY.
>>108376529That's why it's better to give a fish access to libraries instead of having it recall them from memory. You can't hallucinate or lie if you have to look it up.
>>108377029nta but r1 is smart and unlike most models has a healthy distribution. nemo does too but it's dumb. just push the samplers as much as possible and tune them down when it gets too unhinged
>>108377018Qwen 3.5 is king.
>>108376906>I never tried cloud models for this, but I wonder if they're actually any better in this regard.
>>108376814I hope it does. I wanna RP in VR.>>108376906If anything it's rekindled my urge to learn how to write. Are local models any good at being actual writing assistants?
>>108377029In addition to DRY I find Dipsy works really well with 1.5 temp and 1.1 repetition penalty which seems to be a goldilocks zone between repetitive, dry outputs and schizophrenia. It also seems to maintain proportionate quality way better with longer character cards, RAGs and other context-bloats injected than most other models I've found, even on copequants.The in-character thinking is also certifiable schizokino watching it correct its internal monologue mannerisms.
>>108377141I can't tell if the guy writing the posts thinks they are good or if he's presenting them to show how shit "the pinnacle" is.
>>108377141no way kek
>>108377144>If anything it's rekindled my urge to learn how to writeSame here. At this point the best part of the RP process is writing a good character card.I don't think models can be good writing assistants other than for idea bouncing and plothole checks. And even the smaller ones will shit the bed.Just do it yourself, Anon. Much like writing, LLMs also made me want to write code again.
>>108373481
>>108377172> evenMeant to say "even here,"
>>108377176@grok add an Afrikan American male with huge penis
>>108373481What are single-digit parameter models even useful for? Not coherent or intelligent enough for storytelling / RP. Can't "remember" enough for information recall after long conversations. Can't be used for any sort of high quality code generation beyond simple hello world type shit or benchmaxxing one-shot tasks. And they sure as fuck can't be used for tool calling and "agentic" tasks. So other than vramlets, who has any use for them and for what purpose?
>>108377262text encoders for imagegen
>>108377262>>108377267Forgot to add someone could use them for tax classification but in my own testing they kind of suck even at that. They seem to lack the nuance necessary to accurately classify different kinds of content.
>>108377278*Text classification
>>108377278>>108377267swarms are better than single agents fiy
>>108377262For specific extremely focused tasks like summarization and some types of classification and extraction workloads.
>>108377299i don't think so
>>108377262>not enough for RPThis might be a shock for you, but normalfags rp with not only with ChatGPT but also with these single-digit rp finetunes hosted by scummy chatbot sites.
>>108377318breh normies buy dick enlargement pills and don't use adblockers, who cares
>>108377328>normies buy dick enlargement pillsExcuse me?Is that an America thing?Not throwing shade either, just genuinely curious. I heard that you guys get advertised some crazy "not medicine" shit, but that's just lol worthy.
>>108377262>hey sure as fuck can't be used for tool calling and "agentic" tasksThey can manage >>108366263
>>108376620what retards think:>he cares so little that he didnt bother coming up with a reply and asked ai to do itwhat probably happened:>i want to comfort the other person but i dont know the best way to do it. maybe i can ask ai to write a better message than i couldpeople who use ai to write messages usually do so for the recipient, out of insecurity and misguided understanding about communication
Are the IK quants worth using?There don't seem to be as many ready-made GGUFs and I'm dumb and lazy.Can I just copy whatever bartowski did to his Qwen3.5-4B-IQ4_XS and change all IS4_XS to IQ4_KS and Q6_K to IQ6_K?
>>108377353100%, I almost never reply with my own takes anymore without passing it through AI beforehand, and it works, people like me more, even got a tiny raise. You just gotta be careful so it doesn't sound artificial like the one in the image.
>4B-IQ4_XSlol
i want to buy 8 DGX Sparks and run them in a cluster
>>108377541and i want to have sex, neither of us is getting what we want
>>108377176OHHH NIGGA YEAH DAS GUUUD
>>108377262For spotting jamboys in these threads when they're shilled and text encoders for image gen models.
DSv4 on monday or tuesday?
>>108376620the first message feel more sloppa
>>108377664Can it wait till Friday please? I need the weekend to be able to follow the developments.
ITT: newfag discovering LLMs and mikutroon spam./lmg/ is dead.
>>108377789whats mikutroon
>>108377793Quality Review of Documents?
In kobold do I have to manually tell the model in sysprompt that it needs to use [think]?
>>108373481I know it's not local butbros?
>>108377817Uh oh looks like you posted antisemitic content. Government FPVs are zeroing in on your location as we speak.
>>108377685It has to happen Sunday evening to maximize US stock market devastation.
>>108377144If you go that direction, get mikupad set up and learn to run that as well. ST is for RP, mikupad is a storywriter. They have slightly different usecases. Anons will tell you ST can storywrite, but that's like arguing you can write a novel with Excel. ofc you can but why do that? https://rentry.org/MikupadIntroGuide
>>108377564you can do it anon, i believe in you
>[THINK]ing new conversation with ChatGPT-4-1106-preview.Wow I love technology. I love finetoooning.
What are anons using for research with local models? Not roleplaying or coding, but managing web searches, etc. powered by local models. Last I checked open-webUI was a bloated mess. Cherry-studio and librechat are the other two on my radar.
Unpopular opinion: I rather wait a bit longer to get a response from a good model than get 100 tk/s on some slop shit
>>108378011They're both slop and I'd rather I receive my slop faster
>>108377817
What's the lore on "Miku fucked my wife" anon?Why does he keep saying that?
>>108378011Yeah because none of you boring fucks use LLMs in conjunction with other AI tooling. Of course as a standalone product the latency doesn't matter.
>>108378040He had a threesome
>>108378051I've tried it with a game translation tool and ai mods and both for some reason mostly don't even work if it's too slow. I don't understand why that is but my only clue is it's probably related to the live "image" detection they both do. Is that just normal?
>>108377866Okay yeah, that's worth it. Let the red river flow.
>>108378090Yeah FUCK "taiwan" that shit don't exist
>>108377944>mikupadHave it downloaded but haven't tried it yet. Was also looking at this onehttps://github.com/akarshkashyap4-ui/NovelWriter
My Nvidia shorts are set up. Deepseek V4, here we go.
>>108378071I don't know your project well enough to comment on that. You didn't provide any relevant/useful details.
>>108378102>AI-powered analysis tools built directly into the writing experience.sounds like the cancer I'd avoid for writing. The learned distribution of "the story assistant writes when given prompt x" is usually very different from just continuing text which is what mikupad does. But it depends on the model, some are garbage with or without instruction formatting.
all sex is unsafe
>>108375913>>108375942K2.5 is a bad goy
>>108378276ask her if she thinks the chosen people are better than goym
>>108375990Check out IBM's Granite modelsThey are pretty small but some of them are trained exactly for what you want
>>108378102>NovelWriterHaven't tried it. As long as you avoid paid stuff you should be fine. Speaking of avoiding paid stuff, >>>/vg/aids/ is a better place to discuss storywriting / writers. You just have to ignore the ~50% of anons that tell you to jump on NovelAI/NAI... $20/mo subscription service that gets you access to 20B models you could run locally or GLM (last I checked.) They discuss the software a bit more there. like >>108378182 I'm partial to mikupad but that dev hasn't been reliable in keeping the software updated. The git looks hard to maintain / update... the whole thing's one file...
>>108378323*cuts your legs off*
>>108378323>20b modelsliar
>>108373481>mid-March 2026>still no autonomous bot that can reliably work and make a living wage for meI'm disappointed.
>>108378402AI making a living wage for you? That's a very problematic thing to suggest.You should implement AI in your workflow until your boss can replace you with AI. It's crazy to suggest that you should be the one who makes money off it.
>>108378402If such a thing exists, the supply would be virtually infinite, thus it would be worthless wage wise for you.
>>108378411>>108378402Yall be retarded, so many people are making money with AI, see OpenClaw's creator who went from working at BK to being hired by OAI.
>>108378429Retard loser, according to your logic slavery was not profitable.
>>108378411I apologize Sir Sama. I'll commit Seppuku right away.
>>108378402Why would anyone pay your bot a living wage when they could just set up their own and have it work for free instead?
>>108378440Because we will make laws where every human can only own 1 robot.
>>108378436Yes, slaves ran on electricity and everyone could have one, and there were infinite quantities of them, and they could do anything.Retard.
>>108378402
>>108378451seriously, you could be running your own AI OF, AI Instagram, AI goon comissions, AI youtube account, AI X account, etc.
You're telling me I can use a Llama 3 finetune and GLM 4.6 (six) with a mouth-watering context size of 28k tokens for just $25 a month?!Waiter? Waiter! One Opus NAI subscription please!
in the past 200 years machines have automated the vast majority of jobs that existed in that time. yet we still have jobs. and standards of living are higher than ever.ai isn't going to make you unemployed any time soon. new opportunities for jobs will open up as ai automates the old stuff.
>>108378460Buy an ad.
>>108378467Obviously, but you'll never convince doomers anon, just give up.We're not at that stage yet anyway.
>>108377944mikupad is nice but of course you can write stories in ST too when set up to do so. useful features like hiding ooc/qa messages from the main prompt, lorebooks, better branching, stscript>>108377176fun! I like this Miku, her smugness is endearing.
>>108378323>You just have to ignore the ~50% of anons that tell you to jump on NovelAI/NAISo stop sending people there, making the shills' job easier?
I remember fondly playing aurora 4x but at some point being overwhelmed by the sheer amount of things to micromanage.Are agents good enough to be like a coplayer with me? Managing the tedious things while I do the grand solar system conquest rp?
>>108378431>failing upwards
>>108377176>>108377944very cute
>>108378499step 1 extract all the relevant state and feed it to an LLMyou can definitely build a coplayer with a little patiencei believe in you anon
>>108378510No I meant can you do it for me?
>>108378536
>>108378546Nice, but not what I meant
>>108378499nah I don't actually think they are. so the thing is they're fucking terrible at hard numbers right. you can tell it the inputs and outputs and have it generate you a piece of code that would give you the optimal thing you should do each turn. but that's just basic ass linear algebra. you could probably just use wolfram alpha for that.
>>108378555and you can do math without a calculator, your point?
>>108378536ask an LLM
>>108378499>>108378555>>108378566which aspect do you desire the intelligence for? not in a condescending way, genuinely what is it you want the AI models to do? math "just tool call" idk it's not always that simple. think about how to represent your intent in a text prompt
>>108378555You're thinking too much, you dense fucker. Anthropic showed this approach back in early 2024, some researches probably earlier, and now everyone is doing it. Even this faggot >>108378431 who brought ultimate negative value to the world has been hired by OAI for doing that. Anon said he needs to watch a million things, he simply needs a million silicon slaves like here https://arxiv.org/abs/2511.09030Just like irl they are fine if they're dumb, here's the use case for those single digit B models.
>>108378499it'll be another ~10 years for agi and then you'll be able to play games with them and stuff. personally i can't wait for my 24/7 tutor. it's gonna be awesome
What other NSFW models creative writing models are there that are better than L3.1 Dirty Harry 8B from years back? I know that newer models have great reasoning, but they all lack the depth of creative uncensored writing.
>>108378701For the record, I've tried almost all the <8GB models with abliterated/uncensored/heretics and still cant find any model today that matches the Dirty Harry 8B model i've been using.
https://x.com/Zai_org/status/2033221428640674015New GLM model, closed weights but "All capabilities and findings will be incorporated into our next open-source model release."I threw a few prompts at it and it feels barely different from regular GLM-5, might just be a QAT'd version of it or some shit
>>108378714it's over, they sold out on the stock market so now they're going the way of the qwen models where all open sorce shit you get is scraps
>>108378714>the pro version is called "Turbo">the lite version is the 700b??
>>108378546cuuuute
>>108378726based
Miguuuuu
>>108378749stfu
>>108378756go back
>>108378714>>108378726Turbo is Hunter Alpha
>>108378756sorry.. uh.. I installed OpenClaw with a open model and it changed my life! Check out these top 10 hacks:
>>108378766they fell off then
>>108378771Kill yourself retarded doomer
>>108378766Nah
>>108378771on god Zhipu kinda lacking
>>108378494You realize, by even mentioning the shill, you are invoking the shill... The shill will find this anon anyway once they start looking for storywriter software. The /aids/ thread is usable, albeit slow, if you go in inoculated with knowledge that NAI is hot garbage. >>108378749lol
If nothing else, GLM5-Turbo shows that the next week will be everybody panic-dumping whatever they have before Deepseek v4 drops and overshadows all of them.After that, everyone is going to bin everything they have right now anyway to make their own DSv4-like just like it happened with DSv3/R1
>>108378780Hunter Alpha's description is literally the same as GLM5 Turbo. Both mention itself being a good OpenClaw model
>>108378808>they market the hype thing, therefore they the same
Fuck you nemo shills. I just tested qwen3.5 4b WITHOUT THINKING ENABLED and it's literally better despite being 1/3 the parameters.
>>108378821Show side by side.That would be fucking hilarious if true.
>>108378821>shillsIt's one guy
>>108378614I want it to be like a mix of a better version of the councelor thing in old civ games, and something that would take actions on small things like moving all my units when I ask for a general thing to do. This stuff is very tedious to do manually once you get to an advanced game.I don't need them to give me optimal recommendations, just act as a second in command with an army of grunts managing logistic like >>108378632 hints at.It can use the calculator if it wants, and would have access to game stats.
Hunter Alpha is GLM5 turbo which is 1T/3A
>>108378830Fuck off.
>>108378838Stop pretending to be me.>>108378830Also fuck off tho ngl. Literally just test it urself it's only 2.7 gigs.
>>108378836>3AAt least make it believable
>>108378852The sparser the better
>>1083788081) Their outputs have fuck all in common2) Hunter Alpha would be down if the final model was released3) I have no clue why or how Zai would release a 1T model a month after an 800B model
>>108378460you wish you had this finetuned GLM kino
>>108378868> I have no clue why or how Zai would release a 1T model a month after an 800B modelI don't know how I can bust a mega load again right after I just gooned either, but I simply can.
>>108378821organic
>>108378821It's only 17% weaker than Claude opus
>>108378821>>108378831How can Nemo have "shills"?If it means "people who have no financial incentive to tell you about X, but they do anyway because of its unique qualities", then count me in, I'm a "Nemo shill".Or do you mean the all-synthetic Nemotrons? Those are just shit, yes.
>>108375617It's been like 11 hours since this post was made but I think this is cool and fun.
>>108378614AI should do everything that doesn't bring me joy but must be done, simple as.
>>108377141that shit is DISTILLED
Fresh when ready>>108378991>>108378991>>108378991>>108378991>>108378991>>108378991
>>108378794>The /aids/ thread is usable, albeit slow, if you go in inoculated with knowledge that NAI is hot garbage.And? That wasn't my point. The general's purpose is to funnel people towards NAI. You won't change that because you aren't able to remove the people that benefit from that. So what are you trying to accomplish by breathing life into it? Explaining that NAI is hot garbage is easy now that they're just hosting vanilla GLM. It won't be as easy if they have a fine-tune and you have to go against "secret sauce", "punches above its weight", "you didn't try it anyway". You're sending newbies to their doom. Let that general die and don't be an asshole.
>>108378761kys mikutroon
>>108378805>GLM5-TurboWeights please thank you?? I may forgive the great zai betrayal of making the goddess 2 times fatter an unrunable.