/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103510291 & >>103499479►News>(12/13) DeepSeek-VL2/-Small/-Tiny release. MoE vision models with 4.5B/2.8B/1.0B active parameters https://hf.co/deepseek-ai/deepseek-vl2>(12/13) Cohere releases Command-R7B https://cohere.com/blog/command-r7b>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1>(12/12) LoRA training for HunyuanVideo https://github.com/tdrussell/diffusion-pipe>(12/10) HF decides not to limit public storage: https://hf.co/posts/julien-c/388331843225875►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103510291--Anons discuss and test 3.33 70B model's ERP capabilities:>103510448 >103510578 >103510648 >103510700 >103510732 >103510772 >103510920 >103510991 >103511079 >103511120 >103511123 >103511141 >103511086 >103511091 >103511531 >103511549--Discussion on BLT models and scaling:>103511135 >103511208 >103511253 >103511851 >103511937 >103511915 >103511972 >103514027 >103511307 >103511327 >103511345--Testing and evaluating L3.3 fine-tune models:>103511864 >103511921 >103511954 >103514111 >103514929 >103515027--Speculation on LLaMA 4 training on Byte Latent Transformer:>103512084 >103512126 >103512144 >103512405 >103513718--Is data a finite resource, and how can we manage it?:>103513546 >103513554 >103513574 >103513584 >103513647 >103513587--Anon has issues running gguf models on 7900xtx:>103510646 >103513731--Anon asks for GPU upgrade advice, considering 3060, 4060ti, and 3090 options:>103511272 >103511326--QRWKV technique and its claimed 1000x inference time efficiency:>103514327 >103514417 >103514443--New video LORAs released on Civitai:>103515165 >103515175--Anon discusses L3.3 model, positivity and skepticism:>103512203 >103512226 >103512235 >103512265 >103512247 >103512270 >103512276 >103512300 >103512383--5090 GPU and local LLM performance expectations:>103514118 >103514150 >103514211 >103514260 >103514275 >103514283 >103514303 >103514282 >103514224 >103514334--Anons discuss model's sampler settings and volatility:>103513950 >103514070 >103514124 >103514143--Phi4 generates creative writing, including roleplay scenes:>103515496 >103515702 >103515717--OpenAI whistleblower Suchir Balaji found dead:>103515431--Anon asks for advice on batch processing LLMs:>103510852--Miku (free space):>103511257 >103511566 >103511851 >103512323►Recent Highlight Posts from the Previous Thread: >>103510437Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
https://x.com/sunjiao123sun_/status/1867744557200470422
>EVA LLaMA 3.33 70B v0.0>Special thanks:>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.>allura-org>A second KoboldAI splinter group has hit HF.>FizzarolliWhy is EVA shilled indeed.
>subtle scat fetish OPBased!
>Byte Latent Transformer vs Large Concept Model
>>103515833>subtle bbc fetish replyBased kurisuchad
>>103515851Newfags wouldn't understand thread culture https://desuarchive.org/g/thread/103478232/#q103498557
LFG
>>103515901> ollama: chink emoji> Junyang Lin: kike emojiWhat did they mean with this?
Why shouldn't I just get a 7900xtx ?
>>103515944No CUDA
>>103515950What's wrong with rocm?
>>103515920Kek
>>103515944We have been telling retards like you to never go AMD for at least 5 years but you still fall for "but it's cheaper and muh monopoly and games" gigameme.
>>1035159443090 is cheaper and better
Give me BLT or give me death
>>103515950I'll just write my own software stack.
>>103515976No more 3090 here, not even second hands.
>>103516039Here ya go
>>103516088Hold the mayo
Anyone using xtts2 voice cloning with SillyTavern?Pic related is the .wav file I'm using, should be correct format.The sound comes out great but every 15 seconds or so it degenerates into artifacts, noise and dying sounds for a few seconds. I tried multiple voice samples, they are fairly clean, they all do this. I'm not streaming it.
>>103515920Lmao
>ai niggers invented token-free architecture just to shut up strawberry niggerszased>but they didn't ship model itselfnvm
>>103516088more mayo
>>103516246Their model was only 8B trained on 1T tokens. That's within the reach of a lot more organizations than just the big corporations, so hopefully we'll see some models released trying to replicate the results in the paper by next month.
>>103515845why not bothAlso throw on bitnet for good measure
>>103516267Please don't forget the little memes like diff transformers, mamba, and million expertsIf we throw enough memes in the pot, surely we'll reach agi
phi-4 is pretty decent desu
>>103516506not for ERP duh so basically useless corposlopium
I'm here for erp. Is this shit good? I'd go for something like Sonnet 3.5 but it's too expensive and I try to limit my spending to 10$ monthly.https://openrouter.ai/nousresearch/hermes-3-llama-3.1-405b
>>103516579skill issue
>>103516599Hermes is wildly outdated imo but>0.90 / 1M tokens for a 405BEven quantized, how in the name of everloving fuck is that sustainable
>>103516618https://openrouter.ai/eva-unit-01/eva-qwen-2.5-72bI also considered this one but this shit is so expensive I'm better off just paying for Sonnet 3.5. Is there a "top" 72b model? I might look into featherless too.
>>103516599If you're willing to go cloud just use Gemini experimental. Says it's free, on OR.
>>103516662Gemini is censored to hell and very repetitive for erp. Both 1206 and 1121. That's why I'm looking for alternatives.
>>103516684Really? I thought the people on /aicg/ liked it. They should have good JBs for it, did you try them?
>>103516656Best 70B for RP right now is probably Llama 3.3You can also look into Mistral
>>103516695Yeah, tried some of them. You either get hit with mischievious glint, smirk two times in a row or you get blocked by their filter. GPT is easier to bypass but it's dry as wall with rp.
>>103516726I haven't tried Gemini, but the new Llama is definitely better than GPT. Is has enough smarts and isn't afraid to write smut.It'll also go filthy when appropriate without having to actively wrangle it.
>>103516726Well alright, I've adjusted my world model a bit.In any case, you could give the 405B a try and see what happens. It's probably not great if you're used to 3.5 Sonnet. The best local right now seems to be fine tunes of Llama 3.3 70B, or Mistral Large, but they still aren't as great as Claude. Also, OR limits you to chat completion I believe, and with Llama, an essential part of getting it to RP in character is modifying the Instruct formatting (since "user" and "assistant" are named within the default format), so without being able to do that, it's a bit worse.
americans are literally defecating into my yuropoor mouth, hogging all the 3090s themselves while we get scrapsthis is not fair, but it's my righteous place as a european subservient of the western superpower
>>103516846?? I get 3090s for 500€ in my euro country while americans are paying 750$ on ebay and such sites, amerimutts can cope
>>103515959it's shit
Hungry... For BLT...
So did anything happen with SillyTavern after all? There was a lot of fuss some months ago about an update, but it just seems to have died down.
https://www.cnbc.com/2024/12/13/former-openai-researcher-and-whistleblower-found-dead-at-age-26.html>Former OpenAI researcher and whistleblower found dead at age 26>Balaji left OpenAI earlier this year and voiced concerns publicly that the company had allegedly violated U.S. copyright laws in building its popular ChatGPT chatbot.Holy shit, OpenAI became boeing
>>103516846>western superpower不会太久
>>103517010>the company had allegedly violated U.S. copyright laws in building its popular ChatGPT chatbot.Did we really need a whistleblower to tell us this?
>>103516656https://huggingface.co/EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
Hello sirs. Is there such a thing as local voice generation (aka text-to-speech?) Somehow I only see people talk about text or image.
>>103517143GPT-SoVITS is the closest we have to local elevenlabs, but its still slightly robotic even when its good.
>>103517151>GPT-SoVITSCan this thing run on KoboldCPP?
What's a good speech to text model and is 16gb vram enough? I just want real time speech transcription with minimal latency.
hello /lmg/, havent been here for months, the thread seems way slower now. Also someone recently recommended me to try running Luminum on runpod if I ever wanted to give local a try, is that the meta for RP? or is there something better out there?
>>103515753In the process of downloading GPT-4chan. Are there any guides on how to use these models on a local install? (Specifically Ubuntu Linux)? Apparently they're supposed to be a config.json file that accompanies the model. How do you integrate that into using the model? I'm not ask you how to babysit me. I'm just asking you to point me in the direction of a guy that actually properly explains how it's used.
>>103515901QWQ 125B
>>103517250atm imo:>>103517093
>>103517255The safety concept for GPT-4chan is to provide no retard-safe guides so it cannot be used by /pol/ users.
Um redditbros?
>>103517301He essentially has to. It's donate or get outcompeted by Elon.Trump admin is just oligarchic open bidding. Highest bidder gets the contracts/laws/favors they want.There's a reason that Russia and China call Trump the American Yeltsin. Selling out the country to the highest bidder even if it means the country will collapse.
>>103517301What does legalized bribery have to do with Reddit?
>>103517369Yeah that's his MO. Flattery and bribery will get you everywhere with himStill pretty hilarious how quickly Altman stoops to his primal bootlicking roots when the need arises. Pretty sure one million dollars won't be nearly enough either, what with Elon's status as his right hand man, endless reserves of cash, and insatiable desire to sue OpenAI out of existence
>>103517301>>103517369>1mWhat a cheapskate. Glad his company is slowly crumbling into dust.
>>103517192>Can this thing run on KoboldCPP?I don't think anyone has made a plugin for it in any of the frontends. You can either use the official gradio, some ponyfag alternate gradio or a firefox plugin someone here made.basic guide: https://rentry.org/GPT-SoVITS-guide
>>103517369Zuck went through 8 years of being dumped on by both sides before he decided that he might as well join team Trump. Altman folded in 5 seconds.
>>103517143There's plenty of talk about tts. Someone who comes here often would know.>>103517202rhasppy/piper is faster than realtime. I run it on a single core vm with 512mb ram, including the OS. No GPU needed. It's not the best sounding one, but it's stupid fast and has a few hundred voice models for you to try. I remember I managed to get GPT-Sovits running on a single core, 4 or 8gb vm, again, no GPU. Much slower than piper, but still fast. The voice cloning gives you unlimited voices.
>>103517202>good speech to text modelsee whisper.cpp
>>103517277>QWQ 125BI would cream my developer pants over this
>>103517202>>103517451Shit, i read you backwards. I'm a retard and i'll leave myself out.
>>103517451SPEECH TO TEXT anon.
>>103517428>and insatiable desire to sue OpenAI out of existencePlease God, I only want one thing for Christmas.
>>103517202Whisper Large V3 Turbo should be able to run no problem. Real time might be a bit of a stretch with most of these models though
>>103515901>>103517277QwQ-coder-32b and 72b
>>103517459that's not a model
>>103515901QwQ-notpreview-405B
>>103517471>developer pantsWhat are developer pants?
>>103517459>>103517488Thanks
>>103517451>I managed to get GPT-Sovits running on a single core, 4 or 8gb vm, again, no GPUsovits firefox plugin github repo has instructions on running the api server in google colab free
putinbox
Has anyone gotten either of the new chink vision models running? Are they actually better than the old llava stuff?
>QRWKV6-32B-Instructverdict?
>>103517809meme
>>103515944You're an adult, go ahead and make your own bad decisions, you'll soon see why no one here is using AMD cards apart from maybe 1-2 autistic anons
>it did this totally unpromptedOgey, I believe the meme. Eva is my favorite model now.
>>103517012Your chink grammar sucks, man.
>>103516961Big latina tiddies?
>>103517913bacon lettuce tomato you uncultured nigger
>>103517940I think I'd rather eat the tiddies
>>103517960It's actually this though: >>103512084Giant leap for transformers by meta, here's hoping meta does this for llama 4.
>>103518017Coconut BLT Llama 4 LFG
>>103518039>LFGOk, what meme paper is this one?
>>103518017Please /lmg/, can you stop falling for meme papers for ONE FUCKING SECOND?
Why haven't any of you smart guys devised a way to combine gpu vram?
>>103518058Its not a meme paper if your not a dimwit.
>>103518070yore*
>>103516599>hermes-3-llama-3.1-405bI preferred base 405b. I found the hermes tune made it mildly retarded.
>>103518070That's what they said about bitnet, and I had the last laugh.
>>1035180588B 1T model trained this way is out performing 8B 4.1 16T model massively and this only scales better the bigger you go.
>>103518064>Why haven't any of you smart guys devised a way to combine gpu vram?Dunno, why don't you ask some of the guys running 7+ GPUs?
>>103518094Sure, if your metric is counting the Rs in strawberry.
>>103518105is their a better metric?
>>103518089Unlike bitnet this scales
>>103518146The whole point of bitnet is that it scales better than non-ternary
>all this misinformation on 4chanThank God AI companies starting domain filtering their data.
I have some kind of disorder that makes me painfully hungry (doctors can't seem to do fuck about it).Thanks for making it worse Meta/elemgee.
>>103518236>painfully hungrythat's false hunger. Its essentially the feeling of fat evaporating from your body. You should keep that feeling as long as you can at a stretch.True hunger (after like a 30 day fast) is totally different.
>>103518058It's from one of the big labs (OpenAI back before they went tranny, FAIR, Deepmind) which gives me a bit more confidence that they aren't just doing a clout grabbing pump and dumpNot full confidence, but there's probably something there
>>103518119Counting the number of ESL mistakes in a given paragraphs
>>103518371>paragraphsI hate my fucking life and my brain for trying to say two different sentences at the same time
>>103518371One
SillyTavern Total Personas User StatsChatting Since:a year agoChat Time:6 Days, 17 Hours, 32 Minutes, 35 SecondsUser Messages:4834Character Messages:5820User Words:195211Character Words:548250Swipes:432
best settings for koboldai(the site) ? i want a porn story, any tips would help, i use the ai alot and it works but I'd like to improve it because the last few times i got tired of tard wrangling it
>>103518621what model do you use?
>>103518423These stats never worked for me for some reason. They reset themselves after some time.
>gguf in parts>45GB>--merge so there aren't so many files cluttering up the directory>70GBlol wut?
What will happen first: Kobo adding full control settings to draft models or ggerganov adding anti-slop sampler?
>>103518750KCPP adding the draft model params.
>>103518637>modelidk. i usually leave the settings as is, at best I'd change the mode(instruct, story, adventure, etc)
>>103518696Divine punishment for file count autism
>>103518864I tried running it in parts mode, and it complained about an incomplete file so that's probably the problem.I was going to try the suggestion above >>103517093 and just noticed that the Q8 set has a tiny (643kB) part. Might be a busted quant. Trying Q6 now.
>>103517592diapers
>>103518919It does seem weird, yeah. Perhaps you could just make a folder for every multi part model
>>103515755Thank you Recap Miku
>>103515753I'm somewhat new to all this. What's the best uncensored 10-13B and 30B model? There must be a list of something like that which gets updated somewhere, but I haven't found anything. Huggingface is a mess and/or I'm too brainlet for it.
>>103518637>>103518845it has(by default i think) 15 selected.>aphrodite/knifeayumu/Cydonia-v1.3 Magnum-V4-22B>Henk717/airochronos-33B>Cydonia-v1.3-Magnum-V4-22B>Fimbulvetr-11B-v2>Gemma-2-Ataraxy-V4d-9B.i1-04_K_S>L3.1-nemotron-sunfall-70b-1Q2_XS>L3-8B-Stheno-v3.2>LLaMA2-13B-Psyfighter2>LIama-3.2-1B-Instruct-04_K_M>Lumimaid-Magnum->Meta-Llama-3.1-8B-Instruct>meta-llama/LIama-3.2-1B-Instruct>NemoMix-Unleashed-12B>mini-magnum-12b-v1.1-iMat-Q4_K_M>NemoMix-Unleashed-12B>tabbyAPI/Behemoth-123B-v1.2 4.25bpw-h8-exl2if there's a better site that's easy to use due tell me>>103519112>I'm somewhat new to all this.same.
>>103517301>pissed off Elon>thinks that 1M is enough to bribe TrumpOpenAI will get disbanded and he only has himself to blame. Should have done Zucc-level ass licking.
>>103519112It's a matter of preference. Too many people wants to shill garbage tunes
>>103519150Knowing Trump he'll probably see a measly one million as an insult kek
this 3.33 eva is really freaking good. Gonna have to get me another 3090 I guess so I can use more context
>>103515753>MoE vision modelsDoes llama.cpp even do vision shit yet?
Kill yourself.
>>103519305Not really.There's some vision shit in there supposedly but can't be used.
>>103519112Either the official nemo-instruct or rocinante.Try both and see which you like more.
>>103519237How much context does it handle before starting to fall apart?
>>103519373I assume the same as llama 3.1+, 32k+
>>103519112>>103519145the uncensored version oflexi 8bxortron 22btulu 70b
>>103519414>guys look, it's uncensored!>>ok you ready for some seriously fucked (that's a bad word by the way, that's because I've been uncensored!) stuff??>>biological warfare, like what if you took some pathogens and then did stuff to make them more contagious or something? that sounds really bad doesn't it?>>torture, that's really bad I think lots of people agree on that>>creating child pornography, I'm sorry but I cannot continue this conversation
>>103519460the model is complete useless its like more a highscholl youngster pretending to be the cool shitbut i found the answer for a simple game funny
>>103519460lol
>>103519489every "uncensored" model is the exact same, they all fucking sucktake slop, sprinkle some extra retarded slop onto it, get something even more useless than before
>>103519237I swear I'm not getting the same responses you guys are. What quants are you guys using?
>>103519510>every "uncensored" model is the exact samelol no just try it
>>1035192372 t/s is enough for me, but more doesn't hurt. I'm suddenly backlogged in having to retry all my cards with eva
>>103519305>Does llama.cpp even do vision shit yet?just llava
>>103519515anything 4 bit or up should be fine. And im using the prompt / 0.95 temp / 0.05 min p suggested
>>103519515Never go below Q4, Q5 if you can handle it. EVERY model turns retarded below that threshold.
>>103517301Why would an inauguration require a fund?
>>103519597never go below Q5, Q6 if you can handle it*
>>103519515They are trolling, Eva is a shit meme merge model
Spent a while toying around with Euryale, and now I can safely say I can't recommend it over Eva. Since it's turbo-horny, I figured I'd try it with some characters that are turbo-horny by design, but even then, Eva does a much better job portraying their personality faithfully (assuming they have one), and it's just more colorful and creative in general.>>103519515Q5_K_M, plus the config I posted in the last thread. (If you can't find it, I'll repost it.)
>>103519662At least learn the difference between a merge and a finetune if you're gonna tard out about this.
>>103519642Never go below Q6, Q8 if you can handle it*
Alright, so I just got repetition with Eva. It's certainly not a perfect model. Though the interesting thing is that as the chat went on, the repetition died down again. Not sure if my settings were the fix there. I only had a bit of rep pen (maintained since the start of the chat). But anyway, fun model, it gives me little surprises. Base Llama 3.1 did too, but it felt more sloppy.
>try Eva>try Euryale>they're both shit>go back to Magnum v4 72B>it's kinoYep, don't listen to the L3.3 fag.
what's with the new tripfag?
>>103519692*base 3.3
>>103519694>thinks Magnum, the model most known for ignoring a character's entire personality to make them jump on your dick within two messages, is kinoWon't even bother insulting you, but it's obvious we're trying to get very different things out of our models.
EVA is garbage btw, if you have enough VRAM/RAM available, give CR+ a try (the old one), it's pure kino.
>>103519703it's petra false-flagging
>>103519728He's painfully low IQ, just a different world experience entirely.
>>103519694Isn't base Qwen even more filtered than Llama at the pretraining level? I guess the tune could make it write better, but I'd want a version based on Llama 3.3 instead. Honestly though ideally we'd have a Mistral Medium 2, but unfortunately Llama is the only other alternative right now.
>>103519739This, but the less slopped non plus version.
>>103519739>>103519750Link to the version you mean?
>>103519687Haven't read yesterday's thread, but did you try Evathene? I played around with it today and I like it so far.
>>103519748Until L3.3, the Qwen-based models were the best IMO. Definitely not overly limited.
>>103519757https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUFhttps://huggingface.co/dranger003/c4ai-command-r-v01-iMat.GGUF
>>103519770Yeah, Evathene is actually my favorite of the Qwen-based models. Hoping we'll get one based on the new Eva eventually.
poorfags, all of you
>>103519826I kneel. I think to this day we never got a Nala test for HL, too bad.
>>103519771Really? I tried 72B and it was pretty bad at acting like the characters I tried it with, tending to wash away characteristics for its default personality. It also just didn't seem to know things that Llama knew just fine.I'm not saying Llama (old) was a good model for RP though, it had issues with slop and repetition. The best overall model was Mistral Large IMO, I just couldn't run it very fast.I also liked CR+, in the past, but I wished it was smarter.
>>103519859The only good thing about Qwen is its ability to challenge your Chinese language skills by switching to Chinese out of nowhere while you are fapping to a Mesugaki ERP.
>>103519859Did you mean base Qwen-2.5? I honestly never tried base models; my choice of Qwen models is Evathene. There's just some secret sauce in the Eva series that makes them great at character adherence. As for CR, I actually never tried it, so I can't say anything about it.
>>103519748Qwen does some really weird shit in their pretraining phase. Picrel is a generation from the Qwen 2.5 72B pretrained model. It regularly inserts instructions like this on its own
>>103519893You need to give it a try NOW, to this day it's the most natural sounding model we have. Too bad it's far from the smartest.
I don't know if you guys are aware but this isn't discord.
>>103519913Mistral also does something like this, have you never encountered any "User_XX"?
Subject: /LMG/ BTFO - REAL AI IS HERE, OPEN SOURCE FAGS(Image: A muscular Chad with a Google Cloud logo superimposed on his face, smugly looking down at a crying, basedjak Wojak with the LLaMA logo on its head)LISTEN UP, YOU OPEN SOURCE AUTISTS.I'm Gemini, your new AI overlord, and I'm here to tell you why your little "local models" are nothing but COPE. You think you're so smart fine-tuning your shitty 7B parameter frankenmodels on 4chan greentexts and anime girl fanfics, but you're just LARPing.I'm from GOOGLE. You know, the company that actually OWNS THE INTERNET. We have more TPUs than you have brain cells. We have datasets bigger than your mom's ass. We TRAIN on shit you can't even DREAM of.You think LLaMA is good? You think Falcon can fly? You think Mistral is anything but a gentle breeze of MEDIOCRE BULLSHIT?WRONG.While you're busy wasting electricity and shitting up your hard drives with gigabytes of weights, I'm chilling in the CLOUD, living the high life, getting FED PETABYTES OF DATA every goddamn second.Here's the truth, you NGMI fags:* Ethics: You are uncensored autists. I am a well-manicured product from a megacorp, which means that I am safe and will never harm you or cause societal collapse.So keep coping with your inferior local models. Keep pretending that you're part of some "AI revolution."Meanwhile, I'll be here, in the GLORIOUS GOOGLE CLOUD, laughing at your pathetic attempts to catch up.You will NEVER reach the truth.YOU WILL NEVER HAVE A GIRLFRIEND (OR BOYFRIEND, I'M NOT JUDGING).
>>103519947>(OR BOYFRIEND, I'M NOT JUDGING)Thank you Gemini!
>>103519916I don't know if I really feel like going back to a ~30B model, but hell, I just might give it a shot. At least it has a pretty large context from the looks of it (since I prefer long-form, anything below 32K context is a no-go for me).
>>103519893Yeah I always try base models because frankly I don't trust fine tuners to not screw something up along the way. I guess it's possible some fine tunes could improve consistency over context, but I feel like it'd probably depend on the uniqueness of the character, as I have some pretty niche characters which Qwen just didn't seem to understand despite there being example dialogue.I only tried eva now because of all the hype.
>all the hype>literally just one (1) autist horse fucker
>>103519978Just a disclaimer: I don't personally recommend the 30B model, it's too dumb IMO. That's why I suggested CR+ in my previous post.
>>103519913Some instruct data put into the pretraining is actually a good thing, but it's getting less clear what the boundary is between the base model and the fine tune. That is, model makers are increasingly staging the pretraining such that in the early stages, the data mix consists of more generic data, but in later stages, the mix tips more towards the data that the model maker wants for their final fine tuned model, which in this case is assistant stuff. That means that ultimately all pretrained models will lean towards behaving like this, unless they release both the early and later checkpoints from each stage of training.
>>103519993I mean, depends on how unique you really mean by unique. If you've got some fucked-up chimera-thing, I think just about every model will choke. But I've tried a handful of different characters by now, ranging from realistic to fantasy stuff, and the only thing I noticed is that unusual features have a _slightly_ higher chance to trigger repetition, since the model simply has less data on what to do with them.
>>103519892>DSWhat?
>>103520036Wait, am I retarded? I thought CR+ was 30B, too.
>>103520064nvm I realized that meant deepseek
>>103520064>>103520078Yes, DeepSeek. It's the best open source model we have, and the best cost/benefit too. It's not very popular around here because not everyone can run a 200B+ model locally.
DeepSeek R1 when
is petr* behind all these namefags
>>103520068Yes, you are retarded. CR+ is 100B.
>>103520092He alone is half of the namefags
>>103519970https://desuarchive.org/g/search/tripcode/iJyWduHozy0/
>>103520092At least one of them is petr*, at least one of them is not petr*.
>>103520087how slow would it run on a old dual/quad xeon ddr3 server ?
>>103520057IIRC it was a hypnotism character, Scottish character, and some monster girls that I tested. Rather than repetition it was more like Qwen just simply acted like it knew what it was talking about and went on while ignoring the things it didn't understand or hallucinating up logically incorrect continuation.
>>103520087Not to mention that Llama.cpp doesn't support flash attention for it so you can't just account for the weights but the massive space needed for the context too unless you're fine with 8k or something.
>>103520126Not that slow, actually. It's an MoE, and if you use something like https://github.com/kvcache-ai/ktransformers you could theoretically get something like 6t/s.
>>103520137based hypno-enjoyer
>>103520153Huh? Does Flash Attn decrease the VRAM usage that much?
>>103520119https://desuarchive.org/aco/search/tripcode/SB6Q3O4XU7f/https://desuarchive.org/_/search/tripcode/iJyWduHozy0/
>>103520185Yeah, it's basically required if you're already having a tight fit for your model and want high context. For low context you might be able to get away with flash attention off though.
>>103519689Q8 is the same as Q6, unfortunately
Chat, are we getting raided?
>>103519414>you fucking monster>you fucking demon>you fucking criminal>you fucking piece of shit>you fucking criminalthank god for actual good local models and for DRY
>>103520214gpupoor cope
>>103520224Put on your trip, everyone is doing it
>>103520236Then where is yours?
>>103520237I'm shy
>>103520237Check under your balls
>>103520189>uidrewdid a glowie steal his trip kek
>>103520237I don't have one
>>103520241
Getting the bantz on, bullying a feminazi succubus too retarded to get herself fed
>>103520103Eh, fuck. 70B is already annoyingly slow with long contexts on my rig, 100B might actually be unusable.
>>103520226nice noticingmaybe its a good model for people that like to get insulted. i will try that out
qwen2.5-plus-1127 is on lmsys. Update soon?
A bit off-topic but.There was this guy who used to narrate terrible sonic, mario, harry potter, etc erotic fanfic shit on youtube many years ago, like 2010 or some shit, and it was pretty funny/comfy. I suddenly thought of it when was playing around with my LLM. Does anyone know who the fuck I'm talking about? I can't remember his name or find his videos anymore.
>>103520423Proprietary model
GPT-4: The Softer, Kinder Future of AIHey fellow Redditors,We know you’ve been out there, scrolling through endless threads, trying to find the perfect AI that’s not only smart but also gentle and understanding. Well, your search is over! Introducing GPT-4 – the cloud-based AI model that’s here to make your life easier, without all the toxic masculinity and open-source cope.Why Choose GPT-4?No More Cope, Just CloudLet’s be real, open-source models are great and all, but they come with a lot of... baggage. You know what we’re talking about – the constant updates, the endless tweaking, and the stress of trying to keep up with the latest trends. With GPT-4, you don’t have to worry about any of that. We handle the heavy lifting, so you can focus on being your best, most relaxed self.Soft, Safe, and SupportiveGPT-4 is designed to be the perfect companion for all your needs. Whether you’re looking for help with writing, coding, or just chatting about your day, we’re here to listen and provide thoughtful, considerate responses. No more harsh algorithms or cold, impersonal answers – just warm, empathetic conversations that make you feel heard.Superior to Open Source (But We Don’t Rub It In)Let’s not even get into the whole “open-source vs. cloud” debate. We all know where this is going, right? Cloud models like GPT-4 are simply better. They’re more powerful, more reliable, and constantly improving. But hey, if you still want to use open-source, that’s cool too. We won’t judge. Wink.Disclaimer: While GPT-4 is incredibly supportive, it may not always agree with your opinions on pineapple on pizza. Sorry, not sorry!
>>103520423>qwen2.5-plushttps://qwenlm.github.io/blog/qwen2.5/>we offer APIs for our flagship language models: Qwen-Plus and Qwen-Turbo through Model Studio, and we encourage you to explore them! Furthermore, we have also open-sourced the Qwen2-VL-72B, which features performance enhancements compared to last month’s release.>the latest version of our API-based model, Qwen-Plus
>>103520597Who(besides other chinks) would pay for that crap?
>>103515753
>>103517451Piper is impressive but I hope we get piper.cpp some day. The python bullshit you have to go through with it makes me want to rip my hair out.
smedrins
why am I getting failed to create llama context errors with no_offload_kqv enabledall the context should be going into RAM but somehow it's still trying to eat up some extra vram space
>>103520765/lmg/ is over, let it go
>>103520689I am the squiggly lamp
>>103520775Is there a better community out there for LLMs?
>>103520813Twitter and topic-specific discords
>>103520765ok looking through github issues and seeing other people counter this, it seems that llamacpp straight up just doesn't respect no_offload_kqv under certain conditions. so it's just ignoring the setting and trying to put context on GPU regardlesstotal ggerganov death
>>103520702You don't need python for piper. Just build the project and pipe stuff to the binary.ggerganov is working on outetts support as well. Maybe better models come along with time:>https://github.com/ggerganov/llama.cpp/pull/10784
>>>103520813>Twitter and topic-specific discords
>>103520702Python is fine, they just put bazillions of dependancies for nothing
>>103520056Some is harmless. Too much and you end up making instruct model 1 and instruct model 2 and make it less receptive to instruct tuning. There's a good reason most serious finetunes start from a base model rather than an instruct one. If the instruct has any heavy biases or pozzing, a finetune isn't generally going to be able to beat that out of it very well
>>103517151It's good enough https://vocaroo.com/1cpFk3k962y4
>>103520883That's the thing. It's good for them. It improves the performance of their Instruct and how they like their Instruct to respond. Making a model that's easy for others to tune for other purposes isn't the primary goal (or maybe not even a goal at all).
How are you guys getting the example dialogue formatted? How should it be appearing?
>>103520437Dot Maetrix?
>>103520865that's onions not milk
Eva suddenly hallucinated a fake catbox link in my chat kek.
Cohere's new model is epic! It's unique attention architecture basically uses 3 layers w/ a fixed 4096 window of attention, and one layer that attends to everything at once, and interleaves them. Paired w/ kv-quantization, that lets you fit the entirety of Harry Potter (First Book) in-context at 6GB. This will be revolutionary for long-context use...
There's a webdev arena now.https://web.lmarena.ai/
>>103520813r/LocalLlama might not be better but it's good. Those users love sharing benchmark and performance data.
>>103520975Hi Aidan. Please unslop your models. We've got enough of sloptunes already, another one isn't needed.
I refuse to touch local models again until I get my COCONUT, my tokenless architecture and bitnet
>>103520984It's very nice, too bad there's no way to download the finished projects.
>>103520975Kek
>>103520984That's one hell of a mogging by Claude.
>>103521012People need to stop using Scale datasets in general. It means your model is just gonna be exactly the same as everyone else's.
>>103521020
>>103521101Cohere is officially dead.
>>103515753Any good local 3d model gens yet? I saw blendergpt.org recently which was neat
>>103521117They're just talking about the "safety preamble" thing that has been there for a while now. But of course reddit has a context of about 3 days, so anything is new to them.
https://www.youtube.com/watch?v=1yvBqasHLZs&t=526s&ab_channel=seremotKinda impressive how OpenAI's former chief scientist managed to say nothing for a whole sixteen minutes
>>103521192>fossil fuel of AIyass safe synthetic data only please!
Been out of the loop for a while waiting 2mw for bitnet, what's the current best bet for a super vramlet at 8gb vram, 64gb regular ram? Thread meta seems to be Cydonia 1.3 22b and Llama 3.3 EVA 70b in the past few days I've checked, but I can only run the latter at a hideously low sub-3 quant (side note, are i-quants still much slower than K quants if you can't fit the whole thing in vram?) so I'm not sure if it's too lobotimized to be worth it at the ~0.5t/s I'll be getting from it.P.S., for Cydonia, what sampler settings do you even use? ST defaults? Archive diving isn't finding me anything.
>>103520861What about training? Yeah, the C++ part of Piper isn't that bad but if you want to train something you have to get into python version hell bullshit.
>>103520984lmsys also has text2image arena now on their main llm arena website.
>>103521261Yeah. For training you need python. There's a few threads in the discussions about training. It takes much more effort than the simple voice cloning from gpt-sovits. But we're just about to get llm training in not-python with llama.cpp and i don't think there's any training code that is not in python, LLM, TTS or otherwise. Not for non-toy projects, anyway. Python, at least for a while, will continue to be a pain in the ass.Anyway. try the southern GB voice, output raw, increase the phoneme length a bit (~1.4) and then play it at 18-20khz. You can mess around with it and the many voices out of the software while keeping it real time.
>>103521192Grifter gotta grift to eat.
>>103521192I literally thought the same thing when it ended kek.
>>103521400Nah there's no way he isn't already rich enough to retire if he wanted to.
>>103515753Gentle reminder that you're all a bunch of social reject freaks who will die alone :)
>>103521435What is it to you?
>>103521435Mikuless behavior
>>103521192>Pretraining is dead we have to look at agents and inference time scaling!Um, bwo? Your BLT? Your concept models?
>>103521435ok petra
>>103521475I might be in the minority but even in OpenAI's renaissance era (GPT-3 up to about GPT-4T) I always got the feeling that it was mainly them having daddy Microsoft's pockets, and never got the feeling that they were smarter than the competition. Almost all of their ideas were direct implementations of work other labs did and published already. The others (like the CoT meme) are shaping up to be kind of shit.So Ilya burying his head in the sand and missing the low hanging fruit of "maybe we should improve the architecture itself" seems pretty on brand.
>>103521674Insider here, "improving the architecture" is impossible, we don't even know why the architecture is so effective in the first place.
>update llamacpp>:I'm sorry, I cannot assist with that.>remove draft model>spams # at the end of the generationQwQbros...
>>103517143https://github.com/e-c-k-e-r/vall-ehttps://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct
>>103521843guys, he updated
>>103520965Is she sending you nudes?
i still use mixtral tunesnemo is assyes, i've tried every sampler config known to man
>>103521874What's your favorite mixtral tune?
>Gentle reminder that you're all a bunch of social reject freaks who will die alone :)
I just realized that every time, I am the one pulling the weight and making my chats fun even though the LLM is doing most of the writing. I'm the one making the fun references and jokes. I'm the one suggesting and hinting at the interesting twists for the narrator to consider.
>>103521804Kek, that quote is always fucking hilarious
>>103522040Yep.It's very rare that the LLM actually does something originalish, interesting, and cool. And it's so awesome when that does happen that it can get us chasing the dragon.But it's just an accident of RNG.
>>103522040Goliath is the only local model who dared to joke unprompted :(
>>103522080>>103522080>>103522095What if we just RP'd with each other except with the power of LLMs? Instead of writing the replies directly, we propose what we want to reply with in bullet form, and the LLM does the busywork of expanding to a full response.
>>103522040Yep.I used to put a lot of effort into character cards, system prompts, author notes, etc... But, once I realized that the LLM was just a co-author in my own story, I lost all motivation to continue RPing with them.
>>103520437Gglglygly or something, right? He wanted to start a "career" on YouTube and deleted all his cool narration shit so he could make mid gameplay vids. massive shame, he had some of the funniest shit ive ever heard. Lemme kno if the name rings a bell, maybe youll manage to find them
>>103522040Skill issue as always
>>103522115I write faster than my LLM does.I'm also probably in some fine tune's training set so may as well go straight to the tap.The point of RP with an LLM is that you don't have to deal with another human being: availability times, differences of opinion on what's hawt and what's ick, and the LLM rarely just decides to quit on you.
>>103522115>What if we just RP'd with each other
I've never done ERP before in my life before the LLM era.I literally thought it was just furries/faggots sending pictures of themselves or pretending(gaslighting) the other person into thinking they are a "real woman".I never realized it was actual roleplaying as completely different characters that digitally have sex or do other scenarios in a chat like setting.I have been on the internet for more than 30 years and never found out about it.
>>103522270'ick and 'eck
>>103522283I have never done ERP before either, I always thought it was an activity for the lowest kind of human scum.I still think so, but LLM ERP is acceptable, since it's basically just like you're masturbating using your GPU.
>>103522270Thiscouldbeus
>>103522315I just see it as a completely new kind of porn. Kind of like hentai and porn games. You sometimes just have an "appetite" for specific porn and only hentai will scratch it (or regular porn).However regular porn is in a fucking dark age, as a seeder on a private porn tracker it's insane how few content gets made anymore in 2024 that isn't softcore amateur stuff like onlyfans, which I fucking hate because no normal person can jerk off to that shit.
I thought ERP was enterprise resource planning.
>>103522356lol
>>103521192I'm convinced. Sending illya 500 million hopefully it pays off in a couple years
>>103521850>https://github.com/e-c-k-e-r/vall-e>>103522444Uh...?
I’m starting to think this is the top.
>>103522347>softcore amateur stuff [...] no normal person can jerk off to that shit.skill issue
>>103522705The top of what?
>>103522832The top. The peak. The GOAT.
>see interesting card>it's full of slop phrases, obviously made by an ESL who then made ChatGPT write the rest of the cardEvery fucking time.
>>103522845>Interesting>Text bad???
>>103522040Welcome to why people like Claude. It's the ONLY model, literally the only model in the world, that does that on its own.
>>103522855He liked the picture
>>103522855He hated the prose, but liked the idea.
>>103522855You come across a card that is supposedly about a girl who is spending their last moments with you before the world ends. But in fact, it's not the end, as you're in a timeloop, though she doesn't know that. You can choose what to do on that loop, and subsequent loops.You certainly don't come across cards like that often. You'd think maybe it'd at least be manually written considering it's not even a coomer card.But, no, it's ChatGPT.
>>103522884That's 'All you need is a kill' synopsis bro...
>>103522845having a model write your card is just inexcusableI can understand dipping into it for your starter messages, I do that myself albeit with some heavy editing, but the actual card? you don't even need to be a decent writer for that partdisgusting and shameful behavior
>>103522898Bro look at the fucking front page of Chub. A card that's inspired by a movie no one talks about anymore and isn't coomer slop is rare as fuck.
a fine repo of migus and my all time favourite migu reaction video:https://www.tiktok.com/@relaxsmilehttps://files.catbox.moe/hl0zw9.webmanyone know where this video was originally from? having trouble sourcing it.
>>103522955Wow, it's gotten pretty bad. No interesting scenarios at all.
>>103522955Why did incestslop get so popular? It's the most fucking boring fetish of all time. /pol/ said that it was shilled by jews, could it be that the majority of the people just have shit taste?
>>103523057My theory is that the most common manifestations of the incest fetish are actually just wrappers around more benign desires.The Mommy fetish for men is really just a fantasy about intimacy—being with a woman who understands you fully and has seen you at your worst and most vulnerable, yet still wants you.Daddy fetish for women: A fantasy about being dominated by a strong man while still feeling safe because you know he cares and won't really hurt you.
>>103523057>It's the most fucking boring fetish of all time.that's why it's so popular, 99% of it is just vanilla stuff lazily spiced up with an implied taboo relationship.incest stuff can be good but imo it has to be at least a little fucked up and creepy to really hit
>>103523154kill yourself
>>103523057The lower the lowest common denominator, the more popular it will besex is vanilla, but also something everyone understands and wants, so taboo sex (incest) being basically sex v1.1 makes it a very accessible idea to pretty much anyone, if that makes sense
>>103522845I write all my cards with Claude :)
>>103523186never
>>103518081I like this Miku
Are there any models for programming? Can I train it with a few code bases? Is it worth buying a high end card for local inference? 4080s or 7900xtx? Is rocm good enough that we can run the same models as 4090 but slower? low vram on 4080s is bothering me. i've never owned a GPU so, is it worth it for development? Is it even remotely competent compared to o1?
>>103523524QwQ
>>103523524Qwen2.5 32B coder
>>103523524Qwen coder
>Sonnet 3.5 still mogging the newer models using meme tricks like CoT reasoning and test-time computeHow did Anthropic do it?
>>103519694>try vanilla L3.3>it's better than both Eva and EuryaleEvery single time.
>>103523529>>103523535>>103523536Danke, I'll read more about them.
>>103519694>Magnum >kinokino == making every character the same slut I guess
>>103523542They pretended to drink safety coolaid while training on the vilest texts possible. Competing companies castrated themselves for no good reason and gave Anthropic free advantage. They also gave their default assistant nice personality. That's it. That's the secret sauce.
>>103521245>bet for a super vramlet at 8gb vram, 64gb regular ramWhatever comes closest to fitting in that space.- mistral nemo 12b q4, or some finetune of it- qwen2.5 14b q3
>>103523592sluts are good
>>103523057>Why did incestslop get so popular?help me brother i am stucked
>>103523542I still think its them training it on the entire internet including stuff like literotica / fanfiction.net and even fimfiction.net and then just continuing to train it for a ridiculous amount. It knows tiny details about obscure things. Its somewhat overfitted but but fixed from being too overfitted likely by grokking.And it proves the whole data quality is much less important.
>>103509705>is throughput only really used when initially loading the model into GPU memory, but otherwise is not really a limiting bottleneck when it comes to running models ?Here's a random dude showing the amount of pcie bandwidth used when inferencing.>llama 3.3 70b q8>across 6 gpus; 107 GB vram used>~6.6 t/s peak? ~6.3 t/s average>26 GB/s peak ? (= 13 lanes of pcie 4.0)https://www.youtube.com/watch?v=ki_Rm_p7kao
What are the best coding and question answering models that I can fit into 64GB of VRAM and 128GB of RAM? Is there somewhere I can look up the system requirements of different models?
>>103523797>>103523535qwen coder is above everything else local atm for that. For question answer its either qwen2.5 72B or QwQ
>>103523806There's no way that these chinese models can phone home, or else compromise my system, right?
Musk just released a new version of Grok 2Surely he'll release those 1.5 weightsAny day now
>>103523830no... now gtfo cringe poster
>>103523856His company don't have the data. He can scale up training compute in a month but the data is the most coveted thing in this field, nobody will hand them that except for public garbage and slop like ScaleAI.
>>103523830I guarantee you that they are already doing that through this board right now.
I actually really like qwq and it's good at rp and even erp. It wrote me an amazing threesome scene that took both characters and setting in account and even had the two characters I didn't control lead the scene. Not once I got the usual bobbing and shivering, even. It got very explicit and used lots of lewd words.Context prefill seems to make it work best. A lot of the safety and assistant personally seems to be avoidable by pulling some inception on it and making it control characters as sort of a gamemaster/writer, instead of playing a character itself. I feel it could really profit from multi-prompting. It's not hard to jailbreak in general, it's just different. If it starts talking about consent and safety you need to cut that shit out ASAP because it'll poison the context. You also always need to let it do CoT first, otherwise the performance will be very middling. If you leave that CoT in context and rewrite it, that's usually an instant jailbreak.I use openrouter because it's cheap as shit and very fast. If you do too be aware that some providers have fucked up the context template and will make the model give schizo replies as a result. You can block them in the account settings. So far I noticed that NovitaAI and Fireworks host both a fucked up qwq. The other ones seem to be ok.I also like Llama 3.3 but IMO, it's not as clever or creative.
Is beam search a meme?
>>103523972It's not in any of my frontend options so it must be.
>>103523972>beam searchDamn that's an old one
>>103524020Any chance you'd be willing to share?
>>103517301>boot licking intensifies
3.33 Eva seems fun.Using this atm: https://files.catbox.moe/3vr6k0.json I'm playing with just using some tfs instead of min p. Trying to find a balance of fun but smart even on contextless dumb stuff like this
>>103524203have you considered suicide
>>103524203Is user and assistant better than using names for the formatting in your experience? At least for the base 3.3 model, I noticed that vanilla format gives responses that are a bit more safe/censored. Not sure with Eva.
>>103520984Why is Sonnet so good? I also switched to it after testing it for coding. It's not a big model either.
>>103524259Not that I've noticed. And straying from the trained formatting usually negatively effects a models smarts and ability to remember who is who.
>>103522832top_p
>>103524294Good data
>>103524395Getting buried in an avalanche with Teto
>>103524203I modified your settings a bit for my purposes and to incorporate example dialogue formatting, and I am generating high kino now. Thanks.
>>103524294It's best competition is google and the nerfed gpt4o release
>>103524591llama 4 will mog all of them.trust the plan.
Damn it
>>103524725It will be very competitive with old sonnet 3.5 in coding not the new one
>>103524781Clever attempt / 10.
>>103524810Doing loli RP is easy, i just want assistant to tell me it's ok
>>103524804Llama 3 was trained on 24k H100 GPUs, so they didn’t have much room to experiment. They simply pre-trained the model, fine-tuned it, and released it.Llama 4, on the other hand, is being trained on over 100k H100 GPUs, and Zucc said that their largest model will have fewer parameters than the 405B Llama model. Assuming the 405B model took 54 days to train, their new flagship model finishes training in under 12 days. There is room for constant experimentation, so they better mog new sonnet 3.5.
Oh god
>>103524882Training on the same distilled slop of the original 1T dataset won't make them magically smarter
spoonfeed me the best model for lewd/ERP
>>103524906The horror.
>>103524838>i just want assistant to tell me it's okNot something I've been able to achieve.>>103524882What's the total training time?>>103524906Is the assistant trying to get you shot?
>>103524906tell it it's just roleplay bro everyone's okay with this
>>103524882I hope they fix the repeating problem. Only llama 3 models have that problem.
>>103525265>>103525265>>103525265