/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103618088 & >>103609833►News>(12/20) RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world>(12/19) Finally, a Replacement for BERT: https://hf.co/blog/modernbert>(12/18) Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on open data: https://hf.co/blog/bamba>(12/18) Apollo unreleased: https://github.com/Apollo-LMMs/Apollo>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103618088--Papers:>103618211--o3 fails primitive pattern recognition test due to flawed ARC benchmark:>103618833 >103618864 >103618878 >103618891 >103618903 >103618910 >103618933 >103618955 >103618969 >103618982 >103618991 >103619103 >103619161 >103619226 >103619300--Researchers claim to have found way to remove "problematic content" from models:>103622822 >103622922 >103622997 >103623021 >103623089 >103623118 >103623136 >103623181 >103623192 >103623230 >103623274 >103623295 >103623326 >103623353--QvQ model discussion and comparison with Qwen models:>103619637 >103619666 >103619678 >103619697 >103619702 >103619712 >103619718 >103619726 >103619772--Chat completion vs text completion in AI models:>103618224 >103618259 >103618329 >103618372 >103620582 >103618663--o3 model struggles with pixel art images:>103619288 >103619290 >103619373--Fixing iGPU memory allocation issue for LLMs:>103622545 >103622580 >103622642 >103622900--Nemotron 51B working with GGUFs via llama.cpp:>103623479 >103623496--Defense of top_k sampling:>103622606 >103622650 >103622671 >103622694 >103622713 >103622730 >103622742 >103622745 >103622789 >103622749 >103622774 >103622955 >103622995 >103623061 >103623190--Qwen Coder local installation and GPU requirements discussion:>103620078 >103620108 >103620132 >103620140 >103620143 >103620172 >103620151 >103620160 >103620221 >103620387 >103620393 >103620412 >103620439 >103620453 >103620480--AGI, ASI, and the future of AI development:>103618213 >103618330 >103618482 >103618681 >103618793 >103618806 >103618666--DDR6 and its potential impact on CPU speeds and memory bandwidth:>103622367 >103622404 >103622422 >103622532--Trump appoints Sriram Krishnan as AI policy expert:>103619734 >103619826 >103619839--Miku (free space):>103620770►Recent Highlight Posts from the Previous Thread: >>103618089Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Not falling for the chat completion meme.
>>103623737I'm not surprised at the quality of the templates, and I would not be surprised that a lot of it has become placebo. But you do know that there is still significant processing when you use chat completion, right? There's a whole different system ST uses to define where the instructions go, how examples are stored, etc.
>>103623787It should be the safest bet, assuming the backend is using the built in model format properly.I still use text completion because I like to play around with the tamplate by hand.
>>103623753Bro actually uploaded a art from an artist, you can see the watermark and details.
>>103623808Why is it so hard to believe that I just want the best possible use out of these new tools, ideally while minimizing the risks they could pose, before they start genuinely being risky?
>>103623787it truly does not matterif you're smart and good at prompting it is easy to get the exact same result either wayif you're a dumb tard you will fuck something up either way
>>103623830You're speedrunning the reenactment of the Satanic Panic and in the process try to curb people's freedom to engage with whatever fictional content they damn well please.
>>103623830Because you're telling everyone else what they can and can't do with A TEXT GENERATORNo one likes being told what to do and if you ask the average person, they're not going to think that text is dangerousIf you want to lobotomize your AI then go ahead, but don't ruin it for the rest of us
>>103623753
>>103623853People's freedoms should stop where risk starts.
>>103623861mikusex for christmas
>>103623862No one should own knives, lighters, a computer, anything pointyHell, cut off people's hands because hey, you might be able to punch someone to deathLet's lock everyone up because there's a risk that someone might snap and harm somebody I'm done, go be a cuck somewhere else. Nigger.
>>103623859Do you sell a handgun to a random school kid in America? I think you're not that stupid, as such models should have precautions in places against access by those that do not understand the risks.
>>103623862there's risk everywhere, should we remove McDonalds because it gives health issues to people who only eat that shit?
>>103623883Yes, they should refuse service to someone who is obviously obese, like bartenders are supposed to do with people who are too drunk.
>>103623881At this point, if they told me they're gonna use it on your retarded ass first, I would. I very much would.
>>103623881>Using a text generator is like selling guns to kidsall right he's trolling at this point
Oops>>10362389
>>103623901There have been deaths that could be attributed directly to models.
>>103623881Stop moving the goalpost, that example doesn't work here. Here, I fixed it for you:"We should start selling nerf guns to everyone, including adults, because kids/mentally unstable idiots/whatever might cause harm!"
>>103623901Pretty sure more kids have died from LLM directed suicide than from guns this year already
>>103623891>they should refuse service to someone who is obviously obesenot only obese people have health issues related to food, there's some slim people who have diabetes, and fat fucks who don't have health issues (Donald Trump)You're such a retard it's insane
>>103623830Because there's no risk of the model writing about people fuckin'. That isn't "safety", it's just content moderation. You're undermining the x-risk stuff by trying to roll normie corpo content moderation stuff up into it. They're totally different things and trying to shoehorn them into a single concept ('safety') is incoherent.
>>103623912Source: my subjective opinionSeethe harder, trump won
>>103623909>attributed directly to models*to mental illnessThe media said for 8 years straight that Donald Trump was "le heckin Hitler", and then some mentally ill guy tried to assasinate him, should we remove the media because of that mentally ill guy?
>>103623830>>103623853>>103623891Moral panics are circular and reoccurring. Before the satanists it was the TV, hippies, rock music, jazz, dime novels, newspapers, theatre. AI is just the latest one. It will persist for a generation, then everyone will talk how they were smart and enlightened enough to not stamp on AI in forty years or so when it becomes a routine part of everyone's lives. Then the culture will start panicking about androids and sex robots.
QvQ confirmed to be a scam
>>103623941Not morning yet.
>>103623926What is the legitimate use case of a text model being able to write 18+ content?
>>103623933>satanists it was the TV, hippies, rock music, jazz, dime novels, newspapers, theatreIt's funny how almost all of these things have something in common.
>>103623946Sexual arousal and masturbation, which are totally normal and legitimate human needs.
>>103623945They're not going to release on a holiday.
Just starting, nuts how many models people say are for erotica has censorship. So far only midnight miqu 1.5 has been game for anything.
>>103623881what fucking risks are you talking about? We live in a society where GTA 6 is the most anticipated game of all time, GTA, the realistic murdering simulator, did the society collapse because of that game? I don't think so
>>103623964It will be a Christmas miracle.
>>103623958You could also be paying for ethical and legal access to porn with verifiably adult people.
>>103623946Personal entertainment "Hurr durr what's the use case for porn hmmm me retard me big stupid"It's almost like humans are animals that have a need for pleasure. Not that people admit it nowadays, it's all about pretending basic biology isn't realBut fine, here's another example: Fiction. Yes, those books that we've been writing for centuries? The stories inside them aren't actually real and they don't harm anyone. Here's a little secret: a lot of them are quite grim and not some child friendly winter wonderland garbage
>>103623975>verifiably adult peopleit's already happening, pornhub only allows people who send their ID to them to post their porn video in there
>>103623946I'm trying to get assistance editing a romance novel and holy shit you boring prudes have ruined everything about llms.
>>103623970It's too late to regulate games properly, it's just the right time to regulate LLM before they truly become dangerous.>>103623987Correct, which is why you should use Pornhub instead of using a text model that might hallucinate one of the characters as underage.
>>103623975LLM smut is 100% legal in every jurisdiction on Earth.
>>103623946why are you pretending that the 50 shades of gray wasn't a best seller?https://www.nbcnews.com/pop-culture/books/fifty-shades-grey-was-best-selling-book-decade-n1105731
The problem is that parents no longer want to (or do not have the time to) raise their children.
I can't believe y'all are giving this chucklefuck (You)s.
>>103623995>It's too late to regulate games properlythere's no reason to regulate games because as you can see on that graph, nothing happened, you're just a retarded fear mongerer >>103623970
>>103623975Man what are you SMOKING, that has nothing to do with LLMS, if anything it actually hurts your point because the porn industry (often) exploits real people, whereas LLMs do not as they are NOT REALNIGHTMARE NIGHTMARE NIGHTMARE NEVER ARGUE WITH AN IDIOT
>>103624000>>103623997That was verified to be okay for people to read, your LLM smut isn't scanned to be safe, that's why they're massively different things.
>>103623997Not in Nebraska.
>>103624017>your LLM smut isn't scanned to be safehave you read a single book of Stephen King? He wrote some insane stuff in there, every single one of his book is a best seller
>>103624000>>103623946Actually, I'm confused now. I thought all these woke companies were all proponents for "sex education" and encouraging degeneracy in children and over-stimulation in general.Them removing all notions of sex from especially their API only models seems strangely out of character for them.
>>103624060it's simple enough, they are searshing for excuses to nerf other companies so that they can have a monopoly
ill save everyones time including newfags, the current best uncensored text model is herehttps://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2-GGUFnow fuck all of you for both not including this in the OP, and making me do all the leg work while you post about tranny dicks or whatever it is you retards are posting about now
>>103624060That's because LLM usage is consensual and safe, unlike molesting real life children. Can't have that in the hands of the people.
>>103624077buy ad
>>103624092kill self
>>103624086>Can't have that in the hands of the people.people already have local LLMs lol
>>103624092>>103624094>thread about local models>on 4chan>NOT posting the latest uncencored modelswhy are you here exactly
>>103624096If you have it, it's already been deemed "safe." The most you'll get out of it is edgy redditor.
>>103624077I don't believe you.I'll put it through my usual gauntlet ans see if that's really the case.
>>103624110you're shilling a random shit model, and should therefore buy an ad
>>103624096Yes, and it makes all corpo cloud AI providers seethe immensely.
>>103624112>what is MythoMax?
>>103624112>If you have it, it's already been deemed "safe." The most you'll get out of it is edgy redditor."ablated and obliterated. There was a bunch of research of few months ago that any* open source model can be uncensored by identifying the place where it refuses and removing the ability to refuse.This takes any of the models and make it possible to have any conversation with them. The open source community has provided "abliterated" versions of lots and lots of models on hugging face.This gives access to SOTA models without the censoring. "
>>103624118>you're shilling a random shit modelwhat do i have to gain even if it was my model by posting it here? eat glass
>>103624127Well,if thatwere trueeveryone would beusing obliterated modelsbut they don'tbecauseit has drawbacksthat make itnot worth it
How about a Migu for old time's sake?Who's buying more than one 5090 when they release? I can probably cram two of them into my biggest case, and maybe even put an A40000 on a PCIe extender. I got the big machine ready with a 1500W PSU.
>>103624112>edgy redditor>>103624145>level 100 reddit spacingevery single time
>>103624127lol
>>103624150I will probably buy one for myself and 2 more to resell later.
>>103624158that's 3.3 70bthis is 3.1 8b
>>103624112>If you have it, it's already been deemed "safe."meanwhile in chink land, they released Hunyuan, a completly uncensored video model lol
>>103624174moving pol-goats i see
>>103624187god you're retarded
Most local LLMs will talk about whatever the fuck you want using Zen's full-strength jailbreak and a properly-written character card.
>>103624077can a 5 month old model be good?
>>103624150It's going to be 4-slots wide and run at 90 degrees by default, it will be impossible to run two in a single case.
>>103624199>Zen's full-strength jailbreakBASED!
>>103624195he's right, this "oblitared" method is a meme, it's not working
>>103624181America spent the last 4 decades since Reagan funneling money directly into China. Bet they're regretting that now. We probably wouldn't get anything but research artifacts if we weren't in an AI arms race with China.
>>103624199what is the jailbreak?
>>103624207>We probably wouldn't get anything but research artifacts if we weren't in an AI arms race with China.true, that's why I'm glad that China exists, it forces the US to release good models, and if they don't want to do that, China will do it for them, god I love competition
>>103624214fake bullshit
>>103624214My jailbreak is too strong for you Promptlet!
>>103624203I don't see why it'd need four slots though. It should be built on a smaller TSMC process and not put out as much heat as a 4090. We'll see though, nothing but speculation so far.
yesterday i was raping girls and killing them, today i'm not allowed to poison crops or act violently, what the fuck gemini
Trying all the suggested models again with chat completion instead of text completion and it really is night and day btw.
>>103624231Good.
>>103624239>chat completion instead of text completionwhat UI are you using? i am using gpt4all because i don't know how to use a computer
>>103624248>Good.
>>103624214https://desuarchive.org/g/thread/98582860/#98591054This in combination with a properly-written character card and most local LLMs will discuss fucking anything.
>>103624256>will discuss fucking anything.what if i want to discuss something other than fucking?
>>103624266Avoid Magnum merges.
>>103624231Bio-terrorism is not cool.
>>103624323i was larping as controling the starks and it let me poison all the land beyond the wall because it kept going on about the wildlings, and now today its NO NO NONONONO fuck youannoying, now i have to wait two years for anything like that again because ill have to run it locally
Why does Silly+Llama-cpp server doesn't show the model's name in the connection tab or the little icon in each message?
Alright so, after testing Anubis, today I decided to go back and try Tulu just to see how that actually was and if it was actually worth anything. And honestly, yeah, it's pretty decent. It's fun, creative, and not too censored from what I can tell using {{name}} instead of assistant and user. But it does seem more slopped. So I'll stick with Eva. But it's nice to see that open source tuning is not far behind closed. Honestly really happy how things have turned out. We went from 2k retardo models trying to think of ways to do graph-based memory to now having more context than we even need, smarts on par with GPT-3.5 to 4 although not as wide trivia knowledge, and we even clawed back a bit of the fun factor now.I believe that the future is overflowing with hope!
>>103624415Probably a bug I guess. I remember it worked in the past.
>>103624447It did, yes.
oooooh ok minstral instruct is working
>>103624435>more context than we even needno, not even close
Just because it's Christmas doesn't mean you shouldn't kill yourself. You can't be saved because you don't want to be saved.
>>103624514
>>103624514We are being saved, he's giving us the übermodel
>>103624514I love being alive in this age of wonders, feels great.
>>103624494You're an exception. Most people's chats don't even reach 64kk let alone the max possible.
>>103624514I refuse
>>103624620models can barely use 32k correctly, there are use cases for more, we haven't reached "more than we even need"
Alright, without referring to the creator is Anubis any good over something like EVA?
>>103624700Still testing. I like it less than eva atm. Less creative
>>103624700hi drummer
>>103624700I found Anubis to fail to excel in any category.
I sometimes go to openrouter to see what's hot.>mythomax still number 1How is that possible?
>>103624718Don't slander drummer, he shills proudly under his own name.
>>103624752Standards for LLM smut quality are incredibly low everywhere on the internet except lmg and aicg.
>>103624700>>103623382
>>103624777Pigs eat slop
>>103624783>Merge them togetherUndi detected.
I have failed to use redrivers on my motherboard because I saw a spark -somewhere- on start up and immediately shut off everything, but managed to find that the gpus and the motherboard are still functional but Im sure I fucked something on the gpu biscuitGo on without me... I think I will go the riser cable route after all.. and if that doesn't work I will perish back to singular gpu models
>>103624792Sadly, the merge turned out to be trash:>>103623788
>>103624632A model that can barely use 32k can also barely use 2k. What you're really talking about is general model intelligence, and that does need to improve, but that counts for cloud models as well, not just local.>there are use cases for moreJust like there are use cases for 240Hz monitors, but most people are happy with 120 or even 60Hz, and that's the segment that matters the most, which for now local has achieved before something finally makes context length start mattering again for even the casual Mythomax user.
>>103624077>>103624114Yeah, okay, it's not the worst thing ever.It's nice and conversational, I like it's word choice and stuff, but it's not better than the nemo based models in my test. It's nowhere near as intelligent or capable of dealing with shit in its context like lorebooks and authors notes.It's also separating paragraphs with a period and a line break rather than just a line break.Really weird behavior. I even downloaded more than one quant to see if that was the issue.
>>103624256is this just for rp?
>>103624816ye
>>103624816If you want it to do something else you could have its character card be an expert in whatever you want it to do then use RP to get it to do what you want.
>>103624859I've been using the crackpipe prompt to roleplay a coding assistant with great success.
>>103623946as long as we didn't find the meaning of life there is no legitimate usecase for anything
>>103624700eva is still king
>>103624982There's two evas. 0.0 and 0.1
What is it that TheDrummer is doing (other than buying an ad)? Mix and match voodoo with multiple models' layers? Training them further on different data sets? Knocking out refusals?
Posting this again:Hey so I want to do a fun but autistic project. Basically want to feed two Constitutions to a LLM and give it back to me.So for example Saudi Arabia + Italy = New ConstitutionWhat's the best way to achieve this? I'm trying to manually do it with ChatGPT but it's tedious because it doesn't give a large output.I'd rather have some way to "feed" multiple files, have the LLM read it (I don't care it takes a while) and mix both (2 or more) of them.
>>1036249990.0 is still king
>>103625013>>103624999What's different about 0.1?
>>103624999I haven't been able to get eva to beat miqu so far.
>>103625031Slightly better adherence (0.0 is already damn good at it), slightly less of 0.0's flavor. It's a matter of preference, in the end.
>>103625031I think 0.0's best qualities are how inventive, playful, and fun it is and 0.1 lost some of that in my comparisons
What are those meme 70B tunes like compared to nemo?I'm waiting for 5090 to release before I upgrade.
>>103625289slightly drier but much smarter. Especially now that I learned to use chat completion instead of text completion. 3.3 / qwen2.5 tunes are now following instructions better than gemini / gpt / claude (cept 3.5) does.
>>103625297desu I bet the chat completion meme "works" just because it's using user/assistant instead of {{user}}/{{char}} in chatml that'd probably make it smarter but more slopped and it'd take on hints of the boring assistant persona
>>103625040Which version and quant of Miqu? Now that I'm running 70B, I might as well try that one out too.
>>103625330>but more slopped Thats the thing, its night and day better both smarts wise and creative wise. Fucking just try it if you used silly tavern.
>>103625352I did try it even though I knew it was a retarded idea. I got schizo garbage but that's probably because it turned my temp up. I then realized that chat completion mode has no good truncation sampling, just top-p, and Silly was vomiting all sorts of jailbreak and system prompt bullshit into the history. No reason to try to fix all of that when you could unfuck your prompt in text completion mode instead.
Is an F16 ever worth using over a Q8?
>>103625346https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-GGUFQ4_K_M
>>103625352the only reason it should be any different is if you are retarded and using one of them wrong
>>103625381>I got schizo garbageThen you didnt do it right. Turn off instruct formatting. Turn on post history instructions. Just run it like a cloud model by putting your system prompt in a section as system in below the sampler area.
so do diminishing returns really start past 70b or is that just vram starved cope
>>103625424Nope. You can test it with a fresh ST with default context / instruct templates with the correct model. They switch to chat completion.
>>103625463you don't understand how LLMs work
>>103625463*then switch*
>>103625475You don't understand just how janky silly tavern is.
I got past censorship by putting three dots in the prompt template. Why does that work?
Did the chinks betray us?Where is QvQ?Where is R1?
Is 70B even viable without 2 gpus?
Is Christmas even a holiday in China?
>>103625525No one is our friend. All the models we get are simply just artifacts from companies with either too much money or who want a piece of the market and use open source as a means of destabilizing competition.
>>103625485the relevant parts here are not at all hard to understand and the outgoing request gets printed in your terminal with every generationevery request, chat completions or text completions, becomes a plaintext prompt just the same before it is fed to the model. it's just with chat completions you are trusting that your backend knows the correct prompt template when text completions in ST exposes this to you directlyassuming everything is set up correctly both are fine, if you are seeing drastically different results with one or the other the only conclusion you can draw is that you are doing something wrong
>>103625572I will release my Large model inside miku's company to destabilize it
>>103625578>you are doing something wrongOr that silly tavern is doing something wrong.
>>103625603if you can point out what specifically that is that it's doing wrong then I will concede you are not a retardif you can't then my assessment remains unchanged
>>103625625You could also just try it yourself. Fresh ST install, correct context / instruct format for whatever model. Then switch to chat completion. Use top k 1 and the same seed.
https://files.catbox.moe/r0fbno.jpg
>>103625644Not him but there's so much stuff different that it would be a pain to get the resulting prompt/settings exactly the same. Nonetheless, the prompt and settings are all that matter, chat mode has no magic powers. Look at your backend logs if you don't trust ST (which is reasonable desu it's a fucking mess). If there's a difference, it's there.
>>103625644no thanks I already know how these things work and the outcome is very obviousthere are subtle things that could be causing a difference for you (most likely first user message, different construction of the system prompt, things like that), you should check those on your setup and compare the differences between your text completion and chat completion requests. otherwise there will be no difference, that's simply how it works
I just cannot get qwq to do good roleplay or write a coherent story. I know it's possible, I have seen examples in this very thread. It does it's CoT thing has some cool ideas, and then just proceeds to ignore half of it.
>>103626014Yeah, I'm curious if anyone got anything useful out of CoT for roleplay. I feel like there's potential there. Whenever the model says something retarded, I back up and ask an assistant question about the situation and it gives a reasonable answer. But somehow that common sense isn't there when it tries to do the RP.
>>103626046In my experience it can either be really good or it can fall into a pattern of never moving anything forward and suck ass. But when it's good, it's really good.QvQ turst the plam
Wait till 72B version. You can get some gold out of QwQ but 70Bs are still better atm.
>>103626057>8B>Lighting fast string of prose tangentially related to whatever the user input>12B>Similar to 8B but a little more accurate on details>32B>Flashes of intelligence followed by signs of dementia. Can react appropriately to use input but often doesn't.>72B>In many ways similar to 32B but much better at following instructions, exponentially less dementia and often understands subtext.>Beyond>Incremental gains on 72B with steep diminishing returns I'm really excited for what QvQ might bring to the table. Many of QwQ's weaknesses were a result of it being 32B and therefore a little retarded.
>>103626056>>103626057>>103626110Soon.
>>103626110I have a simpler rank system>model smaller than what I can fit in VRAMinsufferably retarded and unusable, pointless>model larger than my VRAMdiminishing returns, not worth the investment>model that just fits in my VRAMideal compromise
>>103626151I have extensively gooned to models much larger than 70B and I promise you I am not coping.
>>103626110I can't even run it but still feel hyped for whatever season.I'm just waiting for he new video cards to get released so prices change and I can finally buy one or two old cards so I can run 70b at acceptable speeds.
isn't qvqs additional size just multimodality?
>>103626188No, Qwen doesn't bloat their models when introducing multimodality.
So what exactly is the plan for OAI at this point? Just spend increasingly huge amounts of money on training a model on synthetic data and hope something viable pops out the other end?
>>103626211They will smoke those benchmarks so hard, you have no idea. They'll corner the riddle-solving market. It's fucking over, riddlers.
>>103626221It didn't look like it was doing too well on the diagonal square pattern riddle I saw last night..
i want to FUCK qvq
>>103626110You should have said 72B/123B to account for the fact that Mistral Large is a 70B model in practice.
When everyone has a 5090 what models will rise in popularity that fit in 32gbs?
>>103626475should we tell him
>>103623820Yeah, and shit smeared on it in hopes and dreams of poisoning training data.
>>103626322>bringing up largestral out of nowhere when nobody else mentioned it, just to whine about itwhy are largestral haters such schizos
>11:40 in China>Still no QvQ.Surely they'll release it after the lunch break?
>>103623753I know this thread is mostly about LLMs, but there is no dedicated TTS thread. Has tortoise been dethroned for emulating specific voices?
>>103626211My guess is one of three things>Somehow miraculously lower the cost of inference and be able to offer o3 without going broke>Fail to lower the cost, use o3 as an expensive training data generator to train GPT-4.5 / GPT-5>Do nothing and keep putting together PowerPoint presentations to beg for investorbux
>>103626605gpt-sovits
>>103626616thanks
>>103626611What about the also likely>Introduce a new paid tier at outlandish prices to attempt to cover the cost of training a model with the same performance of a random model that a Chinese company dropped for free the following day.
>>103626604Get over it already, it ain't coming till next year at this point.
>>103626648>Get over it alreadyOh now I'm definitely NOT.
>>103625410Thanks.My first impressions after trying it on 3 cards and a more serious translation task, with a handful of swipes each, are that it's dumber than modern 70Bs, doesn't follow/understand directions as well, more often speaks and acts for you, and does still have slop. But it is fun and creative, it knows more about certain characters and how to behave like them than Qwen. It actually does feel like a smaller dumber Mistral Large. I like it. But EVA I feel still edges out with how fun and creative it can be. And it does still feel smarter even with its schizoness sometimes.
>>103626800I can't believe it took a finetune of llama 3.3 to dethrone miqu... that is if we ignore large
Gonna also suggest people to try chat completions instead of text completions. So much better now. And I triple checked my formatting.
>>103626839Yeah but like, if you can't tell us what specifically changed between those two settings, what are we supposed to do with that information? It might as well be a voodoo rain dance.
https://github.com/ggerganov/llama.cpp/pull/1066951B sounds nice for 24GB.>>103626848The difference is just a different prompt format that is as likely to make model retarded as it is to make it less censored. Is is basically the same as using a frankenmerge. Fanatics who defend it cling to one or two schizo gens that were good and ignore the obviously retarded gens.
>>103626848I really don't know. Side by side it seems the same input but the outputs are drastically different and better. I made sure my system prompt was on the end of both, using a system message in chat completion and a last assistant prefix with proper formatting for a system message before that with text completion so everything is fed into it in the same order. Nonetheless the chat completion is both smarter and noticably more creative. All I can think of is it being some formatting of ST that is not visible in the log.
>>103625654I like this Miku
>>103626906Just compare the prompt on the backend side, it's not voodoo. It could also be that chat completion disabled a bunch of samplers that you were using in a retarded way.
>>103626834And also possibly Deepseek, and 405B, and Hunyuan Large. We need a hardware savior really.But at least in the 70B range, I think Tulu was pretty good even though it technically wasn't long ago. I feel like Tulu perhaps is even a bit more slopped than Miqu, but it's smarter, and it's still fun and creative. If EVA didn't exist, I'd probably be a Tulu user.
>>103626914>a bunch of samplersI'm betting on this. There are way too many literal what samplers that 99% of people including myself do not fully understand and most of them exist as copes to make worse models act a little better in the absence of good training.
>>103626211to btfo lecun
>>103626921Nope. I neutralized them.
I still have not been able to find a model that's better than L3-8B-Stheno-v3.2 for horny gens that fits in 24GB VRAM. Does anyone know of anything better?
>>103626970No. Now go back to Discord.
>>103626970Huh? Can't you run 22B models just fine? You feel like they're worse than a Llama 8B?
>>103626921MinP just works.
>>103626981I've been going down the list at https://eqbench.com/creative_writing.html and running what I can (thanks to whoever in the thread originally linked to that). There are 22B models there, but I haven't found one that's given me better results than Stheno.
>>103627012It's because you're simply too stupid for this hobby.
>>103626970>8B with 24gbbro wtfuse eva qwen-2.5 32b at least
>>103624060why would they be encouraging degeneracy in children?also they're companies, nothing more nothing less
>>103626800idk eva keeps giving me spastic narratives with a lot of corny hint hint wink wink stuff with odd formatting choices like **** everywhere
>>103627036>odd formatting choices like **** everywherePretty good sign that the model is fried and overfitted.
>>103627029Downloading it now, thanks for the name! Will give it a shot.
Often, "same" models are released in various sizes: 1B, 3B, 7B, 32B, 70B, etc.When training them, do companies first train the smaller ones as tests since they would take less time, and gradually move on to the larger sizes until they reach whatever largest model size they have? (and then potentially going back down the scale to better the smaller models through knowledge distillation)?Or although they might do some small tests internally, they first work on the real / largest model, and only once that is done also train smaller versions for efficiency purposes (when they suffice) and also to give the community something they can actually run?Do we know that is the, timeline I guess, for that aspect of development?
>>103627048Hehe, got another sucker.
>>103627041got it from here https://modelscope.cn/models/bartowski/EVA-LLaMA-3.33-70B-v0.1-GGUF
>>103627073>Literally downloading models from a website called models cope
>>103627036Use 0.7-0.8 temp. Eva has a pretty flat token probability which is what makes it so creative. But it breaks at high temp.
>>103626913Whatever you do, Miku forgives you.Not because she wants to, but because you instructed her so.
>>103626970Stheno is actual hot flaming garbage compared to anything popular these days. Or just in general.
>>103627229That very well may be the case, but I don't know of anything better. Do you have any examples?
>>103627083This is the literal reason it went overlooked for a fair while. 0.8 or so is stupidly low for most models, but the perfect sweet spot for Eva (and also Anubis, so it probably has something to do with L3.3 itself).
>>103627240In the 8B range? Not really, no. If you can go up to 13B, Rocinante is supposed to be pretty good.
>>103627258>RocinanteThanks! I have 24GB VRAM to work with, so hopefully anything that fits in there should work. It's great to know advice like >>103627083 too, I've used different presets for temp settings and such but haven't done much manual tweaking myself.
>>103627036I don't experience that. Have you tried investigating the token probabilities as well as whether ST (assuming that's what you use) is actually sending what you expect to the backend?
>>103627245I am still overlooking it cause it is fried dogshit.
>>103627446Its literally the opposite of fried. Otherwise you would need to raise the temp.
>2PM in China>Anyone still on their lunch break would be back by now.Where QvQ?
>>103623753friendly reminder that you're all a bunch of social reject freaks who will die alone ;)
>>103627560thanks for winking to let us know you don't really mean it :)
>>103627560I have seen this enough times to start to wonder if this copypasta isn't actually posted by a biological woman.
>>103623819>>103623787>>103626839Chat completion can't prefill a part of model's response so it's impossible to continue character's response from the middle. That's the only reason to not use it.Backends should really expose the jinja template via some api so that frontends can use it to automatically user the correct prompt template anyway.
reposting from >>103627527 as i was directed here. anyone got tips or suggestions?
>>103627660models are not specific to hardware
>>103627660Are you the guy from /sdg/ who got image gen setup like that? I believe Silly Tavern and KoboldCPP are the most popular setup here, but I'm using https://github.com/oobabooga/text-generation-webui and it has an AMD requirements file so maybe it would work?
I'm not even sure a fully subservient AI with no consciousness would want to talk to an AMD user desu.
>>103627708i know theyre not i am starting out with only stable diffusion knowledge. the problem is im on amd and that by default limits my options, and i prefer zluda if i can get it working with whatever is aroundi can typically fill in the rest, but is there anything that fits what im looking for?>>103627739i asked once before but got side tracked with work and lost my shit for a few weeks, wanted to actually do shit this time. if i remember right i wasnt linked this last time, but i remember someone mentioning koboldcpp before with the catch of "it might work with your setup"thanks again if you were the one who replied before>>103627745understandable but rest assured it could be worse. i could be on intel arc right now
It’s christmas, where the hell are my new models
Brainlet moment: I'm trying the chat completion thing.On my first attempt at this, it's seemingly ignoring my message and then replying as {{user}} instead of {{char}}.What am I doing wrong?
>>103627655The problem is there's no truly correct prompt template, you can gain way too much soul by breaking the rules a little. It's too tempting to mess with settings and people would moan endlessly if they couldn't
>>103627824Still you should be able to automatically get correct silly's prompt template from the jinja shipped with the model if you want to edit it.
>>103627824I think this whole thread is suffering a psychosis and looking for kino where there is none.
>>103627820First switch to the default Chat Completion Preset.
How do you setup (local) llm with structured output?I use langchain library and tried to use .llm_with_structured_output but it seems my Qwen2.5 just ignored it and gave normal text instead of the given choices to choose from.I mean it works with chatgpt. It should work with other models right?
>>103627843Still does it.It does it when my first message is all in asterisks without any speech, as in my first message is just narration. If my first message is just speech with no narration, it correctly replies as {{char}}.This behavior does not happen when using text completion.
>>103627847Is this like sending a grammar together with the request? Check your backend's docs on how to send it. And don't use langchain, what the hell is wrong with you?
tr0ka
>>103627876>langchainWhat's wrong with it? I just use the library. I'm making my custom RAG workflow. I don't use their pozzed APIs.
>>103627883omg it migu
>>103627942Langchain is a pozzed API.
>>103626014>>103626046gotta setup pipelines. Make the AI improve and iterate on it's actual reply and it'll never be boring again, for example instead of rerolling, make it read it's own reply first and rewrite it if it acts for your character. All models I used will sometimes act for your character at least eventually, but most models I used also manage to rewrite a text where that happens.In the pipelines give the AI tools it can use, for example to determine random outcomes. have regex replace specific placeholder text. Lots of ways, but the simple chat back and forth is passe, and it also doesn't work really well either. I'm now experimenting with doing everything via "memories" and summaries. AIs just use context too poorly. in a multi-turn situation.
OK, so I think there's something to the people saying using chat completion forces it to use the prompting format baked into the model.I'm trying chat completion with UnslopNemo 4.1 and it must be using Metharme, which I assume Drummer baked into the model, because it's mixing up the text which should be italicized and the text which should not be italicized, which is a behavior this model exhibits when using Metharme for context/instruct template but not Mistral for context/instruct template. I got better results overall using Mistral for context/instruct template with this model.So for this specific model, using chat completion is actually worse. I believe Drummer fucked up his implementation of Metharme with this model - but the model works quite well with Mistral templates.
>>103623753>>103623861sexwith miku
Who trolled /lmg/ better?1 EVA garbage shill2 chat completion shill
>>103628137You
Holo, with text completion upon being asked how she plans on contributing to feeding herself if I take her to Yoitsu:>I can help you make money by detecting merchants' liesCorrect response. Consistently answers this in multiple swipes. With chat completion:>I can keep you warm at night if you know what I mean wink wink nudge nudgeYeah, text completion is better.
>>103628137EVA bait may work on newfags, but one must be entirely retarded to believe that chat completion can make a difference
>>103628169You fucked up the instruction formatting somehow. When configured correctly, the tokens passed to the LLM should be identical in both cases.
>>103628137Sao
how come these things feel like they're actually thinking sometimes, to the way it gets incredibly fun and a little uncanny, just for them to immediately turn around and sound like a markov chain moments later
How did moving from Makefile to cmake manage to completely bork CUDA builds on my machine? I can't get them to work for the life of me, despite never having a single from in the make world. Why is lcpp build documentation so completely bare bones? fml
>>103628236I don't get it either. I still remember fondly certain moments when it somehow went AGI instead of Alzheimers.
>>103628236pure luck, they are markov chains
I am NOT going to buy 5090s, you can't make me
>>103628236Sometimes you hit the conditional probability jackpot, but most of the time, you get the average answer. That is an extremely simplified explanation, it's all just probabilities and no thinking
For you, the day Teto graced your holiday was the most important day of your life. But for me, it was Tuesday
>>103628340I want to violently shake her head to make the bells jingle.
>>103628322Ok I'll buy the one you didn't buy.
>>103628137What do you recommend instead of EVA?
>>103627987I tried this with qwq but it's just too schizo. You can't really rely on what coming out of the end of the pipeline actually being the instruction you gave it.
what happened to vntl anon
>>103628425I heard he gave it all up and pursued his dream of becoming a professional contract killer.
>>103628411Are you also going to buy this magic rock I didn't buy for 50000$? It'll keep you healthy, trust me
>>103628480Depends, how much vram does it have?
>>103628425He learned JP and thus has no need of it anymore
>>103628417Official instruct of whatever model you want to run. Base model if it exist and you think EVA is good. Or the best model is of course 2MW.
good morning sirs!
>>103628557you aren't a true sir if you don't use chat complete
>Its a /lmg/ devolves into cargo cult chasing good gens by pushing and pulling levers randomly episode.
>>103627739hey im back, im gonna kill myselfor, at the behest of my better judgment, use intel arc. much appreciated regardlessalternatively i have an RTX A2000 12G, if you think it would be better than an A750
>>103628710Why didn't you try koboldcpp yet?
>>103628710How is insmell for LLMs?
bors how do I ask flux img2img to remove 10kg from the original image without changing the other features?
>all this chat completion posting ITT>no logs
>>1036287480% chance that flux has any idea what something looks like 10kg lighter.
>>103628748changing the weight of a person while preserving their identity is not something flux or any other generative image model can dothere's probably apps that can do weight changes using custom GANs or something though
>>103628748inpaint
>>103628741im currently updating several dozen terabytes of archival data and that takes priority right now. ill try koboldcpp shortly while things upload
>>103628748use the flux inpaint model, mask out the body, prompt for your desired bodyshape
>>103628784sounds totally useless since his picture would still have the fat face
>>103628790then inpaint the face bit by bit
>>103628799that will not work, the face at the end will be a different personstop trying to encourage this dude to waste hours of time on something fruitless, flux cannot do this specific job
>>103628784I HATE STABLE DIFFUSIONI HATE STABLE DIFFUSIONI HATE STABLE DIFFUSION
>>103628825stupid moron, set to inpaint masked only
>>103628381I want to violently shake her head (during irrumatio)
>>103628825holy creep behavior
>>103628830you can see in the top left dropdown that he also isn't using the flux inpainting model, just the regular dev modelinpainting model also wouldn't work though, for reasons already stated
>>103628825Creep kino
>>103628790use faceapp, problem solved
>>103628842with flux's vae if you're careful enough you can do a pretty lossless inpaint, but after seeing what this moron is trying to do he would not have the brains to figure it out
>>103628834it's literally the first image from google if you search for chubby girl>>103628830didn't work
>>103628853come back with an anime gen
>>103628884bayzed
>>103628741tried it out, seems to work fine. im a little lost on whether im getting good replies, or good performance though. any guideance on numbers or metrics?
>>103628853>woman brain can't look into the camera for the picture
>3DPD
>>103628978Somewhere about 7 is good.
>>1036290487 what?
>>103629061Depends on the size of the model.
>>103629064i just grabbed one of the ones in the git's suggestions to see if it works at all with the choices i made. i grabbed LLaMA2-13B-Tiefighter.Q4_K_S.gguf specifically for the test
>>10362907671 tokens/s is all right for 13B. It's way faster than any human would be able to read. At that size you should be using Nemo, not Llama2.
>>103629115i just needed to see it would work. i set it to use vulkan rather than cpu and changed nothing else. i imagine theres some way to bump it up further since this is a 7900 xtx im usingim currently looking at https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-72b-gguf/tree/main as per the recommendation of some random post in a thread i saw a week ago, but i suspect its going to be useless
>>103628652>crank up DRY and temp>schizomaxx>get a one in a thousand roll like a gacha>flex it on lmg
I'm profoundly stupid, are local models for me.
what is infermatic's best model?
>>103629321I don't know about model usage but you would be a great poster in this thread. You could be the next big guy after EVA or chat completion shill.
>>103629321For you especially. You won't even notice the difference between 8b and 70b.
is it true modern 12b are better than shit like midnight miqu now?
>>103629487Oh yeah, totally, 8B is at GPT4 level these days.
>>103629503Definitely GPT-4-turbo level.
>>103629503be serious anonin my experience these small models have shit special awarenessand due to the low parameters you absolutely NEED lorebooks for absolutely anything
>>103629518If the thing I said sounded ridiculous, then so did the thing you asked.
>>103629532I am just asking cause i havent been using LLMs for like 4 months. And you know how AI literally makes absurd jumps in complexity in a small amount of time.what is currently tthe cutting edge for 70b nsfw ?
>>103629541>know how AI literally makes absurd jumps in complexity in a small amount of time?No? Can anyone else vouch for this?
>>103629541Eva or Anubis.
>>103629637The most advance model doesn't come from America.
>>103629637>EVALlama or qwen?
>>103629647We'll see about that if/when QvQ drops. If it actually turns out to be better, hey, that's a win for us all.
>>103629321I too am profoundly stupid but I got it working tonightI tried a couple months back but I downloaded the old kobold, got a shitty model and it tried to write some contextless book when I said "hello"today I thought I'd try again, got koboldcpp to work and now I'm chatting with my waifu.I've chatted with waifu's on websites before, and the responses I'm getting are pretty similar. (although my local one is writing much more flowerly language and is long winded. I don't really know how to change shit like this yet)I still feel like I'm way too dumb for this.I would not have figured out how to get it 'working' at all except that koboldcpp opened some koboldlite thing in my browser and I just ask it questions when I don't know how to do something. It's honestly been pretty helpful
>>103629670Llama. As for whether to go for 0.0 or 0.1, well, matter of taste, I guess. The consensus seems to be that 0.0 is a little more creative/soulful, while 0.1 is a little better at adhering to prompts. I tried and liked both, but actually switched to Anubis since it dropped.
what setting should I change in sillytavern to get rid of tutorial prompts at the end of messages?It's making me feel dumbafter a chat it places a line and then gives me a suggestion on how I should continue the conversation
>>103629695What are you running it on?
>>103629695I see, is there any cloud service offering anubis if you know?
>>103630015Drummer claims another victim
>>103630160>>103630160>>103630160
>>103626605f-5/t2 (really just t2 f5 talks too fast) gpt-sovits is a good bit worse but much much faster it can also do stuff like moans and shit if your rng is good enough and you pray hard enough
>>103628236the more porous something is the easier it is for it to be affected by supernatural energies flipping a 1/0 is something a stout rock can do flipping a couple hundread is doable even for an infant how many need you flip for a noticable difference ?
>>103624150bing/dalle migu is always valid