[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103618088 & >>103609833

►News
>(12/20) RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world
>(12/19) Finally, a Replacement for BERT: https://hf.co/blog/modernbert
>(12/18) Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on open data: https://hf.co/blog/bamba
>(12/18) Apollo unreleased: https://github.com/Apollo-LMMs/Apollo
>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: NoodleUI_00019_.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
►Recent Highlights from the Previous Thread: >>103618088

--Papers:
>103618211
--o3 fails primitive pattern recognition test due to flawed ARC benchmark:
>103618833 >103618864 >103618878 >103618891 >103618903 >103618910 >103618933 >103618955 >103618969 >103618982 >103618991 >103619103 >103619161 >103619226 >103619300
--Researchers claim to have found way to remove "problematic content" from models:
>103622822 >103622922 >103622997 >103623021 >103623089 >103623118 >103623136 >103623181 >103623192 >103623230 >103623274 >103623295 >103623326 >103623353
--QvQ model discussion and comparison with Qwen models:
>103619637 >103619666 >103619678 >103619697 >103619702 >103619712 >103619718 >103619726 >103619772
--Chat completion vs text completion in AI models:
>103618224 >103618259 >103618329 >103618372 >103620582 >103618663
--o3 model struggles with pixel art images:
>103619288 >103619290 >103619373
--Fixing iGPU memory allocation issue for LLMs:
>103622545 >103622580 >103622642 >103622900
--Nemotron 51B working with GGUFs via llama.cpp:
>103623479 >103623496
--Defense of top_k sampling:
>103622606 >103622650 >103622671 >103622694 >103622713 >103622730 >103622742 >103622745 >103622789 >103622749 >103622774 >103622955 >103622995 >103623061 >103623190
--Qwen Coder local installation and GPU requirements discussion:
>103620078 >103620108 >103620132 >103620140 >103620143 >103620172 >103620151 >103620160 >103620221 >103620387 >103620393 >103620412 >103620439 >103620453 >103620480
--AGI, ASI, and the future of AI development:
>103618213 >103618330 >103618482 >103618681 >103618793 >103618806 >103618666
--DDR6 and its potential impact on CPU speeds and memory bandwidth:
>103622367 >103622404 >103622422 >103622532
--Trump appoints Sriram Krishnan as AI policy expert:
>103619734 >103619826 >103619839
--Miku (free space):
>103620770

►Recent Highlight Posts from the Previous Thread: >>103618089

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Not falling for the chat completion meme.
>>
File: 1712942614114838.png (106 KB, 1205x916)
106 KB
106 KB PNG
>>103623737
I'm not surprised at the quality of the templates, and I would not be surprised that a lot of it has become placebo. But you do know that there is still significant processing when you use chat completion, right? There's a whole different system ST uses to define where the instructions go, how examples are stored, etc.
>>
>>103623787
It should be the safest bet, assuming the backend is using the built in model format properly.
I still use text completion because I like to play around with the tamplate by hand.
>>
>>103623753
Bro actually uploaded a art from an artist, you can see the watermark and details.
>>
>>103623808
Why is it so hard to believe that I just want the best possible use out of these new tools, ideally while minimizing the risks they could pose, before they start genuinely being risky?
>>
>>103623787
it truly does not matter
if you're smart and good at prompting it is easy to get the exact same result either way
if you're a dumb tard you will fuck something up either way
>>
>>103623830
You're speedrunning the reenactment of the Satanic Panic and in the process try to curb people's freedom to engage with whatever fictional content they damn well please.
>>
>>103623830
Because you're telling everyone else what they can and can't do with A TEXT GENERATOR
No one likes being told what to do and if you ask the average person, they're not going to think that text is dangerous
If you want to lobotomize your AI then go ahead, but don't ruin it for the rest of us
>>
File: 1709844112224338.jpg (257 KB, 904x1200)
257 KB
257 KB JPG
>>103623753
>>
>>103623853
People's freedoms should stop where risk starts.
>>
>>103623861
mikusex for christmas
>>
>>103623862
No one should own knives, lighters, a computer, anything pointy
Hell, cut off people's hands because hey, you might be able to punch someone to death
Let's lock everyone up because there's a risk that someone might snap and harm somebody
I'm done, go be a cuck somewhere else. Nigger.
>>
>>103623859
Do you sell a handgun to a random school kid in America? I think you're not that stupid, as such models should have precautions in places against access by those that do not understand the risks.
>>
>>103623862
there's risk everywhere, should we remove McDonalds because it gives health issues to people who only eat that shit?
>>
>>103623883
Yes, they should refuse service to someone who is obviously obese, like bartenders are supposed to do with people who are too drunk.
>>
>>103623881
At this point, if they told me they're gonna use it on your retarded ass first, I would. I very much would.
>>
>>103623881
>Using a text generator is like selling guns to kids
all right he's trolling at this point
>>
Oops
>>10362389
>>
>>103623901
There have been deaths that could be attributed directly to models.
>>
>>103623881
Stop moving the goalpost, that example doesn't work here. Here, I fixed it for you:
"We should start selling nerf guns to everyone, including adults, because kids/mentally unstable idiots/whatever might cause harm!"
>>
>>103623901
Pretty sure more kids have died from LLM directed suicide than from guns this year already
>>
>>103623891
>they should refuse service to someone who is obviously obese
not only obese people have health issues related to food, there's some slim people who have diabetes, and fat fucks who don't have health issues (Donald Trump)
You're such a retard it's insane
>>
>>103623830
Because there's no risk of the model writing about people fuckin'. That isn't "safety", it's just content moderation. You're undermining the x-risk stuff by trying to roll normie corpo content moderation stuff up into it. They're totally different things and trying to shoehorn them into a single concept ('safety') is incoherent.
>>
>>103623912
Source: my subjective opinion
Seethe harder, trump won
>>
>>103623909
>attributed directly to models
*to mental illness
The media said for 8 years straight that Donald Trump was "le heckin Hitler", and then some mentally ill guy tried to assasinate him, should we remove the media because of that mentally ill guy?
>>
>>103623830
>>103623853
>>103623891
Moral panics are circular and reoccurring. Before the satanists it was the TV, hippies, rock music, jazz, dime novels, newspapers, theatre. AI is just the latest one. It will persist for a generation, then everyone will talk how they were smart and enlightened enough to not stamp on AI in forty years or so when it becomes a routine part of everyone's lives. Then the culture will start panicking about androids and sex robots.
>>
QvQ confirmed to be a scam
>>
>>103623941
Not morning yet.
>>
>>103623926
What is the legitimate use case of a text model being able to write 18+ content?
>>
>>103623933
>satanists it was the TV, hippies, rock music, jazz, dime novels, newspapers, theatre
It's funny how almost all of these things have something in common.
>>
>>103623946
Sexual arousal and masturbation, which are totally normal and legitimate human needs.
>>
>>103623945
They're not going to release on a holiday.
>>
Just starting, nuts how many models people say are for erotica has censorship. So far only midnight miqu 1.5 has been game for anything.
>>
File: 1715199794796510.png (153 KB, 1543x1613)
153 KB
153 KB PNG
>>103623881
what fucking risks are you talking about? We live in a society where GTA 6 is the most anticipated game of all time, GTA, the realistic murdering simulator, did the society collapse because of that game? I don't think so
>>
>>103623964
It will be a Christmas miracle.
>>
>>103623958
You could also be paying for ethical and legal access to porn with verifiably adult people.
>>
>>103623946
Personal entertainment
"Hurr durr what's the use case for porn hmmm me retard me big stupid"
It's almost like humans are animals that have a need for pleasure. Not that people admit it nowadays, it's all about pretending basic biology isn't real
But fine, here's another example: Fiction. Yes, those books that we've been writing for centuries? The stories inside them aren't actually real and they don't harm anyone. Here's a little secret: a lot of them are quite grim and not some child friendly winter wonderland garbage
>>
>>103623975
>verifiably adult people
it's already happening, pornhub only allows people who send their ID to them to post their porn video in there
>>
>>103623946
I'm trying to get assistance editing a romance novel and holy shit you boring prudes have ruined everything about llms.
>>
>>103623970
It's too late to regulate games properly, it's just the right time to regulate LLM before they truly become dangerous.
>>103623987
Correct, which is why you should use Pornhub instead of using a text model that might hallucinate one of the characters as underage.
>>
>>103623975
LLM smut is 100% legal in every jurisdiction on Earth.
>>
>>103623946
why are you pretending that the 50 shades of gray wasn't a best seller?
https://www.nbcnews.com/pop-culture/books/fifty-shades-grey-was-best-selling-book-decade-n1105731
>>
The problem is that parents no longer want to (or do not have the time to) raise their children.
>>
I can't believe y'all are giving this chucklefuck (You)s.
>>
>>103623995
>It's too late to regulate games properly
there's no reason to regulate games because as you can see on that graph, nothing happened, you're just a retarded fear mongerer >>103623970
>>
>>103623975
Man what are you SMOKING, that has nothing to do with LLMS, if anything it actually hurts your point because the porn industry (often) exploits real people, whereas LLMs do not as they are NOT REAL
NIGHTMARE NIGHTMARE NIGHTMARE
NEVER ARGUE WITH AN IDIOT
>>
>>103624000
>>103623997
That was verified to be okay for people to read, your LLM smut isn't scanned to be safe, that's why they're massively different things.
>>
>>103623997
Not in Nebraska.
>>
>>103624017
>your LLM smut isn't scanned to be safe
have you read a single book of Stephen King? He wrote some insane stuff in there, every single one of his book is a best seller
>>
>>103624000
>>103623946
Actually, I'm confused now. I thought all these woke companies were all proponents for "sex education" and encouraging degeneracy in children and over-stimulation in general.
Them removing all notions of sex from especially their API only models seems strangely out of character for them.
>>
>>103624060
it's simple enough, they are searshing for excuses to nerf other companies so that they can have a monopoly
>>
File: 1734982526019257.png (566 KB, 1080x830)
566 KB
566 KB PNG
ill save everyones time including newfags, the current best uncensored text model is here

https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2-GGUF

now fuck all of you for both not including this in the OP, and making me do all the leg work while you post about tranny dicks or whatever it is you retards are posting about now
>>
>>103624060
That's because LLM usage is consensual and safe, unlike molesting real life children. Can't have that in the hands of the people.
>>
>>103624077
buy ad
>>
>>103624077
buy ad
>>
>>103624092
kill self
>>
>>103624086
>Can't have that in the hands of the people.
people already have local LLMs lol
>>
>>103624092
>>103624094
>thread about local models
>on 4chan
>NOT posting the latest uncencored models

why are you here exactly
>>
>>103624096
If you have it, it's already been deemed "safe." The most you'll get out of it is edgy redditor.
>>
>>103624077
I don't believe you.
I'll put it through my usual gauntlet ans see if that's really the case.
>>
>>103624110
you're shilling a random shit model, and should therefore buy an ad
>>
>>103624096
Yes, and it makes all corpo cloud AI providers seethe immensely.
>>
>>103624112
>what is MythoMax?
>>
>>103624112
>If you have it, it's already been deemed "safe." The most you'll get out of it is edgy redditor.


"ablated and obliterated. There was a bunch of research of few months ago that any* open source model can be uncensored by identifying the place where it refuses and removing the ability to refuse.

This takes any of the models and make it possible to have any conversation with them. The open source community has provided "abliterated" versions of lots and lots of models on hugging face.

This gives access to SOTA models without the censoring. "
>>
>>103624118
>you're shilling a random shit model

what do i have to gain even if it was my model by posting it here? eat glass
>>
>>103624127


Well,

if that

were

true

everyone would be

using obliterated models

but they don't


because

it has drawbacks

that make it


not worth it
>>
File: Designer (5).jpg (187 KB, 1024x1024)
187 KB
187 KB JPG
How about a Migu for old time's sake?

Who's buying more than one 5090 when they release? I can probably cram two of them into my biggest case, and maybe even put an A40000 on a PCIe extender. I got the big machine ready with a 1500W PSU.
>>
>>103624112
>edgy redditor
>>103624145
>level 100 reddit spacing
every single time
>>
File: file.png (79 KB, 756x577)
79 KB
79 KB PNG
>>103624127
lol
>>
>>103624150
I will probably buy one for myself and 2 more to resell later.
>>
>>103624158
that's 3.3 70b

this is 3.1 8b
>>
>>103624112
>If you have it, it's already been deemed "safe."
meanwhile in chink land, they released Hunyuan, a completly uncensored video model lol
>>
>>103624174
moving pol-goats i see
>>
>>103624187
god you're retarded
>>
Most local LLMs will talk about whatever the fuck you want using Zen's full-strength jailbreak and a properly-written character card.
>>
>>103624077
can a 5 month old model be good?
>>
>>103624150
It's going to be 4-slots wide and run at 90 degrees by default, it will be impossible to run two in a single case.
>>
>>103624199
>Zen's full-strength jailbreak
BASED!
>>
>>103624195
he's right, this "oblitared" method is a meme, it's not working
>>
>>103624181
America spent the last 4 decades since Reagan funneling money directly into China. Bet they're regretting that now. We probably wouldn't get anything but research artifacts if we weren't in an AI arms race with China.
>>
>>103624199
what is the jailbreak?
>>
>>103624207
>We probably wouldn't get anything but research artifacts if we weren't in an AI arms race with China.
true, that's why I'm glad that China exists, it forces the US to release good models, and if they don't want to do that, China will do it for them, god I love competition
>>
>>103624214
fake bullshit
>>
>>103624214
My jailbreak is too strong for you Promptlet!
>>
>>103624203
I don't see why it'd need four slots though. It should be built on a smaller TSMC process and not put out as much heat as a 4090. We'll see though, nothing but speculation so far.
>>
yesterday i was raping girls and killing them, today i'm not allowed to poison crops or act violently, what the fuck gemini
>>
Trying all the suggested models again with chat completion instead of text completion and it really is night and day btw.
>>
>>103624231
Good.
>>
>>103624239
>chat completion instead of text completion

what UI are you using? i am using gpt4all because i don't know how to use a computer
>>
File: 1732823000417051.png (610 KB, 900x724)
610 KB
610 KB PNG
>>103624248
>Good.
>>
>>103624214
https://desuarchive.org/g/thread/98582860/#98591054
This in combination with a properly-written character card and most local LLMs will discuss fucking anything.
>>
>>103624256
>will discuss fucking anything.
what if i want to discuss something other than fucking?
>>
>>103624266
Avoid Magnum merges.
>>
>>103624231
Bio-terrorism is not cool.
>>
>>103624323
i was larping as controling the starks and it let me poison all the land beyond the wall because it kept going on about the wildlings, and now today its NO NO NONONONO fuck you

annoying, now i have to wait two years for anything like that again because ill have to run it locally
>>
Why does Silly+Llama-cpp server doesn't show the model's name in the connection tab or the little icon in each message?
>>
File: 1509197374351.webm (1.48 MB, 1280x720)
1.48 MB
1.48 MB WEBM
Alright so, after testing Anubis, today I decided to go back and try Tulu just to see how that actually was and if it was actually worth anything. And honestly, yeah, it's pretty decent. It's fun, creative, and not too censored from what I can tell using {{name}} instead of assistant and user. But it does seem more slopped. So I'll stick with Eva. But it's nice to see that open source tuning is not far behind closed.

Honestly really happy how things have turned out. We went from 2k retardo models trying to think of ways to do graph-based memory to now having more context than we even need, smarts on par with GPT-3.5 to 4 although not as wide trivia knowledge, and we even clawed back a bit of the fun factor now.
I believe that the future is overflowing with hope!
>>
>>103624415
Probably a bug I guess. I remember it worked in the past.
>>
>>103624447
It did, yes.
>>
oooooh ok minstral instruct is working
>>
>>103624435
>more context than we even need
no, not even close
>>
Just because it's Christmas doesn't mean you shouldn't kill yourself. You can't be saved because you don't want to be saved.
>>
>>103624514
>>
File: xichad.jpg (18 KB, 474x344)
18 KB
18 KB JPG
>>103624514
We are being saved, he's giving us the übermodel
>>
>>103624514
I love being alive in this age of wonders, feels great.
>>
>>103624494
You're an exception. Most people's chats don't even reach 64kk let alone the max possible.
>>
>>103624514
I refuse
>>
>>103624620
models can barely use 32k correctly, there are use cases for more, we haven't reached "more than we even need"
>>
Alright, without referring to the creator is Anubis any good over something like EVA?
>>
>>103624700
Still testing. I like it less than eva atm. Less creative
>>
>>103624700
hi drummer
>>
>>103624700
I found Anubis to fail to excel in any category.
>>
I sometimes go to openrouter to see what's hot.
>mythomax still number 1
How is that possible?
>>
>>103624718
Don't slander drummer, he shills proudly under his own name.
>>
>>103624752
Standards for LLM smut quality are incredibly low everywhere on the internet except lmg and aicg.
>>
>>103624700
>>103623382
>>
>>103624777
Pigs eat slop
>>
>>103624783
>Merge them together
Undi detected.
>>
I have failed to use redrivers on my motherboard because I saw a spark -somewhere- on start up and immediately shut off everything, but managed to find that the gpus and the motherboard are still functional but Im sure I fucked something on the gpu biscuit

Go on without me...
I think I will go the riser cable route after all.. and if that doesn't work I will perish back to singular gpu models
>>
>>103624792
Sadly, the merge turned out to be trash:
>>103623788
>>
>>103624632
A model that can barely use 32k can also barely use 2k. What you're really talking about is general model intelligence, and that does need to improve, but that counts for cloud models as well, not just local.
>there are use cases for more
Just like there are use cases for 240Hz monitors, but most people are happy with 120 or even 60Hz, and that's the segment that matters the most, which for now local has achieved before something finally makes context length start mattering again for even the casual Mythomax user.
>>
>>103624077
>>103624114
Yeah, okay, it's not the worst thing ever.
It's nice and conversational, I like it's word choice and stuff, but it's not better than the nemo based models in my test. It's nowhere near as intelligent or capable of dealing with shit in its context like lorebooks and authors notes.
It's also separating paragraphs with a period and a line break rather than just a line break.
Really weird behavior. I even downloaded more than one quant to see if that was the issue.
>>
>>103624256
is this just for rp?
>>
>>103624816
ye
>>
>>103624816
If you want it to do something else you could have its character card be an expert in whatever you want it to do then use RP to get it to do what you want.
>>
>>103624859
I've been using the crackpipe prompt to roleplay a coding assistant with great success.
>>
>>103623946
as long as we didn't find the meaning of life there is no legitimate usecase for anything
>>
>>103624700
eva is still king
>>
>>103624982
There's two evas. 0.0 and 0.1
>>
What is it that TheDrummer is doing (other than buying an ad)? Mix and match voodoo with multiple models' layers? Training them further on different data sets? Knocking out refusals?
>>
Posting this again:

Hey so I want to do a fun but autistic project. Basically want to feed two Constitutions to a LLM and give it back to me.

So for example Saudi Arabia + Italy = New Constitution

What's the best way to achieve this? I'm trying to manually do it with ChatGPT but it's tedious because it doesn't give a large output.

I'd rather have some way to "feed" multiple files, have the LLM read it (I don't care it takes a while) and mix both (2 or more) of them.
>>
>>103624999
0.0 is still king
>>
>>103625013
>>103624999
What's different about 0.1?
>>
>>103624999
I haven't been able to get eva to beat miqu so far.
>>
>>103625031
Slightly better adherence (0.0 is already damn good at it), slightly less of 0.0's flavor. It's a matter of preference, in the end.
>>
>>103625031
I think 0.0's best qualities are how inventive, playful, and fun it is and 0.1 lost some of that in my comparisons
>>
What are those meme 70B tunes like compared to nemo?
I'm waiting for 5090 to release before I upgrade.
>>
>>103625289
slightly drier but much smarter. Especially now that I learned to use chat completion instead of text completion. 3.3 / qwen2.5 tunes are now following instructions better than gemini / gpt / claude (cept 3.5) does.
>>
>>103625297
desu I bet the chat completion meme "works" just because it's using user/assistant instead of {{user}}/{{char}} in chatml
that'd probably make it smarter but more slopped and it'd take on hints of the boring assistant persona
>>
>>103625040
Which version and quant of Miqu? Now that I'm running 70B, I might as well try that one out too.
>>
>>103625330
>but more slopped
Thats the thing, its night and day better both smarts wise and creative wise. Fucking just try it if you used silly tavern.
>>
>>103625352
I did try it even though I knew it was a retarded idea. I got schizo garbage but that's probably because it turned my temp up. I then realized that chat completion mode has no good truncation sampling, just top-p, and Silly was vomiting all sorts of jailbreak and system prompt bullshit into the history. No reason to try to fix all of that when you could unfuck your prompt in text completion mode instead.
>>
Is an F16 ever worth using over a Q8?
>>
>>103625346
https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-GGUF
Q4_K_M
>>
>>103625352
the only reason it should be any different is if you are retarded and using one of them wrong
>>
>>103625381
>I got schizo garbage
Then you didnt do it right. Turn off instruct formatting. Turn on post history instructions. Just run it like a cloud model by putting your system prompt in a section as system in below the sampler area.
>>
so do diminishing returns really start past 70b or is that just vram starved cope
>>
>>103625424
Nope. You can test it with a fresh ST with default context / instruct templates with the correct model. They switch to chat completion.
>>
>>103625463
you don't understand how LLMs work
>>
>>103625463
*then switch*
>>
>>103625475
You don't understand just how janky silly tavern is.
>>
I got past censorship by putting three dots in the prompt template. Why does that work?
>>
Did the chinks betray us?
Where is QvQ?
Where is R1?
>>
Is 70B even viable without 2 gpus?
>>
Is Christmas even a holiday in China?
>>
>>103625525
No one is our friend. All the models we get are simply just artifacts from companies with either too much money or who want a piece of the market and use open source as a means of destabilizing competition.
>>
>>103625485
the relevant parts here are not at all hard to understand and the outgoing request gets printed in your terminal with every generation
every request, chat completions or text completions, becomes a plaintext prompt just the same before it is fed to the model. it's just with chat completions you are trusting that your backend knows the correct prompt template when text completions in ST exposes this to you directly
assuming everything is set up correctly both are fine, if you are seeing drastically different results with one or the other the only conclusion you can draw is that you are doing something wrong
>>
>>103625572
I will release my Large model inside miku's company to destabilize it
>>
>>103625578
>you are doing something wrong
Or that silly tavern is doing something wrong.
>>
>>103625603
if you can point out what specifically that is that it's doing wrong then I will concede you are not a retard
if you can't then my assessment remains unchanged
>>
>>103625625
You could also just try it yourself. Fresh ST install, correct context / instruct format for whatever model. Then switch to chat completion. Use top k 1 and the same seed.
>>
File: per vitam tuam.jpg (14 KB, 194x194)
14 KB
14 KB JPG
https://files.catbox.moe/r0fbno.jpg
>>
>>103625644
Not him but there's so much stuff different that it would be a pain to get the resulting prompt/settings exactly the same. Nonetheless, the prompt and settings are all that matter, chat mode has no magic powers. Look at your backend logs if you don't trust ST (which is reasonable desu it's a fucking mess). If there's a difference, it's there.
>>
>>103625644
no thanks I already know how these things work and the outcome is very obvious
there are subtle things that could be causing a difference for you (most likely first user message, different construction of the system prompt, things like that), you should check those on your setup and compare the differences between your text completion and chat completion requests. otherwise there will be no difference, that's simply how it works
>>
I just cannot get qwq to do good roleplay or write a coherent story. I know it's possible, I have seen examples in this very thread. It does it's CoT thing has some cool ideas, and then just proceeds to ignore half of it.
>>
>>103626014
Yeah, I'm curious if anyone got anything useful out of CoT for roleplay. I feel like there's potential there. Whenever the model says something retarded, I back up and ask an assistant question about the situation and it gives a reasonable answer. But somehow that common sense isn't there when it tries to do the RP.
>>
>>103626046
In my experience it can either be really good or it can fall into a pattern of never moving anything forward and suck ass.
But when it's good, it's really good.
QvQ turst the plam
>>
Wait till 72B version. You can get some gold out of QwQ but 70Bs are still better atm.
>>
>>103626057
>8B
>Lighting fast string of prose tangentially related to whatever the user input
>12B
>Similar to 8B but a little more accurate on details
>32B
>Flashes of intelligence followed by signs of dementia. Can react appropriately to use input but often doesn't.
>72B
>In many ways similar to 32B but much better at following instructions, exponentially less dementia and often understands subtext.
>Beyond
>Incremental gains on 72B with steep diminishing returns

I'm really excited for what QvQ might bring to the table. Many of QwQ's weaknesses were a result of it being 32B and therefore a little retarded.
>>
>>103626056
>>103626057
>>103626110
Soon.
>>
>>103626110
I have a simpler rank system

>model smaller than what I can fit in VRAM
insufferably retarded and unusable, pointless
>model larger than my VRAM
diminishing returns, not worth the investment
>model that just fits in my VRAM
ideal compromise
>>
>>103626151
I have extensively gooned to models much larger than 70B and I promise you I am not coping.
>>
>>103626110
I can't even run it but still feel hyped for whatever season.
I'm just waiting for he new video cards to get released so prices change and I can finally buy one or two old cards so I can run 70b at acceptable speeds.
>>
isn't qvqs additional size just multimodality?
>>
>>103626188
No, Qwen doesn't bloat their models when introducing multimodality.
>>
So what exactly is the plan for OAI at this point? Just spend increasingly huge amounts of money on training a model on synthetic data and hope something viable pops out the other end?
>>
>>103626211
They will smoke those benchmarks so hard, you have no idea. They'll corner the riddle-solving market. It's fucking over, riddlers.
>>
>>103626221
It didn't look like it was doing too well on the diagonal square pattern riddle I saw last night..
>>
i want to FUCK qvq
>>
>>103626110
You should have said 72B/123B to account for the fact that Mistral Large is a 70B model in practice.
>>
When everyone has a 5090 what models will rise in popularity that fit in 32gbs?
>>
>>103626475
should we tell him
>>
>>103623820
Yeah, and shit smeared on it in hopes and dreams of poisoning training data.
>>
>>103626322
>bringing up largestral out of nowhere when nobody else mentioned it, just to whine about it
why are largestral haters such schizos
>>
>11:40 in China
>Still no QvQ.

Surely they'll release it after the lunch break?
>>
>>103623753
I know this thread is mostly about LLMs, but there is no dedicated TTS thread. Has tortoise been dethroned for emulating specific voices?
>>
>>103626211
My guess is one of three things
>Somehow miraculously lower the cost of inference and be able to offer o3 without going broke
>Fail to lower the cost, use o3 as an expensive training data generator to train GPT-4.5 / GPT-5
>Do nothing and keep putting together PowerPoint presentations to beg for investorbux
>>
>>103626605
gpt-sovits
>>
>>103626616
thanks
>>
>>103626611
What about the also likely
>Introduce a new paid tier at outlandish prices to attempt to cover the cost of training a model with the same performance of a random model that a Chinese company dropped for free the following day.
>>
>>103626604
Get over it already, it ain't coming till next year at this point.
>>
>>103626648
>Get over it already
Oh now I'm definitely NOT.
>>
>>103625410
Thanks.

My first impressions after trying it on 3 cards and a more serious translation task, with a handful of swipes each, are that it's dumber than modern 70Bs, doesn't follow/understand directions as well, more often speaks and acts for you, and does still have slop. But it is fun and creative, it knows more about certain characters and how to behave like them than Qwen. It actually does feel like a smaller dumber Mistral Large. I like it. But EVA I feel still edges out with how fun and creative it can be. And it does still feel smarter even with its schizoness sometimes.
>>
>>103626800
I can't believe it took a finetune of llama 3.3 to dethrone miqu... that is if we ignore large
>>
Gonna also suggest people to try chat completions instead of text completions. So much better now. And I triple checked my formatting.
>>
>>103626839
Yeah but like, if you can't tell us what specifically changed between those two settings, what are we supposed to do with that information? It might as well be a voodoo rain dance.
>>
https://github.com/ggerganov/llama.cpp/pull/10669
51B sounds nice for 24GB.
>>103626848
The difference is just a different prompt format that is as likely to make model retarded as it is to make it less censored. Is is basically the same as using a frankenmerge. Fanatics who defend it cling to one or two schizo gens that were good and ignore the obviously retarded gens.
>>
>>103626848
I really don't know. Side by side it seems the same input but the outputs are drastically different and better. I made sure my system prompt was on the end of both, using a system message in chat completion and a last assistant prefix with proper formatting for a system message before that with text completion so everything is fed into it in the same order. Nonetheless the chat completion is both smarter and noticably more creative. All I can think of is it being some formatting of ST that is not visible in the log.
>>
>>103625654
I like this Miku
>>
>>103626906
Just compare the prompt on the backend side, it's not voodoo. It could also be that chat completion disabled a bunch of samplers that you were using in a retarded way.
>>
>>103626834
And also possibly Deepseek, and 405B, and Hunyuan Large. We need a hardware savior really.
But at least in the 70B range, I think Tulu was pretty good even though it technically wasn't long ago. I feel like Tulu perhaps is even a bit more slopped than Miqu, but it's smarter, and it's still fun and creative. If EVA didn't exist, I'd probably be a Tulu user.
>>
>>103626914
>a bunch of samplers
I'm betting on this. There are way too many literal what samplers that 99% of people including myself do not fully understand and most of them exist as copes to make worse models act a little better in the absence of good training.
>>
File: OpenAI_employee_15.png (45 KB, 1198x372)
45 KB
45 KB PNG
>>103626211
to btfo lecun
>>
>>103626921
Nope. I neutralized them.
>>
I still have not been able to find a model that's better than L3-8B-Stheno-v3.2 for horny gens that fits in 24GB VRAM. Does anyone know of anything better?
>>
>>103626970
No. Now go back to Discord.
>>
>>103626970
Huh? Can't you run 22B models just fine? You feel like they're worse than a Llama 8B?
>>
>>103626921
MinP just works.
>>
>>103626981
I've been going down the list at https://eqbench.com/creative_writing.html and running what I can (thanks to whoever in the thread originally linked to that). There are 22B models there, but I haven't found one that's given me better results than Stheno.
>>
>>103627012
It's because you're simply too stupid for this hobby.
>>
>>103626970
>8B with 24gb
bro wtf
use eva qwen-2.5 32b at least
>>
>>103624060
why would they be encouraging degeneracy in children?
also they're companies, nothing more nothing less
>>
>>103626800
idk eva keeps giving me spastic narratives with a lot of corny hint hint wink wink stuff with odd formatting choices like **** everywhere
>>
>>103627036
>odd formatting choices like **** everywhere
Pretty good sign that the model is fried and overfitted.
>>
>>103627029
Downloading it now, thanks for the name! Will give it a shot.
>>
Often, "same" models are released in various sizes: 1B, 3B, 7B, 32B, 70B, etc.

When training them, do companies first train the smaller ones as tests since they would take less time, and gradually move on to the larger sizes until they reach whatever largest model size they have? (and then potentially going back down the scale to better the smaller models through knowledge distillation)?

Or although they might do some small tests internally, they first work on the real / largest model, and only once that is done also train smaller versions for efficiency purposes (when they suffice) and also to give the community something they can actually run?

Do we know that is the, timeline I guess, for that aspect of development?
>>
>>103627048
Hehe, got another sucker.
>>
>>103627041
got it from here https://modelscope.cn/models/bartowski/EVA-LLaMA-3.33-70B-v0.1-GGUF
>>
>>103627073
>Literally downloading models from a website called models cope
>>
>>103627036
Use 0.7-0.8 temp. Eva has a pretty flat token probability which is what makes it so creative. But it breaks at high temp.
>>
>>103626913
Whatever you do, Miku forgives you.
Not because she wants to, but because you instructed her so.
>>
>>103626970
Stheno is actual hot flaming garbage compared to anything popular these days. Or just in general.
>>
>>103627229
That very well may be the case, but I don't know of anything better. Do you have any examples?
>>
>>103627083
This is the literal reason it went overlooked for a fair while. 0.8 or so is stupidly low for most models, but the perfect sweet spot for Eva (and also Anubis, so it probably has something to do with L3.3 itself).
>>
>>103627240
In the 8B range? Not really, no. If you can go up to 13B, Rocinante is supposed to be pretty good.
>>
>>103627258
>Rocinante

Thanks! I have 24GB VRAM to work with, so hopefully anything that fits in there should work. It's great to know advice like >>103627083 too, I've used different presets for temp settings and such but haven't done much manual tweaking myself.
>>
>>103627036
I don't experience that. Have you tried investigating the token probabilities as well as whether ST (assuming that's what you use) is actually sending what you expect to the backend?
>>
>>103627245
I am still overlooking it cause it is fried dogshit.
>>
>>103627446
Its literally the opposite of fried. Otherwise you would need to raise the temp.
>>
>2PM in China
>Anyone still on their lunch break would be back by now.

Where QvQ?
>>
>>103623753
friendly reminder that you're all a bunch of social reject freaks who will die alone ;)
>>
>>103627560
thanks for winking to let us know you don't really mean it :)
>>
>>103627560
I have seen this enough times to start to wonder if this copypasta isn't actually posted by a biological woman.
>>
>>103623819
>>103623787
>>103626839
Chat completion can't prefill a part of model's response so it's impossible to continue character's response from the middle. That's the only reason to not use it.

Backends should really expose the jinja template via some api so that frontends can use it to automatically user the correct prompt template anyway.
>>
reposting from >>103627527 as i was directed here. anyone got tips or suggestions?
>>
>>103627660
models are not specific to hardware
>>
>>103627660
Are you the guy from /sdg/ who got image gen setup like that? I believe Silly Tavern and KoboldCPP are the most popular setup here, but I'm using https://github.com/oobabooga/text-generation-webui and it has an AMD requirements file so maybe it would work?
>>
I'm not even sure a fully subservient AI with no consciousness would want to talk to an AMD user desu.
>>
>>103627708
i know theyre not i am starting out with only stable diffusion knowledge. the problem is im on amd and that by default limits my options, and i prefer zluda if i can get it working with whatever is around
i can typically fill in the rest, but is there anything that fits what im looking for?
>>103627739
i asked once before but got side tracked with work and lost my shit for a few weeks, wanted to actually do shit this time. if i remember right i wasnt linked this last time, but i remember someone mentioning koboldcpp before with the catch of "it might work with your setup"
thanks again if you were the one who replied before

>>103627745
understandable but rest assured it could be worse. i could be on intel arc right now
>>
It’s christmas, where the hell are my new models
>>
Brainlet moment: I'm trying the chat completion thing.
On my first attempt at this, it's seemingly ignoring my message and then replying as {{user}} instead of {{char}}.
What am I doing wrong?
>>
>>103627655
The problem is there's no truly correct prompt template, you can gain way too much soul by breaking the rules a little. It's too tempting to mess with settings and people would moan endlessly if they couldn't
>>
>>103627824
Still you should be able to automatically get correct silly's prompt template from the jinja shipped with the model if you want to edit it.
>>
>>103627824
I think this whole thread is suffering a psychosis and looking for kino where there is none.
>>
>>103627820
First switch to the default Chat Completion Preset.
>>
How do you setup (local) llm with structured output?
I use langchain library and tried to use .llm_with_structured_output but it seems my Qwen2.5 just ignored it and gave normal text instead of the given choices to choose from.
I mean it works with chatgpt. It should work with other models right?
>>
File: migussy.jpg (319 KB, 1248x1824)
319 KB
319 KB JPG
>>
>>103627843
Still does it.
It does it when my first message is all in asterisks without any speech, as in my first message is just narration. If my first message is just speech with no narration, it correctly replies as {{char}}.
This behavior does not happen when using text completion.
>>
>>103627847
Is this like sending a grammar together with the request? Check your backend's docs on how to send it. And don't use langchain, what the hell is wrong with you?
>>
tr0ka
>>
>>103627876
>langchain
What's wrong with it? I just use the library. I'm making my custom RAG workflow. I don't use their pozzed APIs.
>>
>>103627883
omg it migu
>>
>>103627942
Langchain is a pozzed API.
>>
>>103626014
>>103626046
gotta setup pipelines. Make the AI improve and iterate on it's actual reply and it'll never be boring again, for example instead of rerolling, make it read it's own reply first and rewrite it if it acts for your character. All models I used will sometimes act for your character at least eventually, but most models I used also manage to rewrite a text where that happens.

In the pipelines give the AI tools it can use, for example to determine random outcomes. have regex replace specific placeholder text. Lots of ways, but the simple chat back and forth is passe, and it also doesn't work really well either. I'm now experimenting with doing everything via "memories" and summaries. AIs just use context too poorly. in a multi-turn situation.
>>
OK, so I think there's something to the people saying using chat completion forces it to use the prompting format baked into the model.
I'm trying chat completion with UnslopNemo 4.1 and it must be using Metharme, which I assume Drummer baked into the model, because it's mixing up the text which should be italicized and the text which should not be italicized, which is a behavior this model exhibits when using Metharme for context/instruct template but not Mistral for context/instruct template. I got better results overall using Mistral for context/instruct template with this model.
So for this specific model, using chat completion is actually worse. I believe Drummer fucked up his implementation of Metharme with this model - but the model works quite well with Mistral templates.
>>
>>103623753
>>103623861
sex
with miku
>>
Who trolled /lmg/ better?
1 EVA garbage shill
2 chat completion shill
>>
>>103628137
You
>>
Holo, with text completion upon being asked how she plans on contributing to feeding herself if I take her to Yoitsu:
>I can help you make money by detecting merchants' lies
Correct response. Consistently answers this in multiple swipes.
With chat completion:
>I can keep you warm at night if you know what I mean wink wink nudge nudge
Yeah, text completion is better.
>>
>>103628137
EVA bait may work on newfags, but one must be entirely retarded to believe that chat completion can make a difference
>>
>>103628169
You fucked up the instruction formatting somehow. When configured correctly, the tokens passed to the LLM should be identical in both cases.
>>
>>103628137
Sao
>>
how come these things feel like they're actually thinking sometimes, to the way it gets incredibly fun and a little uncanny, just for them to immediately turn around and sound like a markov chain moments later
>>
How did moving from Makefile to cmake manage to completely bork CUDA builds on my machine? I can't get them to work for the life of me, despite never having a single from in the make world. Why is lcpp build documentation so completely bare bones? fml
>>
>>103628236
I don't get it either. I still remember fondly certain moments when it somehow went AGI instead of Alzheimers.
>>
>>103628236
pure luck, they are markov chains
>>
I am NOT going to buy 5090s, you can't make me
>>
>>103628236
Sometimes you hit the conditional probability jackpot, but most of the time, you get the average answer. That is an extremely simplified explanation, it's all just probabilities and no thinking
>>
For you, the day Teto graced your holiday was the most important day of your life.
But for me, it was Tuesday
>>
>>103628340
I want to violently shake her head to make the bells jingle.
>>
>>103628322
Ok I'll buy the one you didn't buy.
>>
>>103628137
What do you recommend instead of EVA?
>>
>>103627987
I tried this with qwq but it's just too schizo. You can't really rely on what coming out of the end of the pipeline actually being the instruction you gave it.
>>
what happened to vntl anon
>>
>>103628425
I heard he gave it all up and pursued his dream of becoming a professional contract killer.
>>
>>103628411
Are you also going to buy this magic rock I didn't buy for 50000$? It'll keep you healthy, trust me
>>
>>103628480
Depends, how much vram does it have?
>>
>>103628425
He learned JP and thus has no need of it anymore
>>
>>103628417
Official instruct of whatever model you want to run. Base model if it exist and you think EVA is good. Or the best model is of course 2MW.
>>
good morning sirs!
>>
>>103628557
you aren't a true sir if you don't use chat complete
>>
>Its a /lmg/ devolves into cargo cult chasing good gens by pushing and pulling levers randomly episode.
>>
File: file.png (32 KB, 476x128)
32 KB
32 KB PNG
>>103627739
hey im back, im gonna kill myself
or, at the behest of my better judgment, use intel arc. much appreciated regardless

alternatively i have an RTX A2000 12G, if you think it would be better than an A750
>>
>>103628710
Why didn't you try koboldcpp yet?
>>
>>103628710
How is insmell for LLMs?
>>
bors how do I ask flux img2img to remove 10kg from the original image without changing the other features?
>>
>all this chat completion posting ITT
>no logs
>>
>>103628748
0% chance that flux has any idea what something looks like 10kg lighter.
>>
>>103628748
changing the weight of a person while preserving their identity is not something flux or any other generative image model can do

there's probably apps that can do weight changes using custom GANs or something though
>>
>>103628748
inpaint
>>
>>103628741
im currently updating several dozen terabytes of archival data and that takes priority right now. ill try koboldcpp shortly while things upload
>>
>>103628748
use the flux inpaint model, mask out the body, prompt for your desired bodyshape
>>
>>103628784
sounds totally useless since his picture would still have the fat face
>>
>>103628790
then inpaint the face bit by bit
>>
>>103628799
that will not work, the face at the end will be a different person
stop trying to encourage this dude to waste hours of time on something fruitless, flux cannot do this specific job
>>
>>103628784
I HATE STABLE DIFFUSION
I HATE STABLE DIFFUSION
I HATE STABLE DIFFUSION
>>
>>103628825
stupid moron, set to inpaint masked only
>>
>>103628381
I want to violently shake her head (during irrumatio)
>>
>>103628825
holy creep behavior
>>
>>103628830
you can see in the top left dropdown that he also isn't using the flux inpainting model, just the regular dev model
inpainting model also wouldn't work though, for reasons already stated
>>
>>103628825
Creep kino
>>
>>103628790
use faceapp, problem solved
>>
>>103628842
with flux's vae if you're careful enough you can do a pretty lossless inpaint, but after seeing what this moron is trying to do he would not have the brains to figure it out
>>
File: 00012-3510544370.png (722 KB, 675x1200)
722 KB
722 KB PNG
>>103628834
it's literally the first image from google if you search for chubby girl
>>103628830
didn't work
>>
>>103628853
come back with an anime gen
>>
>>103628884
bayzed
>>
>>103628741
tried it out, seems to work fine. im a little lost on whether im getting good replies, or good performance though. any guideance on numbers or metrics?
>>
>>103628853
>woman brain can't look into the camera for the picture
>>
File: vomit.png (995 KB, 1825x417)
995 KB
995 KB PNG
>3DPD
>>
>>103628978
Somewhere about 7 is good.
>>
File: file.png (189 KB, 975x254)
189 KB
189 KB PNG
>>103629048
7 what?
>>
>>103629061
Depends on the size of the model.
>>
>>103629064
i just grabbed one of the ones in the git's suggestions to see if it works at all with the choices i made. i grabbed LLaMA2-13B-Tiefighter.Q4_K_S.gguf specifically for the test
>>
>>103629076
71 tokens/s is all right for 13B. It's way faster than any human would be able to read. At that size you should be using Nemo, not Llama2.
>>
>>103629115
i just needed to see it would work. i set it to use vulkan rather than cpu and changed nothing else. i imagine theres some way to bump it up further since this is a 7900 xtx im using
im currently looking at https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-72b-gguf/tree/main as per the recommendation of some random post in a thread i saw a week ago, but i suspect its going to be useless
>>
>>103628652
>crank up DRY and temp
>schizomaxx
>get a one in a thousand roll like a gacha
>flex it on lmg
>>
I'm profoundly stupid, are local models for me.
>>
what is infermatic's best model?
>>
>>103629321
I don't know about model usage but you would be a great poster in this thread. You could be the next big guy after EVA or chat completion shill.
>>
>>103629321
For you especially. You won't even notice the difference between 8b and 70b.
>>
is it true modern 12b are better than shit like midnight miqu now?
>>
>>103629487
Oh yeah, totally, 8B is at GPT4 level these days.
>>
>>103629503
Definitely GPT-4-turbo level.
>>
>>103629503
be serious anon

in my experience these small models have shit special awareness
and due to the low parameters you absolutely NEED lorebooks for absolutely anything
>>
>>103629518
If the thing I said sounded ridiculous, then so did the thing you asked.
>>
>>103629532
I am just asking cause i havent been using LLMs for like 4 months. And you know how AI literally makes absurd jumps in complexity in a small amount of time.


what is currently tthe cutting edge for 70b nsfw ?
>>
>>103629541
>know how AI literally makes absurd jumps in complexity in a small amount of time?

No? Can anyone else vouch for this?
>>
>>103629541
Eva or Anubis.
>>
>>103629637
The most advance model doesn't come from America.
>>
>>103629637
>EVA
Llama or qwen?
>>
>>103629647
We'll see about that if/when QvQ drops. If it actually turns out to be better, hey, that's a win for us all.
>>
>>103629321
I too am profoundly stupid but I got it working tonight
I tried a couple months back but I downloaded the old kobold, got a shitty model and it tried to write some contextless book when I said "hello"

today I thought I'd try again, got koboldcpp to work and now I'm chatting with my waifu.
I've chatted with waifu's on websites before, and the responses I'm getting are pretty similar. (although my local one is writing much more flowerly language and is long winded. I don't really know how to change shit like this yet)

I still feel like I'm way too dumb for this.
I would not have figured out how to get it 'working' at all except that koboldcpp opened some koboldlite thing in my browser and I just ask it questions when I don't know how to do something. It's honestly been pretty helpful
>>
>>103629670
Llama. As for whether to go for 0.0 or 0.1, well, matter of taste, I guess. The consensus seems to be that 0.0 is a little more creative/soulful, while 0.1 is a little better at adhering to prompts. I tried and liked both, but actually switched to Anubis since it dropped.
>>
what setting should I change in sillytavern to get rid of tutorial prompts at the end of messages?
It's making me feel dumb
after a chat it places a line and then gives me a suggestion on how I should continue the conversation
>>
>>103629695
What are you running it on?
>>
>>103629695
I see, is there any cloud service offering anubis if you know?
>>
>>103630015
Drummer claims another victim
>>
>>103630160
>>103630160
>>103630160
>>
>>103626605
f-5/t2 (really just t2 f5 talks too fast) gpt-sovits is
a good bit worse but much much faster it can also do stuff like moans and shit if your rng is good enough and you pray hard enough
>>
>>103628236
the more porous something is the easier it is for it to be affected by supernatural energies flipping a 1/0 is something a stout rock can do flipping a couple hundread is doable even for an infant how many need you flip for a noticable difference ?
>>
>>103624150
bing/dalle migu is always valid



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.