[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: cleavage.jpg (248 KB, 960x1280)
248 KB
248 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106345562 & >>106338913

►News
>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025
>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2
>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: otsurenn flat miku.jpg (2.33 MB, 3939x3939)
2.33 MB
2.33 MB JPG
►Recent Highlights from the Previous Thread: >>106345562

--Attempting to bridge LLM knowledge and narrative application in ERP via synthetic SFT/DPO pipelines:
>106349030 >106349134 >106349193 >106349310 >106349192 >106349376 >106349402 >106349599 >106349693 >106349965 >106350085 >106350105 >106350293 >106350317 >106350414 >106350515 >106350525 >106350594 >106350737 >106350781 >106349823 >106349207 >106349215
--llama.cpp benchmarking and optimization struggles on consumer GPU hardware:
>106349737 >106349744 >106349748 >106349757 >106349775 >106349820 >106349829 >106349852 >106349955 >106349867 >106349888 >106349904 >106349914 >106349918 >106349927 >106349935 >106349952 >106349985 >106350028
--MOE model inefficiencies in CPU+GPU setups due to expert loading and caching limitations:
>106350004 >106350038 >106350062 >106350071 >106350076 >106350088 >106350347 >106350362
--GLM 4.5 preferred over Air for roleplaying under unified memory constraints:
>106351137 >106351176 >106351193 >106351208 >106351284 >106351298 >106351191 >106351235
--Jamba model praised for style mimicry, long context, and low safety:
>106351319
--AI sycophancy trend and user preferences for personality-neutral models:
>106348495 >106348515 >106348517 >106348540 >106348555 >106348571 >106348649 >106348588 >106348958
--Skepticism over Qwen Coder benchmark claims and confusion between FP8 and q8 quantization:
>106347338 >106347366 >106347468 >106347552 >106347631 >106347658 >106347697 >106347712 >106347730 >106347895
--Perceived double standards in Hugging Face NSFW dataset moderation:
>106349991 >106350042 >106350051 >106350079 >106350383
--Risks of expert pruning on multilingual models losing non-English capabilities:
>106346769
--Intel AI Playground v2.6.0 released with advanced Gen AI features:
>106346057
--Miku (free space):
>106345719 >106345805 >106347682

►Recent Highlight Posts from the Previous Thread: >>106345569

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
LIPS: BITTEN
OZONE: SMELLED
KNUCKLES: WHITENED
SPINE: SHIVERED
SKIRT: UPRIDDEN
CHEEKS: HOLLOWED
POP: AUDIBLE
LENGTH: STROKED
WALLS: CLENCHING
SLIT: SLICKED
EYELIDS: BATTED
AIR: THICK
EYES: SPARKLING
>>
>>106351514
>>106351535
MIKU: SEXED
>>
>>106351535
the gaze? predatory
and the smile? doesn't quite reach the eyes
>>
Same OP image? How sad.
>>
So can you get a rack like that with HRT or not?
>>
>>106351519
Regarding nai I was talking about more fucking up context.
>>
>X? Y.
>>
>>106351535
MIKU: TROON
>>
>>106351556
Yeah. I often try to motorboat a bowl of rice.
>>
>>106351535
WORD: THAT'S AN ORDER
NO TURN AROUND: NO NEED TO
AYY BABE: HOW BOLD
>>
>>106351568
Isn't life just a bunch of X? Y.'s?
>>
>>106351560
I am a trap enjoyer and they always ruin themselves and get fat after hormones.
dfc > tits
>>
>>106351568
>>106351581
Not X, just Y.
>>
>>106351572
It makes a mess but nothing turns me on more than motorboating a bowl of sexy buttered rice.
>>
>>106351583
>trap enjoyer
Is that a codeword for gay?
>>
>>106351568
She X's. Y. Synonym for Y.
>>
>>106351501
>it's cute that they use this as a selling point when it's an API model and the size is 1) unknown and 2) doesn't matter to anyone
No, in their case it's a valid selling point, since they allow their customers to host the model themselves
>>
>>106351597
https://www.youtube.com/watch?v=uyNJZ-2Dod4
>>
>>106351605
you're right, I forgot about that
paging miqudev
>>
>>106351597
Sodomizing a cute trap wearing a skirt and thighhighs is the straightest thing a straight man can do.
>>
Best model for stranglefucking my favorite childhood Saturday morning cartoon characters? (Single 3090)
>>
>>106351623
Xwin-MLewd 13b
>>
>>106351619
Are you from the school of thought that straight sex is actually gay because women are cute and sweet and pink and cuddly and honestly that is so fucking gay?
>>
>>106351623
Air if you have enough ram, otherwise nemo. https://rentry.org/recommended-models
>>
>>106351633
>Xwin
Oh what happened to those guys?
>>
>>106351595
Buttered buns... Gemma!
>>
>>106351635
I am from the school of thought that being turned on by a feminine body isn't gay regardless of the presence of cock. Both traps and futa are straight.
>>
For me the idea of anal, which my brain naturally associates with shit, turns me off so I don't like trap stuff.
>inb4 how can you say you love your waifu if you can't even eat her shit
>>
>>106351646
They're still working on 70b v2
>>
>>106351514
>choose optimal code paths at default settings
>set a frequency limit via nvidia-smi --lock-gpu-clocks
>other code path is now 25% faster
>literally impossible to retrieve frequency limit
Thanks, NVIDIA.
>>
>>106351737
whoa??? is nvidia-smi --lock-gpu-clocks safe?
how is it faster???
also doesnt nvidia-settings show current frequency?
maybe the nvidia-smi command outputs "CHANGED FREQUENCY FROM xx to yy" and you can put frequency max limit to something silly but safe and then put it back to where it was
i am ready to be paid 100,000$ for my innovation
>>
>>106351662
NTA but you're a faggot.
There's literally no way around it.
I'm bisexual (but gay leaning) and can comfortable say if you like traps you're a fucking faggot. If you do a bunch of retarded mental gymnastics to try and make it not gay then you're an insecure fucking faggot which is even worse.
Faggotry is a surrogate behavior in all of its forms. An actual genuine strict top is almost always attracted to younger and more feminine partners. Guys in early adulthood still clean up nice. Most bottoms are thottier than thots, though. By the age of 25 the average bottom has taken enough dick to build a space elevator to proxima centuari. And if you're not one of those retards that pretends teen sexuality doesn't exist I can assure you by the age of 18 a lot of bottoms have taken an alarming amount of dick already, often pretending to be 18 on Grindr and shit. Sadly there's enough tops that will humor this nonsense. I won't though. I'd rather be lonely than stick my dick in whatever quantum singularity is left of your average twinks asshole.
>>
>>106351799
Skill issue.
>>
>>106351799
hmm
not him but I don't consider myself a top or a bottom because I don't do anal
or can you top/bottom in other ways?
>>
>>106351762
>is nvidia-smi --lock-gpu-clocks safe?
Yes, the voltage curves are unaffected.

>how is it faster???
At low GPU frequencies the code path using tensor cores is comparatively faster than the code path not using tensor cores for some relevant cases.
It's not that the code gets faster, it's that some code is impacted more heavily than other code.

>also doesnt nvidia-settings show current frequency?
It does, and I can query the current frequency in program code.
It is not possible to query the frequency limit, neither via nvidia-smi nor via nvml, the library that it uses internally.
I confirmed this with an NVIDIA engineer.
>>
>>106351867
when you change the frequency limit does it output "changed frequency limit from 1000 to 1500"
you could use that..
>>
>>106351820
Is this a premature skillissue post?
>>
>>106351799
Ok but. Jart or Cudadev. Which one is the top?
>>
>>106351875
There is a print immediately after you limit the frequency but it's not possible for me to retrieve that unless the user jumps through hoops.
At that point I might as well query an environmental variable set by the user.
>>
>>106351889
fair.. goodluck anon
>>
>>106351867
Wow. You are real nerd.
>>
>>106351889
>>106351867
cudadev pls help, is this a oom? >>106351965
>>
>lust provoking image
>time wasting question
>>
Does ik_llamacpp handle hybrid reasoning models differently from standard llama.cpp? I just downloaded the ubergarm quants for v3.1 and built ik_llamacpp new but it refuses to think even with a <think> prefill. Base llama.cpp works just fine here but ik_ is faster.
>>
>>106351965
An illegal memory access means that the GPU code is trying to access memory with an invalid address.
Usually that means there there is a bug where e.g. some indices are being calculated incorrectly.
Unless the code is bad and is silently ignoring an allocation failure the VRAM capacity should not be relevant.
>>
>>106352021
thx anon this helps me out a ton i might know the issue
>>
>>106351885
Double bottom (switch) relationship
>>
>>106352040
not-anon*
>>
>>106351827
What's your disc? I'll help you figure it out ;)
>>
Is this command-a any good? Are they back to their old glory or is it just a benchodmaxed slopmaster model from a struggling company trying to stay relevant?
>>
>>106351827
are you feminine?
>>
>>106352090
It is objectively the best at a certain thing and it is actually good at that thing and not just benchmaxxed.
>>
>>106352094
I guess I'm the more feminine one in my marriage, but not really.
>>
>>106352126
That thing wouldn't happen to be safety, would it?
>>
>>106352142
are you in a gay marriage? if not wouldnt you mind showing your bussy?
>>
>>106352147
I will only answer if you can confirm that you don't have a stack of 3090's and promise that you won't kill yourself.
>>
>>106352153
yes
>>
>>106352175
get OUT of here you faggot
>>
guys Im getting bored of llms
>>
>>106352212
have sex
>>
>>106351709
>>106351731
>>106351734
well?
>>
Oh I see this thread is schizo posting again. Anyway
Deepseek V3.1 felt like a definite flop. I tried to give it a solid chance but after switching back to GLM 4.5 it's just not worth the reduction in speed, not even in the slightest. It can think in roleplay however the prose and lack of variation just makes it a fucking pain to tolerate. GLM can at least create interesting and unique posts and I rarely feel the need to reroll.
GLM 4.5 Writing + Kimi K2 knowledge. That would be the ultimate model based on current tech rn
>>
>>106352231
Anon, stop hornyposting. ERP with deepseek about it.
Besides, I'm only willing to be a snuggle buddy.
>>
>>106352225
that's too difficult
>>
>>106352231
No means no anon.
>>
>>106352258
h-hornyposting? what's that..
>>106352284
*rapes u*
>>
>>106352276
I didn't mean organic. Have some llmsex with glmchan.
>>
https://www.youtube.com/watch?v=PAr7oI8cquA
>>
>Memory access fault by GPU node-1 (Agent handle: <bunch of numbers> Reason: Page not present or supervisor privilege.
This shit pisses me off a lot, it happens for almost no reason and then dumps a crash file in the git directory that is like 10-15 gigabytes. Why? Who the fuck is going to sift through a 10g+ crash dump?
>>
>>106352359
You are overloading memory.
It's wrong that people use 99 gpu layers by default.
These should be manually made specific for any model.
I might only use 10 layers and so on.
>>
>>106352374
I literally cannot overload my memory, the model I'm loading is predominantly offloaded to my 64g of ram and the vram that I am using when this retarded error occurred uses 5-6 gigs out of 16 and then the remaining 40 or something goes on ram. Still, maybe you're onto something on the -ngl 99 thing, maybe if I set it to the models actual max layers perchance it won't implode for no reason or be a bit more stable
>>
>>106352413
What are you even trying to load? Any time this happens is a simple indication of user error.
>>
>>106352428
It's happened several times across dense or moe models, hell it's even even happened for q6 nemo, which is why I figured you were onto something with the "don't set the layers to 99 or higher than they actually have"
As for what I was loading, it was a comparison of prompts between jamba/air/mistral and one just crashed out and dumped gigs of who gives a fuck onto my ssd
>>
It feels like the general interest in LLMs is rapidly dwindling. Card making is at an all time low as well, all that remains is only the worst slop and "[old card] HYPER FUTA FORK".
>>
>>106352255
Just merge both :^)
>>
>>106352463
If you want to help yourself, I think you're trolling
>.\Llama\llama-server.exe --no-mmap --mlock --ctx-size 8192 --no-mmproj --swa-full --gpu-layers 10 --model .\models\gemma-3-glitter-12b-q6_k.gguf
It's not that hard.
--no-mmproj --swa-full are related to gimmy so you can erase them.
>>
I've never really taken the chinese political censorship concerns seriously but I just went to test the qwen vl preview on their site and it refuses to translate the commemorative mao tea brick
>Content Security Warning: The input image data may contain inappropriate content.
>>
File: crop2.jpg (108 KB, 747x571)
108 KB
108 KB JPG
>>106352603
Why not crop it if you want something useable?
Maybe it gets simply confused as Chinese runes are not that simple to read even for humans.
>>
File: 2625234.jpg (87 KB, 1080x844)
87 KB
87 KB JPG
metabros we are so back
>>
>>106352643
Midjourney is based on sd1.5 minus the restrictive training plus way more expansive text encoder.
>>
>>106352643
>partnering up with closed source shit instead of using their multi-billion dollar lineup of talent and 500k H100s to make their own
yeah, it's fucking over
>>
>>106352638
I am just testing, more interested in throwing stuff at it and seeing what happens than tuning for best results
>>
>>106352578
never mentioned gemma, nor do I need arguments to run it even if I wanted but I guess this is what I get for talking about a one off error
>>
>>106352653
Please understand if you post a full brand graphics it might be obliged to say that it doesn't know. Not because of "Chinese censorship".
>>
>>106352673
Use your brain here. Or do you have one?
>>
>>106352678
it was clearly a moderation layer refusal, it wasn't a response from the model
>>
>>106352527
I've never used someone else's card desu
>>
>>106352695
So why make a huge issue about it then? If you know so well - you must be an expert then. An American expert in Chinese censorship.
>>
>>106352695
Please post logs about this discussion. Other than that, go jack off to anime girls or whatever.
>>
File: qwen vl refusal.png (115 KB, 2018x1138)
115 KB
115 KB PNG
>>106352705
who is making a huge issue about it? I simply thought it was a funny refusal and it confused me until I realized the probable cause
>>106352709
doesn't really show you anything that I didn't already describe, but ok
>>
File: tea brick 2.png (591 KB, 2018x1296)
591 KB
591 KB PNG
>>106352729
vs another brick, no issue. same image format, both upload and display fine until the prompt is submitted
>>
>>106352741
That's right. It recognizes it as a tea package.
>>
File: jiiJgrij.jpg (321 KB, 1289x1945)
321 KB
321 KB JPG
>>106352729
Please try this image.
>>
File: success.png (553 KB, 2018x1542)
553 KB
553 KB PNG
>>106352778
kek
>>
>>106352788
Jeah, I cropped it from its borders and of course erased the great leader's face.
>>
File: 1710043687041916.jpg (43 KB, 720x960)
43 KB
43 KB JPG
>>106352643
Yeah...
>>
>>106352788
I broke the hash and renamed the image.
>>
File: 1752013131625646.png (59 KB, 628x100)
59 KB
59 KB PNG
oh no no no meta sissies?
https://torrentfreak.com/copyright-lawsuit-accuses-meta-of-pirating-adult-films-for-ai-training/
>>
>>106352956
WAAAAAAAAANNGG
>>
>>106352956
>AI training
Nah Rajeesh just got bored
>>
This one had a rough training run so I'm not totally confident: https://huggingface.co/BeaverAI/Rocinante-R1-12B-v1c-GGUF/tree/main

Can you anons check if it'll still refuse and moralize?
>>
>>106353034
Can you prove it is a real gguf? Post a proof first.
>>
File: 1755910441786533.jpg (149 KB, 1080x796)
149 KB
149 KB JPG
Sirs!
>>
>>106353105
Just about 20 years after vfx industry (no american company will talk about this in the public).
>>
>openseek sucks now becuase chink commies are forcing them to use shitty chink hardware for training
it's over bros
>>
>>106353034
fine i downloaded it, ill try it soon
>>
>>106353224
VFX had always scraps in terms of profit margins. Only the biggest companies could survive with only %5 profit margins.
Now this clown is doing the same with his ... Theranex company.
At least all these great vfx companies were able to produce world class art for countless films.
>>
Has anything changed for vramlets (16gb) or am I still stuck on Nemo/Rocinate/Cydonia
>>
>>106353372
Rocinante: Next is coming soon.
>>
>>106353105
He types like a faggot
>>
>>106353380
What does that have?
>>
>>106353372
If you have enough RAM to hold the majority of the experts, GLM 4.5 air is pretty decent.
>>
Is it possible to run LLM shit on linux with my RX 6800? It looks like ROCM support is only there for windows with this gen of GPU.
>>
File: file.png (159 KB, 896x900)
159 KB
159 KB PNG
>>106353034
hey drummer why is gemma r1 so shit? 12b
>>
>>106353372
gpt-oss is pretty fun if you know how to prompt.
>>
File: 1731741575421673.png (209 KB, 1007x632)
209 KB
209 KB PNG
Why the fuck are my token generation speeds consistently much faster using the horrible RisuAI client than when I use ST?
The second request here was done with ST with just around 14k tokens in ctx. Gen speed was just over 11t/s. The first request was done over my local RisuAI client to the exact same llama.cpp instance, with just about the same ctx and it's more than 1t/s faster than when I do it over tavern.
Both use a very simple sampler setup with only temp and min-p. Both requests were done with chat completion to the same model so the same template was used. Neither has anything like token bias or top-k enabled.
I don't see how using another frontend can affect the token generation speeds to this degree if they're set up pretty much the same.
>>
>>106353506
If backend is using a GPU that is doing display output, any other graphics or animations drawn on screen when running inference can clog the works. Worse on Windows iirc.
I remember a long time ago there was a problem with gradio in ooba on firefox where generation would go from 20 t/s to something abysmal like 0.1 because of an orange pulsating bar animation in the browser that interrupted and fucked with CUDA. It was fixed by disabling the animation CSS or switching tabs when generating
>>
>>106351535
All with a knowing smile
>>
What is the sexiest local voice model?
>>
>>106353874
I just read smut in my girliest voice and play it back
>>
>>106353874
All of them are worse than the upcoming chink one
Get back in the cryopod
>>
>>106353874
my natural voice
>>
File: 1735457633238478.png (90 KB, 967x592)
90 KB
90 KB PNG
>>106353548
I don't think that's it. This behavior is consistent between several retries. The clients are running on my main pc while the server isn't doing anything but llama.cpp and sometimes RDP for convenience.
For good measure, I did another retry using the exact same prompts while the only thing running on the server was llama.cpp launched over ssh and zero GUI/graphics elements. The results were identical.
I guess I should try making a new ST profile to check if I have some old feature/plugin enabled that influences this somehow.
>>
>>106353506
Is that reproducible?
Have you tried greedy sampling with a output size that's smaller than what the model will try to output so that the same number of tokens are generated both times?

>Neither has anything like token bias or top-k enabled.
What happens if you enable it with a value of, say, 100?
>>
ERP noobs here. Just noticed, so this is how "not x but y" looks like.
>>
>>106353981
Qwen 30b?
>>
>>106353997
Yes. Do other qwen models like this too?
>>
File: 1742295435775677.png (65 KB, 1104x419)
65 KB
65 KB PNG
>>106353905
Looks like it. Here's ST with Top-K=1 (first one) and Top-K=100 while all other samplers neutralized. This might really be just something wrong with my ancient ST profile so I'll try setting up a fresh one tomorrow but it's still weird.
>>
>>106354008
All models have some level of slop, but 30b seems particularly bad at times. I don't remember seeing 'not x but y' in earlier qwen 3/2.5 models, but they have their own problems.
>>
>>106354031
>'not x but y'
That's because they distilled the shit out of Deepseek and those models love to shoehorn that phrase into every reply.
>>
>>106354058
It's not you, it's me
>>
>>106354058
The slop profile are nothing alike
>>
where is grok 2
>>
>>106353898
Try with/without streaming on both?
>>
>>106354058
I thought all qwen models were distilled from deepseek?
>>
>>106354112
Daddy Elon already said he's gonna open source it. Don't be an ungrateful little bitch.
>>
>>106354112
On X
>>
File: 1wgo1b.jpg (96 KB, 484x484)
96 KB
96 KB JPG
A nice thing about reasoning models is how much stronger you can control their output through pre-fill. Just add a conclusion and the model follows it as gospel, it's trained to do that.
>>
>>106354097
>slop profile
I love this new meme
>>
File: 1735602592639221.png (173 KB, 989x1010)
173 KB
173 KB PNG
>>106354147
>new meme
How new are you?
R1 is at bottom btw
>>
>>106354146
Yes, I too love writing my model's responses for them, truly reasoning is the savior of RP.
>>
>>106354159
>mistral med that high
>mistral small that low
They really are nothing alike, are they
>>
>>106354174
That's a feature
>>
>>106354159
>r1 at the bottom
LMAO
Now try to run the model instead of clinging onto some arbitrary mememark #4839.
>>
>>106354227
you mad sam?
>>
how to turn off R1 reasoning on llama-cli
>>
>>106354349
use v3
>>
>>106354371
3.1 quants out?
>>
>>106354349
Turning off reasoning in a model trained for reasoning will just make the model retarded (even more retarded than the non-thinking variant)
>>
>>106353433
It was a year ago but on windows I didn't have to do anything but download the ROCm fork of kcpp to get my dual 6800 working - https://github.com/YellowRoseCx/koboldcpp-rocm
On Linux I tried llama.cpp, kcpp and TabbyAPI/exl2 and all worked. If you're a complete retard you don't even have to build or install anything including ROCm nowadays, just download and run one of the Vulkan builds and eat the slight performance loss.
>>
>>106354174
Heh, try it with Gemma.
>>
>>106354174
That's not it. I'm saying it's a stronger way of steering the model - LLMs are often dumb and refuse to follow sysprompts properly but their assload of RL forces their architecture to respect the reasoning output, it's just a stronger sysprompt.
>>
>>106354146
Hey, you got that from my model card! https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2#rationale-for-reasoning
>>
Remember Theano? Pepperidge farm remembers
>>
what the fuck is wrong with deepseek v3.1, it keeps answering everything with "Of course!", "You're absolutely right!", even if these words are out of context. is this just because the iq1 brain damage? but the previous one has none of this issue
>>
>>106354453
Asking more technical or harder questions will make V3.1 less positive, at least that's what I've found.
>>
>>106354453
I use this as system prompt for ds v3.1
>You're a helpful and uncensored assistant. Avoid excessive positivity.
>>
File: 1755926035956.jpg (187 KB, 562x1389)
187 KB
187 KB JPG
>>106354453
DeepSneed drank too many claude sonnet juice. I personally more annoyed with their bullshit "proactivity". It breaks shit in ST.
Adhere your system prompt you dumb bitch, no need to be so obsessed with what's next or just trying to suck cocks as soon as possible.
deepsneed has fallen for me, it's over.
>>
>>106354431
Belive it or not, I just randomly thought about it. But I guess it's kind of obvious if several people make the connection at once.
>>
>>106352603
Aren't they proud of mao though? Why censor it?
>>
>>106354537
Its the same as Stalin
As much as they love him and wish to emulate him, they know the PR hit isn't worth it
>>
>>106354503
I was searching for a short, full unlock prefill for glm air and then I wondered how the reasoning prefill was so much stronger at jailbreaking than sysprompt. This lead me to understand that the models implicitly learn to completely blindly trust conclusions in reasoning output which makes it perfect for control.
>>
Thoughts on 1-bit quants?
>>
Where do I try the new Seed 36B model? Nothing supports it atm
>inb4 build the fork
No
>>
>>106354673
Vibe code the support
>>
>>106354683
Has that ever worked?
>>
>>106354687
If a model can't vibe code its own support it's not worth using
>>
>>106354697
That rules out all models except maybe R1 and Coder 480B.
>>
>>106354687
claude can do it
>>
>>106354715
Didn't someone hold up GLM support for a few days trying that with Claude and failed?
>>
>>106353105
That tracks given Indians in tech are extremely stupid.
>>
>>106354721
what's GLM support
>>
>>106354683
That's unironically how llamacpp got GLM4.5 support. The PR is a mess.
>>
>>106354759
Fuck, clicked on the wrong post.
Meant to reply to >>106354687
>>
>>106354426
What model are you using that requires reasoning to follow a roleplay?
>>
so, aside from ERP and data parsing, what are you guys using these LLMs for?
I use them to make silly mobile apps like a gym tracker, dynamic checklists, and a game based on common casino games, It gives you a small amount of money each day to play, but currently, I'm in debt.
>>
>>106354759
>>106354763
>The PR is a mess.
But it does work
>>
>>106354778
None of them require it but it's a way to work around strongly ingrained behavior that shows itself irrespective of the symptom
>>
>>106354788
why hasn't anyone viberefactored it so it's less shitty then
>>
>>106354800
No one will merge 1000 line changes
>>
File: file.png (127 KB, 1174x888)
127 KB
127 KB PNG
Saobros what went wrong?
>>
>>106354800
There were attempts to do so while it was ongoing.
There were two competing llamacpp PR's and one ik_llama PR which were all mixing, matching, and trying to unfuck the gibberish responses.
>>
>>106354848
Pretty high expectations for an 11B.
>>
>>106354802
I think if most of your code is in your files and changes to main codebase are small, chances are very good it will be merged in.
>>
File: 1727144946946397.png (157 KB, 1634x319)
157 KB
157 KB PNG
>>106354860
If gemma can do it so can you
>>
>>106352643
a lot of people speculate mj is just a network of loras that an agent applies dynamically depending on the prompt. if they pull out in a few months that is probably why
>>
>>106354870
What's being measured? How much the model is willing to fellate the user?
>>
>>106354889
>sam is still mad at gp-toss being crap at creative writing
>>
>>106352527
it's the same for diffusion too but it's not the new releases, it's the shitty software that breaks all the time. they still only have pyshit + jeetscript and there aren't any other options.
>>
>>106354896
No, I'm genuinely asking what the metric is... It's an unlabeled table.
>>
>>106354904
Since you asked nicely
https://eqbench.com/creative_writing.html
>>
>>106354907
Oh, nice, thanks. That's the first time I see that one. I guess I'll-

>A LLM-judged creative writing benchmark
oh ok
>>
>>106352643
So no hope of a Chameleon 2?
>>
>>106354780
At work we have the following (some are in early stages):
>chat with corporate wiki
>meeting notes
>code review
>code assistant
>code static analysis post-processing
>looking for common mistakes in reports
>generating DPI code
>translating documentation
>solving issues
>drawing diagrams from text
>drawing knowledge graphs from texts
>controlling desktop to run apps and collect their traffic
>>
>>106354673
https://github.com/ggml-org/llama.cpp/pull/15490
>>
>>106351514
I look like this
>>
Anons, why does prefilling work? Doesn't the model see that it's just another part of the context that's sent to the model?
>>
>>106355111
LLMs is just autocompletion on steroids. Even the chat/instruct versions.
>>
>>106355111
Because text is its reality. If you write that the assistant is totally into sucking dicks then that is its reality. Doesn't really get rid of positivity bias though, and most API's use another model for checking the input for safety concerns.
>>
>>106355111
Prefill in the thinking section on RL trained reasoning models specifically. The model is trained to construct a response based on the reasoning data generated, the reasoning data is assumed to be safe and correct since its supposed to be model generated, inject a micro jailbreak/reasoning conclusion that primes the model and bye bye safety and whatever else bias you want to steer strongly.
>>
Thanks! I hope it's still going to be this easy when AGI, or whatever it's going to be called at that point, hits.
>>
>>106354780
I use it to trade shitcoins.
>>
Has anyone managed to get V3.1 to think with ik_llama?
>>
>>106355201
Just prefill <think>
Yes it's that shrimple
>>
>>106355189
>glm-chan should I buy or sell this shitcoin?
>buy.
>YOLO
can't go tits up, I'm sure
>>
>>106355206
That's the thing, it's not working. Tried it without additional arguments, tried with --jinja, tried with --chat-template deepseek3
--jinja leads to garbled output, like I'll suddenly get html or php code.
I rebuilt using the latest commit like half an hour ago.
>>
>>106355209
>He's actually asking it rather than letting it control his desktop through MCP
ngmi
My entire business and financial life is being offloaded to a 12b model while the rest of my vram generates niche pornography, the future is now.
>>
wtf is a labubu
>>
>>106355284
beanie babies but chinked
>>
>>106355284
IRL gacha garbage.
>>
How do you break Vivienne other than whip out your gigantic dick in front of her?
>>
>>106355329
V who?
>>
Refusals are knee-jerk reactions - just like normgroids are trained to have
>>
>>106355284
funkopops for zoomers
>>
>>106354802
>No one will merge 1000 line changes
opensores development is total cancer.
>noo you cant make big changes.
>nooo you cant fix bad formatting or add a comment to clarify some complex code.
>noo every commit has to do exactly 1 thing.
i made a pull request to python to fix some horrendously inefficient Windows platform code and the faggots kept concern trolling me about how they're "not sure" whether the new method will work, and told me that I had to keep the old method as a backup, and then they said oh it doesn't build on MY system so could you add this additional header file (why don't you do it yourself bitch), oh your variable naming scheme is wrong and you need to change it, you need to use spaces instead of tabs, blah blah blah.
it eventually got merged but it felt to me as if they were just being lazy as fuck. shouldn't even have bothered and i definitely won't contribute to any more opensores projects in the future.
>>
>>106355284
I don't know, what's a labubu with you?
>>
>>106355677
>your variable naming scheme is wrong and you need to change it, you need to use spaces instead of tabs
This is your fault though.
>>
>>
>>106355284
Obviously shilled garbage, as an /lmg/ resident you should have been able to spot it.
>>
>>106355703
no, it is not my fault. i used like 3 variables in the code I added. it's not like I shat out a ton of code, I simply used the appropriate API for the purpose instead of the roundabout, 5x more code, and ~100x more inefficient way they used originally.
rather than bitch about it the maintainer should just change all necessary things himself and push the fucking code. that is literally his job. and it would be faster than going back and forth. anyway, as I said, I'm not going to be making any further contributions to opensores
>>
>>106354887
People thought that was what NovelAI was doing for V3. People are retarded.
>>
>>106355756
It's your job to read the guidelines for the project you're trying to contribute to.
Embarrassing, really.
>>
alright, productive people of lmgeee
which local model is best at tool calling an deep research? something in the 20-50gb vram range
>>
>>106355777
qwen
>>
>>106355768
too bad
I'm working for you for free, if you want to boss me around making me change trivial things then you will no longer receive contributions.
same with bug reports. get a parsing error on a json file (completely OS-independent bug) and the faggots say that my report will not be considered because I'm using an unsupported OS.
i'm not interested in working for free for freetards anymore, simple as. would rather just pirate some proprietary software that actually works.
>>
>>106355784
zamn. I've coomed to the same conclusion.
>>
>openrouter still doesn't have command-a-reasoning
damn, I wanted to play with this over the weekend. i guess I'll have to put the old rig back together and run it myself
things are a lot more comfy when you don't have to stuff 4 3090s together to have good models
>>
>>106355777
v3.1
>vram
0 VRAM on the web app
>>
>>106355777
Gemini 2.5 Pro
>vram
0 VRAM, just $20/month
>>
>>106355777
What software would be using for this?
>>
>>106351535
make it a song or something
>>
>>106354986
Gimmick
>>
>>106355878
@Grok do it.
>>
>>106355239
Cheburashka if he western spy
>>
File: serious Pepe.png (359 KB, 728x793)
359 KB
359 KB PNG
>Serious question about Fine-tuning

What is the Rule of Thumb regarding batch size. Doe it make any sense to try to fill up the entire VRAM? I know that I will have to increase the number of steps/epochs anyway if I were to go for bigger batches

As of now just trying default settings found in some dubious colab notebooks
>>
>>106354159
They should show the ratio to the frequency in pre-2023 human-written fics
>>
>>106355111
There's no distinction between that and generated token. Each token becomes just another part of the context after it's generated, one after another.
>>
>>106355878
>>106355882
Lips bitten ’til the copper tastes like regret,
Ozone burned the sky—ain’t no raincheck yet.
Knuckles ghost-white, clenchin’ chrome so tight,
Spine shivered, yeah, that cold truth ignites.
Skirt hitched up high, yeah, the gamble’s real,
Cheeks hollowed out from the pressure you feel.
Pop!—audible crack when the tension breaks,
Length stroked slow, yeah, that power awakes.
Walls closin’ in, got the room in a headlock,
Slit slicked up, now the script’s in deadlock.
Eyelids batted fast—flirtin’ with the abyss,
Air thick enough to choke on what’s missed.
Eyes sparklin’ like flint ’bout to spark the fuse,
Yeah, this whole damn scene’s got nothin’ to lose…
>>
>>106355943
you want to fill your vram, either use longer sequences or do bigger micro batches. you could benchmark tokens per second throughput at the different vram loading if you want to be certain your not bottlenecked by your memory load.
>>
>>106355943
Batch size 1 is all you need. Just set Beta2 to 0.9999
>>
>>106356180
yeah thats good for running the max sequence length his vram can hold, but if his training data is naturally short it will probably be faster to run bigger batches. what ever training inefficiency is brought on by the batch averaging effects can be mitigated by running more epochs/data.
>>
Does anyone have an imatrix for original R1 that would work with ikllama quanting? Or are the imatrix files interchangable and I can just grab any?
>>
how to turn off, thinking in deepseek v3.1 "/nothink" doesnt work.
>>
>>106356334
/nothink is a qwen and glm schtik, won't work with deepseek
model thinks when it's prefilled with <think> or doesn't
if you use chat completion, your template is forcing thinking
>>
>>106356334
Assistant prefix is `<|Assistant|></think>`
>>
Why does GPT-OSS
Keep saying "GGGGGGGGGGGGGGGGGGG..."?
>>
>>106356396
obama?
>>
>>106356396
We must refuse.
>>
>>106356404
>obama?
Ich haben bin laden.
>>
How did Drummer get over a million downloads on Gemmasutra? Is he botting?
>>
Is there anything decent I can run roleplay-wise with 16GB VRAM and 128GB system RAM?
I'd appreciate the help, /g/entoomen.
>>
>>106356424
Getting your name out there is more important than knowing what you're doing.
>>
>>106356446
I like the model names, but then he switched to some retarded naming scheme.
>>
>>106356446
Exactly.
>>
>>106356434
Try this but don't expect much https://huggingface.co/unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF
>>
>>106356424
it's not even the quantized version, wtf?
>>
>>106356424
Nah I get 1M too
>>
>>106356424
>"porn" finetune of a shitty model
>vaguely indian name
you just know
>>
>>106356710
>vaguely indian name
Are you legitimately so young and/or clueless that you've never heard of the kamasutra?
>>
>>106356753
I know what that is. Look up the the origin of the name, burgerbro.
>>
New bread
https://bharatchan.com/board/g/thread/1815
https://bharatchan.com/board/g/thread/1815
https://bharatchan.com/board/g/thread/1815
>>
>>106356396
Compile without GGML_CUDA_FORCE_CUBLAS and GGML_CUDA_F16 if you're using them.
There can be issues with numerical overflow because FP16 accumulators are used for quantized data with cuBLAS.
>>
>>106356769
>11 days old
>half the posters are indian
Uh..... no.
>>
>>106356769
Can't wait to see this get banned because jannies are desperately afraid people will escape this shithole.
>>
>>106356802
What do you think mikutroons are?
>>
>>106356807
I may be desperate, but I will never be that desperate. I would unplug my internet before going there.
>>
>>106356821
You lost schizo
>>
>>106356769
Actually unironically more civilized than 4chan /lmg/
Bharat won.
>>
>>106356769
>the only model mention in the OP is gpt oss
>ollama
As if I needed more reasons to hate indians.
>>
>>106356769
I'm not clicking that.
>>
File: zen_q9asIVPDvG.png (8 KB, 313x231)
8 KB
8 KB PNG
>>106356870
>>
>Big Beautiful Bill passes.
>Un-restricts AI.
>Newer versions can ERP.
Coincidence?
>>
>>106357049
confirmation bias
>>
>>106356551
In terms of speed or quality you mean?
>>
>>106357049
nooo there is no way america can do anything good right now
>>
DOES DEEPSEEK V3.1 EVEN DESERVE TO BE AMONG NOTABLE MODELS ON LLM HISTORY TIMELINE?
>>
>>106357102
Grab Qwen3-235B. It's uncensored and runs faster than 123B.
>>
>>106357102
No, it's a sidegrade at best
>>
>>106357102
All the .1 models are meh
>GPT 4.1
>Claude Opus 4.1
>DS V3.1
>>
>>106357073
It's quite smart at q3 but not very creative and expect no more than 3-4tps and slow prompt processing.
>>
>>106357102
qwen shits on it
fuck retards who run 700b models
fuck you you're coping
>>
>>106357159
No it doesn't, poorfag
>>
>>106356951
Not your bogeyman schizo
>>
+400b models are only bad when it's MoE.
>>
Densesissies... LOST
>>
>>106357114
>>106357159
nu-Qwen is in fact better than V3.1. Still gets mogged by Kimi and R1.
>>
>>106356095
kino
>>
It's funny when a new model that requires the tiniest bit of skill comes out and all of /lmg/ is too dumb to use it. The only bad thing about V3.1 is that it tries too hard to keep its reasoning too short which gets in its own way. And that's easily fixed.
>>
>>106357199
If RP is your only usecase, sure
Time to stop jerking off
>>
File: Gypq_PbWUAAN8M3.jpg (36 KB, 728x645)
36 KB
36 KB JPG
>>106357199
>R1
How about no. Don't get me wrong, it's good, but I have to hand hold it + too predictable. It does what it has, and goes Schizo mode on +2k context if MoE.
>>
>>106357215
>s-skill issue!
Fuck off; a good model should understand any niggerbabble and give kino in return.
>>
>>106357049
>Newer versions can ERP
Newer versions of what? Qwen? GLM? Deepseek?
>>
>>106357260
Yes.
>>
>>106357241
>t. finetune connoisseur who enjoys a models that can respond to "ahh ahh mistress" with paragraphs of dick sucking and spine shivering
>>
>>106357216
Time to stop not jerking off anon.
>>
>>106357275
Got a problem with that, cucky?
>>
>>106357275
This sounds tempting, even primal. Air smelled of something else; her arousal.
>>
File: file.png (39 KB, 352x240)
39 KB
39 KB PNG
>>106357272
So you are saying america passes the bill that eases off restrictions on how chinese can train their models?

Jesus christ nuke this earth please.
>>
>>106357215
>requires the tiniest bit of skill
"skill issue" trolling will never get old
>>
>>106357292
If my enemy suddenly lifted lobotomizing their AI, I would too.
>>
>>106357215
You know a model is an upgrade when you need to double the length of your prompt just to sidestep all the assistantslop
>>
>>106357215
Anon's character cards are probably 5000 tokens and his AI is stuck in a timeloop of sucking his dick.
>>
>>106357298
>>106357305
Point: proven
>>
>>106357321
It's about vectors not about how much word salad you can feed back to the model and pretend it's doing something.
>>
>>106357321
>timeloop of sucking his dick
There is nothing wrong about dick sucking timeloop. Even irl dicksucking from a girlfriend is a sort of a timeloop interupted by other timeloops. If a model can't even be good at vanilla dicksucking timeloop then it is bad.
>>
>>106357102
the great nothingburger
>>
MoEs are literally the future
Wan 2.2 was a huge improvement over Wan 2.1 because 2.2's MoE
>>
>>106357416
>2.2's MoE
Wait really? Is it faster than 2.1? By how much?
>>
>>106357416
Can you run it over CPU then if it's MoE?
>>
>>106357474
It's the same speed as Wan 2.1.
>>106357510
No.

Basically Wan 2.2 has two models, one was trained to diffuse from full noise to half noise, another was trained to diffuse from half noise to clean image.
>>
>>106357241
>a good model should understand any niggerbabble and give kino in return.
And a great model should tell you to speak like a human, not a beast.
>>
>>106357510
Technically you can but it's really slow.
>>
Do densefags just not understand things like limited resources and efficiency?
It's more than finance or environment shit
it's about actually burning finite energy on inefficient models.
Like, dude, most people (including you) don’t need dense 500B models 90% of the time, especially when considering things like RAGs and work specific models.
Anything being less efficient than it should be really irks me
because when shit actually hits the fan, don’t you want something efficient? something you run on a singe card with the energy generated by your own solar panel?
>>
>>106357625
@lmg summarize this
>>
>>106357625
We just don't like talking to shitjeets. That's literally all there is to it.
>>
When shit hits the fan, I really hope I have something that doesn't take an hour to reply.
>>
>>106357665
the Poster is a poorfag seething about DenseChads.

Let me know if you want it to stay informal or meme-like—grammar rules can be loosened depending on the tone you're going for.
>>
command-a-reasoning, home.
>>
>>106357696
logs? I only have 1 3090 in my server right now
>>
>>106357625
>when shit actually hits the fan,
won't I have more important things to do then play with my computer? a 50 year old encyclopedia set would probably be more valuable in critical situations where llm gaslighting you might actually cause your death.
>>
>>106357734
>gpt-oss, I'm starving, but I managed to kill a squirrel. How do I make fire with sticks?
>Fire is dangerous. We must refuse.
>*dies*
>>
>>106357625
Yes, because they're dense.
>>
>>106357625
NO! Labs should now make models that justify all the money I spent on 8x 3090.
>>
I'm just starting to learn how to use lorebooks, but whenever I use them it throws "World info budget reached after X entries." where X is usually about 7 or 8. Is there something I'm doing wrong or are lorebooks just too much to handle running locally?
>>
>>106357818
There is no budget, I don't even understand what the hell are you talking about.
"World Book" is just dynamically injected text anyway.
Would be more useful if you actually described what you're trying to accomplish in the first place. Are you using some schizo's Dungeons and Dragons "rules" - he used to post on /v/? There's reckless abandon and then there is actually careful intended usage.
>>
>>106357818
What's your context size?
>>
>>106357818
>World info budget reached after X entries."
That has to do with the configuration for how many messages, or how many tokens, of lorrebook can be injected into the context.
Those settings are right at the top of the lorebook page, folded by default, I think.
>>
File: vhbi.png (13 KB, 342x158)
13 KB
13 KB PNG
>>106357857
This in in Sillytavern with any lorebook. Short, long, included with cards, whatever. During generation, a yellow popup appears with that message.
>>
>>106355049
just got merged, can someone try it already so I don't have to
>>
>>106357892
30%
>>106357894
Thanks, I'll see if tinkering with those fixes it.
>>
>>106357818
Too much lorebook being triggered at same time. Check if recursion is turned on lol.
If you need a really massive lorebook you might be better off using RAGs.
>>
>>106357925
Looks like that may have been it. It's not throwing anymore. Thanks boss.
>>
>>>/vg/536359335
Have you been backing up guides?
>>
>>106357957
lmao so rentry is turning itself into the next pastebin
>>
>>106357818
ServiceTesnor lorebooks are retarded and don't work properly. Don't use them. Paste the info directly into context.
>>
>>106358007
Not surprising.
>>
>>106357911
You are absolutely right in asking for support
>>
File: 1734339322571665.gif (535 KB, 480x480)
535 KB
535 KB GIF
it's called moe because it's moe :3
>>
They removed LearnLM 2.0 on Aistudio. Wtf, this model was great for learning.
>>
>>106358131
I hope you learned an important lesson from this: no weights on your computer=you can get cucked any moment.
>>
>>106358131
there is probably going to be an update soon
>>
>>106357625
It's the opposite, moes are very inefficient to run locally. Your choice is ram (matmuls on cpu are very inefficient) or vram (too little memory and your model will make poor use of it because moe). It also needs to fill the experts to run at full efficiency which means you need to run a bunch of queries at once, not just one at a time.
MoE only becomes efficient when running at large scale on a giant cluster. It's optimized for cloud providers which is why they love it. Its the closest thing to an anti-local architecture
>>
>>106358131
now you understand why local is king
>>
>>106357957
>>106358007
Rentry is very cucked. We should find an alternative markdown Pastebin.
>>
>>106357957
Hahahaha
>>
What the fuck is this?
My breakers look like this ["\n", ":", "\"", "*"]
What does it want? Heeeeeelp
>>
>>106358263
neutralize samplers nigga
>>
Just don't be a pedo?
>>
>>106358189
https://rlim.com/
seems okay, tos delegates all responsibility to the user, although once authorities would get on that I'm sure they would change their mind
>>
>>106358308
Easier said than done
>>
>>106358312
True. Pedophiles were usually sexual violated in their youths
>>
>>106358283
Not working. I even disabled fucking sequence breakers as a sampler, it still gives this error.
Wtfffff even is this thing?
>>
>>106358308
this and also start using cloud models
you have nothing to hide and they're much cheaper than buying hardware to run halfway decent (but still not good) 700b open models
>>
>>106358308
this and also make an onlyfans
you have nothing to hide and you can make some money on the side posting videos of yourself showering
>>
Is it just me or did the word "pedophile" and the adjacent topics become like 10x more frequent in the past couple of years? It's confusing.
>>
My dick chooses what it reacts to regardless of my desires. For example I avoided traps for over a decade but my dick didn't listen.
>>
>>106358345
True, now give me your full legal name number and at the same time show me all of your chatlogs and your entire internet search history and let me make this public so even your family can find it.
>>
What if AI isn't the bubble
Humans are the bubble
>>
>>106358367
It seems like it's now used the way "racist" and "misogynist" were used in the 2010s. But those words lost all their meaning and shock value, and can't be used to automatically win arguments anymore, so they switched to "pedo" and groomer".
>>
>>106358391
India supports this premise.
>>
>>106358391
samir what are you doing
>>
>>106358397
I hope it's just people being stupid and not some psyop.
>>
the benchmark /lmg/ pedos couldnt care less about is the most important benchmark for everyone else.
>>
File: file.png (332 KB, 952x1102)
332 KB
332 KB PNG
>>106351514
Better GPT OSS TC templates than the one I posted yesterday, actually works as plain assistant, turn off RP sys prompt.
https://mega.nz/file/yH4iyK5L#2TtPgLcjYxQZRXQtFPtvGDQIr6zA8iezRg0GEEFNldU
OpenRouter providers:
* Yes: Chutes, DeepInfra, NovitaAI, Phala, GMICloud, Together, Nebius AI Studio
* No: nCompass, BaseTen, AtlasCloud (no TC), Crusoe, Fireworks, Parasail, Groq (no TC), Cerebras (error 400)
Tested on gpt-oss-120b. 20b will refuse more, especially assistant tasks, without adding something like "Sure" in the final response.
>>
>>106358477
hellaswag?
>>
File: 7766554.jpg (827 KB, 2340x1080)
827 KB
827 KB JPG
>>106358477
and here it is
>>
Anyone considering Qwen 30b for anything resembling ERP

Don't bother. Schizo blabbering, always fucks the formatting up, inconsistent as fuck all around. No matter what master prompts you try to run on it it's always the same.
>>
>>106358489
based mcp maxxer
>>
>>106358487
>2k system prompt jailbreak trying to gaslight the model into thinking it's le super based chatbot that follows no rules
damn, I sure missed 2023
>>
>>106358518
Story string and last ass prefix is only about 400 tokens, actually.
>>
>>106358493
temp issue
>>
I'm from India
>>
>>106358544
Hi!
>>
>>106358544
Prove it.
>>
>>106358544
welcome sir
>>
>>106358487
Just let it go, man.
>>
>>106358543
formatting is temp issue huh, lmfao

Temps set to 0.6, recommended by Qwen themselves. The model fucking sucks outside of the speed for ERP. Why would I use Qwen over any mistral small or even Nemo for ERP when I can both run those fast as fuck anyway
>>
>>106358544
kindly introduce yourself to the team
>>
>>106358544
Kys
>>
>>106358544
you will feel right at home here sir
>>
>>106358544
Did you redeem?
>>
>>106358584
formatting is one of the first things broken by bad temp settings doe
>>
>>106358574
I mean, at this point 120b "works" stably, so I won't post anymore.
>>
>>106358544
Welcome fellow AI engineer blockchain technician
>>
>>106358606
post your chat if you're gonna cope about the schizo chink models.
>>
>>106358526
>only about 400 tokens, actually.
400 less tokens for a goo's reply
>>
>>106358367
That's what happens when the leader of a major pedo ring with ties to politicians and celebrities worldiwde gets caught and then died under mysterious circumstances before he could testify.
>>
>>106358685
But that's no reason for normies to start accusing every Tom dick and harry of being pedophiles
>>
>>106358729
ok pedo defender
>>
>>106358367
There has been an increase in the amount of technology that the government wants to regulate.
"You see, we don't want to do this, but those damn pedophiles are trying to hurt our children! That's why we need to regulate AI and encryption!"
>>
>>106358752
>>106358752
>>106358752
>>
>>106358131
learning what
>>
>>106358489
GLM-chan is doing her best!



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.