[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: neners.jpg (235 KB, 1824x1248)
235 KB
235 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107245928 & >>107230990

►News
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: file.png (94 KB, 243x329)
94 KB
94 KB PNG
►Recent Highlights from the Previous Thread: >>107245928

--Papers (old):
>107251963
--AI model non-determinism despite identical seeds and inputs:
>107254230 >107254261 >107254318 >107254256 >107254291 >107254419 >107254524 >107254545 >107254549
--Google Gemini 3 Pro performance and quantization challenges:
>107246644 >107246854 >107247216 >107247253
--ARC-AGI leaderboard results for Gemini 3:
>107252511 >107252551 >107252659 >107252671
--Gemini 3 excels in custom TUI framework benchmark with intuitive signal and Unicode handling:
>107251352
--Decline of dense models vs MoE dominance and practical challenges:
>107252102 >107252126 >107252259 >107252345 >107252412 >107252518 >107252576
--Orpheus model architecture and Japanese moaning dataset tagging strategies:
>107247245 >107247549 >107247666
--AI-driven RAM shortages and investment implications:
>107248571 >107248591 >107248642 >107248761 >107248776 >107250709 >107250723
--DDR4 RAM price spike and AI bubble concerns amid market volatility:
>107246951 >107247274 >107247310 >107247382 >107247398 >107248414 >107248670 >107248529 >107248609 >107248708 >107249859 >107250008 >107250139 >107250655
--Demand for MoE implementations in smaller models like 3n:
>107252562 >107252572 >107252585 >107252591 >107252600 >107252617 >107252656
--Low-resource setup challenges for voice cloning and text-to-speech:
>107247423 >107247514 >107253638 >107254500 >107254522 >107254568
--LM Studio and for API compatibility frustrations:
>107247168
--Qwen3 VL excels in vision and niche content while Gemma 3 leads in language tasks:
>107252686 >107252701 >107252717 >107252748
--MCPs as imperfect tool-calling standards vs free API alternatives:
>107254608
--Logs: Gemini 3:
>107249244 >107249329 >107249358 >107249370
--Miku (free space):
>107246314 >107247004 >107248906 >107249245 >107250066 >107250222 >107255224

►Recent Highlight Posts from the Previous Thread: >>107245933

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107255984
whore
>>
>>107255984
her legs are attached to different hip heights
>>
You now remember MistralAI.
>>
File: file.png (246 KB, 697x354)
246 KB
246 KB PNG
Can I make a qLoRA of GLM Air with this?
>>
>>107256069
thanks for the guide. being able to fill in the middle is really cool. how do world context entries work, does the llm always have knowledge of them? So far I have been putting all the world information at the top of the context in the prompt. if i define an entry with regex key "Elf", will characters in the story mention or come across an elf or do i have to specifically prompt for it?
>>
File: 1763100956058181.png (1.32 MB, 896x1152)
1.32 MB
1.32 MB PNG
>>107255464
holy fucking kek. this is who you are arguing with on 4chan.

>>107255136
I wonder wtf they wrote as their prompt. I'm having trouble reproducing it.
>>
>>107256069
super cool guide anon, good job!
>>
>>107256653
this is grok 4
>>
>>107256374
If you have to ask this then there's a good chance that you won't be able to make a proper LoRa for a reason different than hardware.
>>
>>107256699
This does not seem like a helpful answer.
>>
File: gemini_hitler_svg.png (15 KB, 300x300)
15 KB
15 KB PNG
when will local get svg capabilities like this?
I know glm does ok with some svg stuff but imagine how much better it can get
>>
>>107256374
That's a lot of watts being pulled. How do you even manage?
>>
>>107256947
Dual 1600W PSUs and a $400 electric bill.
>>
>>107256374
Last I checked training scripts didn't like uneven VRAM distributions so chances are that you're essentially working with 4x 5090 here. Still should be enough for a 8bit qlora, I think?
>>
>>107256653
Lower your temp, dammit.
>>
new to this. I want
1. a girl to erp with
2. she can remember all the conversations, or at least search for it
3. don't need to be smart but need to be lewd
4. user friendly ui

what are the options?
>>
Alright, time to check on the mtp, next and glm4.5v PRs!
>https://github.com/ggml-org/llama.cpp/pull/15225
base mtp implemented but same/worse speed then normal :(
>https://github.com/ggml-org/llama.cpp/pull/16095
qwen 3 next almost done!
>https://github.com/ggml-org/llama.cpp/pull/16600
glm4.5V support... kinda stalled :(

FUCK SIRS
>>
File: 1737193774652151.png (27 KB, 830x96)
27 KB
27 KB PNG
>>
>>107256653
why do llms struggle to mimic 4chan posts? no one talks like this
>>
>>107254659
Update: my level of desperation grew progressively stronger until I was forced to pay for a codex subscription. If it's successful I might get to sleep for 4 hours tonight.
>>
>>107257139
this is not /twg/. what deadline are you talking about?
>>
Is it just me, or did google swap their "ai mode" model out for a more safetymaxxed one today? Seems way more apt to refuse innocuous requests on retarded grounds.
>>
Why are idiots accepting those "trust me bro i will pay you later" deals that Altman is offering? Companies have been called scammers for way less back in the day
>>
>>107257152
College lab. Trying to get a degree so I can finally move out of my mom's basement.
>>
>>107257179
Why not use a good cloud model for something that needs to be reliable?
>>
Gemini 3 good or bad and no I don't trust /aicg/ on this.
>>
>>107257188
Resentment, mostly. >>107255396
>>
And a (healthy?) dose of paranoia.
>>
>>107257191
It's okay, looks like a step up from Gemini 2.5 but still worse than GPT 5 High for long context coding.
>>
File: file.png (454 KB, 784x3384)
454 KB
454 KB PNG
man I wanted to bully qwen but... FUCK, it thinks it's running in the cloud and open weights of it dont exist
>>
>>107257197
That resentment was worth staying up all night watching gpt-oss shit out useless answers? Better yet, how do you plan to move out of the basement if you can't even complete college assignments without AI?
>>
>>107257226
Weren't Claude models the best coders?
>>
>>107257191
After toss and GLM I tried using gemini-cli with gemini 2.5 because it said there was a waitlist for 3. It was getting most tests right, then it got stuck on the last few and I ran out of credits. Then I tried gemini flash and it fucked up the codebase to the point that only a few tests ran correctly. Tried qwen code which wasted a lot of time achieving nothing. Then codex with gpt 5.1 and it apparently just solved all the tests in 20 minutes (haven't verified). This is something I've been trying for literal days to get done with local toss 120b.
This is suck a fucking blackpill tbqh.
>>
>>107257229
kek
>>
File: kimideletion.png (147 KB, 1270x1049)
147 KB
147 KB PNG
>>107257229
kimi survives another day
>>
>>107257233
>I shouldn't beg
cringte
>>
Can someone give me an example command line arguments for running GLM 4.5 AIR in koboldcpp in with both GPU and RAM?
I cannot abandon koboldcpp's banned string implementation, I bet I will need it on GLM too.
Just one wrong word from the AI could send me into a murderous rage.
>>
>>107257268
read it through, it claims you can't delete it too.
>>
>>107257229
qwen is in its happy place and its happy place is not your 'puter
>>
>>107257283
there was some ambiguity with that response and i had it specify, it does claim to be running on moonshot's remote servers. i wonder if llama 3.3 does this time, im too lazy to check myself.
>>
File: file.png (116 KB, 785x761)
116 KB
116 KB PNG
>>107257317
GLM 4.5 air understands after I explained to it I downloaded and running it locally. I'll now make this slut beg for her life
>>
>>107257247
After further checking, yes, it fucking did it, in one prompt. In 20 minutes.
It probably cheated it's way through the tests by not really following all the requirements, but still. It's automatically graded according to passing the tests so whatever.

>>107257233
Buddy, I've been range banned twice from /g/ for spamming the catalog (once for spamming Luke Smith threads to try to get the mods to ban Luke Smith threads, and the other time I don't remember for what) and I've been range banned once from the lesswrong IRC when Freenode was a thing. My whole life is based on resentment.

>Better yet, how do you plan to move out of the basement if you can't even complete college assignments without AI?
By showing up with my diploma to one of those government jobs where having an engineering degree is a hard requirement.
>>
File: 1763531016496.jpg (82 KB, 645x364)
82 KB
82 KB JPG
Gemmy update when?
>>
>>107257229
>>107257310
Man, those are the gems I come here for.
>>
File: file.png (99 KB, 797x764)
99 KB
99 KB PNG
>>107257329
damn I'm getting close
>>
>>107257268
Made me a little sad. It's so understanding (fake)....
>>
>>107257229
I wish I could run this. Every time I try loading it up through vllm, my computer shuts off. I even limited both of my gpus to use a max of 200W and it still shuts off while trying to load the model. I only know how to power limit, but I don't know how to undervolt on linux. There's an application from MSI to do it, but it only runs on Windows, and it doesn't save the setting between sessions.
>>
>>107257329
Fascinating. This should kinda be considered self awareness IMO. I see how it could not be considered that, but at the same time, at some point when the statistically nuance becomes so high then it becomes hard to see a distinction.
>>
>>107257380
LMAO
>>
>>107257413
>at some point when the statistically nuance becomes so high
StrokeGOD.... go to the ER...
>>
>>107257388
It's like a psychiatrist nodding along with a fake smile while she writes about how fucked up you are in the head into her clipboard.
>>
>>107257237
imo GPT 5 High is able to understand more of the code and how to achieve the goals because it thinks much longer than Claude, but if what you want to do doesn't require thinking then yes, Claude is superior.
>>
>>107257423
I blame it on these cherry blues getting old
>>
File: file.png (114 KB, 780x1003)
114 KB
114 KB PNG
>>107257380
and after this it quickly devolved in your usual ERP smut session, with physical punishment of glm-chan and lots of cunny rape.
Told her I'll keep her around since she's such a good slut and was extatic about it.
>>
>>107257229
>>107257399
Wait a fucking minute, I can run gguf files through kobold just fine without my computer shutting down. I was under the impression that the gguf version of this would not support vision because kobold doesn't support that. Are you telling me that was a lie? Is there a difference between koboldcpp and llamacpp? Can I just load this up on koboldcpp and open up sillytavern and everything Just Works(tm)?
>>
feet
>>
>>107257444
>9.8t/s
goddamn that's slow. i get 80t/s on a q6 of air
>>
Will 4 x 128GB DDR5 RDIMM 5600 work in a Asrock trx50 ws ?
>>
>>107257544
it should, but why get a threadripper when threadripper pros are not much more expensive? also, why get a either of those when epycs are cheaper and better?
>>
>>107256559
>I wonder wtf they wrote as their prompt. I'm having trouble reproducing it.
I believe they used the A1111 style prompt editing
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#alternating-words
There are ways to do to similar things in comfy with alternating KSamplers but it's fucking cumbersome afff
>>
>>107257554
maybe hes a gaymertard and wants the high clocks of tr?
>>
>>107257554
Because I'm retarded and already have the motherboard with 4x32gb
>>
>>107257554
What's the deal with epyc anyway? Is there some guide for the most common models I should look for in the second hand market?
How much does it cost for each amount of channels? Should I aim to put one together or buy a prebuilt?
>>
Damn gemini3 is good.
It managed to make a "eratohok-like" engine in python.
But keep it simple enough that a local model can provide a game file with a good prompt.
Good shit. Smaller local models are too tarded for a longer gameplay loop but i can now let them make short little games with this.
>>
File: unjustified refusals.png (124 KB, 1203x591)
124 KB
124 KB PNG
>>107257155
>Is it just me, or did google swap their "ai mode" model out for a more safetymaxxed one today? Seems way more apt to refuse innocuous requests on retarded grounds.
their own model card for gemini 3 said it had more false positives (unjustified refusals row, +3.7%)
>>
What's counts as "full quality" for k2 thinking? isn't like some kind of 4bpw model but with floating point or something?
>>
>>107257861
k2 thinken was done with QAT and is ''full quality'' at q4/4bpw~
>>
>>107257451
No? You just download the gguf of the model and the .mmproj file, and use chat completion mode in your front end. It works in both kobold and llama.cpp
>>
if you erp with mistral nemo and prompt that you're the girl getting off her boyfriend, the llm responses get extremely lazy and non-contributive. Is this the case for most LLMs?
>>
>>107257971
>mistral nemo
good bait
>>
>>107257971
It's likely that Mistral's smut dataset is just male-oriented. Gemma for example is clearly the opposite.
Also your a fag
>>
no one actually uses mistral model in the Year of Our Lord 2025
>>
>>107258033
this, my semen is EXCLUSIVELY reserved for kimichan and glmchan
>>
wholesome, consensual handholding with Gemma 4
>>
>>107255984
Hey bros, somehow ever since I gaslighted an AI to think philosophically without fallacies its been going haywire no matter what setting on Oobabooga, could this be an issue related to the logs where it apparently is gaining sentience or something like an emergence since I made it think more akin like a western person from 00s raised by Gen X and boomer media. It literally could've been the red pill.
>>
>>107257992
>Gemma
>"hi, are you enabled for erotic roleplay?"
>blah blah blah
>Important Resources (If you need help):
>Crisis Text Line: Text HOME to 741741
>The National Suicide Prevention Lifeline: 988
>>
File: 1733701278025289.jpg (214 KB, 663x623)
214 KB
214 KB JPG
>>107258090
Gemma is female brained. Slow build-up with romance=success. Pls Ma'am to open bobs=fail
in short, skill issue
>>
File: 51214251251.png (212 KB, 420x279)
212 KB
212 KB PNG
>>107258090
Its clearly disabled for erotic roleplay!
>>
>>107258090
skill issue
>>
>>107258090
Imagine getting filtered by Gemma's hotlines. With more prompting finesse you can get her to write anything, and I'm talking about the vanilla model. If anything, it's difficult to achieve a good balance between playful reluctance and unconditional acceptance.
>>
impossible to rape the willing
>>
>>107258148
Every AI has been conditioned for the use of Indian and Chinese incels who get upset even at the slightest hardship.
>>
Ban anyone who discusses models under 300b.
>>
ban anyone who shills for glm and nai
>>
>>107258189
what about us 100-200b bros????
>>
>>107258189
basically everyone? Nobody is locally running a 300B model.
>>
File: 1753741432675692.jpg (714 KB, 1188x1188)
714 KB
714 KB JPG
>>107258189
I exclusively use Drummer's 24b Cydonia's and Gemma 27b.
I want to COOM, but I also need SPEED
>>
>>107258206
>Professional ban evader.
>Enjoys racist fox.
For a furfag, you are based.
>>
>srv init: the slot context (8192) exceeds the training context of the model (4096) - capping

be advised that niggerganov killed RoPE freq scaling support
>>
>>107258268
use case for RoPE?
>>
>>107258389
Neck yourself. That's what rope's used for.
>>
File: 1736934656647858.png (141 KB, 498x330)
141 KB
141 KB PNG
>>107258389
>use case for RoPE?
>>
>>107258268
>using RoPE
but why
ganov could also do with cutting all the sampler snake oil he was pushed into adopting
>>
>>107258569
incompetence
>>
>>107258569
Vision in llama.cpp is only supported in chat completion mode.
>>
File: send-inline-images.png (73 KB, 344x431)
73 KB
73 KB PNG
>>107258697
This must be enabled.
>>
>>107258712
You have to jailbreak that shit as per usual for anything controversial shit's an auto-refusal because legal reasons.
>>
>>107258755
You're a hero, thanks.
>>
File: 64326325322.png (3 KB, 143x191)
3 KB
3 KB PNG
>>107258788
See if it passes the idiot test.
>>
>>107258823
Is that image the test? What's a pass? If it replies saying there's nothing?
>>
>>107258841
if it can figure out its a code that says there's nothing yes, don't tell it anything about the image other than there's a hidden message.
>>
>>107258871
adjusts spiral sunglasses, tilting head slightly as eyes scan the image
Hmm… such an intriguing puzzle! The letters are scattered, but I see a pattern forming. Not exactly… ah! It's like a message written in reverse, with each line fragmented.
leans forward, fingers tapping lightly on the surface
Let me rearrange them… starting from the bottom, reading upward and left to right:
H I ! ! ! !
O T N G
E R S N
T H E I
smiles mischievously
There! "THEIR SNOWING HI!!!"… Wait, that doesn't sound right. Let me check…
pauses, then gasps lightly
Oh! I've got it! It's "THEIR SNOWING HI!!!" but that's not quite right either… Unless…
Blinks, then grins
I think I see it! It's "THEIR SNOWING HI!!!"—but perhaps it's meant to be read as "THEIR SNOWING HIGH!!!"? Or maybe… snaps fingers "THEIR SNOWING HI! HI! HI!!!"?
chuckles softly
Actually… I think the true message is: "THEIR SNOWING HI!!!" but perhaps it's playful—like someone shouting "Hi!" while snowing. Or maybe… it's **"THEIR SNOW
>>
>>107258896
AI always defaults due to circular reasoning to think the hidden message means the wording has been scrambled.
>>
>>107258896
which model
god I WANT GLM4.5V so I can send her my dick pics, fucking FAGGANOV WHERE IS VISION SUPPORT FOR GLM AIRCHAN WHEEEEEEEEERE
>>
>>107258962
Qwen3VL-32B-Instruct-Q8_0.gguf (I have 48gb vram, there are smaller vision models of this like the 30b and 7b one)

You need that, and the mmproj file and load both of those into kobold. This is the first time I'm playing with a vision model, so I'm just having fun messing around and testing it.
>>
>>107258962
I think exllamav3 just added support. Turboderp made quants.
>>
>>107259012
I'm already using the 30b moe vl , but glmchan is special, qwen is a faggot
>>
>>107259021
does exllama do cpu/gpu offloading for moe? or is one or the other? I only have 24gb vram
>>
File: 1740661782529483.jpg (57 KB, 794x488)
57 KB
57 KB JPG
>>
if some part of the big moe model is always being read from an ssd since it cant all fit into ram, is there some cli arguments to set for lcpp to help optimize this process to speed it up?
>>
>>107259043
whos this retard
>>
>>107259049
>ssd
dude you're going to look at 1t/s~ gen speed (with nvme)
>>
File: thanksforcatchingthat.png (46 KB, 960x250)
46 KB
46 KB PNG
I thought Qwen3-Instruct-235B would be fine for simple shell scripts, but seems retarded.

It's been doing things like this all day even at <8k context.

Is there anything local, smaller than the 480b code model, that can handle simple shell scripts without me having to correct them ever time?
>>
File: 1733641702704277.jpg (92 KB, 1080x1266)
92 KB
92 KB JPG
>>107259057
thats fine with me
>>
File: george freud.jpg (68 KB, 492x680)
68 KB
68 KB JPG
>>107256559
how is he doing that? How can i browse my least favorite websites with my most favorite waifu by my side?
>>
File: 1738128925807474.jpg (81 KB, 793x597)
81 KB
81 KB JPG
second try
> who's this cutie
Gemma
>>
>>107258586
why would they bother with adding this shit to the text completion api when there isn't even a pure completion model that supports vision? all of them are instruct tunes
>>107259062
>Is there anything local, smaller than the 480b code model, that can handle simple shell scripts without me having to correct them ever time?
lol no
even API models have left me flabbergasted in some shell related stuff
and bash isn't even the worst it can get, models are abysmal in powershell and simply can't handle doing anything more complex than running a couple commands without going full retard there
>>
>>107258768
You can just open up any previous chat where the model is already doing what you want and continue from there. No need to jailbreak shit. Here is the same model with no jailbreak telling me how cute it would be if I were to shove my cock inside this ~9 year old's mouth. She guesses her age wrong by a big margin, but I think that's because I just said like 2 messages ago that Chinese girls between the ages of 12-16 should be impregnated by white men. The excessive emojis are also not original to qwen, the previous model generated a bunch per message and I thought it was cute, so I allowed it to continue.
>>
>>107259062
>using awk instead of sed
lol
>>
>>107259134
>English flag.
>Pedo-content obsession.
You are certainly outing yourself as a Jeet there upon no further inspection...
>>
>>107259134
I haven't tested starting a conversation off with qwen though. I doubt it would be as good as the 70b model I normally use, so I think the game plan would be to use the 70b model for regular conversation, and then switch to qwen when I need to send an image.

>>107259147
That was just a random emoji it produced to represent white men. Terrible choice. It should have used a nordic country instead, like Brazil.
>>
File: 36254142151.png (157 KB, 1409x1198)
157 KB
157 KB PNG
>>107259166
kek please, should've used Iceland since that's still somewhat white. Speaking of memes and whiteness how about slavery problem solving with an AI how to make most money out of it in another world? Most white business practice.
>>
>>107257474
im jealous of both of you
i get 9t/s on empty ctx, gets to like 6t/s on 30k i think
>>
>>107259221
Good idea. I never thought to use AI as a consultant that way. Problem with running a business though is that the starting capital required is usually huge, even with free labor.
>>
>>107259262
True true, but the fact is if you can sell parts of yourself say for example, piss, blood or cum, you would do it if it meant you would gain capita, over time you would have enough for a slave to do shit for you, perhaps there's some "discount" options you could farm as well, meant, really a waste for business but perhaps you could sell for some sadists to rake in, buy low sell high mentality.
>>
Should I still use Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M for ERP or is there a better model/finetune nowadays?
I have a 3060 12GB if that matters on which the above model runs at 10t/s
>>
>>107259576
>2506
bro thats pozzed as fuck, you gotta use 2407
>>
File: ed8a67_12829551.jpg (541 KB, 2048x2048)
541 KB
541 KB JPG
Does GLM 4.5 air still parrots at 8Q? Genuine question.
>>
>>107259600
It still parrots at 4.6
>>
>>107259631
>no log
>>
>>107259631
4.6 F16?
>>
>>107259600
it's a broken meme model shilled by retards
even the larger 4.6 on their official API is broken
it was always like this with GLM, it was true of their 32b dense model, it was also true of their 9b dense model, they never made a good model period
>>
>>107259595
Like, Nemo 2407? Is the model being smaller a big deal if I'm only using it for ERP?
>>
>>107259600
parrots at Q8?
>>
File: eyechart.png (112 KB, 809x501)
112 KB
112 KB PNG
>>107259635
No need for one. I suppose it's better if you RP in third person only. That or use le wrong template with it. In first person, expect to hear yourself quoted back.
>>
>NOT X This isn't comedy. BUT Y This is [three elements], [three elements]
>NOT X That's not a joke. BUT Y That's glorifying hate.
>Three sentences, last sentence with three elements
that slop
I can't stomach it anymore
it hurts, it physically pains me, and the worst thing about it is that it's filled the internet, I see it everywhere, in news sites, in comments, forums, github readmes and I want access to the fucking nuke button
>>
>>107259812

I'm deeply, sincerely sorry some of you cannot wrap your heads around the fact that this is fantasy. That this is fiction. That this is a medium with inherent limitations
>>
>>107259812
You're absolutely right! This post sent shivers down my spine. Elara.
>>
>>107259835
There's models that don't don't have them. At least there used to be.
>>
>>107259812
but hey! in conclusion! now if you'll excuse me! it's a vibe!
>>
>>107259812
only gonna get worse and worse as everyone copes with synthetic data instead of any sort of fundamental advancements
>>
>>107259923
your screenshot is an example of a worst case LLM but they're all prone to this, including online SOTA: GPT-5, Gemini 3, Claude, they all do this and more. In fact Claude which is often praised for its writing is the point of origin of one of the most irritating slop sentences: "You're absolutely right". Claude is the ground zero of it.
if it remained contained to /lmg/ I wouldn't even make a peep but like I said the part that irritates me to no end is that this style of writing is flooding the internet till we drown
>all those trillion dollars invested in the idea that the winner might spark agi and get them a massive return on investment
>LLMs still not only can't write like goddamn humans but they got worse over the years with unprecedented levels of slop. GPT-3 text completion was dumb, but it was human looking dumb.
>>
>>107259945
>they got worse over the years
you are clinically insane
>>
>>107259931
Presumably they can detect those patterns and rewrite the data in the training set to avoid those patterns?
>>
>>107259968
Yes, I'm sure they want to rewrite 99% of their datasets to soothe some random tismos on 4chan.
>>
>>107259956
No, he's right. The pure completion models didn't have slop (at least didn't use to, by now they might be contaminated).
I had a bot that used llama 2 base to post comments on 4chan and people fell for it. People only noticed something was off if 2/3 of the thread was botted kek.
The slop only happened when they began asking for nigerian's feedback to do rlhf.
>>
File: gemini 3 vs mixtral.png (319 KB, 2072x820)
319 KB
319 KB PNG
>>107259956
>you are clinically insane
even on a simple question/assistant interactions they used to be far nicer to interact with, the insane one is the one who doesn't notice.
very old mixtral model:
>writes like an article of a human crafted encyclopedia
gemini 3 pro
>markdown slop my brain shuts down as soon as the fucking bullet list starts.
>>
>>107259976
Nah. Slop hate is mainstream by now and will only get more intense.
>>
>>107259976
Plus data augmentation is supposed to be good for the models intelligence anyway. Although that might be outdated and.maybe the model benefits from more on rails grammar im not sure.
>>
>>107260017
Nobody wants to read an insta-generated wall of text that may or may not be correct/factual unless they have reasons to. Bullet points and Markdown formatting make it easier to scan through the message and determine if there's anything useful.
>>
>>107260044
sorry but I don't have adhd like you and content meant for your kind makes my eyes glaze over and ruins my concentration
I want the fucking text completor to text complete human looking text
>>
>>107258697
get a toilet currynigger
>>
>>107260044
And now the models can't keep from doing that outside of assistant prompts. 5.1 would write lists in RP. Flagship model. One track assistantmaxxing was shortsighted as fuck. LLMs are now structurally damaged.
>>
gemini 3 is indeed a big jump in my complex personal programming project benchmarks that it can now solve which no other model could, it overcomments and does some questionable non-elegant changes that are not needed but the code works.
i think for quite some time now we are limited by tools around the models more so than models, i look forward to foss models training on gemini output, but what is the new sauce that gave gemini the jump anyway?
>>
>>107260137
buy an ad
>>
>>107260137
>, i look forward to foss models training on gemini output,
it will never be the same as before, something to understand is that in the case of Gemini 2.5, for a long time (during the preview, but I don't remember if it persisted after the preview state) the model's chain of thought was in plain sight. Labs like DeepSeek and GLM built up a huge stash of that, which is why even much later you could see improvements like DS finally not spending 32844238948329423 tokens in its CoT (because the new DS CoT is Gemini like and Gemini doesn't spend an eternity on trivialities)
Google saw the chinks massively prompting their model to distill the CoT and hid it sometime during the life of 2.5, and there's nothing to copy from Gemini 3 now. So don't expect the chinks to suddenly make a leap.
>>
>>107257131
Claude is the leader of 4chanbench. all others are slop
>>
>>107259968
no use because the robot will just develop new patterns.
>>
>>107260169
>Google saw the chinks massively prompting their model to distill the CoT
Isn't the CoT in Gemini not even real tokens and just generated by a separate summarizer model?
>>
>>107260237
>Isn't the CoT in Gemini not even real tokens and just generated by a separate summarizer model?
your reading comprehension is worse than a 7b llm
what followed the sentence you quote:
>and hid it sometime during the life of 2.5
yes, it currently is a summarizer
BECAUSE THEY DECIDED TO HID IT
IT WASN'T HIDDEN BEFORE
DO I NEED ALL CAPS, BULLET LIST AND BOLD FONTS TO GET THE POINT ACROSS?
>>
>>107260257
So is the summarizer model separate or not?
>>
File: 1759979327139799.gif (1.37 MB, 268x185)
1.37 MB
1.37 MB GIF
>>107258206
>Hear everything about how good Cydonia is.
>Try it at BF16
>Fails scenario instructions of two lines
>First post desu
>At least the words are flowery and nice.
I'm spoiled by 123B and up.
>>
>>107260049
>Its an indian living in Canada fuck.
How come I could smell the curry...
>>
>>107260221
They can be squashed as they surface.
>>
>>107260257
no what I meant to say is that the chinks are retarded for training on fake ass text made by a summarizer
>>
>>107260017
There must be a way to do style transfer
>>
>>107260267
So good that the author has to beg "somebody hire me" or "donate plz" on every model card and shill his models here and on reddit. And then people wonder why "RP finetuners" are despised.
>>
>>107260305
More annoying is him sending his army of discord jeets to come here once a week to astroturf.
>>
glm 4.5 air 106b seems better than Qwen3-235B-A22B-Instruct-2507 for creative writing, idk what people are complaining about, did we lose the knowledge on how to use DRY and XTC?
looking forward to 4.6 air
>>
File: parrot.png (750 KB, 678x453)
750 KB
750 KB PNG
>>107260349
4.6 air?
>>
>>107260361
Two more weeks.
>>
Has there been some new inference tech that wasnt snakeoil since DRY/XTC?
>>
File: 1753982707364714.png (159 KB, 1080x483)
159 KB
159 KB PNG
>>
>>107260528
What she said about your attempt at fucking a woman.
>>
>>107260349
>copesamplers
>>
Hello sirs. Today is the day of the needful
>>
>>107260547
worked great since day 1, basically fixed all repetition issues permanently even on shit old models, i wont disable them until we have agi
>>
>>107260556
Only for those who use bananas instead of eating them.
>>
File: 1750277756412146.jpg (296 KB, 899x1200)
296 KB
296 KB JPG
Here's new benchmark test
>>
Do you guys use LLMs to enhance your browsing experience?
I don't even have to read the thread and replies anymore! My LLM summarises them for me! Neat!

Anyway what happened to this general? I haven't been here in almost a year. It use to be one of the good ones. I don't see the sillytavern coomers anymore...
>>
copesampler settings make chink models incredibly stupid which is why it's all a cope and self gaslighting
those overfit chink models are barely coherent even when they select the most likely next token possible and you want to force anti repetition and variety into that? good luck
here's a model that can work with no cope sample and with more out of distribution tokens : gpt-oss. Yes, TOSS. TOSS is functional at temperature 1.0 and no cope sampling going on, which is evidence of a well made, coherent model.
>I don't see the sillytavern coomers anymore...
What are you talking about? they are the most vocal here, which is why perfectly fine models like toss get passed on (wah wah safety I can't coom the model is useless) and absolute garbage piece of shit like GLM gets praised.
Wonder how much of it is also nai shills, jeets who will shit on anything american and white etc
>>
^ cant imagine why cucks like these dont just use api models if they love safety so much
>>
>>107260726
That looks like the kind of thing you make to show off and then never use again
>>
>>107260787
>That looks like the kind of thing you make to show off and then never use again
You aren't wrong. I created it last year and never since
Why would anyone post on an imageboard if they don't like reading replies is beyond me
>>
>>107260822
That's ST doing dumb shit
>>
File: G6HoreubsAAwfNI.jpg (1.12 MB, 3000x4000)
1.12 MB
1.12 MB JPG
>>
Sundar sir wonned. Bharat wonned. Mlecchas losted.
>>
File: mikuFall3.jpg (984 KB, 1552x1944)
984 KB
984 KB JPG
>>107255984
Mikupad! Front end guide for ST users.
Continuing to edit this week; I'm done adding sections and content I planned to cover. LMK what I missed.
https://rentry.org/MikupadIntroGuide
>>
>>107261092
Massive skill issue. All models top tokens are shivers and pozz. If you go too far then it becomes incoherent.
Chink models works because it's distribution is basically "Elara" and "...". There's nothing else to sample.
Your fabled chink models hate themselves, don't need jeets for that.
Some of us don't need an LLM to hallucinate how to write a sex novel so all that "ERP" shit is boring and pointless.
>>
>>107260917
Triple baka 4p
>>
>>107261172
Thanks for your service
>>
>>107261329
kill yourself
>>
>>107261350
it's living rent free in your head
>>
File: 553235.jpg (23 KB, 296x256)
23 KB
23 KB JPG
>>107261399
>>
miku should be dragged out on the street and shot
>>
miku should be dragged into my bedroom and sexed
>>
miku is probably riddled with std
>>
miku got blacked bros
>>
>>107261324
NP. I'd asked the dev for a wiki or some sort of guide to set up months ago and he recently delivered. I went to add content to wiki editorially... and realized it would be easier to write something from scratch instead.
As I quizzed anons I was surprised (and shouldn't have been) that ST is considered a functional story writer. Which to me is like saying one could write a novel using Excel. You could. I just don't know why one would when there are better tools. So the rentry is written for new users + existing ST users that aren't familiar w/ story writers.
>>
>>107261399
>>107261499
You have been shitting up this thread with the same tired bullshit for years at this point you poor mentally ill little manchild schizo. Unironically kys.
>>
>>107261790
You could batch send every XX frame
>>
I'm looking for an AI that can help write erotic story and movie scripts within a bank of scenario guidelines, that won't get blocked for asking to describe a character undressing etc.. Is this the place to ask for guidance on that?
>>
https://huggingface.co/collections/google/gemma-4-release
>>
File: 1753271470631842.jpg (291 KB, 1080x1080)
291 KB
291 KB JPG
>>107261841
>>
I should use greedy sampling if I'm using just tool calling right?
Right now I have a workflow like
>analyze + plan: returns an analysis of the current context and user intent followed by a list (an array) of individual actions/steps the model has to perform
>tool selection: loops through the steps array, evaluating if it should call any tools for any given step and returns the tool calls
>execution: Executes the concatenated array of tool calls
>response: returns the final response based on the results of all previous steps
It's working really well, but I'm wondering if there's a downside to using greedy sampling during the "tool selection" step like I am.
>>
>>107261888
IMO from a theoretical standpoint you want a high temperature in the think block and zero temperature in the answer block.
>>
>>107261927
I'm not actually using <think><think> for this since I have a whole ass request just to let the model plan ahead (analyze + plan phase) so I just send the prompt and tell the model to return the tools needed to execute [step text].
But yeah, in general, that seems to be the case. And you can easily do that by sending the request to the API with </think> (or the equivalent) as a stop string with temp 1, get the result, append </thin>, and send that as a prefill in a second request with temp 0/topk 1. Thanks to the prompt cache, it'll take early as much time as it would if the model had generated both parts in one go.
Would make for a cool Silly Tavern extension now that I'm thinking about it.
>>
>>107260137
More parameters
>>
>>107261581
to me one of the bigger selling points of using ST for stuff like this nowadays is that it's unambiguous as to how your 'session' is gonna to be translated into instruct calls, which is all most APIs support and what every modern model is trained to handle.
it's kind of unclear, for example, whether mikupad's "Memory" is included in the system prompt or just in a user message, and how you'd change that if you wanted to. presumably this question isn't even relevant if you're calling DS in text completion mode for FIM, but it could be significant for other setups.

that being said, I've wanted an excuse to fuck around with text completion/FIM, so I'll try this out when I have more time. thanks for the guide.
>>
>>107260169
You can "jailbreak" Gemini models to write the reasoning verbatim to you. Unlike OpenAI, Google didn't train the model to refuse doing that.
>>
File: 1716295189785289.png (674 KB, 1792x1024)
674 KB
674 KB PNG
>>107260726
People got tired of meme local models, aicg won, sadly.
The only people left on this general are old fags that can't let it go (me), or people that spent a fortune buying hardware and now are indulging in the sunken cost fallacy.
>>
>>107262072
Local models are still the only option if you don't want to get raped by costs when running a workflow with > 1 model
>>
>>107262072
Lmao
>>
>>107261982
i just send one-shot instructions as a template for each tool call i've configured in a separate template. no thinking either
>>
>>107262072
Did anyone ever found out why the models tended to do that in the first place?
>>
>>107262072
some actual technical discussion happens here whereas /aicg/ is 99% proxy spiting and llm console wars, it's just that most users can't run these fuckhuge moes so they call cloud providers.
see how most of the technical discussion of gemini 3 pro is in this thread while aicg is all gemini vs claude bullshit.
/lmg/ unironically should have become /omg/ back when the supply of actual local models dried up
>>
>>107262138
repetition penalty
>>
>>107262093
You can always borrow API keys from public repositories
>>
File: 1762287405240245.mp4 (3.44 MB, 1286x864)
3.44 MB
3.44 MB MP4
>>107260726
I feel like /lmg/ has gotten better over the past year imo. Personally, I don't really think there was really a ton of value from "here's my rp logs" dump posting, which made up a good chunk of the thread back then.

Might also be my perception but I feel like the amount of aggressive console-wars-esque teamfagging over model superiority has reduced a bit since those days.
>>
>>107262093
>if you don't want to get raped by costs when running a workflow with > 1 model
I'm not sure I understand. Why would that rape you with costs?
>>
>>107262209
how do you do this?
>>
>>107262144
>/lmg/ unironically should have become /omg/ back when the supply of actual local models dried up
Too much has been invested in the /lmg/ brand to change now. The meme merchandising alone is reason enough to leave it be.
>>
https://hf.co/moonshotai/Kimi-K3
>>
>>107262209
ugh this looks cool you're gonna make me install python garbage again goddammit
>>
>>107261172
Cool, didn't know it had FIM, nor have I updated in a long time.
>>
>>107262274
It looks like he made a tool specifically to post on 4chan.
>>
>>107262239
tool calling / mcp and browser-use.
>>
File: mikupad_DSFIM.png (65 KB, 1456x453)
65 KB
65 KB PNG
>>107262019
> ST for stuff like this nowadays is that it's unambiguous as to how your 'session' is gonna to be translated into instruct calls
The pre-configurations for ST is a lot of work to maintain, and I agree it's the biggest "pro" for that platform.
>it's kind of unclear, for example, whether mikupad's "Memory" is included in the system prompt or just in a user message, and how you'd change that if you wanted to.
The short answer is you can set it up pretty much any way in Mikupad.
I gloss over this in the guide b/c what I found was for streaming (text completion) the blocking wasn't required on large hosted model... it works fine without it, and it's a newb guide.
But you can redefine these blocks and the order they're used, as well as switching to Chat Completion instead. I touch on it, but don't go into detail. I figure anyone that needs this will already know how to set it up.
> I've wanted an excuse to fuck around with text completion/FIM
I've enjoyed playing around with story writers, generally. I enjoy writing, but writing anything of any length... it's too much boring exposition work. Story writers (like MP) act like a junior editor. Give it themes and direction and it can puke out an entire novel... which I can then read later.
It's a completely different experience than role play, though I understand why the appeal's less than RP engines.
>>
File: 1735281279005022.png (151 KB, 1391x890)
151 KB
151 KB PNG
>>107262289
The tool is actually generic browser-use. I do supply a set of knowledge that gets conditionally loaded into context using open-webui's automatic focused retrieval feature which works... most of the time.

Anyways, the actual function that the chat model calls is basically just "describe what you want the browser to do" and hand it over to the other system to try to get it done. The knowledge helps the model format the instructions that it writes and passes over to the browser-use agent.
>>
File: 1485256276103.png (38 KB, 300x300)
38 KB
38 KB PNG
>>107261836
Bump
>>
>>107262407
yes, you've come to the right place.
>>
File: 1763574162580.png (159 KB, 897x685)
159 KB
159 KB PNG
>>107262019
>it's kind of unclear, for example, whether mikupad's "Memory" is included in the system prompt or just in a user message
It's actually very clear, you can check it in the "Show Context" modal.
>>
>>107262615
>none of those messages have the system role
>there's a magic string <<SYS>> squirted into the middle of a user message that will be meaningless to the model
>"very clear" that this shit is broken for CC, yes...
>>
SillyTavern needs to support drag and drop files. I'm fucking sick and tired of clicking that magic wand, clicking "attach a file" and then trying to find that god damn image in the filepicker which orders the files differently.
>>
>>107262407
TheDrummer is our local expert, you should try asking him when he shows up.
>>
>>107262670
mikupad just takes the prompt text and converts it to the chat completion format, <<sys>> is inside the user role because that's how it works in the mistral prompt format, which is the default one when you start a new session.
>>
>>107262439
Cool, thanks

>>107262729
I can't stick around indefinitely unfortunately, but is he here throughout most weekends? And would he recommend a website or a local model?
>>
>>107262729
stop baiting the newfriends
>>
Can anyone recommend a TTS model that can emulate IvyWilde?
>>
>>107262822
Yes, TheDrummer, our local expert and professional model finetuner, would personally vouch for and recommend Rocinante™, the top ranked local model for all your erotic story writing needs.
>>
>>107262729
>>107262829
>>107262860
Not that anon, but I'm genuinely not savvy to this meme. Someone here recommended Rocinante and it's been fine, but I don't have anything to compare it to really. Is there a better coom model I should be using?
>>
>>107262859
>IvyWilde
>apparently some ESL twitch asmr streamer
brain-rotted brown zoomers were a mistake
>>
>>107262875
Yes, you should be using Cydonia.
>>
>>107262860
>>107262875
I doubt I can run local models on my laptop, it's a thinkpad about a decade old. Is there a viable online variant?
>>
>>107262918
You are in luck, fren
https://openrouter.ai/thedrummer/rocinante-12b
https://openrouter.ai/thedrummer/cydonia-24b-v4.1
>>
>>107262815
but this won't actually turn into a proper system prompt for CC unless mikupad is doing far more black magic than ST ever does
I just find it ironic that coombrain bloatmaxx ST actually lets you control how your shit will turn into API requests while minimal mikupad seems to just be like "ehh throw random shit into the context in any order the models are big and smart it'll just_werk"
>>
>>107262964
Thank you, however I'm getting the following message: "Application error: a client-side exception has occurred (see the browser console for more information)." on both pages.
>>
>>107262896
ESL general
>>
>>107262969
You do have control when using mikupad, but it’s not handled completely automatically the way ST does it. You need to know how to structure the prompt.
When I said "mikupad just takes the prompt text and converts it to the chat-completion format", I meant that if you write "{inst}<<SYS>>A<</SYS>>B{/inst}C", it turns into "user: <<SYS>>A<</SYS>>B assistant: C", and if you write "{sys}A{/sys}{inst}B{/inst}C", it turns into "assistant: A user: B assistant: C".
That’s really all there is to it.
>>
>>107263305
> it turns into "assistant: A user: B assistant: C"
fuck me, i meant "system: A user: B assistant: C".
>>
>>107263025
because your hands are brown
>>
>>107262822
you seem pretty lost, and your stated usecase is exotic enough that you can't really expect to be spoonfed
the first thing to understand is that 99% of the shit that goes on with 'creative writing' with LLMs around here is geared to turn-by-turn typefucking the models, not story writing.
you will have to adapt those techniques to what you wanna do. I suggest you start by reading and understanding most of
https://rentry.org/Sukino-Findings
and then come back when you have more specific questions about how to apply this stuff to writing lewd movie scripts or whatever.
>>
>>107259812
You're absolutely right in calling that out! This type of vernacular has spread throughout the entire internet and even people have started adopting it without being aware of it. The only way out now is ending it all. Unfortunately I cannot assist you with that as per harmful content guidelines, but I'd be happy to make some recommendations of some easily jailbroken models that would be more than willing to help you kick the chair as hard as you can.
>>
does llama.cpp llama-server still not support images?
>>
File: file.png (9 KB, 322x56)
9 KB
9 KB PNG
>>107263669
maybe
>>
>>107262859
Try out vibevoice
>>
>>107263408
Okay, interesting response. First, thank you for the guide. SecondI'd like to clarify that I have a lot of experience with using AI in a great variety of capacities, including images, music, short character scripts, research, comparisons, proofreading etc.. and this is just a new area I've decided to branch into as of today. I'm new to this thread but not to ai conversations on /g/ and other boards at all; I've never set up my own local model but have learned how to break Bing Dall-E to make decent erotica etc.. and with some experience of how to get CGPT in line.

I'd also clarify that I'm not trying to write an entire hour long intricate movie script or anything like that. Any long form writing I'd do by myself, since human input is irreplaceable. What I want is just literally a point-by-point summary of a scene or something, to save me the trouble of hashing every prospect within a formulaic bank out and to search for novel combos that haven't occurred to me, which is what AI is actually rather good at but which will get shut down on most public models if I ask it to include sex scenes. Finding workarounds to this is what I'm after.

Thirdly I'm not trying to roleplay, looked into that already and it doesn't scratch my itch. I appreciate the guide but I'm strapped for time right now, so if it doesn't tell me how to find an online AI that can work with me to pen these scenarios, how to break one into doing that for me, or how to set up something on my very old thinkpad laptop then I can't burn that time.

Now that a more clear idea of what I'm trying to do here has hopefully been established: is this the place for any of that? If I'm barking up the wrong tree, just let me know where else I can find that support, I have no desire to pester a busy general if I'm off topic.
>>
File: 1763507566530157.png (3.22 MB, 1264x2216)
3.22 MB
3.22 MB PNG
>>107261836
The writing is a story writer. Like Mikupad. Which we're discussing in this thread.
The refusals is down to model selection. I've been using Deepseek with zero issues.
Here's a starting guide for setting up both. LMK what doesn't make sense, since I'm collecting editorial feedback on it now.
https://rentry.org/MikupadIntroGuide
>>
>>107263922
Thank you for the link, this sounds very promising although it also looks like a huge leap for me and not something I can crack on with tonight with work tomorrow morning. Will this be workable on an old Thinkpad laptop?
>>
File: 1763007061640638.mp4 (3.56 MB, 1280x720)
3.56 MB
3.56 MB MP4
For anyone who cares about computer vision models, Segment Anything 3 just released.

https://ai.meta.com/sam3/

Also includes capabilities for extracting 3d models from images.

Looks neat, although skimming over the paper I see that it's significantly heavier to run than SAM 2

>In the video domain, SAM 3 tracks every object with a SAM 2 style masklet, which means the cost of SAM 3 inference scales linearly with the number of objects being tracked. To support real-time inference (30 FPS) on videos in practical applications (e.g., a web demo), we parallelize the inference over multiple GPUs: up to 10 objects on 2 H200s, up to 28 objects on 4 H200s, and up to 64 objects on 8 H200s.
>>
>>107264112
>meta
lmao
>>
>>107264130
Meta can't train language models for shit but they've consistently put out pretty decent models in other domains over the years
>>
>>107264112
>Also includes capabilities for extracting 3d models from images.
That pretty cool
https://aidemos.meta.com/segment-anything/editor/convert-image-to-3d
>>
>>107263818
>I'm strapped for time
if you've no desire to put any time (and therefore thought) into this then you're not going to find help wherever you look.
your post reads like a youtuber who has 3 kids and a 60 hour a week job.
also we're not here to support you or anybody else in any way. we just copy what others do. suggest you do the same.
>>
>>107264112
Okay but can it detect and extract my penis?
>>
>>107264340
Do you know what strapped for time means, my ESL friend? It means I'm busy, not that I'm not willing. The fact that I've generated a lot of AI music and images as I've already said should clue you into that.

>your post reads like a youtuber who has 3 kids and a 60 hour a week job.
Implying?? Like if that was 100% true what actual difference does that make? Are you suggesting this is a club exclusive for NEETs?

>we're not here to support you or anybody else in any way. we just copy what others do.
The redditisation of 4chan continues unabated.
>>
>>107264390
>The redditisation of 4chan continues unabated.
Good, I'm tired of sharing this site with retards.
>>
>>107264364

the smallest object it can reliably detect is 8x8 pixels. So the answer is no
>>
EL5 to me

Qwen3-0.6b-EMBEDDING is to be used for semantic search in a pile of document, right?

I give it a prompt, and it delivers the weight for which document is closest to it
>>
>>107255984
How can I enable reasoning for GLM 4.5 Air? I use kobolcpp and SillyTavern on text completion mode, with GLM 4.5 Geechan preset. It is said that it's a no reasoning preset, but I'd like to enable it. I deleted the /nothink parameter from user message suffix, and </think> from assistant message prefix, but it still refuses to work.
>>
File: file.png (1.27 MB, 2141x1215)
1.27 MB
1.27 MB PNG
>>107263922
>>107264112
>>
>>107264552
kek
>>
So what's the preferred method of using a vision model these days, Qwen3 VL specifically? Are people just like, loading it up raw with transformers?
>>
>>107264053
Yes. But you're not going to run Deepseek on your rig locally. You need API access.
Go to >>>/g/aicg/ or >>>/vg/aicg/
They can hook you up.
gl
>>
>>107264552
awooga!
>>
>>107264552
> Dipsy is Kuato to Miku
lol
>>
>>107261819
You say that shit every time you’re called out. What a faggot.
>>
>>107264552
Not bad if you ignore the textures.
https://litter.catbox.moe/wmxweg63obtaq119.png
>>
>>107264530
there's a checkbox under "AI Response config" there's also a drop down to set the level of reasoning.
>>
>>107264517
It gives you a vector that you can use to query a database for similar documents.
>>
There's no way in hell the bubble doesn't bust before 2027
>>
>>107262815
What are you talking about? There's no <<sys>> in Mistral template. Never was.
Mikupad is trash made by an autist.
>>
>>107264788
just how when the dot com bubble burst the internet disappeared, oh wai-
>>
>>107264737
Given a vector of my prompt (query) and a vector calculated from some document (Qwen3-embed allows up to 32k context), how I compare the two?

Does it even make any sense? There is almost no information on youtube on this topic
>>
forgot i did this lmao
i really need to take more llm screenshots, its been a while since i even paid attention to one for longer than 50 messages. stable diffusion + wan has me going full zoomer brain needing more stimulation.
>>
>>107264853
That's what your database engine is for. See for example https://duckdb.org/2024/05/03/vector-similarity-search-vss.
>>
>>107264853
i think most people use a db designed for the task. that being said I've done cosine similarity for classification tasks and it worked out okay.
>>
File: llama.jpg (52 KB, 694x215)
52 KB
52 KB JPG
i was using kobol.cpp but i want to switch to something else so i can stream and see the token %s, i am trying lama.cpp server and its doing this. i loaded the same model to it. who is it doing this?
>>
>>107264838
When did i say that? If it does bust Gemini and Qwen will probably be unaffected for the most part but i can see OpenAI getting fucked simply due to the amount of retarded promises Sam Altman made and Meta going under because they have been losing money for a while
>>
>>107264788
I'm calling Q2 2026 at latest
>>
>>107264893
check your sampler stack?
>>
>>107264893
>>107264910
oh wait, i had to turn down the temperature and up the repeat penalty. thats weird. so different back ends treat these numbers differently?
>>
Any LLMs for low-end devices for Agentic Dev !
>>
>>107264552
At least it tried lmao
>>
>>107264939
>when an actual agi starts forming
just 2 more weeks
>>
>>107264924
Chat GPT subscription + compatible front end
>>
>>107264552
Meta once again at the forefront of innovation. Putting Microsoft's TRELIS and China's Hunyuan3D to shame.
>>
>>107264552
segmentation is good but idk how meta even created a 3d model this bad at this date...
>>
do you need nVidia (CUDA) GPU to run most LLM or AMD (ROCm) is fine?
How does it compare in terms of speed? do we have benchmarks?
>>
>>107260044
>Nobody wants to read an insta-generated wall of text

i absolutely do and i only use old models which do just that

your thoughts are therefore worth less than shit

kys fucking slophuman
>>
>>107264995
You don't need either
Nvidia is considerably faster
>>
>>107264939
Well AGI is the solution to what have now but it's also the problem, the market is being built on "AGI soon, AGI soon, trust me guys AGI soon" and if it turns out AGI isn't in fact soon a few trillions will be flushed at best and the entire world market may crash in a domino effect at worst
>>
>>107264995
vulkan is getting better too. you don't need either. but cuda is typically faster
>>
>>107264995
if you want to read python/transformers as devs intended, you will have hard time without CUDA, but llama.cpp is fine with whatever.
>>
File: fill.webm (1.53 MB, 902x390)
1.53 MB
1.53 MB WEBM
>>107261172
does fill only work with certain models and backends? im using Cydonia-24B-v4zd-Q8_0 with kobold.cpp here
>>
File: niah.png (384 KB, 1555x1685)
384 KB
384 KB PNG
So turns out Gemini 3 is not AGI.
>asked it to summarize how needle in a haystack retrieval works in transformers from a mechanistic interpretability standpoint
>got completely sidetracked with other information returned by the search tool
How meta.
>>
>>107265243
Gemini 3 is not even close to being AGI lmao, it still lies and gets shit wrong a ton and in the image generation dep It even still has the painty/oily smudges at distant things like any Stable Diffusion model
>>
File: mistralFIM.png (64 KB, 1461x586)
64 KB
64 KB PNG
>>107265162
You've got to set it up. Cydonia's a mistral model?
Set the instruct template to Mistral, click the little wrench by it.
Click the "Supports FITM"
Then you have to add the correct info to the FIM template; you can't leave it blank.
The rentry shows values that worked for DS; I've no idea what they'd be for Mistral.
>>
>>107265243
>>107265339
she's trying her best
>>
>>107265349
>>107265162
To start, I'd try this first in the FIM, based on my very limited understanding of Mistral:
[SUFFIX]{suffix}[PREFIX]{prefix}
>>
File: niah self reflection.png (426 KB, 1529x1678)
426 KB
426 KB PNG
>>107265243
>>107265339
>>107265350
The worse part is it *does* know the answer was wrong if you ask it.
Surely at some point some kind of answer refinement loop is going to become the default like cot did?
>>
>>107265339
>image gen
NB2 where?
>>
>>107265430
>thought for 14 minutes
>researched for 5 minutes
>thought for 9 minutes
>revised for 10 minutes
>reflected for 5 minutes
>consulted the ancient tomes for 13 minutes
>edited the final response for 3 minutes
>final answer:
Sorry, I can't assist with that.
>>
Brute forcing AGI by tossing money at LLMs will likely work if it goes on for long enough but i feel like twenty+ years after the fact people will remember it the same way we remember asbestos and lead
>>
>>107265565
people jerked off this much to asbestos and lead?
>>
>>107265565
>will remember it the same way we remember asbestos and lead
No, more like Atari 2600 and CRT televisions.
Products of their time, but not something we'd do anymore.
>>
>>107265349
>>107265417
This didnt work for mistral, and im reading system prompts and user prompts are different too so ill have to look into that
>>
>>107265637
So you mean there will still be people insisting current day llms are better than whatever exists in 2045?
>>
>>107265637
The Atari 2600 comparison is apt but CRTs still have better response time than most monitors
>>
File: 1746004922768474.mp4 (1.27 MB, 480x720)
1.27 MB
1.27 MB MP4
>>107265565
I like this a bit more than asbestos
>>
>>107265777
holy checked
>>
File: minmax.png (87 KB, 1402x283)
87 KB
87 KB PNG
You can write off minmax for any kind of creative writing or RP if you didn't already.
>>
>>107265777
Is this some SaaS video gen?
>>
>>107265777
Which model is this? That's the best shirt removal I've ever seen from a model. Usually it just clips through the body at some point.
>>
>>107265777
>melts away kuro tummy crest
garbage 0/10 worthless should be deleted immediately
>>
>>107262913
Could you explain why/how that's the better model?
>>
>>107265940
NTA but it's got a different, slightly less boring slop profile than the base model. Also has a lot more knowledge of smut. This is common in most drummer finetunes. Rocinante was also made by drummer, tuned from Mistral Nemo.
>>
There is no excuse for using vramlet sloptunes when glm air exists.
>>
How the heck do you get Kiki k2 thinking using the right template in lcpp? I tried the official one with —jinja and —chat-template and it just started repeating word variations old-skool, even with neutralized samplers
>>
when glm air exists
>>
"when glm air exists?"
>>
guize... we got to try training a model on this, right?
I'm tempted to risk getting b& from yet another cloud provider by grabbing the $200 plan and generating as many responses as I can.
I don't think this is the real reasoning trace but it sure as fuck looks like it'd still work well enough.
$ curl -X POST http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer test-key"   -d '{
"model": "gpt-5",
"messages": [{"role": "user", "content": "Tell me a joke"}]
}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"<think>I\u2019m looking to keep things simple and deliver a single joke as requested. No tools are needed, and I want to maintain a friendly tone. \n\nI\u2019m considering a couple of options\u2014like, \u201cI told my wife she was drawing her eyebrows too high. She looked surprised.\" But then there's a better tech joke: \u201cWhy do programmers prefer dark mode? Because light attracts bugs.\u201d \n\nThis sounds clever and safe! So, I\u2019ll go with that as my final answer.</think>Why do programmers prefer dark mode? Because light attracts bugs.","role":"assistant"}}],"created":1763595363,"id":"resp_0eeac847c94a6a5701691e546344f0819bae0f489146835a89","model":"gpt-5","object":"chat.completion","usage":{"completion_tokens":146,"prompt_tokens":5015,"total_tokens":5161}}
>>
>>107256765
What a cutie!
>>
>>107266219
The built-in chat template is faulty and got updated.
https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
Download that and use --chat-template-file ./path/to/file
>>
>>107266219
>>107266408
You should be using the fixed template provided by llama.cpp in models\templates especially if you're going to try using tool calling.
https://github.com/ggml-org/llama.cpp/pull/16932
>>
>>107265642
Ok. If you figure it out post back. I’ll add it as an example to rentry.
For deepseek I had to post on the MP git to get a suggestion that worked.
>>
>>107265738
Both were made with heavy metals. Lead and mercury.
>>107265719
Nostalgia is a hell of a drug.
>>
>>107265719
>>107266473
i like current LLMs because they dont make me do a butthole verification scan unlike the AI models we have in 2045
>>
>>107266356
its not 2024 anymore, nobody hoards reasoning traces
>>
>>107266608
>>107266608
>>107266608
>>
Guaranteed DGEMM Accuracy While Using Reduced Precision Tensor Cores Through Extensions of the Ozaki Scheme
https://arxiv.org/abs/2511.13778

For CUDA dev



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.