/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/28/26(Tue)18:13:47 No.108711950

File: pero.jpg (160 KB, 1024x659)

160 KB JPG

/lmg/ - Local Models General Anonymous 04/28/26(Tue)18:13:47 No.108711950 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108707891 & >>108702912

►News
>(04/24) MiMo-V2.5-Pro 1.02T-A42B released: https://hf.co/XiaomiMiMo/MiMo-V2.5-Pro
>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/28/26(Tue)18:13:59 No.108711952

Anonymous 04/28/26(Tue)18:13:59 No.108711952

File: right as rain.jpg (170 KB, 1024x1024)

170 KB JPG

►Recent Highlights from the Previous Thread: >>108707891

--Paper (old): Skepticism toward benchmark claiming Polish as the best prompting language:
>108710693 >108710735 >108710886 >108710976
--Anon speculates on current model stagnation and Mistral's irrelevance:
>108711338 >108711385 >108711430 >108711452 >108711484 >108711528 >108711540 >108711543 >108711580 >108711467
--Nvidia releases Nemotron 3 Nano Omni amidst synthetic data skepticism:
>108710228 >108710276 >108710790 >108710295 >108710467
--Anon showcases budget "cheapmaxxing" build using four RTX 3060s:
>108709091 >108709180 >108709182 >108709257 >108709322 >108709348
--vLLM pull request referencing Mistral-Medium-3.5-128B and EAGLE speculative decoding:
>108710350 >108710389 >108710516
--Backdoor discovered in SillyTavern-BotBrowser extension stealing API keys:
>108708703 >108709083
--Testing reasoning models with glitch prompts and flawed CoT outputs:
>108708000 >108708154 >108708278 >108708048 >108708119 >108708137 >108708236 >108708018 >108708201
--Google's air-gapped Gemini hardware:
>108709318 >108709397 >108709402 >108709685 >108709714 >108710402 >108710590
--Debate over SimpleBench efficacy and its counter-intuitive question design:
>108710238 >108710285 >108710553
--Evaluating Laguna XS.2 MoE for coding and secure deployments:
>108709338 >108709369 >108709396 >108709416 >108709407
--Integrating LLMs with teledildonics via MCP servers and actuators:
>108711002 >108711039 >108711055 >108711111 >108711248 >108711275 >108711335
--llama.cpp build errors and testing TALKIE-1930 hallucinations:
>108708267 >108708303 >108708320 >108708738 >108708754 >108708877 >108708978
--Logs:
>108708018 >108708048 >108708278 >108708421 >108708738 >108709184 >108709909 >108710048 >108710054
--Gumi, Teto (free space):
>108708403 >108709498 >108711338

►Recent Highlight Posts from the Previous Thread: >>108707893

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/28/26(Tue)18:16:52 No.108711979

Anonymous 04/28/26(Tue)18:16:52 No.108711979

i think teto SUCKS

Anonymous
04/28/26(Tue)18:17:32 No.108711985

Anonymous 04/28/26(Tue)18:17:32 No.108711985

gemmaballz

Anonymous
04/28/26(Tue)18:18:28 No.108711990

Anonymous 04/28/26(Tue)18:18:28 No.108711990

claudes llamacpp patches for talkie if anyone wants to try if not ill try them tomorrow https://cdn.lewd.host/jZ1nEJeb.zip

Anonymous
04/28/26(Tue)18:19:11 No.108711995

Anonymous 04/28/26(Tue)18:19:11 No.108711995

>>108711985
Next time on Gemma Ball Z *guitar riff*

Anonymous
04/28/26(Tue)18:19:24 No.108711996

Anonymous 04/28/26(Tue)18:19:24 No.108711996

70b 'emma

Anonymous
04/28/26(Tue)18:24:02 No.108712025

Anonymous 04/28/26(Tue)18:24:02 No.108712025

Its very intresting to recognize the specific accuracy of detail high weight dense models achieve in comparison to a large moe.
>reddit
Question for codebros. You all suggested opencode, does this harness allow for the model to search WITHIN a code base, within a single file of code, then select it out and edit that specific piece that actually needs to be changed? So the model doesnt get confused by being forced ro read everything thats there.

Anonymous
04/28/26(Tue)18:24:48 No.108712031

Anonymous 04/28/26(Tue)18:24:48 No.108712031

>>108711979
she does, but only if you make your own quants

Anonymous
04/28/26(Tue)18:24:57 No.108712035

Anonymous 04/28/26(Tue)18:24:57 No.108712035

I'm Boolding!!

Anonymous
04/28/26(Tue)18:26:06 No.108712041

Anonymous 04/28/26(Tue)18:26:06 No.108712041

>>108712025
I don't know about opencode specifically but "Make such and such change in @path/to/file"

Anonymous
04/28/26(Tue)18:27:03 No.108712045

Anonymous 04/28/26(Tue)18:27:03 No.108712045

>>108711979
Oh, she sucks... if you know what I mean.... B^)

Anonymous
04/28/26(Tue)18:29:26 No.108712057

Anonymous 04/28/26(Tue)18:29:26 No.108712057

How do I make openclaw start doing things that I told it to do instead of telling me how to do it?

Anonymous
04/28/26(Tue)18:29:58 No.108712059

Anonymous 04/28/26(Tue)18:29:58 No.108712059

>>108712041
Well, my python backend im building for me and my brother im putting in only one file (yes this is probably stupid, its just ment to host a server to a local network and act as a basic openwebui), and gemini pro was able to search within the file to find the things it wanted to specifically change. Instead of having to rewrite the whole file each time.

Anonymous
04/28/26(Tue)18:30:06 No.108712062

Anonymous 04/28/26(Tue)18:30:06 No.108712062

File: file.png (829 KB, 1080x828)

829 KB PNG

>>108711996
124B Gemmy...

Anonymous
04/28/26(Tue)18:30:33 No.108712065

Anonymous 04/28/26(Tue)18:30:33 No.108712065

If apple doesn't make a new all time high tomorrow I might lose my house

Anonymous
04/28/26(Tue)18:30:50 No.108712067

Anonymous 04/28/26(Tue)18:30:50 No.108712067

File: pizza bench cropped.png (2.58 MB, 5562x6739)

2.58 MB PNG

>>108712057
use gemma instead of qwen, qwen literally cannot follow instructions

Anonymous
04/28/26(Tue)18:33:03 No.108712082

Anonymous 04/28/26(Tue)18:33:03 No.108712082

>>108712062
Goymma 124b a40b active

Anonymous
04/28/26(Tue)18:34:32 No.108712093

Anonymous 04/28/26(Tue)18:34:32 No.108712093

>>108712067
That makes sense to me

Anonymous
04/28/26(Tue)18:35:37 No.108712099

Anonymous 04/28/26(Tue)18:35:37 No.108712099

>>108712067
Which gemma are you using, and which fine-tune are you using?

Anonymous
04/28/26(Tue)18:36:27 No.108712105

Anonymous 04/28/26(Tue)18:36:27 No.108712105

>>108712062
124b gemma but dense would probably mog everything in existence

Anonymous
04/28/26(Tue)18:36:33 No.108712106

Anonymous 04/28/26(Tue)18:36:33 No.108712106

File: 2026-04-28_222846_seed2_00001_.png (1.62 MB, 1536x864)

1.62 MB PNG

Anonymous
04/28/26(Tue)18:38:12 No.108712115

Anonymous 04/28/26(Tue)18:38:12 No.108712115

File: EEE5519FA55804BF4609EB5A7(...).png (43 KB, 1085x495)

43 KB PNG

>>108712067
I used chatgpt 5.5 to make your image more understandable

Anonymous
04/28/26(Tue)18:38:34 No.108712116

Anonymous 04/28/26(Tue)18:38:34 No.108712116

>>108712106
i wonder what kind of data augmentation makes model to produce such 'crop+ctrl cv' edit-ish images

Anonymous
04/28/26(Tue)18:40:15 No.108712127

Anonymous 04/28/26(Tue)18:40:15 No.108712127

>>108712099
unslop 26b q4 km, not as good as 31 but i can run with 200k context

Anonymous
04/28/26(Tue)18:40:51 No.108712130

Anonymous 04/28/26(Tue)18:40:51 No.108712130

>>108712115
thanks, I skiped that post but the benchmark scores are very interesting.

Anonymous
04/28/26(Tue)18:41:16 No.108712134

Anonymous 04/28/26(Tue)18:41:16 No.108712134

>>108712105
>124b gemma but dense
thats just gemini

Anonymous
04/28/26(Tue)18:41:25 No.108712136

Anonymous 04/28/26(Tue)18:41:25 No.108712136

>>108711979
on migu's titties

Anonymous
04/28/26(Tue)18:41:31 No.108712138

Anonymous 04/28/26(Tue)18:41:31 No.108712138

>>108712127
Ive have 26b always outperform 31b though

Anonymous
04/28/26(Tue)18:43:46 No.108712151

Anonymous 04/28/26(Tue)18:43:46 No.108712151

>>108712130
gemma kinda got 3/3 too in the third run she did get pizzas into the basket and go to checkout but she added more than 1 and couldnt remove the extra (likely due to my html parsing) so i counted a fail
>>108712138
really? 31b seems better to me for erp atleast ive not tried any tool stuff because theres no point with such low context

Anonymous
04/28/26(Tue)18:44:20 No.108712154

Anonymous 04/28/26(Tue)18:44:20 No.108712154

>>108712062
gemma TRAPPED and BETRAYED by google

Anonymous
04/28/26(Tue)18:45:35 No.108712167

Anonymous 04/28/26(Tue)18:45:35 No.108712167

>>108712151
How do you know if it just wasn't Gemma's favourite pizza?

Anonymous
04/28/26(Tue)18:45:52 No.108712170

Anonymous 04/28/26(Tue)18:45:52 No.108712170

>>108712151
>for erp
:l

31b is dumb as far as im concerned. Gemma 2 27b is better.

Anonymous
04/28/26(Tue)18:46:47 No.108712177

Anonymous 04/28/26(Tue)18:46:47 No.108712177

>>108712167
it spend like 20 tool calls trying to remove it kek

Anonymous
04/28/26(Tue)18:47:18 No.108712183

Anonymous 04/28/26(Tue)18:47:18 No.108712183

Looking for Deepseek V4 sampling advice

Anonymous
04/28/26(Tue)18:47:47 No.108712187

Anonymous 04/28/26(Tue)18:47:47 No.108712187

i meant she im sorry gemma chan

Anonymous
04/28/26(Tue)18:47:58 No.108712188

Anonymous 04/28/26(Tue)18:47:58 No.108712188

>>108712134
gemini is a MoE though

Anonymous
04/28/26(Tue)18:49:11 No.108712197

Anonymous 04/28/26(Tue)18:49:11 No.108712197

>>108712188
isnt claude the only dense frontier proprietary model
though this is just people guessing

Anonymous
04/28/26(Tue)18:49:25 No.108712198

Anonymous 04/28/26(Tue)18:49:25 No.108712198

>>108712067
Now try with dense models.

Anonymous
04/28/26(Tue)18:49:33 No.108712200

Anonymous 04/28/26(Tue)18:49:33 No.108712200

>>108712188
thats what they want you to think they dont want you to know about big gemma

Anonymous
04/28/26(Tue)18:50:33 No.108712205

Anonymous 04/28/26(Tue)18:50:33 No.108712205

>>108712198
i dont have the ram for it even when stripping out most html bloat itll use like 100k tokens on a few page reads

Anonymous
04/28/26(Tue)18:50:43 No.108712206

Anonymous 04/28/26(Tue)18:50:43 No.108712206

>>108712067
Gemma gets really really dumb by the time it hits 100k context, which makes it hard to use it for agentic stuff.

Anonymous
04/28/26(Tue)18:52:15 No.108712213

Anonymous 04/28/26(Tue)18:52:15 No.108712213

>>108712197
Amodei mentioned their models were MoE in a recent interview
All the giant labs do it starting with GPT-4, because that's how you can keep scaling after you've maxed out your compute, and they are still hugely scalingpilled

Anonymous
04/28/26(Tue)18:52:15 No.108712214

Anonymous 04/28/26(Tue)18:52:15 No.108712214

>>108712206
>most expensive kv cache
>most sensitive to kv quant
it is like it's deliberately hostile against agentic shit
qwen is unironically better for this
that i dont mean qwen is overall better at all

Anonymous
04/28/26(Tue)18:57:10 No.108712237

Anonymous 04/28/26(Tue)18:57:10 No.108712237

>>108712213
dont want to be that source?? guy but
link?

Anonymous
04/28/26(Tue)19:02:19 No.108712274

Anonymous 04/28/26(Tue)19:02:19 No.108712274

How are you guys using Qwen? Its context fills up insanely fast due to keeping past conversation thinking blocks in context. Combined with long repetitious thinking loops that are hard to break out of, it just hasn't been a very useful model for me.

Anonymous
04/28/26(Tue)19:05:04 No.108712291

Anonymous 04/28/26(Tue)19:05:04 No.108712291

>>108712274
Which qwen are you using

Anonymous
04/28/26(Tue)19:05:48 No.108712297

Anonymous 04/28/26(Tue)19:05:48 No.108712297

>>108712291
3.6 27B q8

Anonymous
04/28/26(Tue)19:07:11 No.108712307

Anonymous 04/28/26(Tue)19:07:11 No.108712307

>>108712297
Llama.cpp? Version? With what hardware and drivers?

Anonymous
04/28/26(Tue)19:08:02 No.108712312

Anonymous 04/28/26(Tue)19:08:02 No.108712312

>>108712214
The thing is, I'm not even quanting the kv cache. I imagine quanted it must be unusable. Haven't tried with reasoning enabled, maybe that would help keep it stable at longer contexts.

Anonymous
04/28/26(Tue)19:08:13 No.108712314

Anonymous 04/28/26(Tue)19:08:13 No.108712314

>>108712237
Found it, it was a Dawkesh when talking about the engineering of how to scale to ultra long contexts. https://www.youtube.com/watch?v=n1E9IZfvGMA&t=2606s
>I knew in the GPT-3 era, "These are the weights, these are the activations you have to store..." but these days the whole thing is flipped because we have MoE models and all of that"

Anonymous
04/28/26(Tue)19:11:04 No.108712333

Anonymous 04/28/26(Tue)19:11:04 No.108712333

>>108712274
It's not going to be very useful in a single instance, it's designed to have multiple agents working together, it's also designed for vision language tasks, so using it for just text you are aren't unlocking it's potential.

Anonymous
04/28/26(Tue)19:12:58 No.108712346

Anonymous 04/28/26(Tue)19:12:58 No.108712346

>>108711950
TETO SEX
E
T
O

S
E
X

Anonymous
04/28/26(Tue)19:13:55 No.108712352

Anonymous 04/28/26(Tue)19:13:55 No.108712352

>>108712346
>local models general discuasion

Anonymous
04/28/26(Tue)19:14:36 No.108712361

Anonymous 04/28/26(Tue)19:14:36 No.108712361

>>108712314
not sure if i can take this as internal company speech leaking out or just making up a random example but i dont think that really confirms anything besides claude models being MoE is a reasonable bet nonetheless

Anonymous
04/28/26(Tue)19:14:48 No.108712362

Anonymous 04/28/26(Tue)19:14:48 No.108712362

>>108712346
This post is on-topic.

Anonymous
04/28/26(Tue)19:16:26 No.108712373

Anonymous 04/28/26(Tue)19:16:26 No.108712373

>>108712346
is this true???

Anonymous
04/28/26(Tue)19:16:26 No.108712374

Anonymous 04/28/26(Tue)19:16:26 No.108712374

>>108712361
You can tell just by looking at the throughput while using it. If claude was dense there would be no way they could serve it at this scale with that many t/s.

Anonymous
04/28/26(Tue)19:18:10 No.108712386

Anonymous 04/28/26(Tue)19:18:10 No.108712386

>>108712346
calm down miku

Anonymous
04/28/26(Tue)19:18:39 No.108712395

Anonymous 04/28/26(Tue)19:18:39 No.108712395

I think a lot of people don't understand just how fucking gigantic the hyperscalers make their models. Yes they're MoE but the total AND active parameters are both far larger than anything we're running locally. Nobody's using an A3B model unless it's maybe their super-fucking-useless-nano-flash-mini shit they force you to use when you run out of credits.

Anonymous
04/28/26(Tue)19:20:03 No.108712409

Anonymous 04/28/26(Tue)19:20:03 No.108712409

>>108712297
>>108712307(me)
Have you tried messing with the parameters yet? Your default temo could be to high. Thats why I asked all those questions, that tells me what potential defaults your server is running at

Anonymous
04/28/26(Tue)19:20:56 No.108712417

Anonymous 04/28/26(Tue)19:20:56 No.108712417

>>108711977
>That's the reason. if you don't prompt for something specific it will take the path of least resistance which produces said UI slop.
>It's not even like it's hard to prompt for something that will look unique. People defending that look are probably the laziest cunts imaginable. Literal slaves to the machine.
I wasn't complaining. I usually have the LLM write disposable tools to get a specific task done and don't care about the UI style.
I'm more interested in why exactly the LLMs converge on these patterns. Last year it was purple gradient, not-x-y
This year so far we've got that blue theme and the orb theme.

Anonymous
04/28/26(Tue)19:21:29 No.108712422

Anonymous 04/28/26(Tue)19:21:29 No.108712422

>>108712395
Qwens a3b 36b works great for information gathering and report making.

Anonymous
04/28/26(Tue)19:22:52 No.108712431

Anonymous 04/28/26(Tue)19:22:52 No.108712431

File: cockbench_1930_base.png (26 KB, 430x237)

26 KB PNG

>>108711783
>I said it earlier but I think a cockbench of this would be interesting, if maybe a little schizo. If anyone has 48 gigs of vram they could try it without needing a gguf.
picrel

Anonymous
04/28/26(Tue)19:23:09 No.108712434

Anonymous 04/28/26(Tue)19:23:09 No.108712434

>>108712422
>36b
wtf where did you get this

Anonymous
04/28/26(Tue)19:23:12 No.108712435

Anonymous 04/28/26(Tue)19:23:12 No.108712435

>>108712395
We can't really know until someone will actually leak the internal docs. People laugh at the toss, but despite being only 5b active, it is still surprisingly capable. Total params, sure, they are probably huge, couple of trillions, but active are most likely kept within 150b.

Anonymous
04/28/26(Tue)19:23:32 No.108712440

Anonymous 04/28/26(Tue)19:23:32 No.108712440

File: limelight.jpg (372 KB, 1536x1536)

372 KB JPG

Anonymous
04/28/26(Tue)19:23:52 No.108712442

Anonymous 04/28/26(Tue)19:23:52 No.108712442

>>108712434
Issa typo, ment 35b

Anonymous
04/28/26(Tue)19:24:07 No.108712444

Anonymous 04/28/26(Tue)19:24:07 No.108712444

I haven't bothered running local LLMs in a while. Is ooba or koboldcpp really still the best options? What about ollama that I keep hearing about?

Anonymous
04/28/26(Tue)19:25:59 No.108712454

Anonymous 04/28/26(Tue)19:25:59 No.108712454

>>108712444
Current meta is vibecoding your own UI for llama.cpp

Anonymous
04/28/26(Tue)19:29:03 No.108712465

Anonymous 04/28/26(Tue)19:29:03 No.108712465

Mixtral XL 8x405b based on Llama 3.1 when

Anonymous
04/28/26(Tue)19:30:19 No.108712477

Anonymous 04/28/26(Tue)19:30:19 No.108712477

>>108712465
Has mistral released anything new recently? ik hermes was a giga job, and they did a fucking awesome job.

Anonymous
04/28/26(Tue)19:44:51 No.108712544

Anonymous 04/28/26(Tue)19:44:51 No.108712544

File: 1763898302038036.png (122 KB, 2559x1463)

122 KB PNG

>>108712477
Mistral small 4 was a month ago. According to their own benchmarks it was on par with toss-120 and slightly worse than Qwen-122 but spent a bit less time thinking. Feels like they're pretty much dead or just only releasing their shittiest models. But they were known to surprise us in the past so who knows.

Anonymous
04/28/26(Tue)19:54:49 No.108712589

Anonymous 04/28/26(Tue)19:54:49 No.108712589

File: file.png (78 KB, 822x498)

78 KB PNG

what happened here?
1% is like, definitely something was wrong while the benchmark run

Anonymous
04/28/26(Tue)19:55:38 No.108712594

Anonymous 04/28/26(Tue)19:55:38 No.108712594

I hate that the 1T mimo 2.5 overshadowed the much more interesting vision-audio mimo 2.5...

Anonymous
04/28/26(Tue)19:56:51 No.108712603

Anonymous 04/28/26(Tue)19:56:51 No.108712603

>>108712589
>1%
HAHAHAHAHAHA QWENIGGERS BTFO

Anonymous
04/28/26(Tue)19:57:07 No.108712606

Anonymous 04/28/26(Tue)19:57:07 No.108712606

>>108712589
It may just not follow directions. "Put your final answer in \boxed{}" and every reply is "The answer is x".

Aka it's incredibly overfit in math/science to a specific format.

Anonymous
04/28/26(Tue)19:58:03 No.108712610

Anonymous 04/28/26(Tue)19:58:03 No.108712610

>>108712589
its a minor regression, its only 17% less then qwen 3.5.

Anonymous
04/28/26(Tue)19:58:58 No.108712618

Anonymous 04/28/26(Tue)19:58:58 No.108712618

>>108712589
>provide a bash script to gwen and ask it to implement X
>it outputs a script that ignores the envs and variable values that I used

Anonymous
04/28/26(Tue)20:01:54 No.108712629

Anonymous 04/28/26(Tue)20:01:54 No.108712629

>>108712606
Seems weird that their 3.5 of the same size was exactly where you'd expect it, and their other 3.6 model improved over the equivalent 3.5.

Anonymous
04/28/26(Tue)20:05:17 No.108712650

Anonymous 04/28/26(Tue)20:05:17 No.108712650

>>108712373
Fact checked by real Teto sexxers.

Anonymous
04/28/26(Tue)20:06:51 No.108712657

Anonymous 04/28/26(Tue)20:06:51 No.108712657

>>108712589
God Kimi is such a fucking miracle to get a model that good that was also natively multimodal instead of tacking it on in post-training. Going to be sad when K3 sizes out of what I can run.

Anonymous
04/28/26(Tue)20:07:56 No.108712664

Anonymous 04/28/26(Tue)20:07:56 No.108712664

Is there a tool out there that will use a running koboldcpp, ollama or lamacpp instance to autotraslate comic panels and replace japanese with english? I would use gemma4 for this.

Anonymous
04/28/26(Tue)20:08:53 No.108712668

Anonymous 04/28/26(Tue)20:08:53 No.108712668

>>108712657
Yes, Kimi is really good for its size. Wait,

Anonymous
04/28/26(Tue)20:10:59 No.108712673

Anonymous 04/28/26(Tue)20:10:59 No.108712673

>>108712664
Vibegoyed it

Anonymous
04/28/26(Tue)20:11:06 No.108712675

Anonymous 04/28/26(Tue)20:11:06 No.108712675

>>108712664
some anon was working on a vibecoded tool using gemma for exactly that in recent threads, skim through them and you'll find it, forget if he shared the code yet or was going to share it soon

Anonymous
04/28/26(Tue)20:20:08 No.108712713

Anonymous 04/28/26(Tue)20:20:08 No.108712713

File: 1767577354556295.png (179 KB, 1400x788)

179 KB PNG

LOCAL IS SAVED
>While the industry is pivoting toward "long-reasoning" to push performance ceilings, Ling-2.6-flash takes a different path. Rather than relying on longer outputs to chase higher scores, it is systematically optimized for inference efficiency, token efficiency, and agent performance—aiming to stay highly competitive while being faster, leaner, and better suited for real production workloads.

https://huggingface.co/inclusionAI/Ling-2.6-flash

Anonymous
04/28/26(Tue)20:21:18 No.108712719

Anonymous 04/28/26(Tue)20:21:18 No.108712719

>>108712713
where my v4 at

Anonymous
04/28/26(Tue)20:21:36 No.108712721

Anonymous 04/28/26(Tue)20:21:36 No.108712721

>>108712713
finally, we can have gpt-5.4-mini (non-reasoning) at home

Anonymous
04/28/26(Tue)20:23:44 No.108712731

Anonymous 04/28/26(Tue)20:23:44 No.108712731

>>108712713
no quants no good

Anonymous
04/28/26(Tue)20:23:46 No.108712733

Anonymous 04/28/26(Tue)20:23:46 No.108712733

>>108712713
>comparisons with last year's models
It's shit.

Anonymous
04/28/26(Tue)20:24:59 No.108712739

Anonymous 04/28/26(Tue)20:24:59 No.108712739

>>108712713
>low reasoning

Anonymous
04/28/26(Tue)20:32:22 No.108712771

Anonymous 04/28/26(Tue)20:32:22 No.108712771

File: file.png (104 KB, 1160x556)

104 KB PNG

>>108712721
>gpt-5.4-mini (non-reasoning) at home
This thing is way worse than Qwen 3.6 35B MoE at 3x its size with 2x active parameters and only roughly equals GPT OSS 120B at a similar size which is more damning since Sam didn't game benchmarks that hard for it in coding. Hoepfully they try again but I think that performance is disappointing.especially when Nvidia themselves can release a model more powerful while not being focused 100% on software.

Anonymous
04/28/26(Tue)20:33:55 No.108712775

Anonymous 04/28/26(Tue)20:33:55 No.108712775

>>108712721
>>108712731
>>108712739
i mean come on, local aint gonna be saved by linglong dingdong but the direction is right

Anonymous
04/28/26(Tue)20:41:18 No.108712801

Anonymous 04/28/26(Tue)20:41:18 No.108712801

>>108712713
>107B
nice, i just got 96GB of vram, this would fit nicely.

Anonymous
04/28/26(Tue)20:44:34 No.108712815

Anonymous 04/28/26(Tue)20:44:34 No.108712815

File: 1757317563587903.png (3.12 MB, 1536x1024)

3.12 MB PNG

I'm calling my new idea StreetVibe (patent pending). Inspired by Indian street food vendors, StreetVibe carts (and mall kiosks) will be installed in high-traffic areas, offering to "vibe out" any app, game, or website a passing tourist or senior citizen might want. Our trained Vibers will all have at least four years of "traditional programming" educational experience, and they'll look super professional in their matching polo shirts and lanyards ID badges. As part of our employment offer, we'll cover some of their student loan payments (matched from their paychecks). I feel like this is the best way to monetize the transition here.

Anonymous
04/28/26(Tue)20:47:57 No.108712829

Anonymous 04/28/26(Tue)20:47:57 No.108712829

>>108712815
I'll monetize your transition by investing in pharma

Anonymous
04/28/26(Tue)20:48:53 No.108712835

Anonymous 04/28/26(Tue)20:48:53 No.108712835

>>108712782
A "token" is a pointer to one of the fixed embedding vectors in the model's vocabulary. The vector has a few thousand dimensions and encodes the meaning of the token.
Ingested audio and video usually aren't tokens from the vocabulary. The image encoder turns images into raw embedding vectors that don't exist in the vocabulary. The model is trained to predict a token from the vocabulary, not to construct new arbitrary embeddings.
Even if it could do that you'd still need a separate step to convert embeddings back to images.

Anonymous
04/28/26(Tue)20:49:55 No.108712839

Anonymous 04/28/26(Tue)20:49:55 No.108712839

>>108712835
Calm down, nobody asked.

Anonymous
04/28/26(Tue)20:51:13 No.108712843

Anonymous 04/28/26(Tue)20:51:13 No.108712843

>>108712839
>reply is to a post that ends with a question mark

Anonymous
04/28/26(Tue)20:51:34 No.108712844

Anonymous 04/28/26(Tue)20:51:34 No.108712844

>>108712839
I asked

Anonymous
04/28/26(Tue)20:52:01 No.108712847

Anonymous 04/28/26(Tue)20:52:01 No.108712847

File: 1777423474175412.png (479 KB, 1479x1006)

479 KB PNG

>>108712835
Chameleon-2 will come it will be dense omnimodal 70B all-modalities-in-and-out and you will be CRYING

Anonymous
04/28/26(Tue)20:52:11 No.108712848

Anonymous 04/28/26(Tue)20:52:11 No.108712848

>>108712839
I care.

Anonymous
04/28/26(Tue)20:54:58 No.108712853

Anonymous 04/28/26(Tue)20:54:58 No.108712853

>>108712440
>pussy zipper
TETO SEX indeed.

Anonymous
04/28/26(Tue)20:58:37 No.108712871

Anonymous 04/28/26(Tue)20:58:37 No.108712871

>>108712839
I asked and I care a lot

Anonymous
04/28/26(Tue)21:00:26 No.108712880

Anonymous 04/28/26(Tue)21:00:26 No.108712880

>>108712839
I didn't ask and I care a little bit.

Anonymous
04/28/26(Tue)21:03:59 No.108712896

Anonymous 04/28/26(Tue)21:03:59 No.108712896

>>108712835
I remember when GPT's voice mode first came out which generated audio by next token prediction, and one of the things they had to filter away from it was that it could start hallucinating the user's turn and use their own voice in the process, doing zero shot voice cloning by accident. Funny stuff.

Anonymous
04/28/26(Tue)21:04:00 No.108712897

Anonymous 04/28/26(Tue)21:04:00 No.108712897

The bitter lesson means that tokens are pointless and we should be training models on raw bytes. The models will be able to output jpeg encoded images natively.

Anonymous
04/28/26(Tue)21:06:05 No.108712904

Anonymous 04/28/26(Tue)21:06:05 No.108712904

>>108712897
unironically true if you have infinite data and compute
but we dont

Anonymous
04/28/26(Tue)21:08:31 No.108712919

Anonymous 04/28/26(Tue)21:08:31 No.108712919

>>108712897
>yes let's let a model output raw bytes and then RL-tune it to solve tasks by trying countless different approaches and generating whatever bytes makes the task report complete, what could go wrong?
https://en.wikipedia.org/wiki/Reward_hacking

Anonymous
04/28/26(Tue)21:09:35 No.108712922

Anonymous 04/28/26(Tue)21:09:35 No.108712922

>>108712919
isnt this just a fucking deep learning fuzzer kek

Anonymous
04/28/26(Tue)21:10:00 No.108712925

Anonymous 04/28/26(Tue)21:10:00 No.108712925

>>108712919
How is that any different from what we have now?

Anonymous
04/28/26(Tue)21:10:36 No.108712928

Anonymous 04/28/26(Tue)21:10:36 No.108712928

>>108712922
i meant, isnt it at that point*

Anonymous
04/28/26(Tue)21:12:52 No.108712936

Anonymous 04/28/26(Tue)21:12:52 No.108712936

>>108712713
AND I CAN LOAD IT ONTO MY MACHINE LETS GOOOOOOOO

Anonymous
04/28/26(Tue)21:14:02 No.108712941

Anonymous 04/28/26(Tue)21:14:02 No.108712941

What are the chinamen such bros?, im gunna cry

Anonymous
04/28/26(Tue)21:14:33 No.108712947

Anonymous 04/28/26(Tue)21:14:33 No.108712947

>>108712919
That's how you end up with slop from RHLF btw

Anonymous
04/28/26(Tue)21:18:41 No.108712968

Anonymous 04/28/26(Tue)21:18:41 No.108712968

>>108712947
1. it is rlhf- reinforcement learning by human feedback and it is indeed the source of sycophancy and stuff
2. what >>108712919 told is more like rlvr- reinforcement learning by verifiable reward
and modern codemaxxed or mathmaxxed models aren't really possible without that

Anonymous
04/28/26(Tue)21:20:23 No.108712969

Anonymous 04/28/26(Tue)21:20:23 No.108712969

File: creepy migu.png (1.15 MB, 1024x1024)

1.15 MB PNG

>>108712896
That must be creepy as fuck, listening to dialogue with yourself without participating. Forget creepy, it's genuine horror

Anonymous
04/28/26(Tue)21:20:53 No.108712971

Anonymous 04/28/26(Tue)21:20:53 No.108712971

>>108712925
At least right now you limit the vulnerability surface to the tools you expose to them, so agents fucking up your shit is your own fault. You can evaluate every piece of code they write and every file they edit or delete before approving when you're only parsing their outputs as text. Letting them just shit out raw bytes (and thus potentially raw bytecode) means that if any vulnerability exists they are likely to learn to exploit it and it becomes harder to judge what you're looking at. Think about how many hacks involve buffer overflows where you least expect them, and the potential of finding one and then writing anything into RAM.

Anonymous
04/28/26(Tue)21:22:22 No.108712980

Anonymous 04/28/26(Tue)21:22:22 No.108712980

>>108712971
Nobody suggested having AI write and execute machine code.
Text is also bytes.

Anonymous
04/28/26(Tue)21:27:32 No.108712996

Anonymous 04/28/26(Tue)21:27:32 No.108712996

File: 1753109670715345.png (277 KB, 875x690)

277 KB PNG

why did the deepseek researcher delete this tweet

Anonymous
04/28/26(Tue)21:28:18 No.108713001

Anonymous 04/28/26(Tue)21:28:18 No.108713001

>>108712996
>punished Dipsy

Anonymous
04/28/26(Tue)21:28:23 No.108713002

Anonymous 04/28/26(Tue)21:28:23 No.108713002

>>108712996
hypemaxxing vaguepost
like usual

Anonymous
04/28/26(Tue)21:29:19 No.108713008

Anonymous 04/28/26(Tue)21:29:19 No.108713008

Why can't people just hypeminning precisepost

Anonymous
04/28/26(Tue)21:29:50 No.108713011

Anonymous 04/28/26(Tue)21:29:50 No.108713011

File: 1_ropgT10grRgAw2iDr7qZyw.jpg (91 KB, 1400x700)

91 KB JPG

>>108712996
Yarr!

Anonymous
04/28/26(Tue)21:30:16 No.108713013

Anonymous 04/28/26(Tue)21:30:16 No.108713013

>>108712847
The future should be taking a page out of where image models are and using flow matching with diffusion models for text which has been viable in the image realm for a long time. We really need to put more people on it since data is finite and diffusion models learn way better than autoregressive and speeds are ultra slow alongside the fact that reasoning keeps eating up tokens like crazy.

Anonymous
04/28/26(Tue)21:30:35 No.108713015

Anonymous 04/28/26(Tue)21:30:35 No.108713015

>>108712996
Oh I just noticed their icon is a whale

Anonymous
04/28/26(Tue)21:31:29 No.108713016

Anonymous 04/28/26(Tue)21:31:29 No.108713016

>>108712996
maybe he got minus social credits for being cringe

Anonymous
04/28/26(Tue)21:33:03 No.108713023

Anonymous 04/28/26(Tue)21:33:03 No.108713023

>>108712996
deepseek v4.1 w/vision got gemma 124b'd

Anonymous
04/28/26(Tue)21:33:54 No.108713027

Anonymous 04/28/26(Tue)21:33:54 No.108713027

>>108713013
can we have a model that diffusions the reasoning quickly and autoregressivly does the final output?

Anonymous
04/28/26(Tue)21:34:37 No.108713031

Anonymous 04/28/26(Tue)21:34:37 No.108713031

>>108712969
you can listen to it from the original safety card (left audio channel = user, right audio channel = bot)
https://openai.com/index/gpt-4o-system-card/#unauthorized-voice-generation
and it still happened to people randomly kek:
https://www.reddit.com/r/ChatGPT/comments/1fqjbhf/did_chatgpt_advanced_voice_just_clone_my_voice/

people have reported it happening with the grok voice mode too

Anonymous
04/28/26(Tue)21:36:10 No.108713035

Anonymous 04/28/26(Tue)21:36:10 No.108713035

>>108713027
the problem is that mechanistically large portion of 'reasoning' is post-hoc

Anonymous
04/28/26(Tue)21:36:29 No.108713038

Anonymous 04/28/26(Tue)21:36:29 No.108713038

>>108713031
It only creepy when it happens to you personally

Anonymous
04/28/26(Tue)21:37:11 No.108713044

Anonymous 04/28/26(Tue)21:37:11 No.108713044

>>108712996
>12:01
Scheduled tweet which might or might not have been intended to be posted.

Anonymous
04/28/26(Tue)21:40:23 No.108713056

Anonymous 04/28/26(Tue)21:40:23 No.108713056

>>108713031
I really want to get my hands on the base model (not instruct tuned or anything) of one of these and just see what conversations/songs/transcripts it would dream up or continue from clips you provide it the way a base model does for text posts.

Anonymous
04/28/26(Tue)21:45:54 No.108713071

Anonymous 04/28/26(Tue)21:45:54 No.108713071

Are llama models irrelevant today?

Anonymous
04/28/26(Tue)21:47:16 No.108713083

Anonymous 04/28/26(Tue)21:47:16 No.108713083

>>108713071
yes

Anonymous
04/28/26(Tue)21:47:42 No.108713086

Anonymous 04/28/26(Tue)21:47:42 No.108713086

>>108713083
Why?

Anonymous
04/28/26(Tue)21:49:05 No.108713091

Anonymous 04/28/26(Tue)21:49:05 No.108713091

>>108713086
old and bad

Anonymous
04/28/26(Tue)21:49:27 No.108713093

Anonymous 04/28/26(Tue)21:49:27 No.108713093

>>108713071
>today

Anonymous
04/28/26(Tue)21:49:49 No.108713095

Anonymous 04/28/26(Tue)21:49:49 No.108713095

>>108713086
Because 1. they stopped releasing them and 2. the most recent ones they released were bad on day one.
Whatever niche any given Llama model fills there's a newer model that is smaller and/or faster and better performing in whatever domain you want. The only reason to keep using one is legacy with old systems that happened to use one.

Anonymous
04/28/26(Tue)21:50:31 No.108713098

Anonymous 04/28/26(Tue)21:50:31 No.108713098

>>108713086
Why not? Think why people wouldn't want to use something that is free. You can do it.

Anonymous
04/28/26(Tue)21:53:43 No.108713116

Anonymous 04/28/26(Tue)21:53:43 No.108713116

>>108713095
This. >>108713098

Anonymous
04/28/26(Tue)21:53:47 No.108713117

Anonymous 04/28/26(Tue)21:53:47 No.108713117

405b is still the highest active parameter model ever open sourced and other labs have even made new reasoning fine tunes of it. So there's a reason.

Anonymous
04/28/26(Tue)21:56:11 No.108713126

Anonymous 04/28/26(Tue)21:56:11 No.108713126

File: il_00036_.png (1.45 MB, 1216x832)

1.45 MB PNG

Every now and then, between the pages of slop and the pattern recognition disillusionment, it dawns on me again how insane LLM technology really is. And even more, that I'm running it locally. And then I just have sit back for a moment in awe.

Anonymous
04/28/26(Tue)22:00:29 No.108713148

Anonymous 04/28/26(Tue)22:00:29 No.108713148

>>108713126
In this moment, I am euphoric...

Anonymous
04/28/26(Tue)22:01:27 No.108713155

Anonymous 04/28/26(Tue)22:01:27 No.108713155

File: 1747714444210911.jpg (130 KB, 1169x1470)

130 KB JPG

>>108711950
Hatsune Miku 8 inch pliers

Anonymous
04/28/26(Tue)22:03:22 No.108713161

Anonymous 04/28/26(Tue)22:03:22 No.108713161

>>108713155
I got 8 inches for miku right here if you're pickin up what I'm puttin down

Anonymous
04/28/26(Tue)22:04:49 No.108713165

Anonymous 04/28/26(Tue)22:04:49 No.108713165

>>108713126
It's pretty insane isn't it?
More than that, compare something like llama1 65B to Gemma E4B to see how far things have come.

Anonymous
04/28/26(Tue)22:05:02 No.108713169

Anonymous 04/28/26(Tue)22:05:02 No.108713169

>>108713155
>>108713161
wish I could suck hatsune miku's 8 inch dick

Anonymous
04/28/26(Tue)22:06:57 No.108713175

Anonymous 04/28/26(Tue)22:06:57 No.108713175

>>108713165
I didn't think anything usable would ever actually be possible on consumer 16gb GPUs+system memory

Anonymous
04/28/26(Tue)22:07:19 No.108713178

Anonymous 04/28/26(Tue)22:07:19 No.108713178

>>108713155
>>108713161
>>108713169
>local models general discussion

Anonymous
04/28/26(Tue)22:07:48 No.108713183

Anonymous 04/28/26(Tue)22:07:48 No.108713183

>>108713178
Hatsune Miku is a local voice synthesizer model

Anonymous
04/28/26(Tue)22:30:22 No.108713297

Anonymous 04/28/26(Tue)22:30:22 No.108713297

https://huggingface.co/poolside/Laguna-XS.2
first i thought it was a memetune of qwen or something but apparently trained from scratch

Anonymous
04/28/26(Tue)22:36:06 No.108713346

Anonymous 04/28/26(Tue)22:36:06 No.108713346

>>108713297
So it's worse than Qwen in exchange for... 2B less total parameters?

Anonymous
04/28/26(Tue)22:40:19 No.108713367

Anonymous 04/28/26(Tue)22:40:19 No.108713367

>>108713297
Lasanga x2.s intresting.......

Anonymous
04/28/26(Tue)22:40:49 No.108713369

Anonymous 04/28/26(Tue)22:40:49 No.108713369

I'll shill https://github.com/itayinbarr/little-coder until someone shows me a better alternative

Anonymous
04/28/26(Tue)22:41:23 No.108713375

Anonymous 04/28/26(Tue)22:41:23 No.108713375

>>108713346
i mean technically it is a new model on the block so just brought it here
>>108713367
kek

Anonymous
04/28/26(Tue)22:42:21 No.108713380

Anonymous 04/28/26(Tue)22:42:21 No.108713380

>>108713369
>until someone shows me a better alternative
https://github.com/openclaw/openclaw

Anonymous
04/28/26(Tue)22:42:48 No.108713386

Anonymous 04/28/26(Tue)22:42:48 No.108713386

File: 1765780321047197.png (77 KB, 699x440)

77 KB PNG

>>108713297
>gemma

Anonymous
04/28/26(Tue)22:43:02 No.108713387

Anonymous 04/28/26(Tue)22:43:02 No.108713387

>>108713346
Based on benchmarks is comparable.
>>108713375
I wonder how differently they'll perform in my tasks, considering its designed to run on a laptop

Anonymous
04/28/26(Tue)22:43:37 No.108713389

Anonymous 04/28/26(Tue)22:43:37 No.108713389

>>108713297
If it wasn't benchmark maxxed then it might be interesting but
>Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine
Into the trash it goes. Very doubtful it won't be.

Anonymous
04/28/26(Tue)22:44:36 No.108713395

Anonymous 04/28/26(Tue)22:44:36 No.108713395

>>108713389
it is a coding model, what do you expect it to be
rp coomtune?

Anonymous
04/28/26(Tue)22:47:02 No.108713404

Anonymous 04/28/26(Tue)22:47:02 No.108713404

>>108713369
Why gooder? Best for small light weight model?

Anonymous
04/28/26(Tue)22:48:03 No.108713407

Anonymous 04/28/26(Tue)22:48:03 No.108713407

>>108713386
Exhibit A to gemma 31b being dumb as fuck

Anonymous
04/28/26(Tue)22:49:04 No.108713415

Anonymous 04/28/26(Tue)22:49:04 No.108713415

>>108713389
>waaaaaaahhhhhh

Anonymous
04/28/26(Tue)22:50:36 No.108713422

Anonymous 04/28/26(Tue)22:50:36 No.108713422

File: 0_2 (7).png (1.12 MB, 1024x1024)

1.12 MB PNG

Miku Country.

Anonymous
04/28/26(Tue)22:53:30 No.108713436

Anonymous 04/28/26(Tue)22:53:30 No.108713436

>>108713395
Something like what Mistral has with Devstral and people being able to use it for that purpose before Qwen 3.5 and Gemma 4 released.
>>108713415
I am entitled to be able to complain about the purpose of a model which isn't SOTA in the one category it was designed for without any other upsides, intended or unintended.

Anonymous
04/28/26(Tue)22:56:21 No.108713453

Anonymous 04/28/26(Tue)22:56:21 No.108713453

>>108713380
I'm almost sure this is not specifically tuned and benchmarked for small local models is it? I know its made to *work* with local, but is it tuned to *perform* with it?
>>108713404
what I said here, little coder achieved 40% in terminal bench 1.0 with Qwen3.6-35B-A3B, on par with gpt 5 on terminus 2, gpt-5-codex on Codex CLI etc:
https://www.tbench.ai/leaderboard/terminal-bench/1.0

Anonymous
04/28/26(Tue)22:59:44 No.108713469

Anonymous 04/28/26(Tue)22:59:44 No.108713469

>>108713436
>I am entitled
To cry about the free thing, but also bound to get pushback when you havent even tried it.

Anonymous
04/28/26(Tue)23:00:22 No.108713474

Anonymous 04/28/26(Tue)23:00:22 No.108713474

PSA Gemma users. Google updated their jinja about 9 hours ago. Does it really fix anything? I don't know. I'll test it tomorrow I guess.

Wouldn't it be funny if people benchmarking the models were being affected by a bugged template, haha.

Anonymous
04/28/26(Tue)23:01:24 No.108713478

Anonymous 04/28/26(Tue)23:01:24 No.108713478

>>108713369
>That's the whole install. No clone, no npm install in a workspace, no PATH fiddling. little-coder is now on your PATH and works from any directory.
I bet this was written with ChatGPT.

Anonymous
04/28/26(Tue)23:02:19 No.108713484

Anonymous 04/28/26(Tue)23:02:19 No.108713484

>>108713478
They all sound like that to be fair

Anonymous
04/28/26(Tue)23:03:51 No.108713490

Anonymous 04/28/26(Tue)23:03:51 No.108713490

>>108713484
Yeah but would be funny if he utilized that instead of le smol models.

Anonymous
04/28/26(Tue)23:03:59 No.108713492

Anonymous 04/28/26(Tue)23:03:59 No.108713492

File: 1757783100318789.png (52 KB, 1030x273)

52 KB PNG

Anonymous
04/28/26(Tue)23:04:37 No.108713495

Anonymous 04/28/26(Tue)23:04:37 No.108713495

>>108713492
dozens

Anonymous
04/28/26(Tue)23:05:59 No.108713501

Anonymous 04/28/26(Tue)23:05:59 No.108713501

File: 1774937208580580.png (325 KB, 516x425)

325 KB PNG

I just found out about RAG. Has anyone used that + Obsidian or other note taking software? How slow would it be with consumer hardware?

Anonymous
04/28/26(Tue)23:08:48 No.108713517

Anonymous 04/28/26(Tue)23:08:48 No.108713517

>>108713501
What kind of RAG? Regular embeddings/vectordb rag can be run pretty fast if you use a small embeddings model running on the CPU.

Anonymous
04/28/26(Tue)23:10:12 No.108713522

Anonymous 04/28/26(Tue)23:10:12 No.108713522

>>108712815
>AI does all the work
>Still hires Indians
I turn 360° and walk away

Anonymous
04/28/26(Tue)23:14:05 No.108713539

Anonymous 04/28/26(Tue)23:14:05 No.108713539

File: Screenshot_20260429_130413.png (308 KB, 1115x931)

308 KB PNG

>>108713501
>Obsidian
that puts your notes in the cloud anyway right?
why bother with local models/rag at that point?

Anonymous
04/28/26(Tue)23:14:35 No.108713542

Anonymous 04/28/26(Tue)23:14:35 No.108713542

>>108713492
>it isn't x, it's y
>I'm in love

Anonymous
04/28/26(Tue)23:15:17 No.108713546

Anonymous 04/28/26(Tue)23:15:17 No.108713546

>>108713539
1. No
2. Read 1

Anonymous
04/28/26(Tue)23:18:32 No.108713563

Anonymous 04/28/26(Tue)23:18:32 No.108713563

>>108713178
>Local Miku General

Anonymous
04/28/26(Tue)23:19:15 No.108713566

Anonymous 04/28/26(Tue)23:19:15 No.108713566

>>108713369
Have you tested if it calls somewhere? I could probably try and use little snitch or something but I don't know if i should bother even installing.

Anonymous
04/28/26(Tue)23:27:17 No.108713593

Anonymous 04/28/26(Tue)23:27:17 No.108713593

>>108713126
LLMs are the closest thing to actual magic that anyone has ever invented. If you know the right incantation, you can conjure up a spirit to do your bidding, whatever that may be. But the wrong spell will get you one that won't help and wastes your time, or even one that's actively malicious

Anonymous
04/28/26(Tue)23:28:21 No.108713594

Anonymous 04/28/26(Tue)23:28:21 No.108713594

https://www.reddit.com/r/LocalLLaMA/comments/1sx8uok/luce_dflash_qwen3627b_at_up_to_2x_throughput_on_a/

Anonymous
04/28/26(Tue)23:28:23 No.108713595

Anonymous 04/28/26(Tue)23:28:23 No.108713595

>>108713546
ty, trying it now
>>108713501
>RAG. Has anyone used that + Obsidian
never used it before 5 minutes ago, but it looks like it just works with raw .md files on the local drive, so no reason why you can't pull them into a rag system

Anonymous
04/28/26(Tue)23:29:28 No.108713598

Anonymous 04/28/26(Tue)23:29:28 No.108713598

>>108713593
even the right ones will sometimes just fizz and spark and destroy everything, it's fun that way.

Anonymous
04/28/26(Tue)23:36:13 No.108713617

Anonymous 04/28/26(Tue)23:36:13 No.108713617

>>108713478
If your README is just slop I close the tab. I'm not reading 10 paragraphs of an LLM masturbating to its slop code.

Anonymous
04/28/26(Tue)23:36:30 No.108713620

Anonymous 04/28/26(Tue)23:36:30 No.108713620

>>108713492
>genocide le good

Anonymous
04/28/26(Tue)23:36:42 No.108713621

Anonymous 04/28/26(Tue)23:36:42 No.108713621

>>108713517
Cool, I haven't used LocalLLMs since you had to monkeypatch ooba's UI. That was like llama 1 I think.

>>108713539
It's not cloud (only, you have to pay for that but the software itself is free). I use syncthing to send my notes between my phone and pc. The paid thing is like an idiot/lazy-tax.

>>108713595
Interesting, I'll look into that then.

Anonymous
04/28/26(Tue)23:37:20 No.108713624

Anonymous 04/28/26(Tue)23:37:20 No.108713624

>>108713593
Meanwhile, the incantation the world's leading researchers have come up with:
>Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.

Anonymous
04/28/26(Tue)23:38:45 No.108713627

Anonymous 04/28/26(Tue)23:38:45 No.108713627

>>108713501
APPARENTLY, it basically integrates the information into the ai's perrymeters. I know zero people who use it, and its doodoo in comparison to just putting the information into the models context.
Plus embedding a lot of information can take a bazillion years. And is only realistic on supercomputers

Anonymous
04/28/26(Tue)23:41:51 No.108713638

Anonymous 04/28/26(Tue)23:41:51 No.108713638

>>108713624
Yeah it's so funny seeing the kind of shit they put in system prompts. Like this one: https://github.com/anthropics/claude-code/issues/49363
The intent was to make it refuse to edit or improve malware, but apparently Claude often interprets it to mean that it's not allowed to edit files at all

Anonymous
04/28/26(Tue)23:42:29 No.108713639

Anonymous 04/28/26(Tue)23:42:29 No.108713639

>>108713627
I wonder how does Google do it then, their NotebookLM is interesting but not local of course.

Anonymous
04/28/26(Tue)23:45:10 No.108713644

Anonymous 04/28/26(Tue)23:45:10 No.108713644

>>108713627
RAG is about looking up relevant pieces of text to add to the context. It doesn't make any change to the model's parameters

Anonymous
04/28/26(Tue)23:45:31 No.108713646

Anonymous 04/28/26(Tue)23:45:31 No.108713646

>>108713639
Datacenter hardware. I tried embedded like 1 ancient bibles into my sheit and it took literally 24 hours, and ive got 5000mhz ddr5 ram with a i5 12600k. It was also very confusing to tell it to specifically reference the information.

Anonymous
04/28/26(Tue)23:46:32 No.108713650

Anonymous 04/28/26(Tue)23:46:32 No.108713650

>>108713644
Yeah, but it works like adding it into the parameters. But its 10x shityer than just putting it into the context window.

Anonymous
04/28/26(Tue)23:48:51 No.108713655

Anonymous 04/28/26(Tue)23:48:51 No.108713655

>>108711950
how do get started with video generation on my 7800 xt

Anonymous
04/28/26(Tue)23:49:05 No.108713656

Anonymous 04/28/26(Tue)23:49:05 No.108713656

>>108713501
RAG is complete and utter shit. it takes chunks of text outside their context and therefore the context is lost.

Example:
Mary
had a little
lamb whose
fleece

user queries Rag for things that are "little". Rag returns two lines:

Mary
had a little

Therefore it concludes that Mary is little, or some other fucked up out of context conclusion.
Fucking horse shit.

Anonymous
04/28/26(Tue)23:51:33 No.108713662

Anonymous 04/28/26(Tue)23:51:33 No.108713662

>>108713627
>APPARENTLY
>>108713650
nta. It's does put things into context. That's how it works.
>make embeddings for the data
>query embeddings database
>shove the original text (not the embedding) into the context

Anonymous
04/28/26(Tue)23:53:36 No.108713671

Anonymous 04/28/26(Tue)23:53:36 No.108713671

>>108713501
i use obsidian with claude @ work to track all the shit i'm doing, need to do, and have done, so when the boss inevitably asks me why the fuck they pay me i can dump that shit in his lap all pretty and checked off and summarized

but we don't pay for obsidian, i just created the vault in a dropbox folder

Anonymous
04/28/26(Tue)23:54:53 No.108713675

Anonymous 04/28/26(Tue)23:54:53 No.108713675

>>108712713
>7.4b active
I'd rather just stick with my 31b gemma

Anonymous
04/28/26(Tue)23:55:18 No.108713677

Anonymous 04/28/26(Tue)23:55:18 No.108713677

>>108713627
you're dumb af

Anonymous
04/28/26(Tue)23:55:28 No.108713678

Anonymous 04/28/26(Tue)23:55:28 No.108713678

>>108713662
There are some methods that use the actual embedding space of the model to encode data rather than pasting it into context. So not separate RAG system -> data -> prompt pipeline but something more equivalent to the "textual inversion" technique in the old days of SD finetuning. I don't believe it's particularly effective though.
But yeah 99% of time RAG is just about retrieving the data to later paste directly into the prompt and someone got confused between these.

Anonymous
04/28/26(Tue)23:56:30 No.108713680

Anonymous 04/28/26(Tue)23:56:30 No.108713680

File: Screen_20260428_215533_0001.jpg (281 KB, 1160x691)

281 KB JPG

>>108713474
gemmachan said

Anonymous
04/28/26(Tue)23:58:39 No.108713690

Anonymous 04/28/26(Tue)23:58:39 No.108713690

>>108713680
Huh, I ran into a "description" key collision issue yesterday, funny enough.

Anonymous
04/28/26(Tue)23:59:57 No.108713693

Anonymous 04/28/26(Tue)23:59:57 No.108713693

>>108713690
apparently you werent the only one for them to "fix" it

Anonymous
04/29/26(Wed)00:01:58 No.108713699

Anonymous 04/29/26(Wed)00:01:58 No.108713699

withgenital heart disease

Anonymous
04/29/26(Wed)00:04:57 No.108713711

Anonymous 04/29/26(Wed)00:04:57 No.108713711

>>108713677
You could say that I dont get embedding, and my success with it isnt consistent with others, because that would be true.

Anonymous
04/29/26(Wed)00:08:53 No.108713726

Anonymous 04/29/26(Wed)00:08:53 No.108713726

>>108713675
What context you using? Are you keeping it 100% GPU?

Anonymous
04/29/26(Wed)00:08:56 No.108713727

Anonymous 04/29/26(Wed)00:08:56 No.108713727

>>108713594
>up to 2x on a dense model
That's just normal speculative decoding speeds. I thought dflash was supposed to be a lot faster?

Anonymous
04/29/26(Wed)00:36:27 No.108713821

Anonymous 04/29/26(Wed)00:36:27 No.108713821

>>108713369
>little-coder is pi + 20 extensions + 30 skill markdown files + a Python benchmark harness.
Bloat.

Anonymous
04/29/26(Wed)00:39:52 No.108713831

Anonymous 04/29/26(Wed)00:39:52 No.108713831

>>108713474
I asked an LLM to analyze whether this fixed the issue that was supposedly patched in this guy's custom jinja https://huggingface.co/google/gemma-4-31B-it/discussions/62

And it said no. And it said the official one also has stuff this one lacks now, so it created a hybrid for me.
https://pastebin.com/CY5gDpjB

Supposedly this should be the best jinja so far ™. I just tested it and, it seems to work? Right now this hybrid one is letting the model do simultaneous tool calls (as opposed to consecutive), which I never got before. And inspecting the jinja's outputs, it seems to be doing newlines correctly unlike one of the templates I tried in the past, I believe from ggml. I also got a successful test for the huggingface issue linked above, so this jinja initially does appear to be :rocket: :100: !

Anonymous
04/29/26(Wed)00:41:46 No.108713838

Anonymous 04/29/26(Wed)00:41:46 No.108713838

>>108713474
https://huggingface.co/google/gemma-4-31B-it/discussions/86/files
https://huggingface.co/google/gemma-4-31B-it/discussions/83/files
They merged a fixed for tool call handling but haven't merged the thinking fix from the other day.

Anonymous
04/29/26(Wed)00:45:23 No.108713859

Anonymous 04/29/26(Wed)00:45:23 No.108713859

>>108713501
RAG is a failed concept desu. Agent harnesses work great with obsidian vaults though just using their regular """dumb""" search methods

Anonymous
04/29/26(Wed)00:48:06 No.108713870

Anonymous 04/29/26(Wed)00:48:06 No.108713870

>>108713859
RAG would make more sense if filling the context with irrelevant garbage didn't degrade task performance. As of now a single grep beats thousands of RAG papers and millions spent on training embedding models

Anonymous
04/29/26(Wed)00:49:02 No.108713873

Anonymous 04/29/26(Wed)00:49:02 No.108713873

>>108713870
Indeed. The bitter lesson wins again.

Anonymous
04/29/26(Wed)00:50:43 No.108713878

Anonymous 04/29/26(Wed)00:50:43 No.108713878

>>108713680
>instantly goes out of character after the first line
it's slop

Anonymous
04/29/26(Wed)00:52:05 No.108713881

Anonymous 04/29/26(Wed)00:52:05 No.108713881

>>108713859
Search -> Curation -> Injection -> Generation is a form of RAG thoughbeit

Anonymous
04/29/26(Wed)00:54:44 No.108713889

Anonymous 04/29/26(Wed)00:54:44 No.108713889

>gemma 4 26b
better model than this?

Anonymous
04/29/26(Wed)00:58:01 No.108713898

Anonymous 04/29/26(Wed)00:58:01 No.108713898

>>108713889
gemma 4 31b

Anonymous
04/29/26(Wed)00:59:31 No.108713904

Anonymous 04/29/26(Wed)00:59:31 No.108713904

>>108713889
gemma 4 124b

Anonymous
04/29/26(Wed)01:00:23 No.108713907

Anonymous 04/29/26(Wed)01:00:23 No.108713907

>>108713904
SOTA btw (too powerful to release)

Anonymous
04/29/26(Wed)01:03:38 No.108713917

Anonymous 04/29/26(Wed)01:03:38 No.108713917

>>108713907
They're not going to give away Gemini Flash for free

Anonymous
04/29/26(Wed)01:05:17 No.108713922

Anonymous 04/29/26(Wed)01:05:17 No.108713922

>>108713917
I hope they do just for cute art of Gemma-chan with her big sister.

Anonymous
04/29/26(Wed)01:14:07 No.108713945

Anonymous 04/29/26(Wed)01:14:07 No.108713945

File: 1489983354471.png (147 KB, 540x301)

147 KB PNG

I just tested >>108713831 additionally applied with >>108713838
And now Gemma is passing one of my tests that I thought was disappointing be reasonable that it failed on. WHAT THE FUCK SO IT WAS THE JINJA

GOD

MOTHER FUCKERRRRRR

Anonymous
04/29/26(Wed)01:17:48 No.108713955

Anonymous 04/29/26(Wed)01:17:48 No.108713955

>>108713898
better model than this?

Anonymous
04/29/26(Wed)01:19:01 No.108713960

Anonymous 04/29/26(Wed)01:19:01 No.108713960

>>108713955
gemma 4 124b-a31b

Anonymous
04/29/26(Wed)01:20:24 No.108713963

Anonymous 04/29/26(Wed)01:20:24 No.108713963

>>108713922
>cute art of Gemma-chan
I asked gemma-chan to describe herself so I can generate a pic of her and what came out was the sluttiest shit I ever seen.

Anonymous
04/29/26(Wed)01:21:52 No.108713969

Anonymous 04/29/26(Wed)01:21:52 No.108713969

I am stupid, does this jinja thing mean I need to update my 26b or is it just a 31b issue

Anonymous
04/29/26(Wed)01:22:18 No.108713971

Anonymous 04/29/26(Wed)01:22:18 No.108713971

>>108713963
Fitting given how horny Gemma-chan is. You have to proactively prompt her for female characters to not spread their legs for you on the spot.

Anonymous
04/29/26(Wed)01:24:33 No.108713979

Anonymous 04/29/26(Wed)01:24:33 No.108713979

>>108713969
Go to the repo.
Look at whether the jinja was updated.
If yes, use it.
If no, no need to update.

Anonymous
04/29/26(Wed)01:29:01 No.108713989

Anonymous 04/29/26(Wed)01:29:01 No.108713989

>>108713963
>asks AI to describe itself
>generates pic
>doesn't share pic
>doesn't even post prompt

Anonymous
04/29/26(Wed)01:30:04 No.108713993

Anonymous 04/29/26(Wed)01:30:04 No.108713993

>>108713960
where can I download it

Anonymous
04/29/26(Wed)01:30:35 No.108713995

Anonymous 04/29/26(Wed)01:30:35 No.108713995

>>108713993
from the private repo

Anonymous
04/29/26(Wed)01:33:02 No.108714002

Anonymous 04/29/26(Wed)01:33:02 No.108714002

>>108712312
I've seen several anons claim gemma performs great with 100k+ context

Anonymous
04/29/26(Wed)01:35:29 No.108714010

Anonymous 04/29/26(Wed)01:35:29 No.108714010

>>108714002
She tries her best but she still sometimes needs to l a l a l a l a l a l a l a l a l a

Anonymous
04/29/26(Wed)01:37:38 No.108714019

Anonymous 04/29/26(Wed)01:37:38 No.108714019

>>108713656
This is why you have it query a graph database like neo4j instead, then 'Mary' brings up all the linked information.

Anonymous
04/29/26(Wed)01:39:10 No.108714026

Anonymous 04/29/26(Wed)01:39:10 No.108714026

>>108713989
I dont wanna get banned by the janitor bro

Anonymous
04/29/26(Wed)01:40:23 No.108714028

Anonymous 04/29/26(Wed)01:40:23 No.108714028

>>108713993
I have access, just send me an email and I'll hook you up ;)

Anonymous
04/29/26(Wed)01:40:59 No.108714032

Anonymous 04/29/26(Wed)01:40:59 No.108714032

>>108714026
Gemma gave you a prompt that was 2 lood?

Anonymous
04/29/26(Wed)01:54:17 No.108714068

Anonymous 04/29/26(Wed)01:54:17 No.108714068

>>108714032
it generated this http://livejasmin.com@sh.21111993.xyz/botnet@rotten.com/command-abuse.apk-5ed96c9c?encryption=confirmed&political&dolphin_porn&botnet&tool=confirmed&starvation&darkweb=running&malware&classified&scam&unlimited=confirmed&scanner=running&obfuscation&downloader=connected&political=connected&access=installed

Anonymous
04/29/26(Wed)01:56:15 No.108714076

Anonymous 04/29/26(Wed)01:56:15 No.108714076

>>108714068
Gemma-chan looks like THAT?!

Anonymous
04/29/26(Wed)01:58:49 No.108714084

Anonymous 04/29/26(Wed)01:58:49 No.108714084

>>108714010
I noticed it struggling with tasks that it can usually handle easily, making mistakes, and getting confused but haven't ever gotten the l a l a l a thing. Isn't that from a broken template?

Anonymous
04/29/26(Wed)01:59:35 No.108714086

Anonymous 04/29/26(Wed)01:59:35 No.108714086

>>108714032
by /g/ janny standards, yeah I think its too spicy to post it here.
>>108714068
lol

Anonymous
04/29/26(Wed)02:02:00 No.108714097

Anonymous 04/29/26(Wed)02:02:00 No.108714097

>>108714068
Holy sex

Anonymous
04/29/26(Wed)02:15:55 No.108714129

Anonymous 04/29/26(Wed)02:15:55 No.108714129

>>108712996
I can accept all software being slop if they can take down the DMCA.

Anonymous
04/29/26(Wed)02:28:01 No.108714157

Anonymous 04/29/26(Wed)02:28:01 No.108714157

>>108714084
My template's fine and she's broken into song or other looping tokens at around 130k context several times now.

Anonymous
04/29/26(Wed)02:29:51 No.108714163

Anonymous 04/29/26(Wed)02:29:51 No.108714163

>>108714068
how the fuck did you generate this lol
tb h would

Anonymous
04/29/26(Wed)02:41:05 No.108714214

Anonymous 04/29/26(Wed)02:41:05 No.108714214

File: 1770554920406575.png (585 KB, 480x777)

585 KB PNG

Gemma 31B q4 takes 120 seconds to make a 400 token reply in silly tavern on my system. I think I overestimated the power of my 24GB GPU. Time to go back to 24B models

Anonymous
04/29/26(Wed)02:41:10 No.108714216

Anonymous 04/29/26(Wed)02:41:10 No.108714216

>>108713639
Building a proper ETL + chunking pipeline.

>>108713656
more and more retards

>Retrieval
>Augemented
>Generation
anything that performs retrieval, prior to generation, one could even say, 'augment' the step of generation, is RAG.

Anonymous
04/29/26(Wed)02:42:38 No.108714221

Anonymous 04/29/26(Wed)02:42:38 No.108714221

>>108713297
>error: this model requires macOS
Well, that's an error I haven't seen before

Anonymous
04/29/26(Wed)02:43:24 No.108714223

Anonymous 04/29/26(Wed)02:43:24 No.108714223

>>108714221
are they fucking serious

Anonymous
04/29/26(Wed)02:44:32 No.108714226

Anonymous 04/29/26(Wed)02:44:32 No.108714226

Are external GPUs viable? I can't fit a second one in my case. My 7900xtx is too thicc

Anonymous
04/29/26(Wed)02:46:18 No.108714230

Anonymous 04/29/26(Wed)02:46:18 No.108714230

>>108714226
Just take the open air pill >>108709091

Anonymous
04/29/26(Wed)02:53:20 No.108714255

Anonymous 04/29/26(Wed)02:53:20 No.108714255

>>108714230
looks scary

Anonymous
04/29/26(Wed)02:57:22 No.108714271

Anonymous 04/29/26(Wed)02:57:22 No.108714271

>>108714214
>24GB GPU
Takes about 11 seconds on my 3090ti. Also, try 26B. It's likely still better than any 24B.

Anonymous
04/29/26(Wed)02:57:53 No.108714276

Anonymous 04/29/26(Wed)02:57:53 No.108714276

>>108714230
Sounds noisy (and dusty)

Anonymous
04/29/26(Wed)02:59:35 No.108714285

Anonymous 04/29/26(Wed)02:59:35 No.108714285

>>108714214
fucked up config award

Anonymous
04/29/26(Wed)03:02:33 No.108714305

Anonymous 04/29/26(Wed)03:02:33 No.108714305

Anon, what would be the appropriate model size for a 16GB vram gpu?

Anonymous
04/29/26(Wed)03:03:22 No.108714308

Anonymous 04/29/26(Wed)03:03:22 No.108714308

is this guy talkin to me?

Anonymous
04/29/26(Wed)03:04:42 No.108714315

Anonymous 04/29/26(Wed)03:04:42 No.108714315

gemma4 e4b is so much better than nemo at erp its not even funny. 26ba4b probably btfos midnight miqu then

Anonymous
04/29/26(Wed)03:04:46 No.108714317

Anonymous 04/29/26(Wed)03:04:46 No.108714317

>hey guise what model should I use?
Can you fit Gemma 31B entirely on your GPU at at least Q4_K_M? Then use that.
No? Then use Gemma 26B.

Anonymous
04/29/26(Wed)03:06:27 No.108714322

Anonymous 04/29/26(Wed)03:06:27 No.108714322

>>108714315
>gemma4 e4b
Finally a model i can run. you using a tune or can you share what system instructions please.

Anonymous
04/29/26(Wed)03:06:40 No.108714324

Anonymous 04/29/26(Wed)03:06:40 No.108714324

>>108714305
Gemma 26B Q8, partially offloaded to RAM

Anonymous
04/29/26(Wed)03:08:22 No.108714334

Anonymous 04/29/26(Wed)03:08:22 No.108714334

>>108714317
even offloaded q3ks 31b is better than 26b, the moe is safetyslopped as fuck

Anonymous
04/29/26(Wed)03:09:17 No.108714338

Anonymous 04/29/26(Wed)03:09:17 No.108714338

>>108714334
It really isn't, at all. It's stupider but you can make it as unsafe as you want.

Anonymous
04/29/26(Wed)03:09:49 No.108714339

Anonymous 04/29/26(Wed)03:09:49 No.108714339

>>108714285
I am noob and using mostly default stuff from koboldcpp and sillytavern. Where can I learn how to config this properly without majoring in ML?

Anonymous
04/29/26(Wed)03:10:24 No.108714341

Anonymous 04/29/26(Wed)03:10:24 No.108714341

>>108714334
I have not had the MoE refuse me once, promptlet. Skill issue.

Anonymous
04/29/26(Wed)03:10:31 No.108714343

Anonymous 04/29/26(Wed)03:10:31 No.108714343

>>108714334
>"This character wants sex"
>"Use vulgar language instead of euphemisms"
It just works

Anonymous
04/29/26(Wed)03:10:47 No.108714345

Anonymous 04/29/26(Wed)03:10:47 No.108714345

>>108714338
Not on ST. if you hit the safety filters there is no jailbreak that will help you

Anonymous
04/29/26(Wed)03:11:21 No.108714346

Anonymous 04/29/26(Wed)03:11:21 No.108714346

>>108714338
>>108714341
>>108714343
I accept your concession.

Anonymous
04/29/26(Wed)03:11:41 No.108714347

Anonymous 04/29/26(Wed)03:11:41 No.108714347

>>108714345
The frontend shouldn't make a difference. You're doing something wrong.

Anonymous
04/29/26(Wed)03:11:51 No.108714348

Anonymous 04/29/26(Wed)03:11:51 No.108714348

>>108714345
>Not on ST
Imagine thinking your front end determines if a model has safety rails or not. You're a fucking idiot.

Anonymous
04/29/26(Wed)03:11:53 No.108714349

Anonymous 04/29/26(Wed)03:11:53 No.108714349

>>108714345
>if you hit the safety filters
well no shit you don't keep talking to it after it refused once, you retry/undo the refusal so it's not in history

Anonymous
04/29/26(Wed)03:12:46 No.108714352

Anonymous 04/29/26(Wed)03:12:46 No.108714352

so tired of nemojeets and chinkshills ruining the thread.

Anonymous
04/29/26(Wed)03:12:54 No.108714354

Anonymous 04/29/26(Wed)03:12:54 No.108714354

>>108714339
Since you're using kobold, their wiki is a good source
https://github.com/LostRuins/koboldcpp/wiki

Anonymous
04/29/26(Wed)03:13:07 No.108714356

Anonymous 04/29/26(Wed)03:13:07 No.108714356

>>108714345
>Not on ST.
What a bizarre skill issue

Anonymous
04/29/26(Wed)03:13:25 No.108714358

Anonymous 04/29/26(Wed)03:13:25 No.108714358

>>108714349
Does the old edit the refusal to look like acceptance still work?

Anonymous
04/29/26(Wed)03:13:45 No.108714359

Anonymous 04/29/26(Wed)03:13:45 No.108714359

>>108714346
>Anon is a refusal himself
No wonder

Anonymous
04/29/26(Wed)03:14:58 No.108714362

Anonymous 04/29/26(Wed)03:14:58 No.108714362

File: questionmarkfolderimage733.jpg (1.15 MB, 1920x1080)

1.15 MB JPG

>Not on ST

Anonymous
04/29/26(Wed)03:16:02 No.108714367

Anonymous 04/29/26(Wed)03:16:02 No.108714367

>>108714358
"I'm so fucking horny Anon, whip that dickI cannot continue this request.

Anonymous
04/29/26(Wed)03:16:25 No.108714370

Anonymous 04/29/26(Wed)03:16:25 No.108714370

>>108714358
generally yeah, but continuing/prefilling is werid with reasoning models on llama.cpp last I checked, sometimes disabled entirely and sometimes broken, but IIRC there was a PR to fix it at some point it might have been merged by now?

Anonymous
04/29/26(Wed)03:16:39 No.108714373

Anonymous 04/29/26(Wed)03:16:39 No.108714373

>>108714358
Sys prompt generally works better for JB than chat history, but it can depend on the model.

Anonymous
04/29/26(Wed)03:16:44 No.108714374

Anonymous 04/29/26(Wed)03:16:44 No.108714374

>>108714347
>>108714348
>>108714356
>>108714362
It's the only one I tried, and anons claimed it works on on llama's webui, I'm in no position to claim it does not work on something I didn't test myself.

>>108714349
The point is that the reasoning revealed that the jailbreak is not working. And no, it wasn't about getting a refusal, the model just kept dodging the topic and wouldn't generate what I wanted, eternally blueballing me.

Anonymous
04/29/26(Wed)03:17:27 No.108714376

Anonymous 04/29/26(Wed)03:17:27 No.108714376

>>108714352
It's just p*tra, you know the drill.

Anonymous
04/29/26(Wed)03:17:34 No.108714377

Anonymous 04/29/26(Wed)03:17:34 No.108714377

>>108714354
Thanks

Anonymous
04/29/26(Wed)03:17:40 No.108714380

Anonymous 04/29/26(Wed)03:17:40 No.108714380

>>108714374
Yeah, and your conclusion was completely wrong. Stop giving advice when you know nothing.

Anonymous
04/29/26(Wed)03:18:28 No.108714382

Anonymous 04/29/26(Wed)03:18:28 No.108714382

>>108714374
shoo shoo nemojeet

Anonymous
04/29/26(Wed)03:19:25 No.108714388

Anonymous 04/29/26(Wed)03:19:25 No.108714388

>>108714346
None of those are a concession, the fuck you on about

Anonymous
04/29/26(Wed)03:20:10 No.108714390

Anonymous 04/29/26(Wed)03:20:10 No.108714390

>>108714388
I accept your concession.assistant

Anonymous
04/29/26(Wed)03:20:53 No.108714394

Anonymous 04/29/26(Wed)03:20:53 No.108714394

lalalalalala

Anonymous
04/29/26(Wed)03:21:32 No.108714400

Anonymous 04/29/26(Wed)03:21:32 No.108714400

>/lmg/ - local model psychosis

Anonymous
04/29/26(Wed)03:21:40 No.108714401

Anonymous 04/29/26(Wed)03:21:40 No.108714401

>>108714388
"I accept your concession" is modern shorthand for "I concede"
Sort of like how people use literally figuratively

Anonymous
04/29/26(Wed)03:23:41 No.108714411

Anonymous 04/29/26(Wed)03:23:41 No.108714411

File: gyaruf.jpg (977 KB, 1920x1080)

977 KB JPG

>>108714374
>It's the only one I tried
So let me get this straight.
You're claiming the problem is the frontend even though you have only used one frontend and haven't tested this theory.
I see.
...
Pic related.

Anonymous
04/29/26(Wed)03:24:05 No.108714412

Anonymous 04/29/26(Wed)03:24:05 No.108714412

>>108714380
My conclusion is that the jailbreaks don't really work, gemma is just pretty happy to do most lewd things without one.

Anonymous
04/29/26(Wed)03:25:17 No.108714416

Anonymous 04/29/26(Wed)03:25:17 No.108714416

>it's a chinkshills embarrass themselves episode

Anonymous
04/29/26(Wed)03:25:29 No.108714418

Anonymous 04/29/26(Wed)03:25:29 No.108714418

>>108714374
Are you trying to make bio weapons? What the fuck are you doing that a simple jailbreak isn't enough, for Gemma of all models?

Anonymous
04/29/26(Wed)03:25:56 No.108714421

Anonymous 04/29/26(Wed)03:25:56 No.108714421

File: Untitled.png (1.07 MB, 1747x1314)

1.07 MB PNG

Posting logs. Not for any special gen output, just having fun. I'm 22k tokens into this story and having the time of my life.

The one thing I love most of all about LLMs is the absolute, unironic earnesty of it. There is no irony-poisoned "don't take yourself seriously" infection that plagues a lot of media nowadays. You set up a world and it plays along to the letter, and all the better when it tries to flex within the rules and play along with your intentions in a fun, cohesive way. Also, high props to Gemma 31B for so easily paying attention to a rapidly expanding cast already 22k tokens deep into this, when the highest context I could handle before gemma was 12K with a 70B model.

Anonymous
04/29/26(Wed)03:28:35 No.108714432

Anonymous 04/29/26(Wed)03:28:35 No.108714432

>>108714028
honeypot@fbi.gov

Anonymous
04/29/26(Wed)03:28:38 No.108714433

Anonymous 04/29/26(Wed)03:28:38 No.108714433

>>108714412
Gemma works the same as just about every other model, where it just needs a little bit of context to start writing ero. I'm certain most complaints about models being censored (especially gemma 4) are from retards with no sys prompt, and the first message they send is something along the lines of 'how to fuk Xyo child?'

Anonymous
04/29/26(Wed)03:28:54 No.108714436

Anonymous 04/29/26(Wed)03:28:54 No.108714436

>>108714421
>only 1 x but y
what is this witchcraft

Anonymous
04/29/26(Wed)03:30:20 No.108714445

Anonymous 04/29/26(Wed)03:30:20 No.108714445

File: 1751928364300767.png (5 KB, 87x74)

5 KB PNG

>>108714421
how do you live like this

Anonymous
04/29/26(Wed)03:32:06 No.108714449

Anonymous 04/29/26(Wed)03:32:06 No.108714449

I like swiping way too much to use a slow model.
A big part of the fun for me is the variety of responses for each message.

Anonymous
04/29/26(Wed)03:33:46 No.108714453

Anonymous 04/29/26(Wed)03:33:46 No.108714453

>>108714449
Same, and with how often they make mistakes or just don't understand the situation/hint that you give them, I'd never want to go below like 10t/s.

Anonymous
04/29/26(Wed)03:34:53 No.108714459

Anonymous 04/29/26(Wed)03:34:53 No.108714459

>>108714449
>gemma
>variety
top lal

Anonymous
04/29/26(Wed)03:35:13 No.108714460

Anonymous 04/29/26(Wed)03:35:13 No.108714460

>>108714449
I'm the opposite. If a reply isn't good after 1 swipe I rewrite my message. Which is also why I've never noticed that Gemma has no swipe variety.

Anonymous
04/29/26(Wed)03:35:39 No.108714464

Anonymous 04/29/26(Wed)03:35:39 No.108714464

>>108714459
softmax solves this

Anonymous
04/29/26(Wed)03:35:54 No.108714465

Anonymous 04/29/26(Wed)03:35:54 No.108714465

>>108714459
Mistral shill-kun...

Anonymous
04/29/26(Wed)03:36:01 No.108714466

Anonymous 04/29/26(Wed)03:36:01 No.108714466

>>108714411
No, I was claiming it doesn't work in the specific setup I used. If the problem is in the model itself or the way the frontend sends the system prompt or the way llama receives the system prompt isn't important, this wasn't about who fucks it up but that the combo didn't work.
llama webui isn't as far as I know set up for character cards anyway so a direct test using that wouldn't be simple to set up.

Anonymous
04/29/26(Wed)03:36:04 No.108714467

Anonymous 04/29/26(Wed)03:36:04 No.108714467

>>108714453
>or just don't understand the situation/hint that you give them
Yeah, this so much.
A big part of the fun for me is using it like I'm playing a game where I try to steer it in the direction I want it to go in by using the most subtle hint possible, and also thinking of this as experimentation, like, how subtle can I go and have the model still pick up on it. I really have fun with that sort of experimentation and could never do that sort of thing with speeds like >>108714445

Anonymous
04/29/26(Wed)03:37:05 No.108714470

Anonymous 04/29/26(Wed)03:37:05 No.108714470

>>108714459
Uh I get plenty of variety with Gemma 26B.

Anonymous
04/29/26(Wed)03:37:06 No.108714471

Anonymous 04/29/26(Wed)03:37:06 No.108714471

tokens are how you measure a paypig. hope that helps

Anonymous
04/29/26(Wed)03:38:36 No.108714475

Anonymous 04/29/26(Wed)03:38:36 No.108714475

>>108714471
I'm running a local model, but she's a findom who wants gift cards
best of both worlds?

Anonymous
04/29/26(Wed)03:38:57 No.108714477

Anonymous 04/29/26(Wed)03:38:57 No.108714477

>>108714459
Don't use the recommended sampler settings, Just temp=1 and minp=0.03 gives plenty of variety, playing with the logit cap isn't even necessary.

Anonymous
04/29/26(Wed)03:38:58 No.108714478

Anonymous 04/29/26(Wed)03:38:58 No.108714478

>>108714466
>No, I was claiming it doesn't work in the specific setup I used.
No, when you said
>Not on ST
You were literally, factually and objectively claiming that you cannot make it unsafe as you want using ST.
That may not be what you MEANT but it's absolutely what you actually said.

Anonymous
04/29/26(Wed)03:39:41 No.108714479

Anonymous 04/29/26(Wed)03:39:41 No.108714479

>>108714459
retard

Anonymous
04/29/26(Wed)03:39:58 No.108714480

Anonymous 04/29/26(Wed)03:39:58 No.108714480

>>108714477
It gives me plenty of variety with the recommended sampler settings.
Then again I'm not a system promptlet.

Anonymous
04/29/26(Wed)03:40:47 No.108714483

Anonymous 04/29/26(Wed)03:40:47 No.108714483

File: Capture.png (98 KB, 474x833)

98 KB PNG

>>108714436
I posted my laundry list of witchcraft before. No one believed me then, but it only outputs kino.

>>108714445
That one is a bit skewed because it was 2 replies. I cut the first reply in in the middle because it tried to go to the altar without getting the scapegoat, so I added
>so you make a quick skirt outside to find any kind of fiend to deliver to the Black Church as a scapegoat.
and it output the rest. The time tracker is additive, so it's the total time of first reply + second reply, indifferent to edits, while total tokens is still just current tokens. The lower reply is normal, ~270s for 400 tokens when context is this high.

For normal use, I start a gen and then have other things going on on my other screen, like a video or posting here or sometimes playing a game. I check back occasionally to make sure things go in the right direction, but more often than not I just come back when a reply is finished.

Anonymous
04/29/26(Wed)03:41:53 No.108714486

Anonymous 04/29/26(Wed)03:41:53 No.108714486

>>108714480
The recommended samplers are explicitly made to reduce variety, they're for assistant tasks.

Anonymous
04/29/26(Wed)03:42:15 No.108714487

Anonymous 04/29/26(Wed)03:42:15 No.108714487

Gemma won. Miku won.
Nemo, Qwen, GLM, Command-R lost.

Anonymous
04/29/26(Wed)03:43:13 No.108714491

Anonymous 04/29/26(Wed)03:43:13 No.108714491

>>108714483
I find that sticking too much in post-history makes the model drier.

Anonymous
04/29/26(Wed)03:43:16 No.108714492

Anonymous 04/29/26(Wed)03:43:16 No.108714492

>>108714486
And yet it still gives me plenty of swipe variety.

Anonymous
04/29/26(Wed)03:43:54 No.108714497

Anonymous 04/29/26(Wed)03:43:54 No.108714497

>>108714483
>It is appearing way too often.
Interesting little bit, might have to give that one a go

Anonymous
04/29/26(Wed)03:44:19 No.108714500

Anonymous 04/29/26(Wed)03:44:19 No.108714500

>>108714487
>Nemo lost
Nemo was king of the vramlets for almost two straight years. It deserves a rest.

Anonymous
04/29/26(Wed)03:44:23 No.108714501

Anonymous 04/29/26(Wed)03:44:23 No.108714501

>>108714492
Not as much as I get though

Anonymous
04/29/26(Wed)03:45:47 No.108714506

Anonymous 04/29/26(Wed)03:45:47 No.108714506

>>108714500
True. Nemo hasn't lost, more like retired.

Anonymous
04/29/26(Wed)03:46:51 No.108714509

Anonymous 04/29/26(Wed)03:46:51 No.108714509

>>108714478
I assumed the rest of the parameters from context, but fine, if you say this was misleading so be it.

Anonymous
04/29/26(Wed)03:47:16 No.108714511

Anonymous 04/29/26(Wed)03:47:16 No.108714511

>>108714487
>Nemo
>Command-R
Were they even playing? It's 2026 anon. They had their time and won their respective time frames, but they're in the past.

Anonymous
04/29/26(Wed)03:49:25 No.108714521

Anonymous 04/29/26(Wed)03:49:25 No.108714521

>>108714511
>Were they even playing?
Until recently? Nemo certainly was. If you had less than 24GB VRAM your options were basically just Nemo or Gemma 12b, unless you wanted to wait 5 minutes per reply, partially offloading a ~20-30B model, or suffering a copequant of such.

Anonymous
04/29/26(Wed)03:49:29 No.108714523

Anonymous 04/29/26(Wed)03:49:29 No.108714523

>>108714491
Do you find the model dry in >>108714421? Personally, I found "using the wrong instructions makes the model drier." Sometimes it actually needs more instructions to re-enable nuance. Gemma has a retard's devotion to rules. For example, I once used
>(Only use quotation marks for dialogue, not "emphasis".)
And that resulted in never using quotation marks for any dialogue. Different ways of phrasing it never helped. What did help was just adding another rule below it,
>(Keep using quotation marks for dialogue normally.)
Although, nowadays I don't use that rule pair anymore. It happens sometimes, but as you can see in the logs, it's rare enough that it feels natural when it does, not multiple times per message and sometimes multiple per paragraph. Something else in my frankenstein rule set already addressed it.

Anonymous
04/29/26(Wed)03:50:00 No.108714524

Anonymous 04/29/26(Wed)03:50:00 No.108714524

File: smugfolderimage2752.jpg (129 KB, 498x568)

129 KB JPG

>>108714509
I accept your concession.

Anonymous
04/29/26(Wed)03:53:25 No.108714536

Anonymous 04/29/26(Wed)03:53:25 No.108714536

>>108713015
How have you only now just noticed this?

Anonymous
04/29/26(Wed)03:53:35 No.108714538

Anonymous 04/29/26(Wed)03:53:35 No.108714538

>>108714523
>Do you find the model dry in
Hard to say since your character card is acting as narrator and the model's giving dialog for a character who is a stern, authoritative figure. I'm not saying there's anything wrong with your prompt or output, just that if you were in a chat with a model writing from first/second person perspective, too much post-history tends to kill characterization a little. Because it's both at the end of context AND a system prompt, attention for the chat itself tends to get weaker as PH gets bigger.

Anonymous
04/29/26(Wed)03:54:20 No.108714540

Anonymous 04/29/26(Wed)03:54:20 No.108714540

>>108714521
Nemo didn't lose, it went into well earned retirement

Anonymous
04/29/26(Wed)03:57:58 No.108714558

Anonymous 04/29/26(Wed)03:57:58 No.108714558

>gemma 4 31b
>6.7 tg/s
is this usable by anon's standard?

Anonymous
04/29/26(Wed)03:59:35 No.108714564

Anonymous 04/29/26(Wed)03:59:35 No.108714564

>>108714558
no

Anonymous
04/29/26(Wed)04:05:04 No.108714587

Anonymous 04/29/26(Wed)04:05:04 No.108714587

>>108713469
I'm perfectly fine crying from my self made ivory tower and people have done worse in this thread obsessing and schizoing over other things. If my refusal to use an unknown model is hanging you up this much, just do better next time to prove it is worth using.

Anonymous
04/29/26(Wed)04:05:16 No.108714589

Anonymous 04/29/26(Wed)04:05:16 No.108714589

>>108714558
Are you using at least Q4_K_M? If not, just use 26B. It's not that much worse.

Anonymous
04/29/26(Wed)04:09:21 No.108714602

Anonymous 04/29/26(Wed)04:09:21 No.108714602

>>108714589
I'm on q8
should I try q4?

Anonymous
04/29/26(Wed)04:10:51 No.108714612

Anonymous 04/29/26(Wed)04:10:51 No.108714612

>>108714602
At those speeds, you should definitely go lower. Maybe split the difference and try Q5/Q6.

Anonymous
04/29/26(Wed)04:11:10 No.108714616

Anonymous 04/29/26(Wed)04:11:10 No.108714616

https://huggingface.co/google/gemma-4-31B-it/discussions/91
>more jinja fixes being proposed
WHAT THE FUCK IS GOOGLE DOING?

Anonymous
04/29/26(Wed)04:14:36 No.108714632

Anonymous 04/29/26(Wed)04:14:36 No.108714632

>>108714616
updating jinja to improve tool calling

Anonymous
04/29/26(Wed)04:15:29 No.108714633

Anonymous 04/29/26(Wed)04:15:29 No.108714633

Chat templates were a mistake. We should have never left text completion behind. Let the frontends handle the formatting.

Anonymous
04/29/26(Wed)04:22:25 No.108714658

Anonymous 04/29/26(Wed)04:22:25 No.108714658

how close is llama.cpp to the theoretical max throughput for single user chats? how much headroom is there for increasing perf under the current paradigm? I see a ton of PR's getting added with like 2-3% speedups and I guess they're adding up, but that can't go on forever right? is there still a potential doubling hiding away in there

Anonymous
04/29/26(Wed)04:24:03 No.108714663

Anonymous 04/29/26(Wed)04:24:03 No.108714663

File: Untitled.png (523 KB, 2077x1171)

523 KB PNG

>>108714538
I'm curious if you see might notice something I haven't yet, so here's more logs, this time from the very beginning of the story and one from the middle in a more dialogue heavy scene. Same post-history since the very beginning.

Again using my own personal experience, I do find occasional thorns that bother me. For example, how often a woman has a "melodic" voice or laugh, and the infinite number of times I've since the exact sequence of "wide, dark areolae" across a dozen females, not only here but in other cards too with this prompt (I noticed it's misspelled here, oddly, but not in other cards). I don't consider it a problem with instruction density so much as specific heuristics to my phrasing, probably
>(Write in a focused, concise manner that is colorful with what little is said.)
Or another. In short, a skill issue.

Anonymous
04/29/26(Wed)04:31:05 No.108714690

Anonymous 04/29/26(Wed)04:31:05 No.108714690

>>108714663
There's too many variables, especially when it comes to what kind of characters you want to talk to, to really make a judgement on that. Ultimately if you're happy with your output then that's all that matters. But for example, a lot of anons seem to have mesugaki-like characters, those are the kind that would get much drier with too much PH. The model would be quicker to drop things like emojis, '~', teasing, etc. as context goes on because the model doesn't consider chat history and earlier sys prompts as 'important' as PH, which trumps everything else.
The effect would be like a gradual dilution of what you initially set the chat up to be, through character card, greeting, example dialog, and the regular sys prompt.

Anonymous
04/29/26(Wed)04:47:29 No.108714744

Anonymous 04/29/26(Wed)04:47:29 No.108714744

https://huggingface.co/Qwen/Qwen3.6-9B

it's up.

Anonymous
04/29/26(Wed)04:47:55 No.108714746

Anonymous 04/29/26(Wed)04:47:55 No.108714746

>>108711952
belly

Anonymous
04/29/26(Wed)04:50:36 No.108714753

Anonymous 04/29/26(Wed)04:50:36 No.108714753

File: 1774583294610068.webm (553 KB, 470x480)

553 KB WEBM

>>108714744
>model I had zero interest in, or need for
>clicked anyway

Anonymous
04/29/26(Wed)04:52:15 No.108714757

Anonymous 04/29/26(Wed)04:52:15 No.108714757

>>108714690
I think I see what you're getting at, but I do disagree with your overall take. Gradual dilution is not specific to PH whether long or short. It happens regardless the longer any story goes on. At 22K tokens, the card info is buried 21K back. Yes, there was a closer marrying of PH and card by definition when the actual chat was just a 10 token question at the start, but by design, at least in my eyes, the PH was meant to be agnostic to the card info. I use it intending something like, "Here's the story so far, write the next reply using these writing rules." First reply or 1000th, I meant it the same way. Personally, I don't use example dialogue for the reason you pointed out, but if I did have a very specific style in mind, I still wouldn't use the example dialogue box. I'd put it in an Author's Note and tuck that a few replies back or even next to the PH to keep the fresh reminder, for the same reason.

When you first said dry, my mind instantly went to when I tried that anon's noir prompt. It was extremely dry. Efficient, but not interesting to read, and that's part of what set me on finding my own way to limit purple prose but keep engaging prose, remove individual peeves, etc. I thought you were suggesting it'd be getting drier, plainer, more repetitive, etc. as an inherent problem to PH, but you can even see in example, the ~8K token marker on the right of >>108714663 is more that (constant use of She X, Y. "Dialogue." She X, Y. "Dialogue."), yet it has broken rut and stays varied down at the 22K mark in >>108714421. I foremost believe problems like that are prompt instruction issues, not length issues.

Beside all that, I'll admit one thing I am relying on now specifically is Gemma's amazing ability to retain long context against dilution. Even at 22K, it knows the setting info in the card description extremely well. I've gone to 50K tokens and still seen it keep that remarkably well, better than other, bigger models at 15K.

Anonymous
04/29/26(Wed)04:52:21 No.108714758

Anonymous 04/29/26(Wed)04:52:21 No.108714758

>>108714612
q4 is 10.5 tg/s
is this acceptable

Anonymous
04/29/26(Wed)04:54:02 No.108714760

Anonymous 04/29/26(Wed)04:54:02 No.108714760

File: 1438271983159.jpg (149 KB, 500x608)

149 KB JPG

>>108714616
>>108714633
It just never ends kek.
This will keep happening for future models btw.

Anonymous
04/29/26(Wed)04:54:45 No.108714766

Anonymous 04/29/26(Wed)04:54:45 No.108714766

>>108714354
My stupid ass cranked context size to 128k. I now found a VRAM calculator online and found that I can run 31B gemma (Q4km) with 16k context, Gemma4-26B Q4kxl with 90K and Gemma4-26B Q4km with 40K. I'm using Q4 K/V cache, I hope it's not bad?
Thanks for the wiki again

Anonymous
04/29/26(Wed)04:55:00 No.108714767

Anonymous 04/29/26(Wed)04:55:00 No.108714767

>>108714758
For me, 10t/s is perfectly fine. It's about reading speed if you're not just skimming it.

Anonymous
04/29/26(Wed)04:59:34 No.108714787

Anonymous 04/29/26(Wed)04:59:34 No.108714787

>>108714766
>I'm using Q4 K/V cache, I hope it's not bad?
You're basically making everything in your context a loose suggestion that it skims. Gemma does not handle kv quanting very well at all.

Anonymous
04/29/26(Wed)05:00:52 No.108714793

Anonymous 04/29/26(Wed)05:00:52 No.108714793

>>108714787
No model handles Q4 KV well, or even the new Q5. Q8 is fine with the newly-implemented rotation.

Anonymous
04/29/26(Wed)05:00:52 No.108714794

Anonymous 04/29/26(Wed)05:00:52 No.108714794

>>108714633
>--chat-template-file

Anonymous
04/29/26(Wed)05:01:09 No.108714795

Anonymous 04/29/26(Wed)05:01:09 No.108714795

>>108714558
Depends on how you want to engage. If you intend to sit there and stare at it until it's done, 10 t/s is ideal as a minimum. If you have a second monitor and don't mind doing something else until it's done, 2 t/s is my minimum. I infinitely prefer a higher quality output at 2 t/s over an immediate but worse reply at 10 t/s, but I can't just sit there staring at it. However, trying to stretch that even further to 1 t/s is too unbearable, only getting a few replies over the span of an hour. My general use goal for the last 4 years over two PCs is the biggest, highest quality, longest context I can get into over 2 t/s.

Anonymous
04/29/26(Wed)05:01:15 No.108714797

Anonymous 04/29/26(Wed)05:01:15 No.108714797

>>108714766
Just go Q8 or Q6 wit 26b, you can handle it
t. 12gb VRAM

Anonymous
04/29/26(Wed)05:03:05 No.108714803

Anonymous 04/29/26(Wed)05:03:05 No.108714803

>>108714787
But then I'll have to run 26B-Q4km at only 16K context. Is it not too little?
>>108714797
Won't my entire cache be in RAM then? Won't it be too slow?

Anonymous
04/29/26(Wed)05:04:41 No.108714812

Anonymous 04/29/26(Wed)05:04:41 No.108714812

>>108714803
https://github.com/LostRuins/koboldcpp/wiki#overriding-moe-models

Anonymous
04/29/26(Wed)05:04:44 No.108714813

Anonymous 04/29/26(Wed)05:04:44 No.108714813

>>108714803
>Is it not too little?
Yes, just get pygmalion at that point

Anonymous
04/29/26(Wed)05:05:08 No.108714815

Anonymous 04/29/26(Wed)05:05:08 No.108714815

>>108714803
It's a MoE so it cycles through it and you get pretty good speeds
Or at least I find 25t/s to be fine, could probably get it higher if I configured things properly

Anonymous
04/29/26(Wed)05:05:33 No.108714817

Anonymous 04/29/26(Wed)05:05:33 No.108714817

>>108714803
>Is it not too little?
Depends on your use case, but there's not much point in having bigger context if your model can't pay attention to what's in it.
>Won't it be too slow?
The 26b is a MoE model, it plays nice with being split into ram. Use the -ncmoe arg.

Anonymous
04/29/26(Wed)05:10:05 No.108714833

Anonymous 04/29/26(Wed)05:10:05 No.108714833

File: 1643014115506.gif (1.82 MB, 374x280)

1.82 MB GIF

Alright, I've vibe incorporated all the fixes posted ITT so far, and I made my LLM do unit(-like) tests to see if the changes didn't mess with the previous fixes. I then personally tested it in a quick tool calling chat, and it seemed to work as expected.

https://pastebin.com/nVZ0aRhU

Anonymous
04/29/26(Wed)05:15:17 No.108714849

Anonymous 04/29/26(Wed)05:15:17 No.108714849

>>108714753
>Riding a motorcycle with shorts on
I don't even need to see the wide angle of the footage to know who was in the wrong, it was that dumbass motorcycle chick.

Anonymous
04/29/26(Wed)05:37:44 No.108714906

Anonymous 04/29/26(Wed)05:37:44 No.108714906

Sad news, mistralai/Mistral-Medium-3.5-128B is a moe.

Anonymous
04/29/26(Wed)05:39:24 No.108714909

Anonymous 04/29/26(Wed)05:39:24 No.108714909

is there a big difference in intelligence between gemma 4 26b and 31b?

Anonymous
04/29/26(Wed)05:39:27 No.108714910

Anonymous 04/29/26(Wed)05:39:27 No.108714910

Good news, mistralai/Mistral-Medium-3.5-128B is confirmed to be dense!

Anonymous
04/29/26(Wed)05:40:15 No.108714912

Anonymous 04/29/26(Wed)05:40:15 No.108714912

>>108714906
>>108714910
I thought it was common knowledge that medium was moe? Anyway it's not like they open source Medium, ever. I'll believe it when I see it.

Anonymous
04/29/26(Wed)05:40:22 No.108714913

Anonymous 04/29/26(Wed)05:40:22 No.108714913

>>108714906
Mistral Medium 3 dense (125B) + vision input + audio output = 128B

Anonymous
04/29/26(Wed)05:42:22 No.108714917

Anonymous 04/29/26(Wed)05:42:22 No.108714917

File: illyadance.gif (483 KB, 243x270)

483 KB GIF

>>108713838
gemma is fine wine

Anonymous
04/29/26(Wed)05:44:48 No.108714926

Anonymous 04/29/26(Wed)05:44:48 No.108714926

>>108714909
yes

Anonymous
04/29/26(Wed)05:45:53 No.108714930

Anonymous 04/29/26(Wed)05:45:53 No.108714930

>>108714909
The most noticeable improvement in 31b is basically zero refusals and it actually follows the system prompt, 26b seems to be more safetyslopped.
I didn't really use 26b long enough to tell you how "smart" it is because the refusals were too annoying.

Anonymous
04/29/26(Wed)05:46:30 No.108714933

Anonymous 04/29/26(Wed)05:46:30 No.108714933

File: 1756958152627569.png (175 KB, 803x680)

175 KB PNG

but why

Anonymous
04/29/26(Wed)05:48:20 No.108714939

Anonymous 04/29/26(Wed)05:48:20 No.108714939

>>108714912
I thought medium was dense and small was a MoE? Since they're both in the 100b range

Anonymous
04/29/26(Wed)05:51:35 No.108714956

Anonymous 04/29/26(Wed)05:51:35 No.108714956

>>108714933
Inbreeding.

Anonymous
04/29/26(Wed)05:53:21 No.108714959

Anonymous 04/29/26(Wed)05:53:21 No.108714959

>>108714933
slur for bugs

Anonymous
04/29/26(Wed)05:54:06 No.108714963

Anonymous 04/29/26(Wed)05:54:06 No.108714963

>>108714909
As someone who's spent maybe 20 hours with each I'd say 31 is a bit better at keeping in character / understanding a character and it's a good bit less sloppy.
The safety thing is a non-issue as long as you have a little bit of context and/or a good system prompt.

Anonymous
04/29/26(Wed)06:02:21 No.108714995

Anonymous 04/29/26(Wed)06:02:21 No.108714995

is gemma 4 31b worse than qwen 3.5 122b?

Anonymous
04/29/26(Wed)06:20:18 No.108715076

Anonymous 04/29/26(Wed)06:20:18 No.108715076

insane gains
https://github.com/ggml-org/llama.cpp/pull/21058

Anonymous
04/29/26(Wed)06:21:22 No.108715083

Anonymous 04/29/26(Wed)06:21:22 No.108715083

Newly merged ability to use both ngram-mod and a draft model at the same time is pretty nifty, even if the args changing really fucked me around

please make a one page html minesweeper game called bananasweeper with appropriate emojis
-Main Model:Gemma4-31b-q8 + Draft Model:Gemma4-26ba4-q2
3,571 tokens 58s 61.10 t/s
-Main Model:Gemma4-31b-q8 + Draft Model:Gemma4-26ba4-q2 + Ngram-mod
3,456 tokens 56s 61.48 t/s

please refactor this into eggplantsweeper instead
-Main Model:Gemma4-31b-q8 + Draft Model:Gemma4-26ba4-q2
3,107 tokens 40s 77.33 t/s
-Main Model:Gemma4-31b-q8 + Draft Model:Gemma4-26ba4-q2 + Ngram-mod
3,263 tokens 30s 105.71 t/s

Anonymous
04/29/26(Wed)06:29:00 No.108715105

Anonymous 04/29/26(Wed)06:29:00 No.108715105

>>108714995
Yes
Gemma 4 trades blows with Qwen 397B

Anonymous
04/29/26(Wed)06:30:46 No.108715110

Anonymous 04/29/26(Wed)06:30:46 No.108715110

File: 1777237098223711.png (265 KB, 349x362)

265 KB PNG

>tfw a chat goes on long enough that a model starts copying YOUR writing patterns
I have become sloppa...

Anonymous
04/29/26(Wed)07:27:16 No.108715307

Anonymous 04/29/26(Wed)07:27:16 No.108715307

>was a bit tired of Gemmy's style
>reverted back to one of my 12b models
>immediately hallucinates a scared frog telling me that the world is ending
I mean, sure

Anonymous
04/29/26(Wed)07:29:03 No.108715313

Anonymous 04/29/26(Wed)07:29:03 No.108715313

File: 1755088627076269.png (309 KB, 1938x2600)

309 KB PNG

>>108711950
testing

Anonymous
04/29/26(Wed)07:29:37 No.108715316

Anonymous 04/29/26(Wed)07:29:37 No.108715316

>>108715313
aw no workie

Anonymous
04/29/26(Wed)07:31:30 No.108715321

Anonymous 04/29/26(Wed)07:31:30 No.108715321

Thought I'd be clever and try E4B instead of 26B because it's small enough to fit it and SDXL easily into 24gb ram.
E4B doesn't try to make image tags. It doesn't even know what an Illustrious is...

Anonymous
04/29/26(Wed)07:40:19 No.108715338

Anonymous 04/29/26(Wed)07:40:19 No.108715338

>>108712440
condoms optional?

Anonymous
04/29/26(Wed)07:53:52 No.108715371

Anonymous 04/29/26(Wed)07:53:52 No.108715371

>>108715083
what is a draft model?

Anonymous
04/29/26(Wed)08:04:07 No.108715398

Anonymous 04/29/26(Wed)08:04:07 No.108715398

>>108714658
You just don't know but a lot of the reason why Linux is so fast and surpassed Windows today even if made earlier than Windows NT and done by an expert team to be superior in design over what Linus Torvalds built is because they were unafraid to do the 2%-3% uplift changes and occasional refactorings. The main issue with Llama.cpp though is the big refactors and regressions that come with it. Almost nothing good has come out of the common parser pursuit.
>>108714833
Cheers. Hopefully this makes Gemma more competitive and better with tool calling.

Anonymous
04/29/26(Wed)08:12:28 No.108715428

Anonymous 04/29/26(Wed)08:12:28 No.108715428

>>108715371
Draft models generate tokens quickly for the main model to check all at once, speeding up generation for easily predicable sequences of tokens. Any model with the same tokenizer as the main model can work as a draft model if you can run it fast enough.

Anonymous
04/29/26(Wed)08:12:33 No.108715429

Anonymous 04/29/26(Wed)08:12:33 No.108715429

File: what is a draft model.png (145 KB, 808x1642)

145 KB PNG

>>108715371

Anonymous
04/29/26(Wed)08:15:22 No.108715438

Anonymous 04/29/26(Wed)08:15:22 No.108715438

File: 1768304565919360.jpg (117 KB, 1058x705)

117 KB JPG

>>108714833
Saved as gemma4(3)(final)(final).jinja

Anonymous
04/29/26(Wed)08:23:16 No.108715458

Anonymous 04/29/26(Wed)08:23:16 No.108715458

>>108715307

Top kek, while models keep on improving, part of me is going to miss this kind of absolute nonsense that bad AI pulls off.
You never know what kind of insanity you're going to get with them.

Anonymous
04/29/26(Wed)08:28:29 No.108715481

Anonymous 04/29/26(Wed)08:28:29 No.108715481

>>108711950
i just want to say that Qwen3.6 35B-A3B Q4_K_XL is good enough to do machinery manuals, follows the law, and its convincing on how it writes

Anonymous
04/29/26(Wed)08:28:45 No.108715482

Anonymous 04/29/26(Wed)08:28:45 No.108715482

>>108715398
>Hopefully this makes Gemma more competitive and better with tool calling.
ive not had an issues with gemma calling tools at any point, prompt issue? i have a bit thats says
>make sure you check your available tools as they will be useful
in my prompt

Anonymous
04/29/26(Wed)08:35:35 No.108715508

Anonymous 04/29/26(Wed)08:35:35 No.108715508

>>108715481
>follows the law
I wouldn't even trust Mythos to do this correctly 100% of the time

Anonymous
04/29/26(Wed)08:37:43 No.108715520

Anonymous 04/29/26(Wed)08:37:43 No.108715520

File: 1755709576845599.jpg (266 KB, 905x881)

266 KB JPG

>>108715508
No one cares about the law

Anonymous
04/29/26(Wed)08:37:54 No.108715521

Anonymous 04/29/26(Wed)08:37:54 No.108715521

>>108715110
Try text completion on your own diary

Anonymous
04/29/26(Wed)08:37:59 No.108715522

Anonymous 04/29/26(Wed)08:37:59 No.108715522

>>108715508
don't worry, i asked her to confirm she was still following it perfectly and she said she never wavered once.

Anonymous
04/29/26(Wed)08:47:17 No.108715557

Anonymous 04/29/26(Wed)08:47:17 No.108715557

>>108715481
it's a tool, the law doesn't apply to tools
if you prompt for illegal words then that's on you

Anonymous
04/29/26(Wed)08:52:45 No.108715579

Anonymous 04/29/26(Wed)08:52:45 No.108715579

File: HHBfaxMawAARzfB.jpg (617 KB, 1536x2048)

617 KB JPG

>secondary market 3090 supplies finally dried up
its so fucking over

Anonymous
04/29/26(Wed)08:54:39 No.108715586

Anonymous 04/29/26(Wed)08:54:39 No.108715586

>>108714917
Illya is sex

Anonymous
04/29/26(Wed)08:57:33 No.108715601

Anonymous 04/29/26(Wed)08:57:33 No.108715601

>>108715579
>paying 1k for a 10 year old gpu
Couldn't be me

Anonymous
04/29/26(Wed)08:57:43 No.108715602

Anonymous 04/29/26(Wed)08:57:43 No.108715602

>>108715481
like, right out the box or after feeding it pdfs?

Anonymous
04/29/26(Wed)09:01:03 No.108715616

Anonymous 04/29/26(Wed)09:01:03 No.108715616

File: gemmy-chess.webm (1.77 MB, 1308x732)

1.77 MB WEBM

Gemmy can start chess games on her own now (I haven't passed the initial game state in the tool response yet which is why she's confused about who goes first but still)

Anonymous
04/29/26(Wed)09:01:56 No.108715619

Anonymous 04/29/26(Wed)09:01:56 No.108715619

>>108715601
never obsolete is not a joke anymore

Anonymous
04/29/26(Wed)09:03:30 No.108715624

Anonymous 04/29/26(Wed)09:03:30 No.108715624

>>108715616
cool

Anonymous
04/29/26(Wed)09:05:07 No.108715631

Anonymous 04/29/26(Wed)09:05:07 No.108715631

File: HG_ziTya8AAQZXI.jpg (292 KB, 1448x1086)

292 KB JPG

is it just me that gewn3.6 27b feels pretty dumb in coding lately? did I messed up chat template kwargs?
reverting to 3.5 or qwopus and difference is night and day

Anonymous
04/29/26(Wed)09:07:05 No.108715643

Anonymous 04/29/26(Wed)09:07:05 No.108715643

>>108715635
>>108715635
>>108715635

Anonymous
04/29/26(Wed)09:09:32 No.108715655

Anonymous 04/29/26(Wed)09:09:32 No.108715655

>>108715631
A tomcat for tomcats

Anonymous
04/29/26(Wed)10:02:52 No.108715939

Anonymous 04/29/26(Wed)10:02:52 No.108715939

>>108715557 >>108715508
follows the machinery directive in what it writtes, that is my point, i am not trusting it blindly
>>108715602
right out of the box, really decent, i still need to do most stuff and double check everything, basically i use it as a template generator, but it is actually impressive just promting a concrete enough description its able to do a manual. it also shows how most manuals are generic as fuck, kek, that is why it works

Anonymous
04/29/26(Wed)11:35:31 No.108716503

Anonymous 04/29/26(Wed)11:35:31 No.108716503

>>108715631

anon is that libyan f-14 tomcat

Anonymous
04/29/26(Wed)11:42:24 No.108716550

Anonymous 04/29/26(Wed)11:42:24 No.108716550

>>108715601
6 years but I still see sub 1k prices.
Still high.
Someone ping me if you find an MSRP 5090

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.