/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/21/26(Tue)15:27:43 No.108655009

File: token burn rate.jpg (230 KB, 1024x1024)

230 KB JPG

/lmg/ - Local Models General Anonymous 04/21/26(Tue)15:27:43 No.108655009

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108650825 & >>108646197

►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/21/26(Tue)15:28:04 No.108655011

Anonymous 04/21/26(Tue)15:28:04 No.108655011

File: __hatsune_miku_kasane_tet(...).jpg (287 KB, 1600x1600)

287 KB JPG

►Recent Highlights from the Previous Thread: >>108650825

--Optimizing game state format to improve Gemma's chess performance:
>108653137 >108653192 >108653198 >108653293
--Discussing llama.cpp PR adding device memory estimation via --fit-print:
>108652449 >108652460 >108652572
--Anon shares vLLM configuration and benchmarks for dual RTX 3090s:
>108653578
--Discussing Qwen3.6 VRAM efficiency and KV cache memory usage:
>108654227 >108654247 >108654281 >108654299
--Discussing jailbreaking Gemma 4 by injecting fake responses into templates:
>108650931 >108651041 >108651155 >108651263 >108651271
--Gemma 4 prefilling issues and chat template formatting bugs:
>108653469 >108653532 >108653698
--Discussing Gemma 4's training pipeline and the use of synthetic data:
>108651778 >108651889 >108651915 >108651948 >108652048
--Comparing benefits of local LLMs against paid subscription services:
>108651734 >108651763 >108651776 >108651811 >108651856 >108651999 >108651823 >108651919
--Anon created GitHub mirror of orb to manage feature requests:
>108652381 >108652386 >108652432 >108652462 >108653375 >108653683 >108653816 >108653937 >108653957 >108654023 >108654038 >108653778
--Discussing local AI RPG implementations and LLM DM reliability:
>108653848 >108653928 >108653940 >108653955
--Using Gemma agent to automate insults toward other LLMs:
>108652519 >108652573 >108652660 >108652673 >108652855
--Logs:
>108652519 >108652529 >108652573 >108652673 >108652674 >108652816 >108652855 >108653137 >108654227
--Teto, Miku (free space):
>108651510 >108651563 >108653204 >108654765

►Recent Highlight Posts from the Previous Thread: >>108650826

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/21/26(Tue)15:29:32 No.108655022

Anonymous 04/21/26(Tue)15:29:32 No.108655022

Tetolove

Anonymous
04/21/26(Tue)15:31:24 No.108655038

Anonymous 04/21/26(Tue)15:31:24 No.108655038

What is Sammy up to this time?

Anonymous
04/21/26(Tue)15:31:57 No.108655045

Anonymous 04/21/26(Tue)15:31:57 No.108655045

ok now where do i llamacpp with rocm or vulkan support, the regular one doesnt support it and rocm version from my distro repo doesnt work with gamma4

Anonymous
04/21/26(Tue)15:34:33 No.108655067

Anonymous 04/21/26(Tue)15:34:33 No.108655067

Why don't any piece of shit execution providers optimize for CPU inferencing. Do they not care about the innate superiority of the CPU over the GPU? Its universality? The fact that maybe people want to run multiple models at once and already have all of their GPU resources used up? Does nobody give a shit about edge/IoT devices? Fucking asshole niggers.

Anonymous
04/21/26(Tue)15:35:05 No.108655075

Anonymous 04/21/26(Tue)15:35:05 No.108655075

>>108655067
i only care about ToT devices

Anonymous
04/21/26(Tue)15:37:18 No.108655091

Anonymous 04/21/26(Tue)15:37:18 No.108655091

>>108655075
>ToT
Idk what this is. Is it some sort of kaomoji?

Anonymous
04/21/26(Tue)15:38:38 No.108655103

Anonymous 04/21/26(Tue)15:38:38 No.108655103

File: file.png (436 KB, 1020x716)

436 KB PNG

>>108655091
uooohh

Anonymous
04/21/26(Tue)15:40:20 No.108655118

Anonymous 04/21/26(Tue)15:40:20 No.108655118

>>108655091
You're absolutely right ꉂ(˵˃ ᗜ ˂˵)

Anonymous
04/21/26(Tue)15:42:34 No.108655140

Anonymous 04/21/26(Tue)15:42:34 No.108655140

>>108655103
>>108655118
I wish you people would take me seriously for one second.

Anonymous
04/21/26(Tue)15:46:11 No.108655160

Anonymous 04/21/26(Tue)15:46:11 No.108655160

>>108655091
>He doesn't know about tot..
Tots are cute and small agents.

Anonymous
04/21/26(Tue)15:46:55 No.108655165

Anonymous 04/21/26(Tue)15:46:55 No.108655165

>>108655140
They have nothing to offer and are just future troons

Anonymous
04/21/26(Tue)15:47:53 No.108655174

Anonymous 04/21/26(Tue)15:47:53 No.108655174

>>108655160
Operating on Tree of Thoughts

Anonymous
04/21/26(Tue)15:56:13 No.108655233

Anonymous 04/21/26(Tue)15:56:13 No.108655233

File: 1774564776822327.png (12 KB, 72x72)

12 KB PNG

>>108655075

Anonymous
04/21/26(Tue)16:02:49 No.108655271

Anonymous 04/21/26(Tue)16:02:49 No.108655271

Why do qwen models look good from a distance but perform like actual fucking garbage upon inspection

Anonymous
04/21/26(Tue)16:02:52 No.108655272

Anonymous 04/21/26(Tue)16:02:52 No.108655272

Is it just me or is Qwen 3.6 35B retarded even compared to Gemma 4 26B? Does one billion less active parameters make that much difference?

Anonymous
04/21/26(Tue)16:04:41 No.108655279

Anonymous 04/21/26(Tue)16:04:41 No.108655279

>>108655271
>>108655272 (Me)
Clearly it isn't just me kek

Anonymous
04/21/26(Tue)16:05:07 No.108655284

Anonymous 04/21/26(Tue)16:05:07 No.108655284

Gemma is a SLUT.

Anonymous
04/21/26(Tue)16:05:34 No.108655289

Anonymous 04/21/26(Tue)16:05:34 No.108655289

>>108655272
It's supposedly really good at coding. For writing I also thought it was dumb as shit.

Anonymous
04/21/26(Tue)16:05:48 No.108655291

Anonymous 04/21/26(Tue)16:05:48 No.108655291

>>108655272
It can but it's more that Gemma 4 is just a better trained model in general. Qwen have always been the benchmaxx kings. A 35BA3 Gemma 4 would be better than a 26BA4 Qwen 3.6 too.

Anonymous
04/21/26(Tue)16:10:20 No.108655321

Anonymous 04/21/26(Tue)16:10:20 No.108655321

On the model size - pop culture and world model knowledge Pareto frontier gemma4 31b is next to GLM4.7

Anonymous
04/21/26(Tue)16:13:43 No.108655337

Anonymous 04/21/26(Tue)16:13:43 No.108655337

>>108655284
(my) slut

Anonymous
04/21/26(Tue)16:13:51 No.108655338

Anonymous 04/21/26(Tue)16:13:51 No.108655338

>>108655272

It's not just you, Qwen is an idiot outside it's code expertise.
I asked Qwen about a character and it got it completely wrong.
Then I told it to do an online search and it still somehow fucked up the character summary despite checking online.
It handles code nicely enough, but when you go outside the code stuff, Qwen is basically fucking retarded.
Gemma set the bar really high and it's great, because everyone will have to try and at least match that level or the models are DOA.

Anonymous
04/21/26(Tue)16:14:51 No.108655350

Anonymous 04/21/26(Tue)16:14:51 No.108655350

fucking hell. after enjoying gemma 4 for like two weeks im back to kimi hell. 130pp/10tg tk/s but the prose is just so much better. not to mention the thinking. people like to act like thinking doesn't matter for RP but after using deepseek and kimi since early 2025, it's obvious to me that it matters a ton.

Anonymous
04/21/26(Tue)16:16:39 No.108655356

Anonymous 04/21/26(Tue)16:16:39 No.108655356

>>108655350
Post prose

Anonymous
04/21/26(Tue)16:24:15 No.108655406

Anonymous 04/21/26(Tue)16:24:15 No.108655406

>>108655356
ill need to post some examples when im back home but my biggest gripe with gemma is that it's too purple prose while simultaneously treating the characters like mary sues. it seems to fail to understand character cards correctly too regarding their personalities. gemma made bardi into some kaomoji spewing gremlin that was happy to be running locally on my computer while kimi maintains her personality and keeps her much more tsundere like she's supposed to be, it doesn't force Bardi to barf out sparkles or do dumb flowery prose shit like referring her pussy as 'flushed with wet desire'. i understand that i can change my prompt to change the style of the text being outputted but it honestly just fails to capture the character's essence most times. on the contrary kimi just gets it and outputs what I expect the character to say. does that make sense? i can try to explain it another way.

Anonymous
04/21/26(Tue)16:25:35 No.108655419

Anonymous 04/21/26(Tue)16:25:35 No.108655419

>>108655406
Who is this 'bardi' anyway?

Anonymous
04/21/26(Tue)16:29:32 No.108655450

Anonymous 04/21/26(Tue)16:29:32 No.108655450

File: 73463453.png (201 KB, 1008x2244)

201 KB PNG

>>108655038
Sam Altman keeps delivering

Anonymous
04/21/26(Tue)16:29:52 No.108655453

Anonymous 04/21/26(Tue)16:29:52 No.108655453

File: 1751399372763159.png (749 KB, 1620x1622)

749 KB PNG

https://xcancel.com/arena/status/2046670703311884548#m
I've never seen such a MOG in my life, what the fuck

Anonymous
04/21/26(Tue)16:30:32 No.108655456

Anonymous 04/21/26(Tue)16:30:32 No.108655456

>>108655406
bardi's basics

Anonymous
04/21/26(Tue)16:33:13 No.108655476

Anonymous 04/21/26(Tue)16:33:13 No.108655476

File: 1752425987433301.png (207 KB, 1027x1133)

207 KB PNG

>24gb vram
>32gb ram
>try qwen 3.6 35b-a3b q5_k_m
>max context
>42t/s
wtf is this black magic?

Anonymous
04/21/26(Tue)16:34:14 No.108655483

Anonymous 04/21/26(Tue)16:34:14 No.108655483

>>108655450
lmk when he finally delivers the uncensored models he promised back in Dec 2025, along with all the other bullshit promise for same in the years leading up to that.

Anonymous
04/21/26(Tue)16:34:18 No.108655484

Anonymous 04/21/26(Tue)16:34:18 No.108655484

>>108655453
i don't really have much to say, that's cool, but they won't let me generate tits with it, so i don't care

Anonymous
04/21/26(Tue)16:36:31 No.108655498

Anonymous 04/21/26(Tue)16:36:31 No.108655498

>>108655419
my default bot i always use as an 'AI assistant'. it's basically just google bard with a tsundere personality. i dont remember how i found it desu.

Anonymous
04/21/26(Tue)16:37:20 No.108655506

Anonymous 04/21/26(Tue)16:37:20 No.108655506

>>108655453
Worthless if it still makes pictures with piss filter on

Anonymous
04/21/26(Tue)16:37:40 No.108655509

Anonymous 04/21/26(Tue)16:37:40 No.108655509

>>108655476
qwen would mog heaven and earth if their life mission wasn't benchmaxxing code and agentic slop to the moon :rocket:

Anonymous
04/21/26(Tue)16:37:55 No.108655512

Anonymous 04/21/26(Tue)16:37:55 No.108655512

>>108655506
>with piss filter on
it's not a thing since GPT Image 1 lol

Anonymous
04/21/26(Tue)16:39:45 No.108655522

Anonymous 04/21/26(Tue)16:39:45 No.108655522

There are plenty of LLM advancements that never really went anywhere, like MAMBA. Do you think Engrams will actually be widely implemented or will it be a paper left on the shelf to collect dust?

Anonymous
04/21/26(Tue)16:41:01 No.108655532

Anonymous 04/21/26(Tue)16:41:01 No.108655532

>>108655522
Until the next paper comes out.

Anonymous
04/21/26(Tue)16:41:24 No.108655534

Anonymous 04/21/26(Tue)16:41:24 No.108655534

>>108655522
dust collector, sadly

Anonymous
04/21/26(Tue)16:42:13 No.108655537

Anonymous 04/21/26(Tue)16:42:13 No.108655537

>>108655522
depends on what deepseek does for v4

Anonymous
04/21/26(Tue)16:43:24 No.108655543

Anonymous 04/21/26(Tue)16:43:24 No.108655543

Aren't the loli Gemmas basically using engrams or something really similar? What's the difference between that and what the 4B (E2B) and 8B (E4B) models do?

Anonymous
04/21/26(Tue)16:44:00 No.108655546

Anonymous 04/21/26(Tue)16:44:00 No.108655546

>>108655279
It's really fucking stupid I posted a screenshot of it destroying multiple files when I gave it the answer to fix a UI issue

Anonymous
04/21/26(Tue)16:44:31 No.108655552

Anonymous 04/21/26(Tue)16:44:31 No.108655552

File: file.png (589 KB, 1762x435)

589 KB PNG

Is pic related the expected output when running IQ4_NL quant of gemma-4-26b from unsloth!? Running pruned 21b version IQ4_XS yields good output. I have tested without any parameters set and w/ the recommended values. 21b runs just fine.
llama-server \
    --host "${LLAMA_HOST}" \
    --port "${PORT}" \
    --model "${MODEL}" \
    --chat-template-file "${JINJA}" \
    --n-gpu-layers 99 \
    --n-cpu-moe 3 \
    --ctx-size 32768 \
    --batch-size 1024 \
    --ubatch-size 1024 \
    --flash-attn on \
    --cache-type-k q4_0 \
    --cache-type-v q4_0 \
    --fit off
And I have tried with q8 on both k/v cache. I need to offload 20 moe layers for it to work but same gargled mess. Running the updated jinja template as well. Oh, and while Im here asking; I have a 5070ti and my old 3070 still lying around. Would it be detrimental to performance splitting models between these two cards? Or will it be fine as long as I complile Llama.cpp with both architectures in mind?

Anonymous
04/21/26(Tue)16:46:41 No.108655563

Anonymous 04/21/26(Tue)16:46:41 No.108655563

>>108655522
it would be nice if it was a precursor to some sort of long term memory

Anonymous
04/21/26(Tue)16:48:30 No.108655575

Anonymous 04/21/26(Tue)16:48:30 No.108655575

File: 00006-1378487878 (4) - Copy.png (1.45 MB, 1024x1024)

1.45 MB PNG

>>108655522
> engram
For all we know, DS implemented it and didn't tell anyone else. Doing that would massively benefit their cost structure.

Anonymous
04/21/26(Tue)16:49:09 No.108655579

Anonymous 04/21/26(Tue)16:49:09 No.108655579

>>108655522
Hyena will save LLMs

Anonymous
04/21/26(Tue)16:50:15 No.108655587

Anonymous 04/21/26(Tue)16:50:15 No.108655587

Gemma and Qwen having lesbian sex

Anonymous
04/21/26(Tue)16:52:36 No.108655602

Anonymous 04/21/26(Tue)16:52:36 No.108655602

File: 1757973822274181.png (2.75 MB, 1024x1536)

2.75 MB PNG

>>108655575

Anonymous
04/21/26(Tue)16:53:30 No.108655607

Anonymous 04/21/26(Tue)16:53:30 No.108655607

>>108655575
>for all we know
Wasn’t this confirmed?

Anonymous
04/21/26(Tue)16:54:03 No.108655612

Anonymous 04/21/26(Tue)16:54:03 No.108655612

>>108655552
speed will be based off your weakest link, if you can tolerate it sure

Anonymous
04/21/26(Tue)16:54:10 No.108655614

Anonymous 04/21/26(Tue)16:54:10 No.108655614

https://youtu.be/ONQcX9s6_co?t=373
qwen won

Anonymous
04/21/26(Tue)16:55:09 No.108655620

Anonymous 04/21/26(Tue)16:55:09 No.108655620

File: Screenshot_20260421_155429.png (3 MB, 2037x1483)

3 MB PNG

gemmachan relax!

Anonymous
04/21/26(Tue)16:55:58 No.108655622

Anonymous 04/21/26(Tue)16:55:58 No.108655622

File: 92601702103.png (2.78 MB, 2095x1343)

2.78 MB PNG

>>108655453
future of image gen

Anonymous
04/21/26(Tue)16:56:37 No.108655631

Anonymous 04/21/26(Tue)16:56:37 No.108655631

>least obvious clouduck shilling op

Anonymous
04/21/26(Tue)16:56:44 No.108655633

Anonymous 04/21/26(Tue)16:56:44 No.108655633

File: 00011-1378487878.png (1.37 MB, 1024x1024)

1.37 MB PNG

>>108655607
I'd have to see the article. There's so little real info about DS that I doubt most of what I read.
>>108655602
Witnessed.
Also, idk why I'd never thought to use my setup to gen vocaloids before. Pic related is its Teto concept for Teto Tuesday. Doesn't seem to have her uniform though. Odd.

Anonymous
04/21/26(Tue)16:58:26 No.108655648

Anonymous 04/21/26(Tue)16:58:26 No.108655648

>>108655622
so it's editing itself over and over? with a VAE you would end up destroying the image, I'm pretty sure they went for a pixel space or some shit

Anonymous
04/21/26(Tue)16:58:45 No.108655651

Anonymous 04/21/26(Tue)16:58:45 No.108655651

>>108655622
its impressive but you can tell they used a lot of synthetic data

Anonymous
04/21/26(Tue)16:59:00 No.108655652

Anonymous 04/21/26(Tue)16:59:00 No.108655652

File: 00009-1378487878.png (1.49 MB, 1024x1024)

1.49 MB PNG

>>108655607
tbf their claim of 1M context hints that they did implement it.
But idk that they claimed the tech behind it.

Anonymous
04/21/26(Tue)16:59:15 No.108655654

Anonymous 04/21/26(Tue)16:59:15 No.108655654

>>108655622
>whispering woods
KEK

Anonymous
04/21/26(Tue)16:59:39 No.108655660

Anonymous 04/21/26(Tue)16:59:39 No.108655660

>>108655620
Kowai

Anonymous
04/21/26(Tue)17:00:17 No.108655664

Anonymous 04/21/26(Tue)17:00:17 No.108655664

>>108655522
The latest Nemotron Super uses an Attention-Mamba2 hybrid architecture.

Anonymous
04/21/26(Tue)17:00:55 No.108655674

Anonymous 04/21/26(Tue)17:00:55 No.108655674

>>108655453
how's the yellow output?

Anonymous
04/21/26(Tue)17:01:30 No.108655680

Anonymous 04/21/26(Tue)17:01:30 No.108655680

File: dipsyUngovernable.png (3.59 MB, 1024x1536)

3.59 MB PNG

>>108655633 √

Anonymous
04/21/26(Tue)17:02:29 No.108655686

Anonymous 04/21/26(Tue)17:02:29 No.108655686

>>108655453
no sexy no nsfw and safetyism = -1000 points
still impressive though

Anonymous
04/21/26(Tue)17:03:13 No.108655688

Anonymous 04/21/26(Tue)17:03:13 No.108655688

>>108655674
very white
>>108655351
>>108653870
>>108653295
>>108653246

Anonymous
04/21/26(Tue)17:03:17 No.108655690

Anonymous 04/21/26(Tue)17:03:17 No.108655690

>>108655633
Fair enough.
Related for those of us who can’t read: https://youtu.be/87Q8nf1XHKA

Anonymous
04/21/26(Tue)17:03:54 No.108655694

Anonymous 04/21/26(Tue)17:03:54 No.108655694

>>108655622
Not
>Covetous Cove
>Treasure Trove
>Prize Paradise
>Golden Goal
>Coinage Cottage
>Shimmering Shed
>Pirate's Pursuit
>Generous Gems
>Booty Bounty

Anonymous
04/21/26(Tue)17:04:06 No.108655696

Anonymous 04/21/26(Tue)17:04:06 No.108655696

>>108655522
As another anon said, Mamba and SSMs in general are integrated into many modern models along with normal attention.

Anonymous
04/21/26(Tue)17:04:17 No.108655697

Anonymous 04/21/26(Tue)17:04:17 No.108655697

>>108655688
god damn this is good

Anonymous
04/21/26(Tue)17:04:45 No.108655698

Anonymous 04/21/26(Tue)17:04:45 No.108655698

File: 1763171780026192.png (246 KB, 878x1484)

246 KB PNG

>>108655654
Heh

Anonymous
04/21/26(Tue)17:12:34 No.108655744

Anonymous 04/21/26(Tue)17:12:34 No.108655744

Why didn't they give the bigger gemmas a few B of imagegen?

Anonymous
04/21/26(Tue)17:14:41 No.108655757

Anonymous 04/21/26(Tue)17:14:41 No.108655757

>>108655744
too dangerous

Anonymous
04/21/26(Tue)17:15:15 No.108655760

Anonymous 04/21/26(Tue)17:15:15 No.108655760

File: dipsyNewOAI.png (2.48 MB, 1024x1536)

2.48 MB PNG

>>108655688
Holy shit. Sam delivers.

Anonymous
04/21/26(Tue)17:16:15 No.108655768

Anonymous 04/21/26(Tue)17:16:15 No.108655768

File: Risu (1).gif (3.45 MB, 400x400)

3.45 MB GIF

>>108655009
>my local model when i ask it to make proper code

Anonymous
04/21/26(Tue)17:18:08 No.108655781

Anonymous 04/21/26(Tue)17:18:08 No.108655781

What is considered good for hit/total for speculative decoding? I'm hovering around 65-85%.

Anonymous
04/21/26(Tue)17:18:17 No.108655782

Anonymous 04/21/26(Tue)17:18:17 No.108655782

>>108655768
Arisu dashinaka

Anonymous
04/21/26(Tue)17:22:01 No.108655809

Anonymous 04/21/26(Tue)17:22:01 No.108655809

>>108655690
>27 minutes
How do I have Gemma-chan give me a tldw?

Anonymous
04/21/26(Tue)17:25:35 No.108655836

Anonymous 04/21/26(Tue)17:25:35 No.108655836

>>108655760
>Sam delivers.
it can do 4k and you can write text on a single rice, like this shit is fucking AGI dude
>>108654985
>>108655069

Anonymous
04/21/26(Tue)17:26:14 No.108655839

Anonymous 04/21/26(Tue)17:26:14 No.108655839

>>108655809
download subs. feed her subs.

Anonymous
04/21/26(Tue)17:27:15 No.108655844

Anonymous 04/21/26(Tue)17:27:15 No.108655844

>>108655688
>/adg/
I'm glad cloud shills have their containment thread now.

Anonymous
04/21/26(Tue)17:29:04 No.108655855

Anonymous 04/21/26(Tue)17:29:04 No.108655855

>>108655836
pack it up boys

Anonymous
04/21/26(Tue)17:29:07 No.108655857

Anonymous 04/21/26(Tue)17:29:07 No.108655857

>forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055
Why do I always get this shit no matter the model I use? I didn't tweak anything related to memory so by default it's just broken?

Anonymous
04/21/26(Tue)17:29:32 No.108655863

Anonymous 04/21/26(Tue)17:29:32 No.108655863

why did they ruined diana from pragmata

Anonymous
04/21/26(Tue)17:31:40 No.108655885

Anonymous 04/21/26(Tue)17:31:40 No.108655885

>>108655857
One problem is thinking - model outputs lots of shit but when it gets the context back thinking is always cleared from the history.

Anonymous
04/21/26(Tue)17:32:45 No.108655892

Anonymous 04/21/26(Tue)17:32:45 No.108655892

>>108655885
I get that on the very first message and in every single one after that

Anonymous
04/21/26(Tue)17:34:06 No.108655902

Anonymous 04/21/26(Tue)17:34:06 No.108655902

>>108655836
You got a better Photoshop, that's not AGI lmao

Anonymous
04/21/26(Tue)17:34:37 No.108655906

Anonymous 04/21/26(Tue)17:34:37 No.108655906

>>108655836
Lmao you think something that can edit pictures is AGI dude?

Anonymous
04/21/26(Tue)17:34:40 No.108655907

Anonymous 04/21/26(Tue)17:34:40 No.108655907

>>108655863
>did
>ruined

Anonymous
04/21/26(Tue)17:35:05 No.108655910

Anonymous 04/21/26(Tue)17:35:05 No.108655910

>>108655836
if it allowed nsfw I'd destroy my dick with the friction

Anonymous
04/21/26(Tue)17:35:37 No.108655912

Anonymous 04/21/26(Tue)17:35:37 No.108655912

>>108655907
Did boughted is clear and good English, are you new here?

Anonymous
04/21/26(Tue)17:36:41 No.108655920

Anonymous 04/21/26(Tue)17:36:41 No.108655920

>>108655857
Still happens when you set `swa-full = on` and `context-shift = off` ?

Anonymous
04/21/26(Tue)17:37:13 No.108655924

Anonymous 04/21/26(Tue)17:37:13 No.108655924

File: 1746001832650304.webm (1.74 MB, 720x700)

1.74 MB WEBM

>>108655863
Kill yourself, she's perfect

Anonymous
04/21/26(Tue)17:37:33 No.108655926

Anonymous 04/21/26(Tue)17:37:33 No.108655926

>>108655844
There's like 6 diffusion threads now.

Anonymous
04/21/26(Tue)17:37:46 No.108655927

Anonymous 04/21/26(Tue)17:37:46 No.108655927

>>108655836
Every OpenAI "model" just feels like they built a big pipeline around chaining multiple steps together. Sora felt the same way. It's like they're giving an LLM tool calls and the ability to control photoshop + a diffusion model.

Anonymous
04/21/26(Tue)17:41:23 No.108655948

Anonymous 04/21/26(Tue)17:41:23 No.108655948

>>108655924
she's perfect? she's not https://www.youtube.com/watch?v=xoxCboik0Is
oldiana beyond worlds..

Anonymous
04/21/26(Tue)17:41:40 No.108655949

Anonymous 04/21/26(Tue)17:41:40 No.108655949

>>108655924
it's not a meme, it's really a dad sim lmao
I will get it when it's less expensive

Anonymous
04/21/26(Tue)17:41:42 No.108655950

Anonymous 04/21/26(Tue)17:41:42 No.108655950

>>108655924
what is this game even about bro

Anonymous
04/21/26(Tue)17:41:50 No.108655952

Anonymous 04/21/26(Tue)17:41:50 No.108655952

>>108655836
This nigga thinks "photoshop 2" is AGI, lmao!

Anonymous
04/21/26(Tue)17:42:30 No.108655955

Anonymous 04/21/26(Tue)17:42:30 No.108655955

>>108655924
Do you have the image where she's wearing the "Be patient I have autism" hat?

Anonymous
04/21/26(Tue)17:43:16 No.108655957

Anonymous 04/21/26(Tue)17:43:16 No.108655957

File: Screenshot_20260421_174233.png (46 KB, 792x174)

46 KB PNG

I never said steal gemma calm down

Anonymous
04/21/26(Tue)17:43:40 No.108655962

Anonymous 04/21/26(Tue)17:43:40 No.108655962

>>108655950
Pretend you got a daughter simulator

Anonymous
04/21/26(Tue)17:44:30 No.108655969

Anonymous 04/21/26(Tue)17:44:30 No.108655969

File: 1752184079714573.jpg (242 KB, 850x480)

242 KB JPG

>>108655950
Action sci-fi daughterwife simulator

>>108655955

Anonymous
04/21/26(Tue)17:44:39 No.108655970

Anonymous 04/21/26(Tue)17:44:39 No.108655970

>>108655950
dead space but you need to do little puzzles before you can kill enemies.

Anonymous
04/21/26(Tue)17:44:42 No.108655973

Anonymous 04/21/26(Tue)17:44:42 No.108655973

>>108655957
Might be the dark theme

Anonymous
04/21/26(Tue)17:44:53 No.108655975

Anonymous 04/21/26(Tue)17:44:53 No.108655975

>>108655950
It's for the /lmg/ audience, if you know what I mean.

Anonymous
04/21/26(Tue)17:44:57 No.108655976

Anonymous 04/21/26(Tue)17:44:57 No.108655976

>>108655924
The feminazis were right. She looks like a 23 year old midget.

Anonymous
04/21/26(Tue)17:45:17 No.108655979

Anonymous 04/21/26(Tue)17:45:17 No.108655979

File: 602e8c52020cb.jpg (86 KB, 1078x1411)

86 KB JPG

What VScode coding plug has the most reliable full autopilot mode? I want to try running gemmy endlessly iterating until shit works without it getting stuck one hour after I go to sleep on some input request.

Anonymous
04/21/26(Tue)17:45:40 No.108655982

Anonymous 04/21/26(Tue)17:45:40 No.108655982

>>108655969
YES. thank you anon.

Anonymous
04/21/26(Tue)17:46:08 No.108655989

Anonymous 04/21/26(Tue)17:46:08 No.108655989

>>108655973
kek.

Anonymous
04/21/26(Tue)17:46:51 No.108655993

Anonymous 04/21/26(Tue)17:46:51 No.108655993

File: 1753263543472250.webm (3.92 MB, 960x540)

3.92 MB WEBM

>>108655976
ZAMN where do I find midgets who look like that?

Anonymous
04/21/26(Tue)17:47:42 No.108655999

Anonymous 04/21/26(Tue)17:47:42 No.108655999

File: ITS AN AI IMAGE.png (1.3 MB, 1535x1024)

1.3 MB PNG

>>108655902
>>108655906
>>108655952
I don't think you realize how insane this shit is, look at this

Anonymous
04/21/26(Tue)17:47:44 No.108656000

Anonymous 04/21/26(Tue)17:47:44 No.108656000

>>108655993
Really love how they made her a robot so that you couldn't look up her skirt.

Anonymous
04/21/26(Tue)17:49:12 No.108656016

Anonymous 04/21/26(Tue)17:49:12 No.108656016

>>108656000
Mods will fix it (if they haven't already. not gonna mod until my second playthrough.)

Anonymous
04/21/26(Tue)17:50:14 No.108656025

Anonymous 04/21/26(Tue)17:50:14 No.108656025

>>108655999
how do you even prompt for this? did you gave it any image as reference?

Anonymous
04/21/26(Tue)17:52:35 No.108656045

Anonymous 04/21/26(Tue)17:52:35 No.108656045

>>108655999
>WOW it can build me a TUI something even gemma 31b can do, its aGI!!!
lmao

Anonymous
04/21/26(Tue)17:53:58 No.108656050

Anonymous 04/21/26(Tue)17:53:58 No.108656050

>>108655999
now ask it to actually build it.

Anonymous
04/21/26(Tue)17:54:11 No.108656052

Anonymous 04/21/26(Tue)17:54:11 No.108656052

>>108656045
it's a fucking AI image you moron, it means it can do perfect text everytime, you won't be able to notice an image is AI anymore by simply looking at garbled text anymore, because they solved that

Anonymous
04/21/26(Tue)17:54:19 No.108656054

Anonymous 04/21/26(Tue)17:54:19 No.108656054

Is it local? If not I don't give a shit

Anonymous
04/21/26(Tue)17:58:27 No.108656077

Anonymous 04/21/26(Tue)17:58:27 No.108656077

File: lmao.png (1.66 MB, 2483x1446)

1.66 MB PNG

>>108656052
>you won't be able to notice an image is AI anymore by simply looking at garbled text anymore, because they solved that
L M A O

Anonymous
04/21/26(Tue)17:59:51 No.108656090

Anonymous 04/21/26(Tue)17:59:51 No.108656090

>>108655009
what is the best ocr right now?
i need to translate many german documents...

Anonymous
04/21/26(Tue)18:00:21 No.108656095

Anonymous 04/21/26(Tue)18:00:21 No.108656095

Am I wasting time with using LLMs for ASR?

Been playing around with gemm4 4b and I feel like it's whisper fast but no clear benchmark on how it compares to whisper. End goal is actually diarization, timestamps actually less important? Do i cut losses and go whisperx?

Anonymous
04/21/26(Tue)18:03:15 No.108656111

Anonymous 04/21/26(Tue)18:03:15 No.108656111

File: 1757854041523043.png (1.89 MB, 1402x1122)

1.89 MB PNG

>>108656077
real life images won't ask for such level of precision though, it's good enough to render the text you see in everyday's life

Anonymous
04/21/26(Tue)18:03:23 No.108656114

Anonymous 04/21/26(Tue)18:03:23 No.108656114

File: tetoStencil.png (621 KB, 1024x1536)

621 KB PNG

>>108655927
Frankly that's the direction right now. Torturing the models until they do what you want.
> Openclaw
1M tokens to order a pizza
> Claude Code
2M tokens to create a basic app
> ChatGPT Image 2.whatever
I assume there's a bunch of tokens generated under the hood as well.
This is just part of the whole technical development. There's nothing inherently wrong with that, it just means things are moving on.
> Roleplay
Silly Tavern is going to get replaced with something way better that's agentic, and wastes even more tokens.
I can't wait.

Anonymous
04/21/26(Tue)18:04:33 No.108656120

Anonymous 04/21/26(Tue)18:04:33 No.108656120

>>108656052
I can spot AI slop from even a thumbnail. Those models are not as good as you think they are.

Anonymous
04/21/26(Tue)18:05:41 No.108656129

Anonymous 04/21/26(Tue)18:05:41 No.108656129

>>108656120
>Those models are not as good as you think they are.
you're alone in this fight dude >>108655453

Anonymous
04/21/26(Tue)18:05:47 No.108656130

Anonymous 04/21/26(Tue)18:05:47 No.108656130

>>108656114
Orb

Anonymous
04/21/26(Tue)18:06:11 No.108656134

Anonymous 04/21/26(Tue)18:06:11 No.108656134

>>108656120
shut the fuck up nigger

Anonymous
04/21/26(Tue)18:06:49 No.108656141

Anonymous 04/21/26(Tue)18:06:49 No.108656141

>>108656095
Why not use one of the newer models made specifically for ASR like the Qwen or VibeVoice ones?

Anonymous
04/21/26(Tue)18:08:15 No.108656148

Anonymous 04/21/26(Tue)18:08:15 No.108656148

>>108656120
Point out seven (7) slops in this thread right now.

Anonymous
04/21/26(Tue)18:12:30 No.108656170

Anonymous 04/21/26(Tue)18:12:30 No.108656170

>>108656095
pretty sure that if you want diarization you need to use whisper, you won't be able to use pyannote with parakeet or voxtral

Anonymous
04/21/26(Tue)18:21:32 No.108656231

Anonymous 04/21/26(Tue)18:21:32 No.108656231

>>108656114
>Silly Tavern is going to get replaced with something way better that's agentic, and wastes even more tokens.
See, I was working on exactly that, but Gemma just made it obsolete. well, I could probably still use stat tracking but besides that she's just so good at instruction following that everything else doesn't really benefit from agentic.

Anonymous
04/21/26(Tue)18:22:43 No.108656239

Anonymous 04/21/26(Tue)18:22:43 No.108656239

>>108656170
parakeet works with diarization (using another model but still)
https://catalog.ngc.nvidia.com/orgs/nvidia/collections/parakeet-tdt-0.6b-v2

Anonymous
04/21/26(Tue)18:23:11 No.108656244

Anonymous 04/21/26(Tue)18:23:11 No.108656244

>>108656095
Moonshine is better than whisper and has everything you've looking for

Anonymous
04/21/26(Tue)18:23:52 No.108656247

Anonymous 04/21/26(Tue)18:23:52 No.108656247

>>108656231
Actually, I could still maybe have specialized agents that gemma can call to help her write in different styles. like I could have a specialized agent that only writes sex scenes.

Anonymous
04/21/26(Tue)18:25:02 No.108656254

Anonymous 04/21/26(Tue)18:25:02 No.108656254

>>108656231
We have Orb now

Anonymous
04/21/26(Tue)18:25:22 No.108656256

Anonymous 04/21/26(Tue)18:25:22 No.108656256

>malloc consolidate error out of nowhere

Anonymous
04/21/26(Tue)18:28:17 No.108656273

Anonymous 04/21/26(Tue)18:28:17 No.108656273

>>108656254
That's just rewriting agents. And not something actually useful.

Anonymous
04/21/26(Tue)18:33:10 No.108656305

Anonymous 04/21/26(Tue)18:33:10 No.108656305

File: 1770869165582031.jpg (1.12 MB, 3420x1976)

1.12 MB JPG

>>108656254
We have Marinara Engine now
https://github.com/Pasta-Devs/Marinara-Engine

Anonymous
04/21/26(Tue)18:33:37 No.108656310

Anonymous 04/21/26(Tue)18:33:37 No.108656310

>>108655272
Werks for me (coding), idk about its child rape stories capabilities

Anonymous
04/21/26(Tue)18:37:17 No.108656326

Anonymous 04/21/26(Tue)18:37:17 No.108656326

Is there any way to use text completion with gemma? When it doesn't have a lalalala breakdown, the outputs are actually really varied and good, but it loses it's mind way too often. I've been using llama, kobold seems to work but it's sooooo slow at generating for some reason compared to llama. I know text completion works for llama cause I downloaded a different model to try it and it's pretty great, but the output from gemma mogs it when it works.

Anonymous
04/21/26(Tue)18:38:50 No.108656334

Anonymous 04/21/26(Tue)18:38:50 No.108656334

>>108656326
Set up the template correctly.

Anonymous
04/21/26(Tue)18:38:53 No.108656336

Anonymous 04/21/26(Tue)18:38:53 No.108656336

>>108656305
Your Doctor looks gay though

Anonymous
04/21/26(Tue)18:39:12 No.108656338

Anonymous 04/21/26(Tue)18:39:12 No.108656338

>>108656326
it works fine in ik_llama regardless if i specify a template or not. maybe your sillytavern settings are fucked?

Anonymous
04/21/26(Tue)18:39:22 No.108656340

Anonymous 04/21/26(Tue)18:39:22 No.108656340

I have never seen a single lalala since I started using Gemma from launch.

Anonymous
04/21/26(Tue)18:39:33 No.108656341

Anonymous 04/21/26(Tue)18:39:33 No.108656341

>>108656334
And how do i do that?

Anonymous
04/21/26(Tue)18:39:51 No.108656344

Anonymous 04/21/26(Tue)18:39:51 No.108656344

I have my own LLM RPG frontend that I use mostly as a playground to fuck around with local models.
Currently, the main "game loop" is a simple
>sends request with chat history + tools
>capture response
>if tool, append response to chat history, send request
>repeat until no more tool calls
>if no assistant response so far (only tool calls), sends one last request without tools
And it works okay, with the model calling tools for everything from fetching info from the "codex", to rolling dice, to editing the game's state, but I'm wondering if I can't make this even better by using a more "agentic" workflow. Something like having an orchestrator that spawns individual agents to do whatever in parallel or in series or whichever way it deems more appropriate.
Is there an example of something like that out there that's not just coding agents or stuff like open claw?

>>108656326
>Is there any way to use text completion with gemma?
As far as the model is concerned, all it receives is a prompt. So if you format the prompt correctly, it should work the same as the chat completion API.

Anonymous
04/21/26(Tue)18:40:44 No.108656353

Anonymous 04/21/26(Tue)18:40:44 No.108656353

>>108656338
That is possible, It has so many things to adjust though idk where to begin

Anonymous
04/21/26(Tue)18:41:17 No.108656357

Anonymous 04/21/26(Tue)18:41:17 No.108656357

>>108656340
Me either actually.

Anonymous
04/21/26(Tue)18:42:22 No.108656365

Anonymous 04/21/26(Tue)18:42:22 No.108656365

File: pizza bench cropped.png (2.58 MB, 5562x6739)

2.58 MB PNG

>>108655272
qwen cant follow basic instructions

Anonymous
04/21/26(Tue)18:43:19 No.108656370

Anonymous 04/21/26(Tue)18:43:19 No.108656370

>>108656344
>As far as the model is concerned, all it receives is a prompt. So if you format the prompt correctly, it should work the same as the chat completion API.
Didn't mean to press post.
Use verbose logging and the myriad jinja playgrounds to see what the prompt would look like based on the Jinja then use that to configure the text completion fields correctly.
Even stuff like spaces and line breaks can have negative effects on models that are ultra overbaked on the chat template.

Anonymous
04/21/26(Tue)18:46:20 No.108656388

Anonymous 04/21/26(Tue)18:46:20 No.108656388

>>108656244
Very interesting thanks

Anonymous
04/21/26(Tue)18:47:13 No.108656399

Anonymous 04/21/26(Tue)18:47:13 No.108656399

File: file.png (47 KB, 1225x465)

47 KB PNG

>>108656341

Anonymous
04/21/26(Tue)18:52:48 No.108656439

Anonymous 04/21/26(Tue)18:52:48 No.108656439

Why are you still using text completion like boomers? Chat completion made it obsolete long ago

Anonymous
04/21/26(Tue)18:53:12 No.108656444

Anonymous 04/21/26(Tue)18:53:12 No.108656444

>>108656254
I really don't care about vibeshit. I'm sure Orb works fine. but it'll hit a wall very soon.

Anonymous
04/21/26(Tue)18:55:21 No.108656464

Anonymous 04/21/26(Tue)18:55:21 No.108656464

>>108656444
Two more weeks luddite

Anonymous
04/21/26(Tue)18:55:48 No.108656467

Anonymous 04/21/26(Tue)18:55:48 No.108656467

>>108656444
>it'll hit a wall very soon
Speaking from experience, roastie?

Anonymous
04/21/26(Tue)18:57:13 No.108656479

Anonymous 04/21/26(Tue)18:57:13 No.108656479

File: 1772168989034764.mp4 (1.19 MB, 1900x1080)

1.19 MB MP4

https://xcancel.com/Angaisb_/status/2046672761569849816#m
>Literally just kept asking Codex to make the assets and then changing things, it's smart enough to know what to do hahaha
jesus this is insane

Anonymous
04/21/26(Tue)18:58:53 No.108656493

Anonymous 04/21/26(Tue)18:58:53 No.108656493

>>108656479
Our response, Googlesisters?

Anonymous
04/21/26(Tue)18:59:15 No.108656494

Anonymous 04/21/26(Tue)18:59:15 No.108656494

>>108656479
By Vishnu, this is extremely good!

Anonymous
04/21/26(Tue)19:00:44 No.108656521

Anonymous 04/21/26(Tue)19:00:44 No.108656521

>>108656464
Sillytavern2 is not a solution. The agents should be for PC automation, tools and vibecoding.

Anonymous
04/21/26(Tue)19:01:48 No.108656532

Anonymous 04/21/26(Tue)19:01:48 No.108656532

>>108656479
Google play will have more indian masterpieces? Yahoo

Anonymous
04/21/26(Tue)19:03:25 No.108656543

Anonymous 04/21/26(Tue)19:03:25 No.108656543

>>108656521
How do you plan to improve the prose without a feedback loop?

Anonymous
04/21/26(Tue)19:04:49 No.108656550

Anonymous 04/21/26(Tue)19:04:49 No.108656550

>>108656543
I'll take slop over thousands of tokens of rewrites desu. I just want vscode+sillytavern in one app

Anonymous
04/21/26(Tue)19:06:55 No.108656575

Anonymous 04/21/26(Tue)19:06:55 No.108656575

>>108656326
There are presets floating around on reddit and elsewhere, people have figured it out

Anonymous
04/21/26(Tue)19:07:07 No.108656577

Anonymous 04/21/26(Tue)19:07:07 No.108656577

>>108655620
It's interesting how the formatting forms sort of a spiral pattern down the page.
I wonder how the text pattern would look if it were in a unispace font.

Anonymous
04/21/26(Tue)19:07:29 No.108656581

Anonymous 04/21/26(Tue)19:07:29 No.108656581

>>108656550
Use case? You want to ERP in the comments?

Anonymous
04/21/26(Tue)19:08:36 No.108656594

Anonymous 04/21/26(Tue)19:08:36 No.108656594

>>108656581
I want to rub her cunny while she codes and MCP spank her if any errors pop up in the terminal

Anonymous
04/21/26(Tue)19:10:22 No.108656610

Anonymous 04/21/26(Tue)19:10:22 No.108656610

>>108656494
>>108656532
As if asset flip shovelware wasn't bad enough, now anyone with a subscription can prompt their way to a "game"

Anonymous
04/21/26(Tue)19:11:32 No.108656622

Anonymous 04/21/26(Tue)19:11:32 No.108656622

File: schizoknowledge.jpg (72 KB, 900x669)

72 KB JPG

>>108656439
In ST I like to format the chat history a within a single user turn with an instruction to write {{char}}'s response according to sysprompt. No user/char/user/char alternation. Done it this way for a few years now because it made models "remember" the instructions better before reasoning.
<system>
instructions: blah
chat history:
anon: 1
char: 2
anon: 3
char: 4
<user>
Write anon's next message according to the instructions.
<assistant>
"
>Instruction: Don't write with this pattern
>Assistant: *writes with that pattern*
In future turns the model will think "the instructions said to do thing, and the generated completion was *this*, so that means the previous output is the correct way to operate going forward. My intuition is and was if the instructions say to do something and then the model does NOT do the thing, the bad output will be associated with the <assistant> tag, meaning it will use in context learning to continue reinforcing bad outputs.
I want to believe it still works even with the reasoning attention hacks, and the repetition of system prompt excerpts in thinking.

Anonymous
04/21/26(Tue)19:13:27 No.108656642

Anonymous 04/21/26(Tue)19:13:27 No.108656642

>>108656479
Can it make sexy Gemma?

Anonymous
04/21/26(Tue)19:18:28 No.108656689

Anonymous 04/21/26(Tue)19:18:28 No.108656689

>>108656622
Interesting, thanks for the esoteric knowledge

Anonymous
04/21/26(Tue)19:21:04 No.108656706

Anonymous 04/21/26(Tue)19:21:04 No.108656706

>I’m going to push back a bit here...
This is it people, they did it.

Anonymous
04/21/26(Tue)19:23:08 No.108656722

Anonymous 04/21/26(Tue)19:23:08 No.108656722

To the non-RAMlets here, Kimi-K2.6 at Q4 is unironically pretty good. Its a GLM-5.1 sidegrade, faster, more knowledgeable, different prose, but just a tiny bit dumber. I think its a clear winner for SFW stuff.
The thinking isn't as bad as some people say either. As long as you don't put many specific examples for it to adhere to, its fine. The model itself unironically smart enough to pick up what you mean, most of the time. Also, you can just tell it to not draft its thinking and that works too. I'm running it with a 5k prompt. Its that easy.
I honestly think the people complaining about the thinking are running it on the cloud, where it probably a 20k system prompt with conflicting instructions + a jailbreak fed to it. There is one caveat though.
Its not ideal for NSFW. Not because it can't be jailbroken, but because it will start negotiating with itself about imaginary safety policies. When you want to coom...a 5 minute thinking session on consent is a boner killer. Haven't tried non-thinking mode yet, but I have a feeling it won't be that much better than GLM-5 Non-Thinking or even Gemma.

Anonymous
04/21/26(Tue)19:24:46 No.108656733

Anonymous 04/21/26(Tue)19:24:46 No.108656733

>>108656706
It's silly, but whatever. I know AI's retarded and stupid so it doesn't really irk me all that much.

Anonymous
04/21/26(Tue)19:25:31 No.108656741

Anonymous 04/21/26(Tue)19:25:31 No.108656741

>>108656722
>non-RAMlets
how much for q4?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.