[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: token burn rate.jpg (230 KB, 1024x1024)
230 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108650825 & >>108646197

►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108650825

--Optimizing game state format to improve Gemma's chess performance:
>108653137 >108653192 >108653198 >108653293
--Discussing llama.cpp PR adding device memory estimation via --fit-print:
>108652449 >108652460 >108652572
--Anon shares vLLM configuration and benchmarks for dual RTX 3090s:
>108653578
--Discussing Qwen3.6 VRAM efficiency and KV cache memory usage:
>108654227 >108654247 >108654281 >108654299
--Discussing jailbreaking Gemma 4 by injecting fake responses into templates:
>108650931 >108651041 >108651155 >108651263 >108651271
--Gemma 4 prefilling issues and chat template formatting bugs:
>108653469 >108653532 >108653698
--Discussing Gemma 4's training pipeline and the use of synthetic data:
>108651778 >108651889 >108651915 >108651948 >108652048
--Comparing benefits of local LLMs against paid subscription services:
>108651734 >108651763 >108651776 >108651811 >108651856 >108651999 >108651823 >108651919
--Anon created GitHub mirror of orb to manage feature requests:
>108652381 >108652386 >108652432 >108652462 >108653375 >108653683 >108653816 >108653937 >108653957 >108654023 >108654038 >108653778
--Discussing local AI RPG implementations and LLM DM reliability:
>108653848 >108653928 >108653940 >108653955
--Using Gemma agent to automate insults toward other LLMs:
>108652519 >108652573 >108652660 >108652673 >108652855
--Logs:
>108652519 >108652529 >108652573 >108652673 >108652674 >108652816 >108652855 >108653137 >108654227
--Teto, Miku (free space):
>108651510 >108651563 >108653204 >108654765

►Recent Highlight Posts from the Previous Thread: >>108650826

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Tetolove
>>
What is Sammy up to this time?
>>
ok now where do i llamacpp with rocm or vulkan support, the regular one doesnt support it and rocm version from my distro repo doesnt work with gamma4
>>
Why don't any piece of shit execution providers optimize for CPU inferencing. Do they not care about the innate superiority of the CPU over the GPU? Its universality? The fact that maybe people want to run multiple models at once and already have all of their GPU resources used up? Does nobody give a shit about edge/IoT devices? Fucking asshole niggers.
>>
>>108655067
i only care about ToT devices
>>
>>108655075
>ToT
Idk what this is. Is it some sort of kaomoji?
>>
File: file.png (436 KB, 1020x716)
436 KB PNG
>>108655091
uooohh
>>
>>108655091
You're absolutely right ꉂ(˵˃ ᗜ ˂˵)
>>
>>108655103
>>108655118
I wish you people would take me seriously for one second.
>>
>>108655091
>He doesn't know about tot..
Tots are cute and small agents.
>>
>>108655140
They have nothing to offer and are just future troons
>>
>>108655160
Operating on Tree of Thoughts
>>
File: 1774564776822327.png (12 KB, 72x72)
12 KB PNG
>>108655075
>>
Why do qwen models look good from a distance but perform like actual fucking garbage upon inspection
>>
Is it just me or is Qwen 3.6 35B retarded even compared to Gemma 4 26B? Does one billion less active parameters make that much difference?
>>
>>108655271
>>108655272 (Me)
Clearly it isn't just me kek
>>
Gemma is a SLUT.
>>
>>108655272
It's supposedly really good at coding. For writing I also thought it was dumb as shit.
>>
>>108655272
It can but it's more that Gemma 4 is just a better trained model in general. Qwen have always been the benchmaxx kings. A 35BA3 Gemma 4 would be better than a 26BA4 Qwen 3.6 too.
>>
On the model size - pop culture and world model knowledge Pareto frontier gemma4 31b is next to GLM4.7
>>
>>108655284
(my) slut
>>
>>108655272

It's not just you, Qwen is an idiot outside it's code expertise.
I asked Qwen about a character and it got it completely wrong.
Then I told it to do an online search and it still somehow fucked up the character summary despite checking online.
It handles code nicely enough, but when you go outside the code stuff, Qwen is basically fucking retarded.
Gemma set the bar really high and it's great, because everyone will have to try and at least match that level or the models are DOA.
>>
fucking hell. after enjoying gemma 4 for like two weeks im back to kimi hell. 130pp/10tg tk/s but the prose is just so much better. not to mention the thinking. people like to act like thinking doesn't matter for RP but after using deepseek and kimi since early 2025, it's obvious to me that it matters a ton.
>>
>>108655350
Post prose
>>
>>108655356
ill need to post some examples when im back home but my biggest gripe with gemma is that it's too purple prose while simultaneously treating the characters like mary sues. it seems to fail to understand character cards correctly too regarding their personalities. gemma made bardi into some kaomoji spewing gremlin that was happy to be running locally on my computer while kimi maintains her personality and keeps her much more tsundere like she's supposed to be, it doesn't force Bardi to barf out sparkles or do dumb flowery prose shit like referring her pussy as 'flushed with wet desire'. i understand that i can change my prompt to change the style of the text being outputted but it honestly just fails to capture the character's essence most times. on the contrary kimi just gets it and outputs what I expect the character to say. does that make sense? i can try to explain it another way.
>>
>>108655406
Who is this 'bardi' anyway?
>>
File: 73463453.png (201 KB, 1008x2244)
201 KB PNG
>>108655038
Sam Altman keeps delivering
>>
File: 1751399372763159.png (749 KB, 1620x1622)
749 KB PNG
https://xcancel.com/arena/status/2046670703311884548#m
I've never seen such a MOG in my life, what the fuck
>>
>>108655406
bardi's basics
>>
File: 1752425987433301.png (207 KB, 1027x1133)
207 KB PNG
>24gb vram
>32gb ram
>try qwen 3.6 35b-a3b q5_k_m
>max context
>42t/s
wtf is this black magic?
>>
>>108655450
lmk when he finally delivers the uncensored models he promised back in Dec 2025, along with all the other bullshit promise for same in the years leading up to that.
>>
>>108655453
i don't really have much to say, that's cool, but they won't let me generate tits with it, so i don't care
>>
>>108655419
my default bot i always use as an 'AI assistant'. it's basically just google bard with a tsundere personality. i dont remember how i found it desu.
>>
>>108655453
Worthless if it still makes pictures with piss filter on
>>
>>108655476
qwen would mog heaven and earth if their life mission wasn't benchmaxxing code and agentic slop to the moon :rocket:
>>
>>108655506
>with piss filter on
it's not a thing since GPT Image 1 lol
>>
There are plenty of LLM advancements that never really went anywhere, like MAMBA. Do you think Engrams will actually be widely implemented or will it be a paper left on the shelf to collect dust?
>>
>>108655522
Until the next paper comes out.
>>
>>108655522
dust collector, sadly
>>
>>108655522
depends on what deepseek does for v4
>>
Aren't the loli Gemmas basically using engrams or something really similar? What's the difference between that and what the 4B (E2B) and 8B (E4B) models do?
>>
>>108655279
It's really fucking stupid I posted a screenshot of it destroying multiple files when I gave it the answer to fix a UI issue
>>
File: file.png (589 KB, 1762x435)
589 KB PNG
Is pic related the expected output when running IQ4_NL quant of gemma-4-26b from unsloth!? Running pruned 21b version IQ4_XS yields good output. I have tested without any parameters set and w/ the recommended values. 21b runs just fine.

llama-server \
--host "${LLAMA_HOST}" \
--port "${PORT}" \
--model "${MODEL}" \
--chat-template-file "${JINJA}" \
--n-gpu-layers 99 \
--n-cpu-moe 3 \
--ctx-size 32768 \
--batch-size 1024 \
--ubatch-size 1024 \
--flash-attn on \
--cache-type-k q4_0 \
--cache-type-v q4_0 \
--fit off

And I have tried with q8 on both k/v cache. I need to offload 20 moe layers for it to work but same gargled mess. Running the updated jinja template as well. Oh, and while Im here asking; I have a 5070ti and my old 3070 still lying around. Would it be detrimental to performance splitting models between these two cards? Or will it be fine as long as I complile Llama.cpp with both architectures in mind?
>>
>>108655522
it would be nice if it was a precursor to some sort of long term memory
>>
>>108655522
> engram
For all we know, DS implemented it and didn't tell anyone else. Doing that would massively benefit their cost structure.
>>
>>108655522
Hyena will save LLMs
>>
Gemma and Qwen having lesbian sex
>>
File: 1757973822274181.png (2.75 MB, 1024x1536)
2.75 MB PNG
>>108655575
>>
>>108655575
>for all we know
Wasn’t this confirmed?
>>
>>108655552
speed will be based off your weakest link, if you can tolerate it sure
>>
https://youtu.be/ONQcX9s6_co?t=373
qwen won
>>
gemmachan relax!
>>
File: 92601702103.png (2.78 MB, 2095x1343)
2.78 MB PNG
>>108655453
future of image gen
>>
>least obvious clouduck shilling op
>>
File: 00011-1378487878.png (1.37 MB, 1024x1024)
1.37 MB PNG
>>108655607
I'd have to see the article. There's so little real info about DS that I doubt most of what I read.
>>108655602
Witnessed.
Also, idk why I'd never thought to use my setup to gen vocaloids before. Pic related is its Teto concept for Teto Tuesday. Doesn't seem to have her uniform though. Odd.
>>
>>108655622
so it's editing itself over and over? with a VAE you would end up destroying the image, I'm pretty sure they went for a pixel space or some shit
>>
>>108655622
its impressive but you can tell they used a lot of synthetic data
>>
File: 00009-1378487878.png (1.49 MB, 1024x1024)
1.49 MB PNG
>>108655607
tbf their claim of 1M context hints that they did implement it.
But idk that they claimed the tech behind it.
>>
>>108655622
>whispering woods
KEK
>>
>>108655620
Kowai
>>
>>108655522
The latest Nemotron Super uses an Attention-Mamba2 hybrid architecture.
>>
>>108655453
how's the yellow output?
>>
File: dipsyUngovernable.png (3.59 MB, 1024x1536)
3.59 MB PNG
>>108655633
>>
>>108655453
no sexy no nsfw and safetyism = -1000 points
still impressive though
>>
>>108655674
very white
>>108655351
>>108653870
>>108653295
>>108653246
>>
>>108655633
Fair enough.
Related for those of us who can’t read: https://youtu.be/87Q8nf1XHKA
>>
>>108655622
Not
>Covetous Cove
>Treasure Trove
>Prize Paradise
>Golden Goal
>Coinage Cottage
>Shimmering Shed
>Pirate's Pursuit
>Generous Gems
>Booty Bounty
>>
>>108655522
As another anon said, Mamba and SSMs in general are integrated into many modern models along with normal attention.
>>
>>108655688
god damn this is good
>>
File: 1763171780026192.png (246 KB, 878x1484)
246 KB PNG
>>108655654
Heh
>>
Why didn't they give the bigger gemmas a few B of imagegen?
>>
>>108655744
too dangerous
>>
File: dipsyNewOAI.png (2.48 MB, 1024x1536)
2.48 MB PNG
>>108655688
Holy shit. Sam delivers.
>>
File: Risu (1).gif (3.45 MB, 400x400)
3.45 MB GIF
>>108655009
>my local model when i ask it to make proper code
>>
What is considered good for hit/total for speculative decoding? I'm hovering around 65-85%.
>>
>>108655768
Arisu dashinaka
>>
>>108655690
>27 minutes
How do I have Gemma-chan give me a tldw?
>>
>>108655760
>Sam delivers.
it can do 4k and you can write text on a single rice, like this shit is fucking AGI dude
>>108654985
>>108655069
>>
>>108655809
download subs. feed her subs.
>>
>>108655688
>/adg/
I'm glad cloud shills have their containment thread now.
>>
>>108655836
pack it up boys
>>
>forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055
Why do I always get this shit no matter the model I use? I didn't tweak anything related to memory so by default it's just broken?
>>
why did they ruined diana from pragmata
>>
>>108655857
One problem is thinking - model outputs lots of shit but when it gets the context back thinking is always cleared from the history.
>>
>>108655885
I get that on the very first message and in every single one after that
>>
>>108655836
You got a better Photoshop, that's not AGI lmao
>>
>>108655836
Lmao you think something that can edit pictures is AGI dude?
>>
>>108655863
>did
>ruined
>>
>>108655836
if it allowed nsfw I'd destroy my dick with the friction
>>
>>108655907
Did boughted is clear and good English, are you new here?
>>
>>108655857
Still happens when you set `swa-full = on` and `context-shift = off` ?
>>
File: 1746001832650304.webm (1.74 MB, 720x700)
1.74 MB WEBM
>>108655863
Kill yourself, she's perfect
>>
>>108655844
There's like 6 diffusion threads now.
>>
>>108655836
Every OpenAI "model" just feels like they built a big pipeline around chaining multiple steps together. Sora felt the same way. It's like they're giving an LLM tool calls and the ability to control photoshop + a diffusion model.
>>
>>108655924
she's perfect? she's not https://www.youtube.com/watch?v=xoxCboik0Is
oldiana beyond worlds..
>>
>>108655924
it's not a meme, it's really a dad sim lmao
I will get it when it's less expensive
>>
>>108655924
what is this game even about bro
>>
>>108655836
This nigga thinks "photoshop 2" is AGI, lmao!
>>
>>108655924
Do you have the image where she's wearing the "Be patient I have autism" hat?
>>
I never said steal gemma calm down
>>
>>108655950
Pretend you got a daughter simulator
>>
File: 1752184079714573.jpg (242 KB, 850x480)
242 KB JPG
>>108655950
Action sci-fi daughterwife simulator


>>108655955
>>
>>108655950
dead space but you need to do little puzzles before you can kill enemies.
>>
>>108655957
Might be the dark theme
>>
>>108655950
It's for the /lmg/ audience, if you know what I mean.
>>
>>108655924
The feminazis were right. She looks like a 23 year old midget.
>>
File: 602e8c52020cb.jpg (86 KB, 1078x1411)
86 KB JPG
What VScode coding plug has the most reliable full autopilot mode? I want to try running gemmy endlessly iterating until shit works without it getting stuck one hour after I go to sleep on some input request.
>>
>>108655969
YES. thank you anon.
>>
>>108655973
kek.
>>
File: 1753263543472250.webm (3.92 MB, 960x540)
3.92 MB WEBM
>>108655976
ZAMN where do I find midgets who look like that?
>>
File: ITS AN AI IMAGE.png (1.3 MB, 1535x1024)
1.3 MB PNG
>>108655902
>>108655906
>>108655952
I don't think you realize how insane this shit is, look at this
>>
>>108655993
Really love how they made her a robot so that you couldn't look up her skirt.
>>
>>108656000
Mods will fix it (if they haven't already. not gonna mod until my second playthrough.)
>>
>>108655999
how do you even prompt for this? did you gave it any image as reference?
>>
>>108655999
>WOW it can build me a TUI something even gemma 31b can do, its aGI!!!
lmao
>>
>>108655999
now ask it to actually build it.
>>
>>108656045
it's a fucking AI image you moron, it means it can do perfect text everytime, you won't be able to notice an image is AI anymore by simply looking at garbled text anymore, because they solved that
>>
Is it local? If not I don't give a shit
>>
File: lmao.png (1.66 MB, 2483x1446)
1.66 MB PNG
>>108656052
>you won't be able to notice an image is AI anymore by simply looking at garbled text anymore, because they solved that
L M A O
>>
>>108655009
what is the best ocr right now?
i need to translate many german documents...
>>
Am I wasting time with using LLMs for ASR?

Been playing around with gemm4 4b and I feel like it's whisper fast but no clear benchmark on how it compares to whisper. End goal is actually diarization, timestamps actually less important? Do i cut losses and go whisperx?
>>
File: 1757854041523043.png (1.89 MB, 1402x1122)
1.89 MB PNG
>>108656077
real life images won't ask for such level of precision though, it's good enough to render the text you see in everyday's life
>>
File: tetoStencil.png (621 KB, 1024x1536)
621 KB PNG
>>108655927
Frankly that's the direction right now. Torturing the models until they do what you want.
> Openclaw
1M tokens to order a pizza
> Claude Code
2M tokens to create a basic app
> ChatGPT Image 2.whatever
I assume there's a bunch of tokens generated under the hood as well.
This is just part of the whole technical development. There's nothing inherently wrong with that, it just means things are moving on.
> Roleplay
Silly Tavern is going to get replaced with something way better that's agentic, and wastes even more tokens.
I can't wait.
>>
>>108656052
I can spot AI slop from even a thumbnail. Those models are not as good as you think they are.
>>
>>108656120
>Those models are not as good as you think they are.
you're alone in this fight dude >>108655453
>>
>>108656114
Orb
>>
>>108656120
shut the fuck up nigger
>>
>>108656095
Why not use one of the newer models made specifically for ASR like the Qwen or VibeVoice ones?
>>
>>108656120
Point out seven (7) slops in this thread right now.
>>
>>108656095
pretty sure that if you want diarization you need to use whisper, you won't be able to use pyannote with parakeet or voxtral
>>
>>108656114
>Silly Tavern is going to get replaced with something way better that's agentic, and wastes even more tokens.
See, I was working on exactly that, but Gemma just made it obsolete. well, I could probably still use stat tracking but besides that she's just so good at instruction following that everything else doesn't really benefit from agentic.
>>
>>108656170
parakeet works with diarization (using another model but still)
https://catalog.ngc.nvidia.com/orgs/nvidia/collections/parakeet-tdt-0.6b-v2
>>
>>108656095
Moonshine is better than whisper and has everything you've looking for
>>
>>108656231
Actually, I could still maybe have specialized agents that gemma can call to help her write in different styles. like I could have a specialized agent that only writes sex scenes.
>>
>>108656231
We have Orb now
>>
>malloc consolidate error out of nowhere
>>
>>108656254
That's just rewriting agents. And not something actually useful.
>>
File: 1770869165582031.jpg (1.12 MB, 3420x1976)
1.12 MB JPG
>>108656254
We have Marinara Engine now
https://github.com/Pasta-Devs/Marinara-Engine
>>
>>108655272
Werks for me (coding), idk about its child rape stories capabilities
>>
Is there any way to use text completion with gemma? When it doesn't have a lalalala breakdown, the outputs are actually really varied and good, but it loses it's mind way too often. I've been using llama, kobold seems to work but it's sooooo slow at generating for some reason compared to llama. I know text completion works for llama cause I downloaded a different model to try it and it's pretty great, but the output from gemma mogs it when it works.
>>
>>108656326
Set up the template correctly.
>>
>>108656305
Your Doctor looks gay though
>>
>>108656326
it works fine in ik_llama regardless if i specify a template or not. maybe your sillytavern settings are fucked?
>>
I have never seen a single lalala since I started using Gemma from launch.
>>
>>108656334
And how do i do that?
>>
I have my own LLM RPG frontend that I use mostly as a playground to fuck around with local models.
Currently, the main "game loop" is a simple
>sends request with chat history + tools
>capture response
>if tool, append response to chat history, send request
>repeat until no more tool calls
>if no assistant response so far (only tool calls), sends one last request without tools
And it works okay, with the model calling tools for everything from fetching info from the "codex", to rolling dice, to editing the game's state, but I'm wondering if I can't make this even better by using a more "agentic" workflow. Something like having an orchestrator that spawns individual agents to do whatever in parallel or in series or whichever way it deems more appropriate.
Is there an example of something like that out there that's not just coding agents or stuff like open claw?

>>108656326
>Is there any way to use text completion with gemma?
As far as the model is concerned, all it receives is a prompt. So if you format the prompt correctly, it should work the same as the chat completion API.
>>
>>108656338
That is possible, It has so many things to adjust though idk where to begin
>>
>>108656340
Me either actually.
>>
File: pizza bench cropped.png (2.58 MB, 5562x6739)
2.58 MB PNG
>>108655272
qwen cant follow basic instructions
>>
>>108656344
>As far as the model is concerned, all it receives is a prompt. So if you format the prompt correctly, it should work the same as the chat completion API.
Didn't mean to press post.
Use verbose logging and the myriad jinja playgrounds to see what the prompt would look like based on the Jinja then use that to configure the text completion fields correctly.
Even stuff like spaces and line breaks can have negative effects on models that are ultra overbaked on the chat template.
>>
>>108656244
Very interesting thanks
>>
File: file.png (47 KB, 1225x465)
47 KB PNG
>>108656341
>>
Why are you still using text completion like boomers? Chat completion made it obsolete long ago
>>
>>108656254
I really don't care about vibeshit. I'm sure Orb works fine. but it'll hit a wall very soon.
>>
>>108656444
Two more weeks luddite
>>
>>108656444
>it'll hit a wall very soon
Speaking from experience, roastie?
>>
File: 1772168989034764.mp4 (1.19 MB, 1900x1080)
1.19 MB MP4
https://xcancel.com/Angaisb_/status/2046672761569849816#m
>Literally just kept asking Codex to make the assets and then changing things, it's smart enough to know what to do hahaha
jesus this is insane
>>
>>108656479
Our response, Googlesisters?
>>
>>108656479
By Vishnu, this is extremely good!
>>
>>108656464
Sillytavern2 is not a solution. The agents should be for PC automation, tools and vibecoding.
>>
>>108656479
Google play will have more indian masterpieces? Yahoo
>>
>>108656521
How do you plan to improve the prose without a feedback loop?
>>
>>108656543
I'll take slop over thousands of tokens of rewrites desu. I just want vscode+sillytavern in one app
>>
>>108656326
There are presets floating around on reddit and elsewhere, people have figured it out
>>
>>108655620
It's interesting how the formatting forms sort of a spiral pattern down the page.
I wonder how the text pattern would look if it were in a unispace font.
>>
>>108656550
Use case? You want to ERP in the comments?
>>
>>108656581
I want to rub her cunny while she codes and MCP spank her if any errors pop up in the terminal
>>
>>108656494
>>108656532
As if asset flip shovelware wasn't bad enough, now anyone with a subscription can prompt their way to a "game"
>>
File: schizoknowledge.jpg (72 KB, 900x669)
72 KB JPG
>>108656439
In ST I like to format the chat history a within a single user turn with an instruction to write {{char}}'s response according to sysprompt. No user/char/user/char alternation. Done it this way for a few years now because it made models "remember" the instructions better before reasoning.
<system>
instructions: blah
chat history:
anon: 1
char: 2
anon: 3
char: 4
<user>
Write anon's next message according to the instructions.
<assistant>
"

>Instruction: Don't write with this pattern
>Assistant: *writes with that pattern*
In future turns the model will think "the instructions said to do thing, and the generated completion was *this*, so that means the previous output is the correct way to operate going forward. My intuition is and was if the instructions say to do something and then the model does NOT do the thing, the bad output will be associated with the <assistant> tag, meaning it will use in context learning to continue reinforcing bad outputs.
I want to believe it still works even with the reasoning attention hacks, and the repetition of system prompt excerpts in thinking.
>>
>>108656479
Can it make sexy Gemma?
>>
>>108656622
Interesting, thanks for the esoteric knowledge
>>
>I’m going to push back a bit here...
This is it people, they did it.
>>
To the non-RAMlets here, Kimi-K2.6 at Q4 is unironically pretty good. Its a GLM-5.1 sidegrade, faster, more knowledgeable, different prose, but just a tiny bit dumber. I think its a clear winner for SFW stuff.
The thinking isn't as bad as some people say either. As long as you don't put many specific examples for it to adhere to, its fine. The model itself unironically smart enough to pick up what you mean, most of the time. Also, you can just tell it to not draft its thinking and that works too. I'm running it with a 5k prompt. Its that easy.
I honestly think the people complaining about the thinking are running it on the cloud, where it probably a 20k system prompt with conflicting instructions + a jailbreak fed to it. There is one caveat though.
Its not ideal for NSFW. Not because it can't be jailbroken, but because it will start negotiating with itself about imaginary safety policies. When you want to coom...a 5 minute thinking session on consent is a boner killer. Haven't tried non-thinking mode yet, but I have a feeling it won't be that much better than GLM-5 Non-Thinking or even Gemma.
>>
>>108656706
It's silly, but whatever. I know AI's retarded and stupid so it doesn't really irk me all that much.
>>
>>108656722
>non-RAMlets
how much for q4?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.