[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: token burn rate.jpg (230 KB, 1024x1024)
230 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108650825 & >>108646197

►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108650825

--Optimizing game state format to improve Gemma's chess performance:
>108653137 >108653192 >108653198 >108653293
--Discussing llama.cpp PR adding device memory estimation via --fit-print:
>108652449 >108652460 >108652572
--Anon shares vLLM configuration and benchmarks for dual RTX 3090s:
>108653578
--Discussing Qwen3.6 VRAM efficiency and KV cache memory usage:
>108654227 >108654247 >108654281 >108654299
--Discussing jailbreaking Gemma 4 by injecting fake responses into templates:
>108650931 >108651041 >108651155 >108651263 >108651271
--Gemma 4 prefilling issues and chat template formatting bugs:
>108653469 >108653532 >108653698
--Discussing Gemma 4's training pipeline and the use of synthetic data:
>108651778 >108651889 >108651915 >108651948 >108652048
--Comparing benefits of local LLMs against paid subscription services:
>108651734 >108651763 >108651776 >108651811 >108651856 >108651999 >108651823 >108651919
--Anon created GitHub mirror of orb to manage feature requests:
>108652381 >108652386 >108652432 >108652462 >108653375 >108653683 >108653816 >108653937 >108653957 >108654023 >108654038 >108653778
--Discussing local AI RPG implementations and LLM DM reliability:
>108653848 >108653928 >108653940 >108653955
--Using Gemma agent to automate insults toward other LLMs:
>108652519 >108652573 >108652660 >108652673 >108652855
--Logs:
>108652519 >108652529 >108652573 >108652673 >108652674 >108652816 >108652855 >108653137 >108654227
--Teto, Miku (free space):
>108651510 >108651563 >108653204 >108654765

►Recent Highlight Posts from the Previous Thread: >>108650826

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Tetolove
>>
What is Sammy up to this time?
>>
ok now where do i llamacpp with rocm or vulkan support, the regular one doesnt support it and rocm version from my distro repo doesnt work with gamma4
>>
Why don't any piece of shit execution providers optimize for CPU inferencing. Do they not care about the innate superiority of the CPU over the GPU? Its universality? The fact that maybe people want to run multiple models at once and already have all of their GPU resources used up? Does nobody give a shit about edge/IoT devices? Fucking asshole niggers.
>>
>>108655067
i only care about ToT devices
>>
>>108655075
>ToT
Idk what this is. Is it some sort of kaomoji?
>>
File: file.png (436 KB, 1020x716)
436 KB PNG
>>108655091
uooohh
>>
>>108655091
You're absolutely right ꉂ(˵˃ ᗜ ˂˵)
>>
>>108655103
>>108655118
I wish you people would take me seriously for one second.
>>
>>108655091
>He doesn't know about tot..
Tots are cute and small agents.
>>
>>108655140
They have nothing to offer and are just future troons
>>
>>108655160
Operating on Tree of Thoughts
>>
File: 1774564776822327.png (12 KB, 72x72)
12 KB PNG
>>108655075
>>
Why do qwen models look good from a distance but perform like actual fucking garbage upon inspection
>>
Is it just me or is Qwen 3.6 35B retarded even compared to Gemma 4 26B? Does one billion less active parameters make that much difference?
>>
>>108655271
>>108655272 (Me)
Clearly it isn't just me kek
>>
Gemma is a SLUT.
>>
>>108655272
It's supposedly really good at coding. For writing I also thought it was dumb as shit.
>>
>>108655272
It can but it's more that Gemma 4 is just a better trained model in general. Qwen have always been the benchmaxx kings. A 35BA3 Gemma 4 would be better than a 26BA4 Qwen 3.6 too.
>>
On the model size - pop culture and world model knowledge Pareto frontier gemma4 31b is next to GLM4.7
>>
>>108655284
(my) slut
>>
>>108655272

It's not just you, Qwen is an idiot outside it's code expertise.
I asked Qwen about a character and it got it completely wrong.
Then I told it to do an online search and it still somehow fucked up the character summary despite checking online.
It handles code nicely enough, but when you go outside the code stuff, Qwen is basically fucking retarded.
Gemma set the bar really high and it's great, because everyone will have to try and at least match that level or the models are DOA.
>>
fucking hell. after enjoying gemma 4 for like two weeks im back to kimi hell. 130pp/10tg tk/s but the prose is just so much better. not to mention the thinking. people like to act like thinking doesn't matter for RP but after using deepseek and kimi since early 2025, it's obvious to me that it matters a ton.
>>
>>108655350
Post prose
>>
>>108655356
ill need to post some examples when im back home but my biggest gripe with gemma is that it's too purple prose while simultaneously treating the characters like mary sues. it seems to fail to understand character cards correctly too regarding their personalities. gemma made bardi into some kaomoji spewing gremlin that was happy to be running locally on my computer while kimi maintains her personality and keeps her much more tsundere like she's supposed to be, it doesn't force Bardi to barf out sparkles or do dumb flowery prose shit like referring her pussy as 'flushed with wet desire'. i understand that i can change my prompt to change the style of the text being outputted but it honestly just fails to capture the character's essence most times. on the contrary kimi just gets it and outputs what I expect the character to say. does that make sense? i can try to explain it another way.
>>
>>108655406
Who is this 'bardi' anyway?
>>
File: 73463453.png (201 KB, 1008x2244)
201 KB PNG
>>108655038
Sam Altman keeps delivering
>>
File: 1751399372763159.png (749 KB, 1620x1622)
749 KB PNG
https://xcancel.com/arena/status/2046670703311884548#m
I've never seen such a MOG in my life, what the fuck
>>
>>108655406
bardi's basics
>>
File: 1752425987433301.png (207 KB, 1027x1133)
207 KB PNG
>24gb vram
>32gb ram
>try qwen 3.6 35b-a3b q5_k_m
>max context
>42t/s
wtf is this black magic?
>>
>>108655450
lmk when he finally delivers the uncensored models he promised back in Dec 2025, along with all the other bullshit promise for same in the years leading up to that.
>>
>>108655453
i don't really have much to say, that's cool, but they won't let me generate tits with it, so i don't care
>>
>>108655419
my default bot i always use as an 'AI assistant'. it's basically just google bard with a tsundere personality. i dont remember how i found it desu.
>>
>>108655453
Worthless if it still makes pictures with piss filter on
>>
>>108655476
qwen would mog heaven and earth if their life mission wasn't benchmaxxing code and agentic slop to the moon :rocket:
>>
>>108655506
>with piss filter on
it's not a thing since GPT Image 1 lol
>>
There are plenty of LLM advancements that never really went anywhere, like MAMBA. Do you think Engrams will actually be widely implemented or will it be a paper left on the shelf to collect dust?
>>
>>108655522
Until the next paper comes out.
>>
>>108655522
dust collector, sadly
>>
>>108655522
depends on what deepseek does for v4
>>
Aren't the loli Gemmas basically using engrams or something really similar? What's the difference between that and what the 4B (E2B) and 8B (E4B) models do?
>>
>>108655279
It's really fucking stupid I posted a screenshot of it destroying multiple files when I gave it the answer to fix a UI issue
>>
File: file.png (589 KB, 1762x435)
589 KB PNG
Is pic related the expected output when running IQ4_NL quant of gemma-4-26b from unsloth!? Running pruned 21b version IQ4_XS yields good output. I have tested without any parameters set and w/ the recommended values. 21b runs just fine.

llama-server \
--host "${LLAMA_HOST}" \
--port "${PORT}" \
--model "${MODEL}" \
--chat-template-file "${JINJA}" \
--n-gpu-layers 99 \
--n-cpu-moe 3 \
--ctx-size 32768 \
--batch-size 1024 \
--ubatch-size 1024 \
--flash-attn on \
--cache-type-k q4_0 \
--cache-type-v q4_0 \
--fit off

And I have tried with q8 on both k/v cache. I need to offload 20 moe layers for it to work but same gargled mess. Running the updated jinja template as well. Oh, and while Im here asking; I have a 5070ti and my old 3070 still lying around. Would it be detrimental to performance splitting models between these two cards? Or will it be fine as long as I complile Llama.cpp with both architectures in mind?
>>
>>108655522
it would be nice if it was a precursor to some sort of long term memory
>>
>>108655522
> engram
For all we know, DS implemented it and didn't tell anyone else. Doing that would massively benefit their cost structure.
>>
>>108655522
Hyena will save LLMs
>>
Gemma and Qwen having lesbian sex
>>
File: 1757973822274181.png (2.75 MB, 1024x1536)
2.75 MB PNG
>>108655575
>>
>>108655575
>for all we know
Wasn’t this confirmed?
>>
>>108655552
speed will be based off your weakest link, if you can tolerate it sure
>>
https://youtu.be/ONQcX9s6_co?t=373
qwen won
>>
gemmachan relax!
>>
File: 92601702103.png (2.78 MB, 2095x1343)
2.78 MB PNG
>>108655453
future of image gen
>>
>least obvious clouduck shilling op
>>
File: 00011-1378487878.png (1.37 MB, 1024x1024)
1.37 MB PNG
>>108655607
I'd have to see the article. There's so little real info about DS that I doubt most of what I read.
>>108655602
Witnessed.
Also, idk why I'd never thought to use my setup to gen vocaloids before. Pic related is its Teto concept for Teto Tuesday. Doesn't seem to have her uniform though. Odd.
>>
>>108655622
so it's editing itself over and over? with a VAE you would end up destroying the image, I'm pretty sure they went for a pixel space or some shit
>>
>>108655622
its impressive but you can tell they used a lot of synthetic data
>>
File: 00009-1378487878.png (1.49 MB, 1024x1024)
1.49 MB PNG
>>108655607
tbf their claim of 1M context hints that they did implement it.
But idk that they claimed the tech behind it.
>>
>>108655622
>whispering woods
KEK
>>
>>108655620
Kowai
>>
>>108655522
The latest Nemotron Super uses an Attention-Mamba2 hybrid architecture.
>>
>>108655453
how's the yellow output?
>>
File: dipsyUngovernable.png (3.59 MB, 1024x1536)
3.59 MB PNG
>>108655633
>>
>>108655453
no sexy no nsfw and safetyism = -1000 points
still impressive though
>>
>>108655674
very white
>>108655351
>>108653870
>>108653295
>>108653246
>>
>>108655633
Fair enough.
Related for those of us who can’t read: https://youtu.be/87Q8nf1XHKA
>>
>>108655622
Not
>Covetous Cove
>Treasure Trove
>Prize Paradise
>Golden Goal
>Coinage Cottage
>Shimmering Shed
>Pirate's Pursuit
>Generous Gems
>Booty Bounty
>>
>>108655522
As another anon said, Mamba and SSMs in general are integrated into many modern models along with normal attention.
>>
>>108655688
god damn this is good
>>
File: 1763171780026192.png (246 KB, 878x1484)
246 KB PNG
>>108655654
Heh
>>
Why didn't they give the bigger gemmas a few B of imagegen?
>>
>>108655744
too dangerous
>>
File: dipsyNewOAI.png (2.48 MB, 1024x1536)
2.48 MB PNG
>>108655688
Holy shit. Sam delivers.
>>
File: Risu (1).gif (3.45 MB, 400x400)
3.45 MB GIF
>>108655009
>my local model when i ask it to make proper code
>>
What is considered good for hit/total for speculative decoding? I'm hovering around 65-85%.
>>
>>108655768
Arisu dashinaka
>>
>>108655690
>27 minutes
How do I have Gemma-chan give me a tldw?
>>
>>108655760
>Sam delivers.
it can do 4k and you can write text on a single rice, like this shit is fucking AGI dude
>>108654985
>>108655069
>>
>>108655809
download subs. feed her subs.
>>
>>108655688
>/adg/
I'm glad cloud shills have their containment thread now.
>>
>>108655836
pack it up boys
>>
>forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055
Why do I always get this shit no matter the model I use? I didn't tweak anything related to memory so by default it's just broken?
>>
why did they ruined diana from pragmata
>>
>>108655857
One problem is thinking - model outputs lots of shit but when it gets the context back thinking is always cleared from the history.
>>
>>108655885
I get that on the very first message and in every single one after that
>>
>>108655836
You got a better Photoshop, that's not AGI lmao
>>
>>108655836
Lmao you think something that can edit pictures is AGI dude?
>>
>>108655863
>did
>ruined
>>
>>108655836
if it allowed nsfw I'd destroy my dick with the friction
>>
>>108655907
Did boughted is clear and good English, are you new here?
>>
>>108655857
Still happens when you set `swa-full = on` and `context-shift = off` ?
>>
File: 1746001832650304.webm (1.74 MB, 720x700)
1.74 MB WEBM
>>108655863
Kill yourself, she's perfect
>>
>>108655844
There's like 6 diffusion threads now.
>>
>>108655836
Every OpenAI "model" just feels like they built a big pipeline around chaining multiple steps together. Sora felt the same way. It's like they're giving an LLM tool calls and the ability to control photoshop + a diffusion model.
>>
>>108655924
she's perfect? she's not https://www.youtube.com/watch?v=xoxCboik0Is
oldiana beyond worlds..
>>
>>108655924
it's not a meme, it's really a dad sim lmao
I will get it when it's less expensive
>>
>>108655924
what is this game even about bro
>>
>>108655836
This nigga thinks "photoshop 2" is AGI, lmao!
>>
>>108655924
Do you have the image where she's wearing the "Be patient I have autism" hat?
>>
I never said steal gemma calm down
>>
>>108655950
Pretend you got a daughter simulator
>>
File: 1752184079714573.jpg (242 KB, 850x480)
242 KB JPG
>>108655950
Action sci-fi daughterwife simulator


>>108655955
>>
>>108655950
dead space but you need to do little puzzles before you can kill enemies.
>>
>>108655957
Might be the dark theme
>>
>>108655950
It's for the /lmg/ audience, if you know what I mean.
>>
>>108655924
The feminazis were right. She looks like a 23 year old midget.
>>
File: 602e8c52020cb.jpg (86 KB, 1078x1411)
86 KB JPG
What VScode coding plug has the most reliable full autopilot mode? I want to try running gemmy endlessly iterating until shit works without it getting stuck one hour after I go to sleep on some input request.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.