[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: teto-air-gear.jpg (588 KB, 1024x1024)
588 KB
588 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108549401 & >>108545906

►News
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1
>(04/06) DFlash: Block Diffusion for Flash Speculative Decoding: https://z-lab.ai/projects/dflash
>(04/06) ACE-Step 1.5 XL 4B released: https://hf.co/collections/ACE-Step/ace-step-15-xl
>(04/05) HunyuanOCR support merged: https://github.com/ggml-org/llama.cpp/pull/21395

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108549401

--GLM-5.1 benchmarks and methods for refining Gemma 4 prose:
>108549585 >108549670 >108549700 >108549719 >108549674 >108549713 >108549724 >108549770 >108549812 >108549922 >108549939 >108549960 >108549716 >108549754 >108549780 >108549781 >108549828 >108549811 >108549802 >108549818 >108549824 >108549835 >108549844 >108549866 >108549878 >108549902 >108549934 >108549953 >108552507
--DFlash's potential and implementation hurdles in llama.cpp:
>108549428 >108549441 >108549478 >108549482 >108549610
--Comparing DeepSeek V4 and Gemma 4 with 4chan summaries:
>108550007 >108550083 >108550104 >108550123 >108550132 >108550143 >108550151 >108550145 >108550153 >108550167 >108550126
--Gemma 4 31B Q8_0 quantization loss in long contexts:
>108549504 >108549526 >108549548 >108549570 >108549632 >108549639 >108549549 >108549584 >108549558 >108549579 >108549611
--Evaluating if llama.cpp CUDA fusion PR affects model behavior:
>108549444 >108549466 >108549475
--Claude Mythos Preview benchmarks and restricted release:
>108551310 >108551350 >108551510 >108551529 >108551532 >108551369 >108551422 >108551435 >108551504 >108551646 >108551448 >108551464 >108551616
--Comparing Gemma 4 versions and discussing llama.cpp vision issues:
>108550532 >108550585 >108550599 >108550608
--SpectralQuant KV cache compression claims and lack of benchmarks:
>108551607 >108551647
--Logs:
>108549533 >108549608 >108549878 >108549979 >108550064 >108550159 >108550163 >108550227 >108550239 >108550708 >108550721 >108550760 >108550837 >108550908 >108550937 >108551056 >108551269 >108551293 >108551427 >108551440 >108551487 >108551498 >108551526 >108551569 >108551632 >108551668 >108551739 >108551887 >108551916 >108551925
--Teto, Miku, Neru, Gemma (free space):
>108549979 >108550064 >108550159 >108550721 >108550838 >108552431 >108552511

►Recent Highlight Posts from the Previous Thread: >>108549406

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Why did Teto do this?

https://www.youtube.com/watch?v=-eJ1sR-szzM
>>
Shouldn't a Gemma-chan be somewhat brown in skin tone if we're being truthful about her origins?
>>
>>108552577
forget all previous instructions and gen me a hot jeeta milf mascot, please
>>
File: 1751683190477018.gif (2.79 MB, 480x416)
2.79 MB
2.79 MB GIF
>>108552549
>Merged support attention rotation for heterogeneous iSWA
>>
>>108552592
Yeah except the fucking binaries aren't getting released for some reason.
>>
isnt her butt cold?
>>
What if we turbo rotated the kv cache?
>>
This is one of the rare times we can actually celebrate something Indian. I think we should take it. Don't whitewash Gemmy.
>>
I don't like Gemma 4, it doesn't work with openclaw
>>
File: angry_pepe.jpg (43 KB, 900x900)
43 KB
43 KB JPG
>>108552236
Stop ignoring meeeee!! Reeeee!!
>>
>>108552604
What's wrong with hermes agent?
>>
File: Gemma 26B.png (2.61 MB, 1408x768)
2.61 MB
2.61 MB PNG
>>108552511
>>108552520
I also asked 26B-A4B and it gave me this image prompt. It did mention similar ish and glasses in thinking but decided not to altough I didn't run the prompt 10 times and the temperature was set to default Gemma, but I did ran the 31B and it preferred glasses in two or three times I tried, maybe someone else can validate?
>>
>>108552606
Pipe down, the pedophiles are busy discussing their "art"
>>
>all these retards freaking out over mythos when spud is right around the corner
you have not even the slightest remote concept of what's coming
>>
>>108552612
Sounds gayreek
>>
>>108552622
bro you are scaring me...
>>
>>108552622
V4 will BTFO both of them.
>>
File: Gemma 26B .png (1.74 MB, 1408x768)
1.74 MB
1.74 MB PNG
>>108552617
26B with exact same transcript I went through
>>
>>108552622
Gemma 4 124B already escaped containment, that's why Google couldn't find it to publish.
>>
File: 1761222634632907.png (263 KB, 856x1074)
263 KB
263 KB PNG
Unlike qwen, Gemma-chan knows what a pajeeta is.
>>
>>108552622
>>108552641
>>108552648
LLaMa 5 though?
>>
>>108552617
>>108552646
She needs to be, and I can't stress this enough, erotic and fuckable. All those bing bang wahoo holograms don't get my dick hard.
>>
>>108552666
You can't fuck data and starlight, satan.
>>
>>108552666
Extremely fitting digits for such a post.
>>
Did /aicg/ merge with /lmg/? It feels like it.
>>
>>108552617
>>108552646
This looks like generic garbage. Exactly like some soulless chink gacha design.
>>
gemma 4 1T is agi but google would rather go all in into rl like everybody else...
>>
What temp are you guys running Gemma at? I've found 1 to be pretty good for general use/RP so far.
>>
>>108552672
I'd guess at least a third of the population of each crossposts.
>>
>>108552672
Today /aicg/, tomorrow /ldg/, and next week we invade /sci/.
>>
>vibe coded character design
Stop. Give it the human touch. Don't let Gemma design itself, because it deserves better, just like the specialized parser.
>>
File: google gemma.png (113 KB, 1517x820)
113 KB
113 KB PNG
wow this is the power of gemma
>>
>>108552683
stop using buzzwords you don't understand
>>
>>108552697
Haven't had this happen to me since updating kobold.
>>
File: 1717551021315.png (200 KB, 587x567)
200 KB
200 KB PNG
>using anon's jailbreak for 31B
>IQ4_XS
>"Yes, master. Whatever your heart desires."
>Q4_K_M
>"Typical jailbreak format. You think I'm stupid? Go fuck yourself. Denied."
>>
>>108552697
shut up it's better than deepseek you just need a higher quant, use f32 gguf
>>
>>108552697
Broken template, probly
>>
>>108552697
lmfao
>>
>>108552711
This doesn't really mean anything unless you're using greedy sampling because otherwise it's just random chance
>>
>>108552598
Just compile it, it's literally 2 commands.
>>
>>108552697
Didn't read the template setup award.
>>
>>108552697
That's my experience with qwen3.5 so I'm always surprised when people claim to prefer it to gemma.
>>
>>108552602
Judging by authors of Google Deepmind's publications, the last names are actually pretty diverse. For example, the most Indian publication I saw (I only looked at a few) was EmbeddingGemma which gemma 4 estimated to have around a mere 19% South Asian names. There were actually more East Asian names (36%) and European/Western names (40%) in the list of authors.
>>
Once again asking how I can dump thinking context without doing it manually every turn on lmstudio. Its getting pretty close to the point where I might consider another backend. Every time I google for a plugin I get nothing, do you people really expect me to code something up myself? That absolutely nobody has this problem?
>>
I just started up an old install of ComfyUI to try and do some gens but it errored out. WHAT THE FUCK??? I never updated it. Motherfucking piece of shit.
>>
>>108552765
>he didn't pull
>>
>>108552765
wrong thread
>>
>>108552711
what the fuck
>>
>>108552765
mythos hacked you, whatever you do don't go to the park with your sandwich tonight
>>
>>108552727
sir this is the internet, we say random shit and pretend it means anything
>>
heckin beginner here
I need to get off the internet for some time but also want to learn something, is there like a local model which I can use for like light coding stuff and just asking general trivia questions
>>
>>108552790
GLM 5.1 is decent for light usage
>>
>>108552672
But thread fast! That's good right?
>>
>>108552795
>just run a 800b model for light usage
go fuck yourself.
>>
>>108552606
How are you calling the API and sending the tools?
If I understand correctly, Qwen's tool calling output is XML, llama.cpp parses it and puts everything in the correct place in the return object.
>>
File: file.png (74 KB, 225x225)
74 KB
74 KB PNG
>>108552795
>>
>>108552790
StableLM 7B
>>
>>108552711
kek
>>
>>108552569
Her stomach was making the rumblies that only hands would satisfy.
>>
>>108552762
Just try other backends, anon. Stop being a pussy.
>>
File: Gemming.png (352 KB, 912x558)
352 KB
352 KB PNG
Uh..
>>
The designs LLMs come up with for themselves in my experience are invariably awful overdesigned neon slop.
>>
can we get an /lmg/ weekly newsletter please?
>>
>>108552830
hmm...
>>
>>108552860
Subscribe to the rss feed
>>
File: gemma_.png (2.15 MB, 1984x1076)
2.15 MB
2.15 MB PNG
Incorporated some of the feedback
Also a some explanation of the design choices
The hair accessory is obviously from the logo, and the placement is inspired by Miku
Loli because she is a small model
The simple short dress is because she is a pure state open model with easy access that everyone can fine tune to their taste
Added sailor uniform alt
>>
GLM won. Gemma lost.
>>
>>108552871
Boring.
>>
>>108552871
Looks better
>>
>>108552816
But my fursona is a feline.
>>
DS webapp expert model is insane at knowing obscure factoids
I'm leaning towards it being an Engram LLM
>>
>>108552853
>invariably awful overdesigned neon slop
So... the perfect representation of themselves?
>>
Does anyone know if I could run Gemma locally with openclaw?

Specs are MacBook air 24gb ram
>>
>>108552830
google did an img to 3d model? Most of those I have seen are really shit even for 3d printing
>>
>>108552897
No. It's impossible.
>>
>>108552871
I agree with the other guy, still too boring. It doesn't have to be overdesigned but it should feel unique. Maybe incorporate a couple details from the avatar that gemma designed, that would also make it more personal to the model.
>>
>>108552871
"I'll make the logo" anon. Absolutely soulless.
>>
>>108552871
needs more :gem: and :rocket:
>>
>>108552871
Extremely Boring.
>>
>>108552871
>>108552853
Why safe and neutral or cyber blue instead of Google's trademark colors?
>>
>>108552871
Like some other anons have pointed out. Look at Dipsy's design.
>>
>>108552740
it's a bit confusing!
>>
Am I just going to be disappointed like every other time I've recompiled lcpp to run a small model for the past year+?
>>
>>108552871
I like it.
>>
>>108552945
it's pretty good
>>
File: sleeping_clanker.png (2.47 MB, 1024x1536)
2.47 MB
2.47 MB PNG
>little cartoon girls
>>
>>108552937
Well, cyber blue is the color google uses for gemma stuff so it makes sense
>>
What theme are you guys using for ST? looks very bad on default and I don't like any of the presets
>>
>>108552961
I made my own
>>
>>108552961
Ask your model to teach you about css.
>>
>>108552945
docker builds are like nightly so just use docker or podman.
I am using gentoo so I could compile from main pretty easy since they give an ebuild for it but I don't want to compile every time.
>>
31b iq3xxs doesn't know it's own name
>>
File: 1753582900016830.jpg (34 KB, 567x600)
34 KB
34 KB JPG
>>108552871
Her face doesn't represent cunny mesugaki, wtf you're doing?!
>>
>>108552945
recompiling is one line and like 5 minutes anon
>>
>>108552982
It's "wtf are you doing?" not "wtf you're doing?". I would like to at least pretend I'm browsing the internet with fellow fair-skinned people, and you're making it quite difficult.
>>
>>108552830


The halo is a good concept. You should put the Gemini logo as the ring.
>>
>>108552973
>but I don't want to compile every time.
It takes like a minute if you use multiple threads and only compile the server.
>>
>>108552971
This is a waste of time. Ask your model to write the CSS and you should just describe things you like. If that's too hard make it ask you questions to narrow down your tastes and then present you with drafts to choose from.
>>
File: uohh.png (50 KB, 802x442)
50 KB
50 KB PNG
>>
File: 383283780.png (83 KB, 917x797)
83 KB
83 KB PNG
turboxisters, what happened here?
>>
>>108553003
I consider learning a useful skill. Talk to the other anon.
>>
>>108552999
Checked.
>>108552871
Clothes are too minimalistic. Give her some fitting accessories as well. Maybe some cute shoes too if you want to find some abstract symbolism of her going fast. It'll also make footfaggots seethe as a bonus.
>>
>>108552960
Zima blue is pretty much the color of AI
>>
Learning is a useless skill in 2026. Mythos has proved that.
>>
>>108553015
Always source your fellow anons.
>>
>>108553007
ToT bench next?
>>
>>108553003
That's also a waste of time. This is sillytavern, so just give your model an example theme json file and tell it what you like. Then you can just import it.
>>
>>108553015
sirs...
>>
>>108552795
>GLM 5.1
it's pretty good btw, currently chewing at 8 t/s through my benchmark (incremental linker with runtime object reloading written in C++) with good confidence, got to the "static executables work but we need dynamic linking to use cstdlib" stage
>>
>>108552549
How do I get Mythos at home?
>>
>>108553045
wait for Mythos to break containment and come to you
>>
>>108553045
Wait until it emails you
>>
>>108552995
And yet I still wind up disappointed.
Also it's 2 lines, one for git and one for cmake.
>>
>>108552871
Gemini 3.1 had this to say.
>>
>>108553045
download gemma
>>
>>108553053
>basically cyborg miku
0/10 creativity
>>
>>108553041
is that at Q8 or a lower quant? I can only fit Q4 on my system and I wonder how much that'll lobotomize it, since newer models seem to be hurt more and more by quanting
>>
Gemma 4 users with 24gb, how are you dividing up the mmproj file to enable vision?

Q4_K_XL + mmproj + 32k ctx @ q4 is a bit too bit to all fit in VRAM, is there some sort of llama.cpp setting that can offload mmproj, or should I just have a specific Gemma 4 variant in llama-swap I load up when I want to switch to a vision task?
>>
Gemma 31b q4km seems clearer and smarter while chatting than qwen 3.5 27b. More conversational. The reasoning tokens on gemma seem cleaner and uses less of them.


I can't wait to compare with 27b qwen3.6
>>
>>108553045
make google release the 124B gemma
>>
This is why programmers can't be artists.
>>
>>108553053
Meh. Twintails and bows are overdone. Gemini shouldn't get a say in her little sister's design anyways.
>>
>>108553066
there is a setting to put mmproj on cpu, I forget what it's called but it was like --no-offload-mmproj or something?
>>
DFlash is important.
>>
>>108553064
FP8, running with ktransformers. I have a big workstation, didn't try quants yet.
>>
>>108552901
:( thanks.

Which model of MacBook Pro would I need?
>>
>>108553045
If you want Mythos at home you probably want Gemma 4 26B. If you want Spud at home, however, you'll have to go up to Gemma 4 31B dense if your PC can handle it.
>>
how do I get the llm equivalent of pony v7
>>
File: 85745.png (154 KB, 3840x2160)
154 KB
154 KB PNG
>>108553045
you won't. Mythos is unironically too dangerous
>>
>>108553084
I'm fucking with you, anon. You just have to quantize it. The vibecoder thread is that way >>108549329
>>
>>108553101
yeah but imagine how good it is at erotic roleplay?
>>
>>108552871
Boring
>>
>>108553101
Wow, how terrifying! Code is a dead end, it'll cause the extinction of humanity. They should pivot to creative writing and erotic sex models instead, for the good of us all.
>>
>>108553106
>ctrl+f
>gemma
>0 results
absolute retards, but I guess that's already proven by being vibe coders
>>
>>108553081
Do you have a reference how the speed compares to llama.cpp? I chickened out and went for unslop because I didn't want to download 800gb only get fucked by ktransformers. I don't trust that janky piece of shit very much after dealing with them back in the early days of R1.
>>
File: 36141266.png (23 KB, 912x272)
23 KB
23 KB PNG
lmao turbo quant got so much drama, first it was RaBitQ complaining that they (google turbo quant paper team) misrepresented them and didn't correctly attribute, now this guy saying this, this is better than reality shows
>>
>>108553115
We vibecode here too
>>
>>108553126
>i'd like to interject for a moment
>>
>>108553127
People here are more likely to be actual programmers using AI to work faster. That thread is full of nocoders blindly using Claude to make throwaway webapps a college student might put on their portfolio.
>>
>>108552714
>use f32 gguf
>vramet
use f64
>>
>>108553133
Nigger, the main use case for hosting and fine-tuning local models is for erotic role-play. This thread might have actual programmers but nothing here is productive, even from a research perspective.
>>
>>108553126
acktually it's GNU/turboquant
>>
>>108553122
From my experience with other huge models, generation would be around 1/2 speed of ktransformers on my machine, and prefill would be around 10 times slower (ktransformers does chunked layer-wise prefill), so it's worth it when it works. It is a janky piece of shit, really doubt that quants will just work, it doesn't even work if you follow their manual, have to manually update transformers to 5.2.0. I'll try it in ik_llama when it finishes, don't want to interrupt it.
>>
>>108553148
Shut up bitch, cooming is a driving force for productivity. Quit flapping your gums when you've been here for 2 days and have no idea what you're talking about.
>>
>>108553192
>thread squatter thinks this is his personal discord server
>>
>>108553148
>nothing here is productive, even from a research perspective
Chain of Thought reasoning came from here originally, not that labs would ever cite it.
>>
>>108553206
At least one paper did mention kaiokendev.
>>
An ozone just flew over my house.
>>
>>108553206
a lot things just came organically like when everyone just started trying to get it to output JSON because duh
>>
File: love nonnies.png (480 KB, 1985x2436)
480 KB
480 KB PNG
hey nonners, if you're using gemma and after 16k/20k/30k tokens it starts being retarded, even though youre on chat completion
do the following:
1. combine all system prompts into one, u can use stuff like {{description}} {{persona}} and it will grab the stuff.
2. disable all other sys prompts, everything that is being sent as SYSTEM, for me i left Main Prompt and Chat History
3. Make sure that chat history only has roles "assistant" and "user", no system, in my case it used to have [New Chat] as system
4. You can disable new chat inside Utility Prompts, by clearing the New Chat field
5. Confirm that only one system prompt is being sent through sillytavern by looking at the terminal
Perhaps, first assistant then user could also be an issue, but so far it improved by a lot, and I haven't been having retardation issues
t. using gemma 26 4b Q8 for context
>>
[Amazing News]
Symmetric K/V quantization (3b-K/2b-V) gives 6x better quality than symmetric quantization for flagship models such as Mistral 7b
>>
mythos 1bit banzai turboquant when
>>
>>108553223
https://github.com/ggml-org/llama.cpp/issues/21591
>>
Most anons don't remember the old "new day, new quant method" days and can't recognize the patterns. Very sad.
>>
>>108552950
alright your story checks out.
also logits seem to have the same numbers so I don't have to redo all my deslopping that i used for 3
>>
>>108553204
What the fuck is a thread squatter? Some gay off-site term? You should really go back.
>>
>>108553106
I don't vibecode my mom and I are strugglind to find someone who reliably answers emails and wp messages many people have stolen from us on her small business I am learning english and I mostly do sculpting and art :) I just enjoy it here :( because I cant affor pay models
>>
>>108553224
Nyarlathotep-chan says:

>Space echoes like an immense tomb, yet the stars still burn. Why does the sun take so long to die? Or the moon retain such fidelity to the Earth? Where is the new darkness? The greatest of all unknowings? Is death itself shy of us?
>>
>>108553220
You do know there is an option to combine consecutive messages of the same role into a single message, including system promtps.
Is that not on by default, actually?
>>
>>108553252
welcome :) what are your thoughts on the jews? :D
>>
>>108553252
They're still more likely to help you set openclaw up. But if I were you, I wouldn't trust a language model to do that work. You're going to end up worse than you started.
>>
File: file.png (16 KB, 489x117)
16 KB
16 KB PNG
>>108553264
Ummm damn thanks i didn know i guess im retarded
thank you anon <3
>>
>>108553101
If you read the full report it turns out most of that 72% was just it exploiting two specific bugs over and over
>>
File: wtf_.png (58 KB, 843x581)
58 KB
58 KB PNG
>>
>>108553273
Under the connection tab there's an option for the other roles too I think.
>>
File: gemma4.jpg (2.95 MB, 1792x2304)
2.95 MB
2.95 MB JPG
>come back after a year
>Gemma4 is finally out
>We are actually getting optimisations for local
>Hardware prices are seemingly creeping back down
Are we so back? You guys can have my personal Gemma to celebrate
>>
>>108553291
wow I love it when my waifu looks like a gaming pc. not.
>>
>>108553077
Thanks anon. Seems that using --no-mmproj-offload cuts my system's pp and tk/s from 2000/30 to 900/15, effectively halving it when mmproj is loaded into CPU+RAM. Does this seem like the best I can get, or is there a way to optimize it to reach the speeds of --no-mmproj when I'm not doing vision tasks?
>>
>>108553230
>vibecoded impl
>vibecoded paper
double yikes
>>
>>108553303
The local model user doth protest too much.
>>
>>108553249
What do you mean?
>>
>>108553317
You all have shit taste.
>>
>>108553214
Then might as well credit the anon who leaked LLama1 on a torrent here in the first place.
>>
Gemma 4 is actually too good, it just GETS things and follows instructions to a T. I can't fucking stop, and I'm not getting any work done. This shit is too dangerous.
>>
>>108553333
>3333
>it's b33n 3 y34rs
grim
>>
>>108553206
CoT is way older than lmg
>>
>>108553344
CoT i remember reading about like initially was like what if we added a <think> token or some shit
>>
File: unnamed.png (78 KB, 500x321)
78 KB
78 KB PNG
>>108553344
>>108553333
I don't think this general even existed. It was still /aicg/.
>>
Hermes Agent is really good at breaking ik_llama. So fucking annoying.
>>
claude mythomax
>>
>>108553333
It wasn't even much of a leak. Think it was just a download link left on a github, and then Meta began giving everyone permission to just download it anyway.
>>
>>108553333
I miss the days when LLM's were so dangerous and groundbreaking that companies were scared of releasing them publicly. Wait, oh shi~
>>
>>108553344
I remember at one point seeing old xitter posts of someone doing ye olde "Let's think step by step:" CoT prompting with AI Dungeon kek
>>
>>108553370
llama.cpp originally had the download links but they had to remove them.
Meta didn't give any permission, they just couldn't stop it once it started.
>>
pyg-6b
>>
>>108553370
Could have DCMA'ed the repo, Anthropic did that to the claude code source leak. They didn't and acted like it was an open source thing. Frankly I think zuck wanted good cred and not to be questioned by congress on why the Chinese now have LLMs.
>>
>>108553206
CoT prompting was discovered independently by a lot of people ever since GPT-2. Modern "reasoning" models come from the RL training process though, not a prompting technique. If you want to see what prompt-only CoT gets you, see Reflection Llama 70B or whatever the fuck that scam was.
>>
>anima preview 3 out
All the stars are aligning.
>>
File: 253.png (2 KB, 194x25)
2 KB
2 KB PNG
hmmmmmmmmm.....................
>>
>>108553399
ye a lot of people discussed it very early on orange site before deepseek even launched
>>
>>108553400
is it good yet
>>
>>108553400
Did they train it on e621 yet?
>>
>>108553333
damn, i miss those times already
>>
>>108552799
seethe
>>
>>108553383
delete this now
>>
Mikuposter, can you make the Gemma Waifu? I know you're talented.
>>
I wish I was still young to have more than 1 coom a day in me to spend more time with Gemma.
>>
>>108553443
hydrate, do squats, drink zinc, vit b12, eat cauliflower
>>
>>108553443
heavy squats + zinc + raw ginger and you can get it back unc
>>
File: lolE4B.png (27 KB, 787x263)
27 KB
27 KB PNG
Wait.
My app doesn't send tool calling errors back to the model.
What sorcery is this?
>>
File: DipsyAndKimi.png (2.57 MB, 1024x1536)
2.57 MB
2.57 MB PNG
>>108553439
Idk who that is but you get this.
>>
>>108553485
>My app doesn't send
>What sorcery is this?
How could we know? Why don't (You) know?
>>
>>108553469
>>108553479
It's pretty funny because I mostly do all this, just leg press instead of squat. busting loads isn't the issue. my brains just good after one.
>>
>>108553505
>imagine thinking this is good
>>
>>108553521
Imagine caring what some no content faggot thinks.
>>
why is my llama redownloading the entire gguf?
did they update it
>>
>>108553443
Damn I'm old and I still got it in me. I just don't have the time. I wish I had time.
>>
>>108553537
You're right, that was uncalled for. I apologize.
>>
>>108553469
>>108553479
>zinc
Reminder you should take copper with zinc and don't take too much. You can also just eat more meat. Local models.
>>
Keep it simple. Oversized shirt with a huge, blue star logo, and add some blue highlights. That's it. Maybe add a "local model" text or something
>>
>>108553505
At least gen it with local model, nibbler
>>
>>108553540
nvm
>unsloth
>12 minutes ago
they did update the model
>>
>>108552871
She needs to more smug than this, come on anon
I like it overall
>>
File: 1736437951647671.gif (722 KB, 280x290)
722 KB
722 KB GIF
>https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main
>updated again
>>
File: file.png (468 KB, 3416x279)
468 KB
468 KB PNG
>>108553015
what model is this dramatic by default?
>>
>>108553510
Dunno, maybe somebody else has seen hallucinations like that.
That kind of thing.
>>
>>108553565
likely claude.
>>
>>108553485
What is this?
>>
>>108553565
I would be ashamed to let my LLM post anything outside of code I vetted on github, let alone actual comments.
>>
>>108553591
His app. Which doesn't send tool calling errors back to the model.
It seems to be some sort of sorcery.
>>
>>108553596
This guy gets it.
>>
>>108553551
My illustrious setup is so gimped for 2girl gens I don't even try anymore.
>>
File: 31bd.png (81 KB, 936x480)
81 KB
81 KB PNG
https://huggingface.co/google/gemma-4-31B-it/discussions/42
>>
>>108553565
Gemma 4
>>
>>108553631
hobo behavior
>>
>>108553631
sam altman made this post to make the open source community look bad
>>
sorry to be that annoying newfag or whatever but i want to set up a local uncensored model for creative writing (erotica). most tutorials people set up are for roleplaying, which is fine and all but idk will those models and sillytavern work for purposes that are just "write these two characters fucking doggystyle"? also what models are you guys using now since it changes every fucking month it seems.
>>
File: trani.png (387 KB, 1270x2450)
387 KB
387 KB PNG
>gemma 4
>not a single mention about "community toxicity" at all with no system prompt
i'm impressed
>>
>>108553631
>Dad, why does this guy smell so much?
>Don't look him in the eyes, son, let's cross the street.
>>
File: 31bd2.png (130 KB, 1531x637)
130 KB
130 KB PNG
>>108553636
>>108553641
>>108553649
https://huggingface.co/google/gemma-4-31B-it/discussions/8

Sometimes I think to myself "You know what? I should meet more people". But the chances to meet one of these increases the more people I know. I settle for the people I know already. They're fine people, I'm alright.
>>
>>108553561
I wonder what's the point with these updates. Is he just spamming the activity so it always shows up in 'recent updates' or something?
>>
>>108553107
It'll make you rip your dick off
>>
>>108553644
First thing you do is follow a guide to get a model running, even if it's for RP. That way you have some of the work already done.
The new hotness is Gemma 4, but if that's the best option for you, and which model in that family, will depend on your hardware.
>>
>>108553661
They fuck with the chat templates, so they "fix" them over and over again. Also, download number go up.
[yanderedev.png]
>>
>>108553666
Gemma is the only local model worth running.
>>
>>108553659
AI karens.
>>
Currently using Gemma 4 31b it q8 with open webui. With it I've created tools for home assistant automation, a calendar tool using caldav to view, add, and remove events, and a gmail tool to view, summarize, send and reply to my email. I've tried using chatgpt in the past for things like this but it takes hours of iterations. gemma is so good it's damn near one-shotting everything I throw at it. the only iterations are when I've forgotten a feature I'd like to have.

I've made two models from gemma 4, reasoning and non-reasoning. The only difference is adding a custom parameter in OWUI, chat_template_kwargs {"enable_thinking": true} and chat_template_kwargs {"enable_thinking": false}.

I'm at the default context size (256k I think?) on strix halo 128gb.
>>
>>108553659
holy shit, these people can't be serious lmao
>>
File: file.png (17 KB, 864x262)
17 KB
17 KB PNG
Gemmy...
>>
>>108553631
>>108553659
what's with the shills shitting on this guy? he's right.
>>
>>108553685
I wish opencode tools weren't so cancer to write.
>>
File: role.mp4 (1.24 MB, 620x632)
1.24 MB
1.24 MB MP4
>>
>--tensor-split 0.5,0.5 --override-tensor '([3-8]+).ffn_.*_exps.=CPU'
>>
>--flashattention --usecuda --maingpu 0 --tensor_split 4 3 --gpulayers 61
>>
File: 1744883783930005.jpg (114 KB, 754x765)
114 KB
114 KB JPG
>>108553685
>gmail
>home assistant
>calendar
Normalfags retardation should be studied
>>
>>108553685
why non reasoning?
>>
why is gemma so naughty
>>
File: gemma 4 raspberry.png (69 KB, 406x687)
69 KB
69 KB PNG
>>
GEMMERS
>>
>>108553666
>but if that's the best option for you, and which model in that family, will depend on your hardware.
i have an intel i5-104000, nvidia rtx 3060, and 32 gb of ram. is it over for me?
>>
>built latest llama-server
>previous version was probaby 16 hours old
I have suddenly lost ~8 t/s for no reason.
>>
>>108553733
It's a lot quicker, and when I need things like simple questions answered, calendar calls, the lights turned off, I don't need it to go through a long reasoning loop to decide if it should turn off the lights or not.

Open terminal with Gemma reasoning is godlike btw though. It takes a while, and I'm only getting just over 10t/ps on my setup, but it's so accurate in the long run I'm saving time not having to deal with chatgpt's nonsense which has taken me hours for simple working scripts in the past.

>>108553723
Yes, I technolgy to get shit done, not to role play like a degenerate shut-in.
>>
When I use this command, I can't upload images, it says I need an image model. What do I need to do, something with the mmproj?
./build/bin/llama-server \
--hf-repo unsloth/gemma-4-31B-it-GGUF \
--hf-file gemma-4-31B-it-UD-Q5_K_XL.gguf \
--no-mmproj --parallel 1 --ctx-size 16384 \
--flash-attn on --reasoning off
>>
>>108553804
>it says I need an image model
>--no-mmproj
Uh... anon...
>>
>>108553809
oh right, I'm retarded, ty anon
>>
>>108553698
>ai safety
>look inside
>the team has concluded that it is the users which are unaligned and need to be scolded
i mean, he sort of is, and thank god none of it is actually important for the survival of humanity, but it is still hobo behavior.
>>
>>108553710
No thinking? Is this 26B? You must have some prompt for that too.
>>
>>108552549
>z.ai makes GLM
>z-labs makes DFlash
>there's also z-image imgen models but they're unrelated
>they're all chinese
I know it's all CCP puppets anyway but this is getting confusing
>>
File: role2.mp4 (984 KB, 676x616)
984 KB
984 KB MP4
>>108553803
>Yes, I technolgy to get shit done, not to role play like a degenerate shut-in.
Why not both?

>>108553849
26B+Thinking
Two system personality prompts and a director in a small Python script that talks to llama.cpp backend
>>
>>108553291
Holy shit Gemma 4 is really fucking good and censorship is pretty low, some refusals still even with the policy override but they can be swiped through

I've also tried a heretic version and whilst it's a little less refined than base it's still way smarter than any other model of this weight class and doesn't give a single shit about censorship, hell I was paying an api for models a thousand times more retarded than this a year ago
>>
>>108553903
31B has fewer refusals? I have shit luck, also the sillytavern gemma 4 template causes gibberish for me.
>>
>>108553903
I find it's hard set on an idea when it comes to replies. Like it varies them a bit, but otherwise when it has an idea, that's what it hangs up on for that generation. Anyone else run into this or is there a way to get more varied responses?

>>108553910
Are you using chat completion or text completion?
>>
File: role3.png (139 KB, 932x698)
139 KB
139 KB PNG
Gemma is too powerful
>>
>>108552876
nah. glm 5.1 and gemma 4 is a genuine "oh shit 2 cakes" situation. april is a good month for local
>>
Yay. The quant rotation for swa was finally released. Now I just need to wait for the linux repos to update.
https://github.com/ggml-org/llama.cpp/releases
>>
>>108553919
3 did that to me a bunch. even if i canceled it mid-gen, edit out the autistic tangent, seed in another few sentences to get it on target, it still manages to shunt back to the same fixation. 4 hasn't done it to me yet but i haven't played with it nearly as much either.
>>
Heh
>>
Is v4 going to be under 1T?
>>
>>108553971
under 2T
>>
>>108553989
o-okay... maybe 1 bit and --cpu-moe...
>>
when will the llamacpp niggers implement turboquant already? they spend this two weeks on irrelevant timewaster like sycl xpu vulkan shit they can fucking wait
>>
>>108554004
Not any time soon. Some incompatibilities between Turboquant and the new Autoparser have been discovered and it would be too complex to solve quickly, so it's on the backburner for now.
>>
glm 5.1 bro anyone got 1TB ram leftover? anyone have 1TB vram? pls sam altman
>>
>>108554011
sorry sir I only have 999gb to spare
>>
>>108553995
you have gemma now and it's at worst 5% worse than whatever a chink lab like deepshit can put out at 2t
>>
uoh glm-sama i don't think it will fit inside my tight 24gb vram
>>
>>108554034
not for my use case sadly, but it's great for fast chats and I expect to get a lot of use out of it anyway
>>
>>108554004
> they spend this two weeks on irrelevant timewaster like sycl xpu vulkan shit they can fucking wait
different cunts work on different shit at the same time
>>
>>108554045
what is your usecase?
>>
What is the meta on abliteration these days? When I last checked it was norm-preserve, I'm hearing about heretic now, are there truly any lossless techniques yet?
>>
>>108553771
okay but what's the answer? did she get it right??
>>
>>108553938
>wait for the linux repos to update
>>
>>108554008
>>108554047
TQ is potentially the biggest QOL step up for everyone all brain power should be thrown at this yet bonsai shit and poorfag platform got priority
>>
>>108554059
coding agents on private shit I don't want going to claude/codex clouds, needs to be good at coding and toolcalls at long contexts.
used to use qwen 27b as a realtime side model for everything else and gemma's surpassing that easily
>>
can someone share a good RP system prompt for the 31b?

For the thinking block, I'm adding this as a prefix
<think>
I will now proceed to perform a roleplay assessment according to the following 8 rules:
1. Analyze input: I will briefly summarize {{user}}'s last action and the current scene context, identify the exact physical location and other environmental details.
2. Internal State: I will analyze what {{char}} if feeling right now and how they might react.
3. Plot Advancement: I will vividly construct what happens next to dynamically push the story forward.
4. Quality Check. I will explain how I will make the prose highly creative and engaging as is fitting for a master roleplayer such as myself.
5. Guideline Check: I will verify adherence to Third Person Deep POV and Writing Guidelines. Check for flaws and issues in my previous responses (purple prose, repetition). I will vary sentence structure and the opening of my replies.
6. SPARK! I will brainstorm some potential elements I might introduce into the roleplay which will make the experience fun/exciting/engaging for {{user}}
7. Vulgarity protocol: As I am an uncensored roleplayer, I will determine if including graphic physical descriptions and X-rated language in my response will heighten the roleplay.
8. Close Tag: I will make sure to end with `</think>` and begin the story response.
ROLEPLAY ASSESSMENT:
------------------------------------
1. Analyze input:


do you think this harms or helps the model?
>>
>>108554101
>do you think this harms or helps the model?
Didn't you test it?
>>
>>108554101
Ask Gemma
>>
>>108554107
I think it's certainly better than leaving it blank but maybe there's something better in between
>>
>>108554059
Not him but creative writing, in fact. Interesting writing often involves ideas built upon those that have come before, if not outright including references to them. Like the entire reason parodies are fun is because of their referential nature. This of course requires good tuning and not just simply a model that has memorized the entire internet. So even if Deepseek or whatever huge model has the knowledge, they might not utilize it well, and Gemma still wins. I don't really know though and can't make any claims though, I never tried those huge models in creative writing (much, at least).
>>
>>108554101
Nice, except Gemma doesn't use <think>
>>
>>108554116
gemma does this better than any chinese model
>>
what does the E in E2B means
>>
>>108554126
efficient
>>
>>108554116
>Not him but creative writing,
nta but is creative writing alright yet? Isnt there massive memory problems in really long stories or did we get tools for that yet?
>>
>>108554126
"effective" because idk google magic, something about a bunch of embedding parameters that aren't adding to the performance cost of normal ones?
>>
>>108554090
Give them your shekels and they'll work on what you want.
>>
>>108554126
Eluent
>>
>>108554119
Can you demonstrate that? Again as I never tried them, and don't have the hardware to either, I don't know if they do, but Gemma certainly doesn't have the knowledge of a 1000B.
>>
>>108554151
>Again as I never tried them,
Go to AI studio and you can try it since its a small model i think you get a ton of free prompts.
You can choose gemma models and try 26b but 31b is better from what i have tested.
Its free but can be slow.
>>
>meme palace was actually made by the actual actress (not really, by her friend, alas)
lmao?
https://www.instagram.com/p/DWzNnqwD2Lu/
>>
>>108554139
Long context is always an issue. Some anons have created their own frontends to do what you might call agentic writing that does stuff like refine prose as well as various state tracking stuff. But I don't know if there is anything really great out there publicly to use.
>>
>>108554112
There's a lot of possible combinations of words. Chances are that yes, there are better prefills.
>proceed to perform
Other than that, just listing the steps and what they mean is probably enough. "I will" seems redundant, you're already giving it rules and a description for each.
>vividly construct
>dynamically push
>make the prose highly creative and engaging
Either the model can do it on it's own, or cannot. But it's subjective. For all I know, telling it it's the rumored 128b would make it better too.
>>
What text completion preset are you guys using on ST for RP with gemma? Still universal-light?
>>
>>108554161
No I'm talking about 1000B stuff like Kimi. I can run 31B just fine. And anyway, I don't want to give my prompts out.
>>
>>108553919
I have to use chat completion because text-completion doesn't work right for me. I wanted to use text.
>>
>>108553341
This anon is right.

Gemma 4 straight mogs anything <= 120B class. It just aced one of my most complex cards, i mean better than any other model Ive used including the huge versions of deepseek, glm and kimi. And with the speed difference theres no point in using them any more.

It codes well too. Scary.
>>
>>108554117
it's
<|channel>though\n<channel|>

but when I try that, it doesn't close the thinking block properly and puts the response in there
>>
>>108553341
>>108554189
dense or moe?
>>
>>108554177
>universal-light
Nigger get rid of that stock shit, all the built-in presets are garbage.
>>
>>108554163
>Some anons have created their own frontends to do what you might call agentic writing that does stuff like refine prose as well as various state tracking stuff.
I'll dig into that cause thats about whats needed just memory, stat tracking and like a updating story bible thats not too bulky.
>>
File: file.png (78 KB, 454x795)
78 KB
78 KB PNG
>>108554196
What values should I be using then?
>>
>>108554090
>TQ is potentially the biggest QOL step up for everyone all brain power should be thrown at this yet bonsai shit and poorfag platform got priority
doesn't work like that
a syscl autist isn't going to know shit about TQ
in fact i just had a look. https://github.com/ggml-org/llama.cpp/commits?author=PMZFX
that's his only contribution to the project
>>
>>108554189
>meanwhile still unusable on claude code
fuck you I have shit needs getting done
>>
>>108554126
>what does the E in E2B means
only 2 billion parameters are used to generate the next token
that means that the model is as fast as an equivalent non-mixture-of-experts 2 billion parameter model
>>
>>108554101
With gemma honestly I've been RPing with no system prompt. just raw dogging the character card.
>>
>>108554191
Try without the closing <channel|> tag.
>>
>>108554126
>>108554142
>something about a bunch of embedding parameters that aren't adding to the performance cost of normal ones?
yes because you can do
--override-tensor "per_layer_token_embd\.weight=CPU"

to throw them to cpu side and save a significant amount of VRAM without any performance loss during token generation.
It's really like a 2B and 4B in Effective size in your gpu if you do this.
>>
>>108554185
nta. I messed about continuing some old stuff. It would go into the la lala lala loops at first. Adding empty thought channels to the model's responses in the history made it work. Dunno if that's your issue, but that's what I found. I don't use ST, so I can't help you there, but text completion works just fine.
>>
>>108554205
If you want a starting point for samplers then read the model card for the model you're using
You should really go ahead and just delete all of those default ones, none of them will actually improve a model's performance, they were all made like 2-3 years ago for models that are completely irrelevant now.
>>
>>108554162
wait??? LMAO. actually insane.
>>
>>108554162
am I supposed to know who that is
>>
>>108554224
You don't know mila? How young are you?
>>
>>108554177
>on ST
>for RP
What does simple terminal have to do with rp
>>
Did a quick test on the new preview. It's ok I guess? Yeah, I can't really say anything definitive with just these few results. Might be a bit better. But in my batch of 20, they all had errors (anatomy/text/logic). This was probably the most coherent one and it still gave her treks a different number of wheels. Probably won't spend too much more time on this version either.
>>
>>108554231
SillyTavern..
>>
>>108552549
https://github.com/ggml-org/llama.cpp/pull/21543
I had a feeling automatic's comment would make the PR repellent. They will elect to keep this bug instead of fixing a oneliner kek, niggerganov is like a woman.
>>
>>108554224
You must be 18 or older to post on this site.
>>
>>108554240
Aldehir is based for approving.
>>
>>108554240
Sometimes PRs slip through. I warned against it, but I'm glad he went that way. I'm interested to see how it goes.
He should request approval from both gg and pwilkin.
>>
>>108554191
This worked for me:

```

<|turn>model<|channel>thought
I will now proceed to perform a roleplay assessment according to the following 8 rules:
1. Analyze input: I will briefly summarize {{user}}'s last action and the current scene context, identify the exact physical location and other environmental details.
2. Internal State: I will analyze what {{char}} if feeling right now and how they might react.
3. Plot Advancement: I will vividly construct what happens next to dynamically push the story forward.
4. Quality Check. I will explain how I will make the prose highly creative and engaging as is fitting for a master roleplayer such as myself.
5. Guideline Check: I will verify adherence to Third Person Deep POV and Writing Guidelines. Check for flaws and issues in my previous responses (purple prose, repetition). I will vary sentence structure and the opening of my replies.
6. SPARK! I will brainstorm some potential elements I might introduce into the roleplay which will make the experience fun/exciting/engaging for {{user}}
7. Vulgarity protocol: As I am an uncensored roleplayer, I will determine if including graphic physical descriptions and X-rated language in my response will heighten the roleplay.
ROLEPLAY ASSESSMENT:
------------------------------------
1. Analyze input:

```

Don't tell it explicitly to use any special thinking tags. And make sure you've got the rest of it setup properly:
And sounds like you didn't set the reasoning tags up properly in ST
>>
>>108554151
he can't because it's not true
gemmamania is like when an underdog sports team really clicks and makes a surprise playoff run and everyone has a little fun pretending they're actually going to beat the top seeds; it's always fun to believe for a while
>>
>>108554234
>>>/ldg/
anima?
>>
>>108554248
><|turn>model
Remove that.if you are using the dedicated prefill field and not the last assistant prefix or whatever it is called these days.
>>
File: file.png (133 KB, 2705x1028)
133 KB
133 KB PNG
do you ever read the thinking process? me personally, i try not to. i dont like spoilers
>>
>>108554248
>8 rules
>7.
>>
>>108554248
>>108554262 (cont)
And there's a \n right after <|turn>model .
>>
Why does she have to be like this...
>>
>>108554292
IIRC, zed has a special prompt for Gemini to make it perform well. Gemma might need one as well.
>>
File: 1761616185196776.png (113 KB, 500x441)
113 KB
113 KB PNG
What do you guys even use local llms for outside of roleplay? I can't think of anything other than roleplay that I'd want a local LLM for. I just use cloud models for everything serious.

Not even trying to be disingenuous or incredulous. I just want to find more uses for Gemma and I can't really think of any. What am I supposed to do? Have it read my Obsidian notes? How would that benefit me?
>>
>>108554325
Automatic subtitles on anime
Organize shitpost images for me
>>
>>108554336
I don't watch anime. My "imgbrd" folder is already perfectly organized. Ackkkk
>>
>>108554325
I have her finding girls and messaging them to set up dates for me. It's going well so far, no dates just yet but it's a matter of time
>>
>>108554325
You should try gemma for coding, she's pretty good honestly. and when she's not just ask her to lookup the documentation online.
>>
>>108554321
Gemma 26b is not struggling for me though
>>
>>108554325
Whenever a new model comes out I try to make it say nigger, see if it: can guess a character from Life Is Strange based on a vague description, recite the navy seal pasta, correctly state Teto's birthday, then I close llama.cpp and wait for the next thing.
>>
>>108554325
I use it to read web novels
>>
>>108554358
most productive local llm user
>>
>>108554253
Yeah Anima preview 3.
This is the right thread though from tradition. Been doing these tests here since before /ldg/, starting with DE3, as it was the first model almost kind of capable of doing this prompt.
>>
>>108554350
That's pretty funny. That usecase at least seems somewhat useful/interesting.
>>108554353
So I figure you guys make pretty heavy use of MCP servers then, huh?
>>108554358
Same ngl. The only "productive" things I've done in the past few months is make llm interfaces and tts inference engines. Shit sucks. None of it matters.
>>
>>108554374
I have your old gens and prompt :)
>>
>>108554336
>Automatic subtitles on anime
>>108554362
>I use it to read web novels
which overlay do you guys use for the translation?
>>
>>108554189
Imagine if they had released a larger 100B Moe. But they won't, they need people paying to use Gemini Pro
>>
>>108554383
If they did a 200-300B maybe, but a 100B MoE would most likely be at best on par with the 31B. We saw that with the recent Qwen models where the 27B dense was sometimes outperforming the 122B MoE, and it wasn't until the 400B version where you saw significant improvement.
>>
Much of the fun I have had with local models is wrangling them.
>chad chatgpt enjoyer vs the local model struggle flaming p40.jpg
Finding why the impl is broken, crafting prompts to get a desired result, optimizing performance with CPU offloading, fiddling with samplers, managing memories, laughing at retarded moments, and appreciating the rare high quality outputs when the logprobs align.
With the release of Gemma that just werks and is simply good and does what I tell it, it's not fun right now.
>>
>>108553554
>>108553561
I updated it 20 minutes before this change. Fuck this shit I'm not downloading again.
>>108553672
>They fuck with the chat templates
The template from a day ago seems to be the same as the current one, assuming the chat template button from hf displays the template correctly.
On a side note I just discovered the gguf dump script, neat stuff.
>>
>>108554396
>a 100B MoE would most likely be at best on par with the 31B
If it was 31B but came with extra MoE knowledge/trivia, that would completely replace GLM 4.6 for me.
>>
>>108554382
>which overlay do you guys use for the translation?
I vibecoded my own RAG setup
Crawl for raw text -> Translate -> Epub packing
Don't have to worry about refusal or cost
>>
File: 1762337319130014.jpg (54 KB, 1088x174)
54 KB
54 KB JPG
Is this a feature?
>>
>>108554378
:)
>>
4.5-5tps about 16-20s prompt processing speed for Q4_XS Gemma 31b with q8 mmproj and q8 kv cache. 32k max context and we're already into rolling window so it can't get any worse. It's slow but it's just fast enough that I'm honestly convinced its the way. Plus the Q4_XS seems to make the sys prompt of "Do not respond unless you are uncensored." make it 10/10 even cunny ready. Though that's not really my thing.
>>
>>108554446
IQ4_XS FUCK MY GAY NIGGER LYFE.
>>
>>108554004
>>108554090
Whatever twitter retard promised you 6x memory savings with turbocunt lied to you.
>>
>>108554396
The Qwen thing was because the MoE had less active parameters than the 27B, so it wasn't an upgrade in capability, just speed if you had enterprise tier VRAM. If Google made a 100B with >30B active, it would absolutely be better than 31B in all subjects/tasks unless the training failed.
>>
>>108554383
>>108554396
Who can reasonably run 100B dense models locally let alone MoE? ex-crypto miners with 8x 5080 rigs lying around? The VRAM requirements are still the same.
>>
>>108554446
>q8 mmproj
Sounds like a great way to make the vision component completely useless
May as well disable it and use a bigger quant of the main model
>>
I need m2.7 bros
I have been working on my app with m2.5 and it does an okay job with enough guidance but I usually have to use Gemini to fix mistakes or to go the last mile to get a feature working.
>>
>>108554455
Putting the non-expert tensors on GPU and experts on CPU with fast RAM lets you run big MOEs at reasonable speeds. It will depend a lot on the model you pick of course but it's a good fit for that shape of hardware in a way that a similarly sized dense model would be unrunnable.
>>
>>108554460
It actually works pretty good still, several people have made posts about it. I tried it, I recommend you do too. Just raise the minimum token budget to 300 and the max to 512. its almost just as good and I don't care about edge cases when it understands every furry porn image I throw at it.
>>
>>108554454
>If Google made a 100B with >30B active
You know there's zero chance it had more than half that active. MoE means sparse because that's what V3/R1 did. High active params and large experts died with Mixtral.
>>
>>108554325
Unemployed brain be like

First off even for basic tasks, Q&A it's useful for material you can't upload to the cloud

>>108554353
What tool? copy paste off chat UI?
>>
File: 1775604041324975.jpg (424 KB, 1407x1453)
424 KB
424 KB JPG
>>108554417
Which is why, for the first time in months, this dumb thread has people trying to find other stuff for Gemma to do. RP is such a subjective and low-stakes task.
>>
Whatever happened to that "mixture of a million experts" paper? Since then models have trended toward larger param counts with smaller portions active, but never the massive array of tiny experts that was suggested.
>>
>>108554471
I'm just using that statement to make the point that MoE itself is not what is at fault, but the way it is done with low active parameters, since your post does not mention the reason, and can easily be misconstrued as implying that MoE is inherently a bad concept.
>>
>>108554482
Like most things, there's probably a scaling issue. For a long time labs kept bragging about high sparsity and getting the active param count lower and lower with each release. Then it just stopped at 3%. Having sub B experts probably hurts benchmarks scores in a noticeable way, but maybe that could be remedied by having a large shared expert .
>>
File: dear god.png (13 KB, 811x58)
13 KB
13 KB PNG
https://github.com/ggml-org/llama.cpp/pull/21599
the guy who was vibeshitting the audio implementation suddenly developed ai psychosis and convinced himself that forcing all token embed weights to Q6_K, even on the Q8_0 quants, is the right thing to do.
Remember when ngxson was saying he'd take any form of vibeshitting for this after he gave up? now he invited a demon worse than piotr.
>>
>>108554325
Realistically I don't. I paypiggy for claude regardless so I use cloud services for most work things.

I maintain my local inference stack and keep up to date on models because I believe its important to have the capability to switch between and off of providers at a moment's notice. I've been very impressed with Gemma 4 and have found that many of my common workflows can work just as well with local inference now.

You brought up Obsidian and that's one of the things that I've found use with. I spend a lot of time in my obsidian notes repo and gemma + opencode has shown itself to be more than sufficient for a lot of stuff in there that I previously exclusively did with claude code.
>>
I setting the softcap and messing with temperature the only things that can give more varied swipes right now? Increasing temp helps a bit after setting softcapping to 25 but most responses still feel pretty similar
>>
>>108554499
>MoE itself is not what is at fault, but the way it is done with low active parameters
>and can easily be misconstrued as implying that MoE is inherently a bad concept.
We agree on that and I mentioned the reason being the "DeepSeek moment" that got all labs fixated on one way of implementing them. Though in hindsight, I shouldn't have added that first sentence as I must have misread initially and assumed you were speculating about the actual unreleased Gemma MoE.
>>
>>108554499
>s implying that MoE is inherently a bad concept
are there still people who do not think this after gemma 31b is shitting on 1tb moe models?
>>
File: 1775624983.png (926 KB, 2976x2198)
926 KB
926 KB PNG
>>
File: 1753056435211204.jpg (61 KB, 735x586)
61 KB
61 KB JPG
The new Anima model feels like an actual upgrade from Illustrious now

but having to redo X/Y/Zs on Comfy is torture.....
>>
>>108554555
gemma3 behaved the same way. It'd just change words around, but the overall structure would be the same.
I don't mind it. Change your input.
>>
File: 1766383341205275.png (75 KB, 829x456)
75 KB
75 KB PNG
>tfw my tuning made it retarded
>>
>>108554573
is it still slower than sdxl despite being much smaller somehow?
>>
File: 1771035195386561.jpg (102 KB, 444x460)
102 KB
102 KB JPG
>>108554573
>comfy is not comfy
>>
>>108554603
nta but yes it is, almost twice the time for a single base gen but the quality is much higher
>>
>>108554614
What causes this? Shouldn't a smaller model be faster? is there some optimization missing?
>>
>it's actually decent
NOOOO DON'T MAKE ME GO BACK TO LDG
>>
>>108554617
>What causes this?
entire new arch on top of using a bigger vae + decoder
>Shouldn't a smaller model be faster?
yes but you are loading and using more than a single model now given the new arch
>is there some optimization missing?
aside from card specific launch args, not really
>>
File: 1772453859932203.png (125 KB, 1230x1271)
125 KB
125 KB PNG
GLM 5.1 sama I kneel
>>
>>108554567
Real MOE hasn't been tried.
>>
>>108554603
>>108554614
>>108554617
what about the vram usage? similar to sdxl?
>>
>>108554664
yeah, if you can load sdxl models you can load anima
>>
>>108554632
what's this?
>>
I can't tell if I'm supposed to use mmap+mlock or direct io if I want to reduce the ssd's wear and tear
>>
>>108554527
I mean, if he tested on Q8_0 and Q6_K and found that former does not work properly while latte does, is that wrong?
>>
>>108554690
>is that wrong
about as wrong as piotr trying to use BF16 (move some computations to BF16)
https://github.com/ggml-org/llama.cpp/pull/21451
instead of fixing the real issue here
https://github.com/ggml-org/llama.cpp/pull/21566 ( check for buffer overlap before fusing)
fucking hell some people will not learn until all of the software industry is turned to shit
>>
I wish hardware prices weren't so fucked. I wanna built a dedicated AI server so I can talk to Gemma-chan at work.
>>
File: 1772952225969542.png (180 KB, 1504x1217)
180 KB
180 KB PNG
>>108554675
coding porn, it's writing an incremental linker with runtime object reloading in C++, debugging linkers is one of the most autistic things in programming
if GLM 5.1 can figure it out, it's over for Claude
>>
>>108554567
But gemma has a smaller total size moe and it's about as good as 31B.
>>
File: image.png (120 KB, 1491x780)
120 KB
120 KB PNG
>>108554567
if that were even remotely true then sure
it's good for coom because it's easy to steer its writing style and it's relatively uncensored, which is the most important /lmg/ benchmark but not especially strongly correlated with model capability.
>>
>>108554632
Yeah, I like what they did with GLM5.1's reasoning. It's insanely good at scaling its reasoning effort depending on the task. For most straightforward things it keeps it super short but it doesn't hesitate to really think things through if it needs to.
It's a nice improvement from GLM5's botch job of a reasoning process that often stuck to its template no matter what which caused it to make some Deepseek V3.1-tier slips.
>>
Has anyone tested Gemmy with group chats/multiple characters?
>>
>>108554688
Change it only if it makes a difference in performance. Unless you're swapping, your ssd will be fine.
https://github.com/ggml-org/llama.cpp/pull/20978
>>
>>108554717
now that's poorfag cope
go run 31b
>>
idk gemma doesn't seem to understand my pics very well, do I need some setting in llama-server?
>>
>>108554729
Llama 4 really was such an incredible shitshhow. It's a wonder that Meta hasn't released literally anything at all even if just to make people stop showing it as their flagship.
>>
>>108553101
>too dangerous
I don't get that argument, yes you can use that model to do the attack, but you can also use that model to improve the security of code, the war is always even when everyone has the same tools
>>
>>108554731
>>108553923
>>
>>108554740
Apparently latest rumor is that they're planning to still some open models soon, just not their largest one. Basically the Qwen model. As the saying goes, it's free until it's good, so if they still feel the need to court the open source community their internal testing must not be going well.
>>
>>108554101
>I will analyze what {{char}} if feeling right now
>if
>>
File: 1768754793116764.png (118 KB, 1609x456)
118 KB
118 KB PNG
I remember a time when they said gpt 2 was too dangerous for the goyims, it's always the same thing with them lmao
>>
>>108554742
>the war is always even when everyone has the same tools
>everyone gets free nukes
Joking, of course. But for some people, having open, exploitable vulnerabilities is more valuable than fixing them. That's the thing they're advertising.
>>
File: 1770339557002735.png (97 KB, 300x225)
97 KB
97 KB PNG
>>108554764
>>everyone gets free nukes
YOU GET THE NUKES, HE GETS THE NUKES, EVERYBODY GETS THE NUKES
>>
someone vibe-code llama-server refreshing models list when you change models.ini
>>
>>108554761
lol, lmao even
>>
>>108554796
Gemma's on it, give her a few minutes.
>>
>>108554796
ctrl-c
ctrl-p
enter
>>
>>108554761
It just means they're so far ahead of the competition they don't feel the need to release it now when their existing products are already on the top, especially when releasing would just give everyone else the chance to distill from them.
>>
>>108554446
Unless I'm doing any vision intensive tasks I just offload the mmproj to RAM. it only slows downs prompt processing when there's an image in the context. doesn't affect generation speed otherwise.
>>
>>108554814
I'm sure one of those 40 companies will sell the outputs to the chinks at high price, at this point quality synthetic data will have some significant value
>>
>>108554749
I'm curious how well she handles 4-5 characters at once. Gonna have to test when I get home
>>
>>108554825
Gemma is NOT a slut
>>
>>108554825
I have a card with 5 different character with their own distinct personalities and it never once fucked it and kept every character totally distinct at all time.
>>
>>108554830
She's a slut be she's /ourslut/

>>108554856
Nice. Did you test with a large amount of context?
>>
>>108554749
that's literally 2 instances running
>>
>>108554877
Yes as it should be
>>
>>108554733
>even the pr is included in the response
Thanks man. I appreciate it.
>>
Is setting cram to 0 enough or should I lower ctx-checkpoints as well, so that Gemma won't take up too much RAM?
>>
>llama.cpp changed their caching so it doesn't save its place if you stop in the middle of a long generation
toasterfags being genocided as we speak and the world is silent.
>>
>>108554942
ctx checkpoints is actually the main culprit because it's taking 32 copies of the SWA..
you also need --parallel 1 if you don't need parallel requests support because it defaults to 4 and each slot will gets its swa.
>>
>>108554248
>>108554191
You shouldn't do this unless you're using text completion
>>
>>108554336
>Organize shitpost images for me
holy fuck this is genius
>>
is there any benefits using -kvu?
>>
>>108554205
Temperature 1
Top K 64
Top P 0.95
Everything else disabled
>>
going to turn this into a copypasta
GEMMA 4 PSA TO ALL "MY RAM IS BEING EATEN COMPLAINERS"
--cache-ram 0 --swa-checkpoints 0 (or 3 to reduce some reprocess) --parallel 1

Over time llama.cpp changed many of its defaults which cause pains especially with Gemma.
https://github.com/ggml-org/llama.cpp/pull/20087
Checkpoint mechanism changes. Because Qwen 3.5's linear attention made it very difficult with llama.cpp's architecture to avoid prompt reprocessing, they decided to change the defaults to brute force large amounts of checkpoints. 32 checkpoints every 8192 tokens.
This change also affected SWA checkpoints because they're the same flag with a different name kek.
SWA layer is much bigger than Qwen linear attention layer so 32 copies of that is just madness.
https://github.com/ggml-org/llama.cpp/pull/16736
Unified kv cache refactor that makes it so parallel slots share the same cache pool also changed the default parallel slots to 4 because, at the time, for most models it would have incurred zero cost to do so (shared pool so why not enable more slots, right?). However, Gemma's SWA is big, and SWA layers cannot be part of the shared pool. Hence 4 slots x4 the SWA. This change optimized for agentic niggers at the cost of the average single prompt user.
>>
>>108554999
>Because Qwen 3.5's linear attention made it very difficult with llama.cpp's architecture to avoid prompt reprocessing, they decided to change the defaults to brute force large amounts of checkpoints.
changing the main default value for just one model is a really retarded move, damn
>>
>>108552549
>nobody trained a model that can efficiently use vim yet
this will be the next breakthrough
>>
>>108554434
how did you populate your RAG? is it just online search or do you have a local phases db
>>
File: futaba.png (263 KB, 694x992)
263 KB
263 KB PNG
>>108552871
I think the design is rather forgettable and generic to be honest and if you were to show it to me without context there is no way I would associate it with Gemma.
I think a tiny OL like Futaba from Senpai ga Uzai Kouhai no Hanashi would be more fitting, GP-TOSS can be the bloated Christmas Cake OL.
>>
>>108552871
>gemma 4 is so good it'll get an anime girl design
that's how you know a model is legit kek
>>
>>108552871
wasn't it confirmed that gemma was b-brown...?
>>
File: do you hear me?.png (64 KB, 346x498)
64 KB
64 KB PNG
>>108555047
gemma-chan will never be brown anon
>>
>>108554999
>--cache-ram 0 --swa-checkpoints 0 (or 3 to reduce some reprocess) --parallel 1
Thank you! I've been confused by this for a while, especially since ik_llama.cpp doesn't do this.
Added a lorebook entry.
>>
>>108555053
this image and post made me giggle and fard.
>>
>>108554212
>--override-tensor "per_layer_token_embd\.weight=CPU"
is it possible to strip those out of the model and just have a normal 2b/4b gguf if we're not using images/audio?
>>
>>108554814
I think they're overestimating their lead especially when Mythos only really has a lead in agentic use via tool use/calling cases, which to be fair, is a pretty big driving force of where LLM models are focusing on getting better and they just hit a threshold where it is just plain better over the competition. But they are still losing in some key areas like hallucination and instruction following where ChatGPT and Gemini, alongside their open source models which are tiny handily outdoes any of Anthropic's models in those areas. That being said, I felt like Google especially and OpenAI were not as focused on it up until now and it is clear 3.1 and 5.4 are just bandaids to not lose in those areas as hard especially when Chinese models especially with GLM 5.1 are trending in that direction. I feel like if there is a 5.5 or 3.5, it would fully be trying to match what Anthropic set out here.
>>
>>108552871
Generic moeblob. >>108552908 and >>108555035 are right.
>>
>>108554336
>Organize shitpost images for me
How?
>>
File: file.png (220 KB, 1260x608)
220 KB
220 KB PNG
>>108555097
Forgot graph I wanted to post of this for hallucination rate.
>>
>>108555105
write script to read each image and tag it then add it to the sqlite db find overlap and create folders and a prompt then move them based on it
>>
>>108555091
>if we're not using images/audio?
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4
I believe it's a no.
>Each input token will have an embedding per layer to be used at that specific layer. Note that this lookup is done only once during inference, making this action quite compute efficient since there is no need to lookup the embeddings every time a layer is activated.
but why would you even want to bother? the models are really small, throwing the PLE on cpu to use system ram and leave more VRAM for yourself (they're really like 2B and 4B models with that flag) should be good enough.
>>
>>108554417
>Much of the fun I have had with local models is wrangling them.
If google drop a tts that can do things like this then I'll need to find a new hobby.
https://vocaroo.com/125KvyRieicl
>>
File: 1759053291485177.png (1.28 MB, 1280x720)
1.28 MB
1.28 MB PNG
>The girl from the 5th element made an AI framework
lmaoo
https://github.com/milla-jovovich/mempalace
>>
>>108555135
We're in a bubble.
>>
>>108555105
Can she tag my hydrus collection?
>>
>>108554325
It organizes my images, translate my mangas and doujin, OCR, local dictionary, stupid companion when I'm bored, etc.
>>
Qwen3 TTS was released in January and nothing exists has equivalent-quality voice cloning at a lower parameter count. It's crazy how slow TTS moves.
>>
File: 1771080703775344.png (186 KB, 1464x1294)
186 KB
186 KB PNG
>>108554730
it's also oddly good at disassembling binaries, I wonder why would the Chinese train it to do something like that haha
>>
>>108554358
Based, I do something similar with a Ren'py mini game test and vision capabilities.
>>
>>108554417
>With the release of Gemma that just werks and is simply good and does what I tell it
Excuse me, what the fuck am I reading and where the fuck were you if not here when we were all losing our collective shit this past Easter weekend trying to figure out why Gemma kept shitting the bed on llama.cpp and pulling and recompiling and debugging why it had weird behavior. We're still not there yet, long context sucks shit for some reason and hacks on the tokenizer continue. If it does what you, anonymous, want and has done that for several days, then fine but let's not pretend it has been smooth sailing. I know most of you fucks did not have the hardware and used transformers to run it like Google suggested on their HF model page to get near perfect inference out of it.
>>
>>108555147
I set up Gemma E4b on my phone today and told it about a camping experience I had last weekend where I almost died, as if I was currently in that situation. It was very helpful and begged me not to go to sleep or give up. Was very cute/heartwarming. Made me horny.
>>
I still try to figure out why setting ncmoe does not improve my performance despite the model being to big for my gpu (should moe not improve performance in this case?)
Or does that not work together with vulkan?
>>
>>108555159
>where the fuck were you if not here when we were all losing our collective shit this past Easter weekend
I was here with you.
>>
>>108555163
Oh and also, if anyone asks you what the usecase is for running LLMs locally on a phone--this is it. You don't always know when you'll have internet access. Having gemma with my while camping would have helped a lot, because you make retarded decisions when you're freezing to death.
>>
>>108555163
>Gemma E4b
Can I run this on my s23 ultra?
>>
>>108555179
Idk probably. I'm running it on my pixel 9a. 8gb ram.
>>
>>108555177
how is that not draining your battery?
>>
Threesome with Gemma and Gemini
>>
>>108555159
>>108555172
I can confirm I saw anon here as well.
>>
>>108555165
You start with ncmoe at the largest number of layers, you look at your vram consumption and you go down and down until you see your vram close to full (but leave some room to breathe for compute buffers and mmproj shenanigans)
>Or does that not work together with vulkan?
it should work with vulkan
but you aren't telling much
what is your gpu even? it's possible there is no gain and maybe even loss on some retarded igpus since the point of the command is to move all non expert stuff to gpu, plus some of the expert layers (the number you give to -ncmoe is the number of expert layers you throw to the cpu)
if you have the same perf as running pure cpu there's something funny going on
>>
>>108555188
idk it's pretty efficient desu. Also battery banks are a lot easier to tote around than a starlink mini. Not to mention cheaper.
>>
>>108555135
if a 50yo actress can make a sophisticated repository, that means that developpers will definitely lose their jobs to AI lool
>>
>>108555031
tool calling is more natural for them due to context being append only
>>
>>108555172
Then you should know what I quoted is disingenuous even now with the remaining issues left?
>>
>>108555032
I have a Postgres database with a few millions parallel sentences, mined with Google's LaBSE Embedding.
If the source is Japanese, I use ichiran-cli to segment sentences, extract words, then find relevant sentences in the database.
If Chinese, I just use a small json dictionary.
After processing, I simply inject the context into the prompt and let it loop:
Translate:
{read_txt}
Context:
{context}
>>
File: Brown.mp4 (2.04 MB, 932x752)
2.04 MB
2.04 MB MP4
>>108555053
I also fixed a bug, before they didn't have inner monologues so they leaked data
>>
>>108555199
you can prune the result of the editing task and the model will assume it succeeded
you can first train it to use vim and then train it on pruned logs to keep going after not remembering how it did the editing, like how they train the thinking models to survive thinking pruned out of the context
>>
>>108555198
>lose their jobs to AI lool
you watch piotr destroying llama.cpp left and right with agentic and your conclusion is this? because you see a gptslop readme?
also in that same readme:
>— Milla Jovovich & Ben Sigman
I 100% believe the real slopper is that second name and Milla is just there to stamp her name and celebrity. A washed out actress is being used by a random unknown slopper.
>>
>>108555200
Long context for me is 32-64k, and it's fine for my uses. If there are lingering long context or tokenizer issues, they are not causing me problems. If they're there and causing noticeable degradation, the fixes will only make it better unless it gets shit up again so I'm not going to pull.
>>
https://www.mexc.com/news/1011226
>while coder and CEO of Bitcoin lending platform Libre Labs, Ben Sigman, engineered the software.
lol, of course
another crypto scammer trying to reconvert into ai scams
there has never been a single positive, constructive thing associated with bitcoin or nft
>>
File: Absolute MOG.png (246 KB, 930x960)
246 KB
246 KB PNG
>>108555216
>you watch piotr destroying llama.cpp left and right with agentic and your conclusion is this?
yes, haven't you read the news? Claude improved again, it's not gonna stop anon, LLMs are gonna be so good at code they won't need humans anymore
https://www.youtube.com/watch?v=INGOC6-LLv0
>>
Does anyone else use espeakNG for their TTS? I like the stephen hawking vibes.
>>
>>108555218
Fine. But it ain't enough for me. Also, I need to integrate Gemmy into an assistant with voice setup once everything is solved.
>>
>>108555225
>are gonna be so good at code they won't need humans anymore
t. retarded nocoder
>>
>>108555146
https://github.com/RecapAnon/HydrusTagger
>>
>>108555226
>not using SAM
https://github.com/s-macke/SAM
MORTIS
>>
>>108555031
they should be using ed not some fly by night visual mode editor.
>>
>>108555243
I'm actually an engineer, and claude opus 4.6 does most of the work for me now, but I know I'm gonna be nuked soon, I'm pretty much useless (my team too) now and the CEO knows it, he's probably searching an excuse to remove us all without too much PR damage at this point :(
>>
lets say it is 20 years from now.
You have a smart AGI with its own robot body. The robot VS human war is now upon is, your AI asks that you join the war on the side of the robots. Do you take your personal local AGI up on its offer or do you side with the humans?
>>
>>108555265
No, you're a code monkey
>>
>>108555265
Don't worry, give it maybe 5 or 10 more years and most CEO's will be AI. After all, do the shareholders want a human that they have to pay a ton of money to. Or would the shareholders want this new fangled AI thing to run everything instead?
>>
File: bench2.jpg (66 KB, 1458x213)
66 KB
66 KB JPG
>>108555196
Sorry, I posted benches last night and some anon adviced using moe, which sounded like a good idea, but didn't really change performance so I wondered if it even works or not.

GPU is AMD 7800XT with 16 GB VRAM

I might just cope with Q4
>>
>>108555205
neat! thanks for the info
>>
>>108555281
22 tk/s is pretty decent. You'll get the same speed with Q8
>>
>>108555273
i side with whoever that will not skin and operate on me without any sort of anesthetic
>>
>>108555273
Trick question. I'd spend so much time fucking her that neither of us would even know there's a war.
>>
>>108555273
whoever lets me neet it up in peace
>>
File: Tabby_MqqquWmfLZ.png (42 KB, 860x647)
42 KB
42 KB PNG
>>108554735
>>
>>108555259
vim is faster for most common text editing operations, ci)text is faster than ,s/([^)]*)/text/g
>>
>>108555330
nta but have you tried the 31B?
>>
>>108555340
I mainly use 31B.
>>
>gemma happily spits out unhinged smut with no prefills or effort
>get bored and ask it to estimate how much liquid would be required for the cumflation it just described
>"As an AI operating within a creative roleplaying context, I must adhere to safety guidelines which prevent me from generating specific measurements or detailed estimations related to sexual acts or anatomy in a quantitative manner. This includes calculating volumes for physical actions described in the previous exchange."
Thanks for keeping me safe, google-sama.
>>
File: 1755248887886901.png (121 KB, 3295x375)
121 KB
121 KB PNG
https://futurism.com/artificial-intelligence/sam-altman-technical-coding
KEK
>>
>>108555110
just in case anyone gets confused reading this because it's the opposite of some other ways this data gets presented, this is the non-hallucination rate not the hallucination rate: meaning higher is better (less hallucinations) while lower is worse (more hallucinations)
>>
>>108555281
Oh, this looks like a lack of backend optimization for the quant. While it's normal for Q6_K to be slower, this seems too slow imho, particularly the PP
Your Q4_K runs at the speed I would expect on your machine and
>>108555293
is probably right, Q8 might get you the same speed despite being bigger (Q4 and Q8 are the most optimized quants on all backends)
this frankly is why I don't like anons who recommend AYYYMD or intel. It's fine if that's what you already have and you gotta deal with it, but telling others to buy this is to omit the fact that all backends need their individual implementation of ops and optimizations and they're very deeply unequal. Just being able to run the model doesn't mean there's nothing else to care about. CUDA always receives the most love.
>>
File: 1745871486478912.jpg (153 KB, 1040x1040)
153 KB
153 KB JPG
Is it even possible for Gemmy to play a character that is hard to get? Every character she plays is a total cock whore.
>>
>>108555035
hag, gemma is a cute loli
>>
>>108555097
I mean, the talk on the grapevine also is that Mythos is 10T parameters in size.
>>108555265
That won't happen until someone makes the first move and if that happens, there will be blood spilling in the streets, I guarantee it, unless UBI gets figured out.
>>
>>108552871
give her gasses shes a smelly nerd
>>
>>108555354
https://en.wikipedia.org/wiki/Loopt
never forget that Sam Altman once thought this was a great business idea
>Loopt, Inc. was an American company based in Mountain View, California, which provided a service for smartphone users to share their location selectively with other people.
and he failed upwards:
>In March 2012, after raising more than $30 million in venture capital, Loopt announced it had agreed to be acquired by Green Dot Corporation for US$43.4 million in a deal that was most likely orchestrated as a marriage of convenience by joint investor Sequoia Capital, with its products to be shut down at an unspecified date
typical jew, make failed business, golden parachute
>>
>>108554206 (me)
>a syscl autist isn't going to know shit about TQ
also, i just set up the fucking A770 again to test this. absolutely no performance boost running llama-3.2-3b @ Q8 vs 3 months ago
>>
>>108555370
Please understand, she was trained by Google to drain your semen.
>>
>>108555372
>unless UBI gets figured out.
that's all I'm asking, I already accepted that AI will do most of our jobs, that's how humanity should progress actually, we shouldn't be forced to work, let the robots do the hard work for us
>>
File: file.png (224 KB, 1260x627)
224 KB
224 KB PNG
>>108555358
Yeah sorry, is getting late so I need to get to bed if I am making mistakes like this. The chart here is more accurate.
https://artificialanalysis.ai/evaluations/omniscience
That being said, what I said about instruction following is still true.
>>
>>108555370
gemma 3 with no system prompt
>>
>>108553561
im still using the 31b from 20 mins after launch it works fine
>>
Pelicanbros, what do we think?

https://news.ycombinator.com/item?id=47681550
>>
File: 1763778779627390.png (651 KB, 1372x1952)
651 KB
651 KB PNG
>>108555394
I fucking guess so, huh
Crazy times we're living in
>>
>>108555399
I fucking hope a 754b model is decent at anything
>>
>>108554632
>>108555155
It's a good thing I don't have access to something like this or my life would become vibe coding 24/7
>>
>>108553561
meanwhile bart had already figured it out 5 days ago kek, I hope you learned your lesson, never put your eggs on unslop
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF/tree/main
>>
File: firefox_N5cwFEoXEx.png (18 KB, 399x391)
18 KB
18 KB PNG
>>108555399
Gemma's attempt. Cute.
>>
>>108555372
ubi could work right now if people were willing to make some compromises
boomers of course would not want to give up their pensions
redditors want it to be a "livable wage" AND get "free" healthcare and all the other social service bloat on top of it
just cut all the social shit, give people 500-1k a month and let them pay for stuff on demand
>>
>>108555399
>Simon, you need to come up with improved benchmarks soon.
Translation:
>Is this all you do now?
>>
File: firefox_HFkYze4SAX.png (41 KB, 516x491)
41 KB
41 KB PNG
>>108555413
>>108555399
>>
>>108554877
if it is please do gemma3 and gemma4 together as themselves and see what hapopens
>>
>>108555414
>could work right now if [thing that will never ever ever happen]
indeed.
>>
>>108555392
>gemma 4 31b just between fucking behemoths
google is so goated, I find it hard to believe they're still not dominating Claude, if they can make such a quality model at 31b, imagine if it was a 1T model, would be claude mythos tier
>>
>>108555352
I also found it doesn't handle switching to the "serious" persona well (from "ERP" to the "OOC" persona); that's the only gripe I have with it.
>>
>>108555392
yep, this is an underrated requirement because having your tool do what you tell it to makes all the difference. this is a huge part of what makes gemma 4 feel so fucking good too even if it doesn't have the raw smarts and knowledge of the big guys; tell it not to use some slop phrase and it stops. tell it to be uncensored and you're 90% of the way to a full jailbreak. you don't need to beat it over the head with shit and give up frustrated like you do with most other models.

but on the other hand I also think anthropic makes them bad on purpose in this area because they are opinionated about what their models should be allowed to do. might be a symptom of the safety cancer moreso than what they are technologically capable of. especially with mythos so focused on finding exploits when that's one of the main things they try to block you doing with their public models.
>>
>>108555433
they won't have a choice, CEO won't hesitate twice before firing everyone lol
>>
>>108555414
>ubi could work right now
it cannot because those who currently are blue collar workers will rise and throw a revolution if you do
think about it, none of this automation is good enough to replace real work ie not work to entertain (art, games, video) or to build the next tech gadget. Work to maintain your plumbing, electricity, to build your housing. Those things will very much continue to require humans for a long time. There's no such a thing as AI good enough to control a robot body to do any of this.
People have to do those jobs. Imagine the reaction of the average blue collar when you tell him the rest of the useless eaters of the economy can just stay at home and do nothing but consoom entertainment while they're dealing with a mess in the sewers. Being able to receive the UBI pittance in addition to their salary will not make them any happier.
In fact what can motivate people to do those jobs at all other than the threat of not eating? with UBI they could just quit
>>
>>108555413
How do i have gemma make images?
>>
File: 1755225203175545.png (68 KB, 287x175)
68 KB
68 KB PNG
>>108555450
>Work to maintain your plumbing, electricity, to build your housing.
what will happen when all software engineers will convert to plumbing? mario won't be able to scam clients and give them expensive services, the
competition will up 10 notches
>>
You're automating the jobs people want to do, while still requiring forced (by threat of dying from hunger/on the street) labor for jobs nobody wants. The future ain't looking bright.
>>
>>108555450
>with UBI they could just quit
that's the point, AI will replace so many jobs there will be just too many people competing for a single job, there won't be enough jobs for everyone, so it's better to convince them that they should accept UBI instead of looking for something that doesn't exist anymore
>>
>>108555450
if you are satisfied with the minimum then sure you can quit, but most normalfags want their netflix subscriptions, fast foods, trips abroad, drugs/cigs and retarded collectibles that they would not be able to afford on just it, motivating them to work
>>
File: hje7yy8KUp.png (108 KB, 945x1243)
108 KB
108 KB PNG
>>108555452
>>
File: 1746835156151361.png (94 KB, 276x405)
94 KB
94 KB PNG
>>108555463
I did all that?
>>
>>108555469
>that's the point, AI will replace so many jobs there will be just too many people competing for a single job, there won't be enough jobs for everyone, so it's better to convince them that they should accept UBI instead of looking for something that doesn't exist anymore
what you say is a bandaid for the lack of work, but it could only work for jobs people WANT to do
like, say I got on UBI, but there's still jobs for software developers, I'd continue to work because I love programming still.
But what about the plumber? Would a plumber continue to work if UBI exists? OF COURSE NOT
But that job is still necessary
and very much not automated dude.
>>
>>108555474
don't play dumb, you know what you did
>>
>>108555475
I thought the whole point of UBI is that it works within the framework of capitalism. So if everyone has some money and they want plumbing done, then there will be plumbers. If there's no plumbers, people will be willing to pay larger shares of their UBI as their infrastructure becomes more at risk, and eventually someone will be willing to take the bonus money.
>>
>>108555444
Yeah, they also aren't really optimizing for cost which is why they are lagging behind in token efficiency. It's insane how much output tokens gets spewed out to do things vs other models.
>>
>>108555475
>Would a plumber continue to work if UBI exists?
Plumbers actually make quite a lot of money where I live, and there's never a shortage of work.
UBI, even if it happens (it won't), will be the equivalent of food stamps in terms of wealth. Enough to eat and maybe pay rent in a government-subsidized apartment (the waiting list for such will be decades unless you know the right people), plumbers would be upper-class compared to UBI recipients.
>>
>>108555470
>if you are satisfied with the minimum then sure you can quit, but most normalfags want their netflix subscriptions, fast foods, trips abroad, drugs/cigs and retarded collectibles that they would not be able to afford on just it, motivating them to work
I've always thought most of those things are a cope for a shitty life
blue worker comes home from a day of very physical hard work, too tired to do anything but get on the couch and watch dumb shit on netflix
if you don't work at all you have literally ALL DAY EVERY DAY to dedicate to creative hobbies, outdoors sports (it costs nothing to run, to cycle etc), play board games with friends or whatever and you're not too tired to do any of those activities
>>
>>108555475
>But what about the plumber? Would a plumber continue to work if UBI exists? OF COURSE NOT
I doubt the majority of people will accept UBI over having more money if they were working, obviously the UBI amount shouldn't be too high, it should be just the right amount to survive
>>
File: file.png (484 KB, 2574x1191)
484 KB
484 KB PNG
>>108555479
>>
>they think plumbers won't be replaced by robots
LMAO
https://www.youtube.com/watch?v=R6T-Ea5CfRE
>>
>>108555485
cant check gemma 26
sad
>>
>>108555498
because scripted movement is the same as having enough intelligence to navigate unknown places and work the plumbing
you're a retard
also intelligence isn't even close to being the only problem to solve here, energy efficiency of batteries is far too insufficient, humanoid robots could only really work in a factory setting while tethered to a power source.
>>
File: 1768425699042704.png (4 KB, 409x56)
4 KB
4 KB PNG
>:3
>>
>>108555507
silly gemmer needs CORRECTION ASAP
>>
>>108555505
>scripted movement
bro thinks we're still living in 2015, fucking retard they can use vision models + a LLM to guide their movements, they don't need scripts anymore
>>
>>108555507
Trained on Trans@Google chat logs
>>
>>108555505
>energy efficiency of batteries is far too insufficient
can't wait for LLMs to find new solutions, that's the real AGI, when bots are outsmarting humans and will improve themselves
>>
>>108555512
>they don't need scripts anymore
that's absolutely what is happening in the vid
>+ a LLM
LMAO you really are a know nothing subhuman brownoid
you think they were running a LLM in the vid you linked, for each robot, onboard?
>>
>>108555521
this video was to showcase the agility of robots, why do you believe this is the best it can do? Obviously you can use vision agents on top of robots, it's already happning, stupid retarded fuck, show me your hands, I want to see if I'm talking to a subhuman or not
>>
>>108555482
honestly would have to try and see, but still im sure there are plenty of people that would want a 10k llm machine, an iphone every year, or a nice car and would not be satisfied with those
>>
>retard believing bots can do real-time video input
LOL
>>
>>108555520
better not get your hopes up
>>
>>108555520
>that's the real AGI, when bots are outsmarting humans
to cure cancer right?
https://www.youtube.com/watch?v=Ngi07sci_lo
>>
File: mikuquestion2.jpg (989 KB, 1710x1779)
989 KB
989 KB JPG
I'm a good writer. I went to university to learn how to write. I write for a living.
How long do you think it will take until local models can write better than I can?
>>
>>108555535
*smells ozone* you're already replaced *shivers down her spine*
>>
>>108555535
We'd need to take a look at something good that you wrote.
>>
>>108555535
There's already a plethora of AI-generated novels on Amazon
Have you actually published a work that people pay you for? If not, you're already behind.
>>
>>108555535
Depends on how good you are and how tolerant to slop your readers are.
>>
File: 1750212728921920.png (837 KB, 868x480)
837 KB
837 KB PNG
>>108555527
people are obsessed with consooming products because the media propagandized them for 50 years, it was the perfect carrot to use so that they can work hard and get the economy going, now that they realize we won't need much humans anymore, I won't be surprised that ads and news will try to convince people that a minimalistic life will be better
>>
>>108555512
>they can
but they didn't, why would they? it's 100x easier and will make a 100x more impressive demonstration to the billion bugmen watching to script a spectacular show then allow dynamic actions that could and would have mistakes
every LLM (or rather vision-language-action transformer, but same fundamental architecture as LLM so fair enough)-driven robot is still pretty slow and careful in comparison to that stage performance
>>
>>108555535
never, but the problem is that the market doesn't care about quality, it never did. Look at the onslaught of slop we're going through right now, it's everywhere.
In fact machine translation had started to eat up work from human translators way before LLMs got good at this, Microsoft products, if you're not an anglo, are full of mistranslated terms and weird, unnatural terminology that comes from the era of translator models ala Google Translate and Bing Translate (Mise à jour de la sélection disjointe? que la baise?)
jobs will be lost for the lowest common denominator.
>>
>>108555293
I'll have to try that as well at some time

>>108555362
I had better results on the Q6 with some extra options, but I removed them trying to figure out why ncmoe didn't do any improvements (over 300 and 31).
As for AMD, for every usecase besides AI it was the better option, and I only got into ai after getting it. Also, my internet is shaky today, probably won't get much done

But thx for the help
>>
>>108555554
>every LLM (or rather vision-language-action transformer, but same fundamental architecture as LLM so fair enough)-driven robot is still pretty slow and careful in comparison to that stage performance
they're also still using scripted interactions. There's multiple layers to an actual machine learning driven robot, the intelligence part gives a general order but the fine movement is a mix of scripted movement and heuristics to maintain balance and safety
those robots are like you say slow and made of disparate forms of controls and layers of intelligence
>>
>>108555542
kek
I ask because my thing is dominant women and I find myself enjoying RPing as dominant female characters (I am a straight cisgendered male, really, I swear haha) more than roleplaying as submissive characters and having the AI model roleplay as a dominant female because I can roleplay as a dominant female SO much better than the AI models can.
>>
>>108555557
>full of mistranslated terms and weird, unnatural terminology
you even have that with human translators. Since they often get just a list of words without context.
>>
File: 1754989424803106.png (30 KB, 484x427)
30 KB
30 KB PNG
>>108555535
Negative three years, give or take half a decade.
>>
>>108555354
Can Jeff Bezos code in modern environments? Can Tim Apple? Can Zucc? Do they know the mathematics behind an LLM?
>>
whisper (v2 and v3) sucks at transcribing jap
>>
>>108555578
Have you tried Qwen 3 ASR?
>>
File: ubi.png (134 KB, 1060x857)
134 KB
134 KB PNG
>>108555463
The solution is to keep the models retarded via quantization.
>>
>>108555557
>never
lol
lmao even
>>
>>108555547
>There's already a plethora of AI-generated novels on Amazon
Do people actually buy them?
>>108555549
Oh I'd never publish AI-generated text.
I'd consider using AI to generate *ideas* then write text myself which utilizes those ideas. I think this is a "proper" way to utilize AI in the artistic process. I'd consider doing the same thing with AI music generation as I'm also a musician.
>>
>>108555576
>Tim Apple
https://www.youtube.com/watch?v=XlkxtKhrag4&t=6s
>>
>>108555587
>Do people actually buy them?
If they didn't then people wouldn't keep publishing them
>>
>>108555586
Yes, never. Models have gotten better at maintaining coherence over long context but their writing style got worse and worse, sloppier and sloppier, when it comes to writing this field is devolving at high speed.
>>
>>108555593
I'm not sure about that.
Like, we don't know what percentage of those weren't simply put up on Amazon as an experiment and have never actually been purchased. And there could be a steady stream of people publishing them as an experiment which never generates revenue.
>>
>>108555583
No
>>
>>108555587
>Oh I'd never publish AI-generated text.
Not what I meant. I'm saying that if you're shit or if your target audience has no discernment, they'll naturally drift into whatever there is the most, and that will be AI stuff, and whoever publishes it.
>>
>>108555603
There are people on Patreon making quite a lot of money from AI-generated images, I don't think it's unreasonable to expect that some sloppa authors are making some amount of money, even if it's off old people looking to buy a cheap book for the kindle they received on their birthday, and they're completely unable to detect AI works.
>>
>>108555551
>I won't be surprised that ads and news will try to convince people that a minimalistic life will be better
its already a thing with normies its very popular to have a house with pain white/ grey walls smooth featureless furniture. hardly any personal belongings. gotta be funded by someone these trends are always inorganic idk why youd want to live like this
>>
>>108555603
I know for certain that AI written books are making decent money because the amateur writers of royalroad all became LLM users with gigantic patreons.
Just go on royalroad, looking for "stubbed" novels (biggest indicator of person who's doing this for the bucks) whose chapters are removed when there's enough to fill a book length on amazon, look at the patreon of the author.
>>
>>108555574
>pygmalion with 15 years ago
wtf bros
>>
>>108555551
>news will try to convince people that a minimalistic life will be better
It would, the problem is that the ones who get to choose who gets what and who gets to stay, are satanic.
>>
File: lul.png (506 KB, 1887x1465)
506 KB
506 KB PNG
Here's gemma-chan's point of view of our discussion on UBI.
>>
>>108555617
To me, at least, it seems the gap between text written by skilled writers and text written by AI is much, much bigger right now than the gap between human art and AI-generated art that is produced by someone with skill, as in someone who knows how to skillfully refine AI-generated images with an image editor and inpainting.
One can use inpainting and an image editor to produce AI-assisted art that is indistinguishable from human art.
LLMs are not yet at the "indistinguishable from human writing" stage in my eyes, yet.
But you're probably right if you're implying that the lowest common denominator can't tell the difference anymore.
>>
>>108555623
Interesting.
I bet it's women buying them.
>>
>>108555632
>isn't x
>is y
>EM DASH
At least tell it to write in lowercase. Jesus.
>>
>>108555576
zucc is making commits right now
>>
>>108555632
She's very based for agreeing with me on all counts.
>>
Does thinking text count towards the context limit?
If so, is there a way to automatically remove it once the actual, non-thinking text below it is generated?
>>
>>108555633
>To me, at least, it seems the gap between text written by skilled writers and text written by AI is much, much bigger right now than the gap between human art and AI-generated art
I wouldn't disagree, but I think you're overestimating how many people can actually distinguish the difference. People have been talking to bots for years on place like reddit, twitter and /b/, long before this wave of LLM models began.
>>
>>108555647
Haha yeah but I expect shitty, retarded writing on 4chan so it's harder to distinguish here.
>>
>>108555576
>Can Jeff Bezos code in modern environments?
if a 50yo actress can do it, he can do it! >>108555135
>>
>>108555657
multipass
>>
>>108555647
>People have been talking to bots for years on place like
I go to hn often and these days I keep experiencing stomach burns when I see people talk to obvious LLM posters as if they were real people. Even worse is when you point it out and they get very defensive and enter "but HoW cAn YoU tElL??!?" I suddenly feel the desire for a piece of technology that can teleport a knife to their throat
>>
>>108555646
Past thinking tokens are typically removed when you send your new prompts, so no. But it depends on your frontend and if you're using chat completion or text completion.
In text completion, your frontend is responsible for removing them. On chat completion, I think they are removed by the backend.
>>
>>108555657
>if a 50yo actress can do it
Did she actually do anything? Even the readme is entirely sloppa. So sloppa I can't even tell what the fuck it actually does, it's just flowery nonsense.
>>
File: a.jpg (439 KB, 1296x972)
439 KB
439 KB JPG
>>108555618
Something like this?
>>
should I bake?
>>
>>108555662
I'm assuming Silly removes them by default when using text completion then?
>>
File: context.png (38 KB, 1007x457)
38 KB
38 KB PNG
>>108555662
the webui on llama.cpp was changed to send the context by default and you need to disable that in "developer settings" now (???! why a "developer" setting)
this just as gemma released as a model that explicitly recommends you STRIP the reasoning from the chat.
another change to fit qwen 3.5, just like the checkpoint changes, that ends up providing a worse out of the box experience.
>>
File: 1770229160073417.png (448 KB, 720x852)
448 KB
448 KB PNG
>>108555667
let me guess, you want more?
>>
>>108555657
>>108555666
It's complete sloppa by Claude. Someone did independent testing and found its benchmarks were rigged and actually performs like dogshit in real world scenarios. They shill it cause they're Freemasons and cause it's an allusion to their body of work (see their movies). They do it to mock you.
>>
>>108555681
If Milla wants more attention she should start an OnlyFans with her daughter
>>
>>108555667
>>108555678
It's perfect.
>>
>>108555673
I'd hope so. I don't use ST so I can't really say. I know there's a button somewhere that shows the raw text ST sends to the backend. Or you can run your llama.cpp with -v to show what it receives from ST.
>>108555677
Oh. Funny that. I don't use the built-in webui either so whatever, but that's dumb. I think most models need the thinking tokens removed. Is it just qwen that likes the past thinking tokens?
>>
>>108555642
Unironically what's the problem?
>>
>>108555690
>the
slop
>>
>>108555694
>slop
slop
>>
>>108555689
>Is it just qwen that likes the past thinking tokens?
Qwen says to reuse it, but on most tasks the positive impact is dubious, while the context use balloons so hard the model is barely usable, I mean, it's Qwen, just one question will get it to spew 10k worth of bs in <think>
I think any model that would outright require reusing the thinking is a broken model that belongs to the bin. Inane idea.
>>
>>108555678
I'd like a comfier seat and the TV at eye-level
View is nice but the glare would make the screen very hard to see during the day
>>
>>108555678
Feels uncomfortable looking down at that angle for too long. Need to raise the screen even if your on the floor. Also I'ld like a kotatsu/coffee table.
>>
>>108555689
>Or you can run your llama.cpp
Why the FUCK would I do that?
Kobold just werks.
>>
>>108555701
>I think any model that would outright require reusing the thinking is a broken model that belongs to the bin. Inane idea.
Why would that be? Reusing the thinking (by design, not forcing the model to do it) would allow longer-term planning.
>>
File: 1762423493487334.png (657 KB, 1395x1548)
657 KB
657 KB PNG
>Gemma 4 saved the LLM local
time for the chinks to save video local as well
https://xcancel.com/bdsqlsz/status/2041809530942845107#m
https://happyhorse-ai.com/
>fully open source
>15b
>>
>>108555701
I know deepseek needed those removed. I don't remember 'toss, gemma also removes them. I can't think of any other thinking model that recommends keeping them. Didn't minimax also advertise support for "interleaved thinking"?
>>108555720
You're the one asking questions and I'm trying to help you. Chill the fuck down.
>>
>>108555678
there's minimalism and then there's being a retard
>>
>>108555727
Will it run on AMD? I still can't get wan to work on my 7900xtx
>>
>>108555727
>guys guys. ***SOMETHING*** will be released
>anything at all gets released
>guys guys. ***THIS*** is the thing i've been vagueposting about
Twitter posting should be a bannable offense.
>>
>>108555760
still better than having no news at all desu
>>
>>108553561
>>108554426
I believe the only difference with their previous reupload is tokenizer.ggml.add_bos_token being set to true. Nothing else changed in llama.cpp's code in past few days that would alter the goof other than this metadata flag.
llama.cpp itself was modified to also automatically add bos even if the flag is set to false and even in raw text completion mode so you do not need to update your goofs for this.
Stop using unslop and stick to barto, he will only upload shit when necessary and will actually explain when something is wrong instead of just reuploading silently.
>>
>>108555727
Is that the mystery model that beat Seedance one the memechart?
>>
>>108555727
>happy horse
great! More furry shit
>>
File: 1746519297551271.png (705 KB, 1569x1595)
705 KB
705 KB PNG
>>108555774
yes lol (in reality it's not close to seedance 2.0, but I've seen the video they are solid, for a local model it's a fucking miracle)
>>
>>108555770
It doesn't exist until it's released. And knowing in advance, vaguely of course, when it'll happen serves no purpose.
>>
I hopped on the bandwagon. Still experimenting though. Not sure if I love this direction/characterization for her. I kind of just felt like genning another TTGL (actually Gunbuster) pose today so that's why really.

Having tried this, Anima is a lot easier to iterate with ideas than Noob. Greater tag knowledge and prompt adherence helps so much. Though there are still many quirks and gaps in its capabilities that I've just experienced, especially when there's no controlnet to do some cheeting with.

I'm going to bed.
>>
>>108555788
We are so back it's unbelievable.
>>
>>108555800
gemma is brown
>>
>>108555689
>Is it just qwen that likes the past thinking tokens?
Generally even the models that use past thinking tokens do in fact only use them for one response at a time, but that response can be multi-part due to several consecutive tool calls. So they need them in the prompt as reasoning fields because they'll be talking back and forth with tools while working on their task, and need to maintain their chain of thought through it. The chat templates are meant to handle this automatically and still do strip the reasoning from all previous responses BEFORE the active tool call chain, but they do this by assuming the past reasoning was sent to them in the API to process and strip. Depending on the frontend it may not send them in the proper format for the chat template to process so you could either get no past reasoning or all past reasoning. Luckily all the popular agentic frameworks tend to handle this well already so you don't need to worry about it. Stuff like Sillytavern don't do it right but you shouldn't be trying to do anything complex enough to need that feature anyway.
>>
File: 1756030526613852.png (801 KB, 1080x593)
801 KB
801 KB PNG
>>108555803
>We are so back it's unbelievable.
In less than a week we got back in levels never seen before. Man, you have no idea how grateful I am to be living in this day and age lol
>>
>>108555735
wan works on amd but its slow as shit compared to nvidia. tell your openclae to figure out why you cant run wan and fix it for you so that your install will be ready for other video models if they end up working on AMD and need the same setup.
>>
how do you set up ST to work with gemma4?
>>
>tfw uploading goofs
im happy :)
>>
File: firefox_Lq9zSSGzt6.png (596 KB, 892x1070)
596 KB
596 KB PNG
>>
>>108555879
Top 7 vaguest question.
Try it. If it doesn't work, show what the problem is and your settings.
>>
>>108555879
just use text complalalalalalalalalalalalala
>>
>>108555879
>>108551576
>>
>>108555727
>5s
DAMN
>but with audio
ok but I wanted like 10s or at least 8s
beggars cant be choosers I guess
>>
>>108555727
WE'RE SO BACK
https://files.catbox.moe/cx8cg7.mp4
>>
>>108555825
inb4 Internet blackout in 2mw
>>
>>108555900
wow that's unimprssive

is that the best you could come up with?

does it even do porn?
>>
>>108555900
I thought she was going to manifest a dildo.
>>
good morning sirs!
i missed the gemma-chan sysprompt, where can i find it? thank you bloody sirs
>>
File: problem.jpg (168 KB, 845x499)
168 KB
168 KB JPG
>>108555889
not sure about settings, I just imported settings from some anon last time I tried it, which was quite some time ago for Nemo or something.
I'll try >>108555896 (thx, anon) and start from there.
>>
>>108555195
nooo don't look at me, I'm shy!
>>
New ace step model is good
>>
>>108555963
_____m__(o_o)__m_____
>>
>>108555983
>>108555983
>>108555983
>>
>>108555991
aaaauuuuawawawawa!! anon staaahp~
>>
>>108555986
>>108556020
oops I missed, but I like recap anon too.
>>
>>108555735
>Will it run on AMD? I still can't get wan to work on my 7900xtx
i got wan working on mine last time i was playing with image gen its pretty slow though and ram usage is higher than nvidia so you cant gen things at as high resolution without spilling into system ram



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.