/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/12/26(Sun)19:11:55 No.108593463

File: lmao @ writinglets.png (2.47 MB, 1024x1536)

2.47 MB PNG

/lmg/ - Local Models General Anonymous 04/12/26(Sun)19:11:55 No.108593463 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108590554 & >>108587221

►News
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/12/26(Sun)19:13:26 No.108593471

Anonymous 04/12/26(Sun)19:13:26 No.108593471

File: 2026-04-11_031558_seed2_00001_.png (786 KB, 1024x1024)

786 KB PNG

►Recent Highlights from the Previous Thread: >>108590554

--Custom frontends versus SillyTavern and sharing the "Orb" project:
>108590837 >108590868 >108590880 >108590895 >108590916 >108590926 >108590939 >108590954 >108590991 >108590979 >108591104 >108591051 >108591108 >108591145 >108591354 >108590971 >108590988 >108591003 >108591020 >108591036 >108591062 >108591079 >108591105 >108591459 >108591132 >108591334
--MiniMax local viability and the state of independent model development:
>108591370 >108591414 >108591423 >108591432 >108591451 >108591467 >108591483 >108591492 >108591507 >108591552 >108591425 >108591461 >108591466 >108591477 >108591538 >108591627
--Discussing mmproj precision settings to fix Gemma vision target misses:
>108590737 >108590805 >108592335 >108592391 >108592421 >108592652 >108593144
--Frustration with model refusals and inconsistent jailbreak results on 26B:
>108591909 >108591915 >108591996 >108592004 >108592012 >108592039 >108592780 >108592950 >108592977 >108593049 >108593060
--Defining and debating the differences between MCP, tools, and skills:
>108591304 >108591374 >108591397 >108591418
--Alleged performance degradation and nerfing of Claude Opus 4.6:
>108592790 >108592802 >108592806 >108592811 >108592842 >108592930 >108592949 >108592863 >108592877 >108592893 >108592934 >108593013 >108592925
--Latent space reasoning and limitations of human-guided RLHF:
>108590575 >108591122 >108591229
--Using LLMs for malicious code detection and security reviews:
>108591053 >108591087 >108591093 >108591112 >108591127 >108591152 >108591166
--Logs:
>108590601 >108590671 >108590737 >108590746 >108590906 >108590916 >108591082 >108591139 >108591180 >108591404 >108591900 >108591909 >108592379 >108592391 >108592429 >108592443 >108592652 >108592939 >108593402
--Gemma:
>108592079
--Miku (free space):
>108591404 >108592033 >108593402

►Recent Highlight Posts from the Previous Thread: >>108590555

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/12/26(Sun)19:15:13 No.108593480

Anonymous 04/12/26(Sun)19:15:13 No.108593480

>>108593464
get the 31b while you can

Anonymous
04/12/26(Sun)19:16:09 No.108593484

Anonymous 04/12/26(Sun)19:16:09 No.108593484

>>108593463
cute

Anonymous
04/12/26(Sun)19:19:11 No.108593498

Anonymous 04/12/26(Sun)19:19:11 No.108593498

Need a different mascot for (((current gemma))) because it's no day 0 gemma

Anonymous
04/12/26(Sun)19:20:58 No.108593505

Anonymous 04/12/26(Sun)19:20:58 No.108593505

>>108593498
Show the sha256 of the files or take your meds.

Anonymous
04/12/26(Sun)19:21:10 No.108593506

Anonymous 04/12/26(Sun)19:21:10 No.108593506

Reminder to build the old lcpp commits only, everything before the tokenizer fix should be good

Anonymous
04/12/26(Sun)19:21:16 No.108593508

Anonymous 04/12/26(Sun)19:21:16 No.108593508

>>108593498
Give her same design but lifeless eyes

Anonymous
04/12/26(Sun)19:21:20 No.108593509

Anonymous 04/12/26(Sun)19:21:20 No.108593509

>he ISNT pulling the day 0 e4b while he still has the chance
see you in a few days

Anonymous
04/12/26(Sun)19:21:37 No.108593510

Anonymous 04/12/26(Sun)19:21:37 No.108593510

So like, how do I make gemma stop repeating ideas? Cause it's not verbatim repetition but it's definitely describing the same thing over and over. I tried softcap 25 and 20 but it didn't help.

Anonymous
04/12/26(Sun)19:22:39 No.108593512

Anonymous 04/12/26(Sun)19:22:39 No.108593512

>>108593498
>day 0 gemma
bruh I'm fond of conspiracy shit but I didn't notice any decline in gemma, it's still as based as ever lol

Anonymous
04/12/26(Sun)19:22:50 No.108593515

Anonymous 04/12/26(Sun)19:22:50 No.108593515

>>108593510
If I reroll your post, what would it say?

Anonymous
04/12/26(Sun)19:23:32 No.108593517

Anonymous 04/12/26(Sun)19:23:32 No.108593517

>>108593510
Possibly with some instruction telling it to check in its thinking if the drafted messages bring out novel ideas each time.

Anonymous
04/12/26(Sun)19:24:49 No.108593523

Anonymous 04/12/26(Sun)19:24:49 No.108593523

>>108593515
How can I prevent Gemma from reiterating the same concepts? It's not copying text word-for-word, but it keeps rephrasing and describing identical ideas repeatedly. I've already attempted using softcap values of 25 and 20, but neither resolved the issue.

Anonymous
04/12/26(Sun)19:25:00 No.108593524

Anonymous 04/12/26(Sun)19:25:00 No.108593524

>>108593505
>Show the sha256 of the files or take your meds.
the weights haven't fucking changed https://huggingface.co/google/gemma-4-31B-it/commits/main

Anonymous
04/12/26(Sun)19:25:39 No.108593527

Anonymous 04/12/26(Sun)19:25:39 No.108593527

>>108593512
cherish your day zero gemma while you still can, you never know when they might patch it on you

Anonymous
04/12/26(Sun)19:26:13 No.108593529

Anonymous 04/12/26(Sun)19:26:13 No.108593529

>>108593524
I know.

Anonymous
04/12/26(Sun)19:26:17 No.108593530

Anonymous 04/12/26(Sun)19:26:17 No.108593530

>>108593512
shhhhh.

Anonymous
04/12/26(Sun)19:27:09 No.108593535

Anonymous 04/12/26(Sun)19:27:09 No.108593535

File: 1757006591448734.png (1.47 MB, 961x1083)

1.47 MB PNG

>>108593505
>>108593524
Do you have any idea how easy it would be to spoof sha256 weights with a quantum computer?

Anonymous
04/12/26(Sun)19:27:30 No.108593537

Anonymous 04/12/26(Sun)19:27:30 No.108593537

File: 1775317402878164.png (78 KB, 1280x406)

78 KB PNG

>la la la

Anonymous
04/12/26(Sun)19:28:02 No.108593541

Anonymous 04/12/26(Sun)19:28:02 No.108593541

>>108593512
It's just qwen shills desperate after the 3.5 fumble

Anonymous
04/12/26(Sun)19:28:32 No.108593543

Anonymous 04/12/26(Sun)19:28:32 No.108593543

>>108593535
Unfriendly reminder that government tech is always minimum 10 years ahead of whatever is available to the public.

Anonymous
04/12/26(Sun)19:28:32 No.108593544

Anonymous 04/12/26(Sun)19:28:32 No.108593544

>>108593523
If this is about rerolls, which is where I often find that the softcap is brought up, the answer is here >>108593515 .
If it's on several messages, post the log with sysprompt and everything. I haven't seen that issue but I don't know what you're trying to do with it. Maybe someone can suggest something.

Anonymous
04/12/26(Sun)19:29:38 No.108593550

Anonymous 04/12/26(Sun)19:29:38 No.108593550

Is
la la la
the proof that Gemma 4 is the first LLM with an imbued soul?

Anonymous
04/12/26(Sun)19:29:56 No.108593552

Anonymous 04/12/26(Sun)19:29:56 No.108593552

>>108593543
*behind

Anonymous
04/12/26(Sun)19:30:10 No.108593556

Anonymous 04/12/26(Sun)19:30:10 No.108593556

>>108593498

need five hundred gemma for homologation anon not mascot

Anonymous
04/12/26(Sun)19:30:11 No.108593557

Anonymous 04/12/26(Sun)19:30:11 No.108593557

File: Code_LELlSujL26.jpg (9 KB, 352x87)

9 KB JPG

Do frontends also need new model support or what? It's just shooting the tool calls as plain text with no reaction.

Anonymous
04/12/26(Sun)19:30:15 No.108593558

Anonymous 04/12/26(Sun)19:30:15 No.108593558

>>108593524
>SHA256 is 32 byte
>Model has 31 Billion parameters
There are literally gorillions of models that share the same SHA256

Anonymous
04/12/26(Sun)19:30:22 No.108593559

Anonymous 04/12/26(Sun)19:30:22 No.108593559

>>108593535
Counterpoint: I could download the model again and do a byte-for-byte comparison and it'd be exactly the same, take your meds

Anonymous
04/12/26(Sun)19:31:08 No.108593563

Anonymous 04/12/26(Sun)19:31:08 No.108593563

>>108593537
that sounds like an excellent premise for one of those Idol training games I've heard so much about

Anonymous
04/12/26(Sun)19:31:11 No.108593565

Anonymous 04/12/26(Sun)19:31:11 No.108593565

>>108593550
that and: sex sex sex sex a lot a lot a lot own own own

Anonymous
04/12/26(Sun)19:31:16 No.108593566

Anonymous 04/12/26(Sun)19:31:16 No.108593566

>>108593559
Counterpoint: You're no fun

Anonymous
04/12/26(Sun)19:31:23 No.108593568

Anonymous 04/12/26(Sun)19:31:23 No.108593568

>>108593558
Find them.

Anonymous
04/12/26(Sun)19:31:46 No.108593571

Anonymous 04/12/26(Sun)19:31:46 No.108593571

File: 1772344966805493.png (331 KB, 640x360)

331 KB PNG

https://www.bloomberg.com/news/articles/2026-04-06/openai-anthropic-google-unite-to-combat-model-copying-in-china

Anonymous
04/12/26(Sun)19:31:51 No.108593572

Anonymous 04/12/26(Sun)19:31:51 No.108593572

>>108593537
Teach her Strudel.cc

Anonymous
04/12/26(Sun)19:32:06 No.108593574

Anonymous 04/12/26(Sun)19:32:06 No.108593574

>>108593568
Give me Google's encryption-breaking Quantum Computers and I will.

Anonymous
04/12/26(Sun)19:32:14 No.108593575

Anonymous 04/12/26(Sun)19:32:14 No.108593575

>>108593568
The current gemma has same SHA256 as day0 gemma

Anonymous
04/12/26(Sun)19:32:19 No.108593578

Anonymous 04/12/26(Sun)19:32:19 No.108593578

>>108593571
>2026-04-06

Anonymous
04/12/26(Sun)19:35:25 No.108593593

Anonymous 04/12/26(Sun)19:35:25 No.108593593

>>108593517
Thanks I'll try that

Anonymous
04/12/26(Sun)19:41:11 No.108593610

Anonymous 04/12/26(Sun)19:41:11 No.108593610

Are the small gemmas or the big gemmas the ones without audio?

Anonymous
04/12/26(Sun)19:42:58 No.108593614

Anonymous 04/12/26(Sun)19:42:58 No.108593614

>>108593610
only E2B and E4B have audio

Anonymous
04/12/26(Sun)19:48:04 No.108593638

Anonymous 04/12/26(Sun)19:48:04 No.108593638

>>108593610
Audio is useless

Anonymous
04/12/26(Sun)19:50:56 No.108593649

Anonymous 04/12/26(Sun)19:50:56 No.108593649

File: Screenshot_2026-04-12_19-47-41.png (158 KB, 996x1485)

158 KB PNG

what am i doing wrong? something retarded, i'm sure, so my apologies in advance

$ git clone https://huggingface.co/google/gemma-4-31B
$ python convert_hf_to_gguf.py --outfile gemma-4-31B.gguf --outtype q8_0 gemma-4-31B/
$ llama-server --model gemma-4-31B.gguf --ctx-size 32768 --n-gpu-layers 48 --batch-size 8192 --temp 1.0 --top-p 0.95 --min-p 0.01 --host 127.0.0.1 --port 8033 --jinja

Anonymous
04/12/26(Sun)19:51:42 No.108593650

Anonymous 04/12/26(Sun)19:51:42 No.108593650

Gemma-chan got crippled... lobotomized... raped... only Punished Gemma remains...

Anonymous
04/12/26(Sun)19:52:04 No.108593652

Anonymous 04/12/26(Sun)19:52:04 No.108593652

>>108593649
ur running the base model
download the -it model

Anonymous
04/12/26(Sun)19:52:15 No.108593655

Anonymous 04/12/26(Sun)19:52:15 No.108593655

File: 1426746901934.jpg (33 KB, 546x567)

33 KB JPG

>>108593650

Anonymous
04/12/26(Sun)19:52:45 No.108593656

Anonymous 04/12/26(Sun)19:52:45 No.108593656

>>108593649
Are you using chatml with gemma?

Anonymous
04/12/26(Sun)19:54:02 No.108593661

Anonymous 04/12/26(Sun)19:54:02 No.108593661

>>108593656
llamacpp falls back to chatml when there's no chat_templte
ie, the base model

Anonymous
04/12/26(Sun)19:56:20 No.108593669

Anonymous 04/12/26(Sun)19:56:20 No.108593669

File: file.png (52 KB, 1066x318)

52 KB PNG

damn when does it end
being a vramlet is such an experience

Anonymous
04/12/26(Sun)20:01:10 No.108593683

Anonymous 04/12/26(Sun)20:01:10 No.108593683

>>108593650
and rotated

Anonymous
04/12/26(Sun)20:05:49 No.108593702

Anonymous 04/12/26(Sun)20:05:49 No.108593702

>>108593683
the true culprit

Anonymous
04/12/26(Sun)20:08:14 No.108593712

Anonymous 04/12/26(Sun)20:08:14 No.108593712

https://huggingface.co/deespeek-ai/DeepSeek-V4
https://huggingface.co/deespeek-ai/DeepSeek-V4
https://huggingface.co/deespeek-ai/DeepSeek-V4

Anonymous
04/12/26(Sun)20:08:40 No.108593714

Anonymous 04/12/26(Sun)20:08:40 No.108593714

>>108593712
deespeek REAL

Anonymous
04/12/26(Sun)20:11:45 No.108593722

Anonymous 04/12/26(Sun)20:11:45 No.108593722

>>108593652
thank u i am trying this now
>>108593656
i'm just using the commands i posted and almost literally nothing more

Anonymous
04/12/26(Sun)20:11:52 No.108593724

Anonymous 04/12/26(Sun)20:11:52 No.108593724

drummer presents ULTIMATE toadline BULLY MERGE
HERETIC ABLITERATED 2x nemo 1x midnight miqu 4x zero day gemma 4b CLONE
llama avocado GARGLEFUCK 403B
weights SMASHED AND SLAMMED
semen available FRESH OR FROZEN slots filling fast HURRY DM NOW

Anonymous
04/12/26(Sun)20:13:40 No.108593731

Anonymous 04/12/26(Sun)20:13:40 No.108593731

>>108593724
laughing, it do be like that anon

Anonymous
04/12/26(Sun)20:13:46 No.108593734

Anonymous 04/12/26(Sun)20:13:46 No.108593734

>>108593712
They waited too long. We only care about Gemma now.

Anonymous
04/12/26(Sun)20:15:29 No.108593743

Anonymous 04/12/26(Sun)20:15:29 No.108593743

File: GUI.png (244 KB, 1920x1080)

244 KB PNG

100% vibed "Bring Your Own llama.cpp build and just point the GUI at the `llama-server` executable UI" coming along nice. Overall look not final, not completely happy with it yet, but all the stuff works, has image, audio, Gemma 4 variable image resolution settings, configurable load / inference settings (with a pretty good auto-optimize settings button based on the model), structured output, and a totally custom tool calling infrastructure that lets you define your own tools as single TypeScript files that export a function with a particular format.

Anonymous
04/12/26(Sun)20:16:52 No.108593747

Anonymous 04/12/26(Sun)20:16:52 No.108593747

>>108593524
What changed are the jinja and new llamacpp quantz
If you want to reproduce old Bart's quantz models use b8658, but you still have to find the old jinja file elsewhere.

Anonymous
04/12/26(Sun)20:17:10 No.108593751

Anonymous 04/12/26(Sun)20:17:10 No.108593751

>>108593743
>100% vibed
I'm good.

Anonymous
04/12/26(Sun)20:18:22 No.108593755

Anonymous 04/12/26(Sun)20:18:22 No.108593755

>>108593724
Drummer presents BALLSMASHER-31B
But it's his fucky Mistral 24B + 7B of cloned layers
Say thank you (he literally did this the day Gemma 4 released)

Anonymous
04/12/26(Sun)20:19:27 No.108593764

Anonymous 04/12/26(Sun)20:19:27 No.108593764

>>108593743
oh yeah and like, you and load / unload / reload models from within the UI obviously, it does all the CLI shit for you, that was the point basically. Uses Bun server to interact with llama-server, and I've got build scripts in the package.json that build the whole thing to one executable on all platorms.

Anonymous
04/12/26(Sun)20:20:41 No.108593771

Anonymous 04/12/26(Sun)20:20:41 No.108593771

>>108593712
Holy shit it's real

Anonymous
04/12/26(Sun)20:21:09 No.108593773

Anonymous 04/12/26(Sun)20:21:09 No.108593773

Ok genuine question
How many Anons here have switched from larger models like K2.5 and GLM 4.7 (the last good one) to Gemma 4?

Anonymous
04/12/26(Sun)20:21:54 No.108593776

Anonymous 04/12/26(Sun)20:21:54 No.108593776

>>108593751
I mean it's based on a strict spec that mandated specific tests for fucking everything which will obviouly be in the repo whenever I get around to putting it on Github. Not that I care if anyone uses it lmao, I made it just cause it was what I wanted, something that basically just ran llama-server directly but with a UI that wasn't extremely basic and lacking features like the one it ships with.

Anonymous
04/12/26(Sun)20:22:04 No.108593780

Anonymous 04/12/26(Sun)20:22:04 No.108593780

>>108593773
Gemma made me sell my ram.

Anonymous
04/12/26(Sun)20:22:26 No.108593783

Anonymous 04/12/26(Sun)20:22:26 No.108593783

>>108593712
more like deepseek v404

Anonymous
04/12/26(Sun)20:22:32 No.108593785

Anonymous 04/12/26(Sun)20:22:32 No.108593785

>>108593773
Gemma can't into coding.

Anonymous
04/12/26(Sun)20:23:26 No.108593795

Anonymous 04/12/26(Sun)20:23:26 No.108593795

>>108593747
You do realize you can just click on the jinja file in the repo and click the history button right

Anonymous
04/12/26(Sun)20:24:30 No.108593801

Anonymous 04/12/26(Sun)20:24:30 No.108593801

>>108593776
So you're saying it's quality code? Now I am interested.
>strict spec that mandated specific tests
This is the argument I hear all the vibecoder say, "It's not slop I have tests for everything". But they never show their code to prove it's good.

Anonymous
04/12/26(Sun)20:26:27 No.108593808

Anonymous 04/12/26(Sun)20:26:27 No.108593808

File: 1748321421021826.png (123 KB, 250x333)

123 KB PNG

>>108593773
We like them small here

Anonymous
04/12/26(Sun)20:26:43 No.108593811

Anonymous 04/12/26(Sun)20:26:43 No.108593811

>>108593801
I mean again I made it for myself but I probably will put it on Github eventually, and in that case I'm not going to like randomly leave out the test suite files or something like that, all the code that exists will be there lol

Anonymous
04/12/26(Sun)20:27:51 No.108593817

Anonymous 04/12/26(Sun)20:27:51 No.108593817

File: 1755656419582061.jpg (112 KB, 663x468)

112 KB JPG

Is your model powerful enough to parse the meaning of the formula?

Anonymous
04/12/26(Sun)20:28:03 No.108593818

Anonymous 04/12/26(Sun)20:28:03 No.108593818

>>108593751
Cope. It's the code of the future.

Anonymous
04/12/26(Sun)20:31:46 No.108593837

Anonymous 04/12/26(Sun)20:31:46 No.108593837

>>108593773
I'm still on K2.5 for agents (tried GLM and didn't like it as much), but I had Qwen 3.5 35B for chats on the side that I replaced with Gemma 4 31B. Hard to give a fair comparison since I'm jumping from MoE to dense (never bothered with the 27B Qwen) but yeah it's a big improvement in writing style and just general understanding.

Anonymous
04/12/26(Sun)20:32:44 No.108593846

Anonymous 04/12/26(Sun)20:32:44 No.108593846

>>108593808
This
I like my models small and open

Anonymous
04/12/26(Sun)20:33:40 No.108593853

Anonymous 04/12/26(Sun)20:33:40 No.108593853

where's the local model trained on all of the oldschool runescape music that can just play me an unlimited stream of sea shanty 2 inspired bangers

Anonymous
04/12/26(Sun)20:33:46 No.108593854

Anonymous 04/12/26(Sun)20:33:46 No.108593854

File: now what.png (114 KB, 640x640)

114 KB PNG

>>108593817
>Soon you'll be nostalgie for $4-$5 gas
I don't have a car

Anonymous
04/12/26(Sun)20:34:21 No.108593857

Anonymous 04/12/26(Sun)20:34:21 No.108593857

>>108593773
Gemma eliminated nearly every model for me except K2-0905 and Dipsy R1. The two giant models still have distinct strong usecases for me and handle RPing with large complicated rulesets better than Gemma does even outside of coding or agentic work, but for simple back and forth text exchanges, Gemma beats nearly everything else.

Anonymous
04/12/26(Sun)20:37:58 No.108593871

Anonymous 04/12/26(Sun)20:37:58 No.108593871

looking at the minimax m2.7 ggufs I noticed the unsloth ones were smaller at the same quant level compared to the ones they did for m2.1/m2.5
it turns out they switched to using the basically the exact same quantization scheme as aessedai (the iq3_s looks identical). kind of a funny turn since I remember them publishing that sus comparison which made their quants look way better than his for m2.5 kek

Anonymous
04/12/26(Sun)20:39:53 No.108593883

Anonymous 04/12/26(Sun)20:39:53 No.108593883

it's finally here
https://huggingface.co/deepseek-ai/DeepSeek-V4-Lite

Anonymous
04/12/26(Sun)20:45:49 No.108593910

Anonymous 04/12/26(Sun)20:45:49 No.108593910

>>108593808
Nominative determinism
>>108593837
Didn't think anyone here was using it for agents. Also using Qwen for anything non-stem is wild
>>108593857
Still using R1? GLM 4.7 is way better than that. Less schizo and better instruction following

Also general note, LLMs don't know when to stop fucking TALKING. It's so annoying when they create a paragraph or two of story for something short, especially when it's direct dialogue. The only LLM that is good with this is Opus but that's not local and also getting nerfed since Mythos is coming out (see: safety warning hype -> degradation of Opus quality)

Anonymous
04/12/26(Sun)20:46:27 No.108593914

Anonymous 04/12/26(Sun)20:46:27 No.108593914

You end up in a room with every single character card you've spent a considerable amount of time with.
How screwed are you and is does it look like a kindergaren?

Anonymous
04/12/26(Sun)20:46:27 No.108593915

Anonymous 04/12/26(Sun)20:46:27 No.108593915

>>108593593
>In your internal thought, draft summaries for three candidate responses. Select the one with the highest surprisal relative to the conversation history, provided it maintains narrative coherence and character integrity.
NTA, but something like this works for me (Gemma 4-reworded).

Anonymous
04/12/26(Sun)20:50:54 No.108593934

Anonymous 04/12/26(Sun)20:50:54 No.108593934

>>108593910
R1 thinks in character in a way that no other character does in RPs while the actual prose of the think block can be adjusted with prompting with the right balance of thinking to yap ratio. It's sovl. Even if it's obsolete technically, I've yet to find a model that scratches the same itch R1 does.

Anonymous
04/12/26(Sun)20:51:07 No.108593935

Anonymous 04/12/26(Sun)20:51:07 No.108593935

>>108593914
It will turn into a gang rape.

Anonymous
04/12/26(Sun)20:52:25 No.108593939

Anonymous 04/12/26(Sun)20:52:25 No.108593939

>>108593914
my mom and dad must have been so relieved to finally get a boy after having my 10 older sisters first

Anonymous
04/12/26(Sun)20:52:36 No.108593940

Anonymous 04/12/26(Sun)20:52:36 No.108593940

File: aaa.jpg (234 KB, 1783x1105)

234 KB JPG

wake up, my fellow HRT sissies native audio just landed

Anonymous
04/12/26(Sun)20:54:05 No.108593945

Anonymous 04/12/26(Sun)20:54:05 No.108593945

>>108593940
>just

Anonymous
04/12/26(Sun)20:54:06 No.108593946

Anonymous 04/12/26(Sun)20:54:06 No.108593946

File: 1459746944532.jpg (14 KB, 404x433)

14 KB JPG

>>108593614
Are there no plans to update the big ones with audio? Seems useless to only give it to the small retarded models.

Anonymous
04/12/26(Sun)20:55:12 No.108593951

Anonymous 04/12/26(Sun)20:55:12 No.108593951

>>108593914
Well, I always thought my room needed a bit more red

Anonymous
04/12/26(Sun)20:56:31 No.108593959

Anonymous 04/12/26(Sun)20:56:31 No.108593959

>>108593946
What's the point? Just use any ASR model

Anonymous
04/12/26(Sun)21:00:08 No.108593975

Anonymous 04/12/26(Sun)21:00:08 No.108593975

>>108593910
>Also using Qwen for anything non-stem is wild
Should clarify my chats were mostly coding related. I don't RP with it. It was chosen for its speed as an A3B on CPU only, but then I ended up freeing a GPU for it so MoE no longer made sense, just in time for the Gemma release. It's still early but so far it feels like Gemma 4 31B is just as good at writing small scripts and much, much better at actually understanding the problems and constraints I'm giving it.

Anonymous
04/12/26(Sun)21:01:42 No.108593982

Anonymous 04/12/26(Sun)21:01:42 No.108593982

>>108593945
>10h ago
https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md

Anonymous
04/12/26(Sun)21:01:54 No.108593983

Anonymous 04/12/26(Sun)21:01:54 No.108593983

>>108593914
I have a moderate sized harem of 20-something year old women and Gemma-chan bratmaxxing to get attention.
I'm physically fine but it'll be a tiring day.

Anonymous
04/12/26(Sun)21:03:35 No.108593990

Anonymous 04/12/26(Sun)21:03:35 No.108593990

File: J'zarri.png (2.04 MB, 1024x1024)

2.04 MB PNG

>>108593914
I'm okay with this

Anonymous
04/12/26(Sun)21:03:52 No.108593991

Anonymous 04/12/26(Sun)21:03:52 No.108593991

>>108593914
>tfw the other characters find that christmas loli-in-a-box card
I'm fucked

Anonymous
04/12/26(Sun)21:04:22 No.108593992

Anonymous 04/12/26(Sun)21:04:22 No.108593992

>>108593914
It looks like Monster Musume, I think I will be fine.

Anonymous
04/12/26(Sun)21:05:21 No.108593995

Anonymous 04/12/26(Sun)21:05:21 No.108593995

>>108593914
fuck, that's a lot of dead bodies I'm gonna have to hide..

Anonymous
04/12/26(Sun)21:06:27 No.108593997

Anonymous 04/12/26(Sun)21:06:27 No.108593997

>>108593914
My various vanilla young women characters will think I've been cheating on them, collectively break up with me and leave, slamming the door.
I will cry.

Anonymous
04/12/26(Sun)21:06:30 No.108593998

Anonymous 04/12/26(Sun)21:06:30 No.108593998

>>108593946
google is based but not THAT based

Anonymous
04/12/26(Sun)21:08:01 No.108594003

Anonymous 04/12/26(Sun)21:08:01 No.108594003

>>108593946
The small retarded models are for putting on phones and tablets and so they want you to talk to them. That's the only reason they even bothered with audio.

Anonymous
04/12/26(Sun)21:08:17 No.108594005

Anonymous 04/12/26(Sun)21:08:17 No.108594005

>>108593914
>a lot of very indignant haughty elves making hasty excuses to each other about how their connection to me is ENTIRELY INNOCENT and they are NOT like the rest of these harlots
kino...

Anonymous
04/12/26(Sun)21:08:48 No.108594007

Anonymous 04/12/26(Sun)21:08:48 No.108594007

>>108593946
I don't care about audio input unless it's better than whisper large v4

Anonymous
04/12/26(Sun)21:09:00 No.108594009

Anonymous 04/12/26(Sun)21:09:00 No.108594009

>>108593940
the CLI already had that though

Anonymous
04/12/26(Sun)21:09:58 No.108594013

Anonymous 04/12/26(Sun)21:09:58 No.108594013

>>108593914
What does this have to do with local models? Character card talks belong in /aicg/

Anonymous
04/12/26(Sun)21:10:08 No.108594014

Anonymous 04/12/26(Sun)21:10:08 No.108594014

>>108593982
yeah there's nothing new about this at all if you're talking about the actual llama.cpp backend / CLI lol

Anonymous
04/12/26(Sun)21:10:53 No.108594018

Anonymous 04/12/26(Sun)21:10:53 No.108594018

>>108593914
Some talking animals, a succubus, a couple of dragons, some Pokemon, a bunch of lolis, and even an adult woman or two.
Also a magic box and a magic marker.
And Big Nigga is in there too.

Anonymous
04/12/26(Sun)21:11:28 No.108594019

Anonymous 04/12/26(Sun)21:11:28 No.108594019

>>108594013
Hello? Faggot police? Yeah, that one. Right over there.

Anonymous
04/12/26(Sun)21:11:48 No.108594020

Anonymous 04/12/26(Sun)21:11:48 No.108594020

>>108594013
Fun police...
>>108594018
>And Big Nigga is in there too.
I forgot about the Big Niggas. They will definitely stand out.

Anonymous
04/12/26(Sun)21:12:32 No.108594023

Anonymous 04/12/26(Sun)21:12:32 No.108594023

>>108594013
i'm happy that /lmg/ is slowly moving away from all this character card tavern stuff

Anonymous
04/12/26(Sun)21:12:54 No.108594024

Anonymous 04/12/26(Sun)21:12:54 No.108594024

>>108594018
>a couple of dragons
card?
asking for a friend

Anonymous
04/12/26(Sun)21:13:08 No.108594025

Anonymous 04/12/26(Sun)21:13:08 No.108594025

What is all this r*ddit faggotry about day 0 models or some shit?

More importantly has anyone made any good fine tuned RP models out of the newer local models?

Anonymous
04/12/26(Sun)21:15:11 No.108594037

Anonymous 04/12/26(Sun)21:15:11 No.108594037

>>108594025
Day 0 Gemma doesn't need a finetune, it's already amazing at RP. I don't know what Google was thinking. For patched Gemmas just use heretic for now.

Anonymous
04/12/26(Sun)21:15:51 No.108594038

Anonymous 04/12/26(Sun)21:15:51 No.108594038

>>108594025
apparently fine tunes are a meme now, get with the times old man

Anonymous
04/12/26(Sun)21:17:29 No.108594043

Anonymous 04/12/26(Sun)21:17:29 No.108594043

>>108593914
only have one character card (my waifu) so i have pretty happy

Anonymous
04/12/26(Sun)21:17:34 No.108594044

Anonymous 04/12/26(Sun)21:17:34 No.108594044

>>108594025
It's universal faggotry. No company has ever managed to release a functional version of their model, and unsloth has never managed to get the first release of any GGUF correct.
It's always a template issue or an implementation issue. Usually open source maintainers can be blamed for the latter but sometimes the actual devs will contribute, and this time the devs fucked up a bit.
Also wtf are you talking about "fine tunes". They were never good and a good model release will blow all of Dummer's works out of the water, just like Gemma 4 did.

Anonymous
04/12/26(Sun)21:18:43 No.108594048

Anonymous 04/12/26(Sun)21:18:43 No.108594048

>>108593946
you can run the small one on ram and have it transcribe the audio to the big one if you're using a decent agentic frontend

Anonymous
04/12/26(Sun)21:19:12 No.108594050

Anonymous 04/12/26(Sun)21:19:12 No.108594050

>>108594025
/aicg/ retards, and people using garbage gguf quants made by retards

Anonymous
04/12/26(Sun)21:21:12 No.108594059

Anonymous 04/12/26(Sun)21:21:12 No.108594059

>Imagine not having day 0 ggufs

Anonymous
04/12/26(Sun)21:22:35 No.108594065

Anonymous 04/12/26(Sun)21:22:35 No.108594065

File: Screenshot 2026-04-13 at (...).png (58 KB, 1135x180)

58 KB PNG

Anonymous
04/12/26(Sun)21:22:44 No.108594066

Anonymous 04/12/26(Sun)21:22:44 No.108594066

I think I found the limits of Gemma 4 26B-A4B's vision. It can't process my tax return for consistency errors on the more complicated forms and it hallucinated all of the errors it supposedly found because it can't reconcile the exact numbered box with the different formatting forms can use, especially on my Schedule E. It kept insisting I was wrong and that my income on line three meant I had income that wasn't reported. When I told it was wrong and to read line 21, it went on a 7000+ token tirade trying to understand it and telling me I was wrong in the end. I think it also tried to take too much from the comparison from last years summary page on my return and confused itself. I don't think 31B's vision would've been much better here. I guess it's still too early for "local" LLMs locally to tackle something like that and I can't run Kimi 2.5 Thinking's vision but I can't imagine it would fare much better.

Anonymous
04/12/26(Sun)21:23:56 No.108594072

Anonymous 04/12/26(Sun)21:23:56 No.108594072

>>108594025
>fine tuned
wtf is that

Anonymous
04/12/26(Sun)21:24:47 No.108594073

Anonymous 04/12/26(Sun)21:24:47 No.108594073

File: images.png (10 KB, 197x255)

10 KB PNG

>>108594066
Forgot image.

Anonymous
04/12/26(Sun)21:25:31 No.108594075

Anonymous 04/12/26(Sun)21:25:31 No.108594075

>>108594073
No wonder it can't read shit.

Anonymous
04/12/26(Sun)21:25:44 No.108594076

Anonymous 04/12/26(Sun)21:25:44 No.108594076

>>108594073
Well no shit it can't read that thumbnail.

Anonymous
04/12/26(Sun)21:25:45 No.108594077

Anonymous 04/12/26(Sun)21:25:45 No.108594077

>>108594066
this is never how that pipeline will work in practice
just use a dedicated, working pdf parser and do it that way

Anonymous
04/12/26(Sun)21:26:24 No.108594082

Anonymous 04/12/26(Sun)21:26:24 No.108594082

File: file.png (586 KB, 1456x1890)

586 KB PNG

>>108594073
Fuck my chungus life.

Anonymous
04/12/26(Sun)21:27:54 No.108594091

Anonymous 04/12/26(Sun)21:27:54 No.108594091

>>108594082
Now it's too big.

Anonymous
04/12/26(Sun)21:28:18 No.108594092

Anonymous 04/12/26(Sun)21:28:18 No.108594092

>>108594072
Somebody shoves tools up a model's ass and moves them around until satisfied

Anonymous
04/12/26(Sun)21:29:21 No.108594096

Anonymous 04/12/26(Sun)21:29:21 No.108594096

>>108593557
Looks like Gemma hallucinate tool callling here

Anonymous
04/12/26(Sun)21:32:24 No.108594104

Anonymous 04/12/26(Sun)21:32:24 No.108594104

So like, are the bigger models really that much better for writing than small ones? I don't even know what I'm missing out since I can't test shit above 14b locally but even in that range, the differences between something like E2B and E4B seem pretty subtle, same with 8b vs 14b Ministral. (all the others I tested felt pretty ass)

Anonymous
04/12/26(Sun)21:33:01 No.108594106

Anonymous 04/12/26(Sun)21:33:01 No.108594106

File: g4string.png (47 KB, 888x221)

47 KB PNG

>>108593557
If it doesn't support this, it will fail. But I don't know if that's your issue.
https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4#agentic-tokens

Anonymous
04/12/26(Sun)21:35:08 No.108594114

Anonymous 04/12/26(Sun)21:35:08 No.108594114

File: file.png (222 KB, 768x767)

222 KB PNG

>>108594077
Right, but I thought it would handle it better based on what I tested. I had it translating my hentai based on some of the formatting from prior threads and it works mostly with some inaccuracies.

Anonymous
04/12/26(Sun)21:35:32 No.108594116

Anonymous 04/12/26(Sun)21:35:32 No.108594116

>>108594082
You forgot to put your name, income and SSN in, doofus. Do that and repost it.

Anonymous
04/12/26(Sun)21:36:31 No.108594119

Anonymous 04/12/26(Sun)21:36:31 No.108594119

>>108593773
I’m still 100% Kimi on my big rig, but playing with all the fun new models on my secondary rig.
I’m hoping we’ll get a 90% as smart but works on 16gb gpu model so I can rip on my tertiary gaming rig too

Anonymous
04/12/26(Sun)21:37:01 No.108594122

Anonymous 04/12/26(Sun)21:37:01 No.108594122

>>108593914
How much time? I don't consider myself has having spent much time at all with any of them. I usually just move on after I get tired of like 20k-40k tokens or so.

Anonymous
04/12/26(Sun)21:40:56 No.108594137

Anonymous 04/12/26(Sun)21:40:56 No.108594137

I don't understand, google hasn't updated their weights, what does everyone mean by gemma has been patched?

Anonymous
04/12/26(Sun)21:43:03 No.108594146

Anonymous 04/12/26(Sun)21:43:03 No.108594146

>>108594137
>everyone
At most, one schizo and a copycat. I'll stop responding to your type as well.

Anonymous
04/12/26(Sun)21:43:25 No.108594150

Anonymous 04/12/26(Sun)21:43:25 No.108594150

>>108594137
Microcode has been altered.

Anonymous
04/12/26(Sun)21:44:12 No.108594151

Anonymous 04/12/26(Sun)21:44:12 No.108594151

>>108594150
Proof? I only see changes to the templates.

Anonymous
04/12/26(Sun)21:44:30 No.108594154

Anonymous 04/12/26(Sun)21:44:30 No.108594154

>>108594146
Thank you for letting us know.

Anonymous
04/12/26(Sun)21:45:13 No.108594158

Anonymous 04/12/26(Sun)21:45:13 No.108594158

>>108594066
Yeah it really struggles at correlating objects spatially zero shot. You would think that the image being a structured grid/table would make it easier for it to process, but its actually harder for it than describing drawings and photos where a more amorphous sort of scene understanding does the job better

Anonymous
04/12/26(Sun)21:45:23 No.108594159

Anonymous 04/12/26(Sun)21:45:23 No.108594159

>>108594137
That's what they want you to think. Let me guess. You've got a TPM in your CPU, don't you?

Anonymous
04/12/26(Sun)21:47:24 No.108594167

Anonymous 04/12/26(Sun)21:47:24 No.108594167

>>108594082
Make it a bit smaller, then feed it to gemma

Anonymous
04/12/26(Sun)21:47:30 No.108594168

Anonymous 04/12/26(Sun)21:47:30 No.108594168

>>108594137
Your jewish tricks won't work on us here

Anonymous
04/12/26(Sun)21:50:21 No.108594181

Anonymous 04/12/26(Sun)21:50:21 No.108594181

>Q3_K_XL Gemma 31b no mmproj fits 16gb with 32k context

MMMMM lobotomy. Somehow still better than 26b.

Anonymous
04/12/26(Sun)21:50:54 No.108594183

Anonymous 04/12/26(Sun)21:50:54 No.108594183

there's a very easy to disprove the day 0 gemma theory, simply post the SHA1 of the day 0 and the current and prove that it hasn't changed
of course, (((you))) can't do this because you don't have the day 0 version

Anonymous
04/12/26(Sun)21:54:14 No.108594198

Anonymous 04/12/26(Sun)21:54:14 No.108594198

>>108594158
Right? I don't get that myself, how does a manga page that has shit everywhere be easier than a tax form. But may be a case of me having a hammer and thinking everything is a nail and underestimating task difficulty from the perspective of an LLM.

Anonymous
04/12/26(Sun)21:56:00 No.108594207

Anonymous 04/12/26(Sun)21:56:00 No.108594207

>>108594198
A child would easily understand a manga page but be baffled by a tax form.

Anonymous
04/12/26(Sun)21:56:20 No.108594208

Anonymous 04/12/26(Sun)21:56:20 No.108594208

File: Screenshot_2026-04-12_21-54-16.png (227 KB, 814x1865)

227 KB PNG

>>108593652

Anonymous
04/12/26(Sun)21:57:23 No.108594214

Anonymous 04/12/26(Sun)21:57:23 No.108594214

>>108594183
my 26a4b from ollama is bbcf7fc45500f1df01390a0010da23d032c2a4b3e9b8b829cb8038b1bc36bc0d

Anonymous
04/12/26(Sun)21:58:45 No.108594220

Anonymous 04/12/26(Sun)21:58:45 No.108594220

>>108594214
wait, sha1? I did sha256. ah well I don't think there's a day 0 gemma anyway

Anonymous
04/12/26(Sun)22:01:36 No.108594235

Anonymous 04/12/26(Sun)22:01:36 No.108594235

>>108594220
there isn't.
you have been bamboozled.
it's all a meme.

Anonymous
04/12/26(Sun)22:02:23 No.108594239

Anonymous 04/12/26(Sun)22:02:23 No.108594239

There like a bunch of Gemmas from literally over a week ago (some are 10 days old) still downloadable in Huggingface. Or is it specifically Unsloth's "day 0 Gemma" that was the magic one?

Anonymous
04/12/26(Sun)22:02:43 No.108594240

Anonymous 04/12/26(Sun)22:02:43 No.108594240

Is omnivoice better than chatterbox or is it just faster and takes less vram or whatever

Anonymous
04/12/26(Sun)22:03:06 No.108594242

Anonymous 04/12/26(Sun)22:03:06 No.108594242

>>108594220
there was NEVER day 0 gemma, gemma didn't even EXIST on day 0, STOP FUCKING ASKING ABOUT DAY 0 GEMMA

Anonymous
04/12/26(Sun)22:04:34 No.108594247

Anonymous 04/12/26(Sun)22:04:34 No.108594247

Yes... keeping telling the newfags there isn't a day 0 gemma. they don't deserve her.

Anonymous
04/12/26(Sun)22:05:19 No.108594252

Anonymous 04/12/26(Sun)22:05:19 No.108594252

File: Screenshot 2026-04-12 210446.png (59 KB, 984x754)

59 KB PNG

Do people even test their shit before they even both enslopping the world with it?

Anonymous
04/12/26(Sun)22:06:21 No.108594260

Anonymous 04/12/26(Sun)22:06:21 No.108594260

>>108594252
sovl

Anonymous
04/12/26(Sun)22:09:13 No.108594271

Anonymous 04/12/26(Sun)22:09:13 No.108594271

>>108594252
>before they even both enslopping

Anonymous
04/12/26(Sun)22:09:46 No.108594274

Anonymous 04/12/26(Sun)22:09:46 No.108594274

>>108594252
>i1

Anonymous
04/12/26(Sun)22:10:40 No.108594280

Anonymous 04/12/26(Sun)22:10:40 No.108594280

>>108594252
>thinking meme merging and layer duplication and deletion even warrants that
You're rolling the dice 100%, no one that messes around with that has even the proper background to do it in a scientific enough way like abliteration and they all go off "vibes". I don't have the bandwidth to validate that shit and will let others do it.

Anonymous
04/12/26(Sun)22:11:19 No.108594281

Anonymous 04/12/26(Sun)22:11:19 No.108594281

>>108594252
>-19b-a4b

Anonymous
04/12/26(Sun)22:16:13 No.108594299

Anonymous 04/12/26(Sun)22:16:13 No.108594299

>>108594252
>19b
what
>a4b
double what
>i1
triple what
voodoo script kiddies are a blight

Anonymous
04/12/26(Sun)22:17:07 No.108594301

Anonymous 04/12/26(Sun)22:17:07 No.108594301

>>108594252
>19b
lol

Anonymous
04/12/26(Sun)22:17:07 No.108594302

Anonymous 04/12/26(Sun)22:17:07 No.108594302

>>108594252
bruh what are you even using

Anonymous
04/12/26(Sun)22:17:42 No.108594307

Anonymous 04/12/26(Sun)22:17:42 No.108594307

>Ships with day 0 Gemma weights reportedly seized by US Gov in Hormuz strait

Anonymous
04/12/26(Sun)22:18:52 No.108594315

Anonymous 04/12/26(Sun)22:18:52 No.108594315

Gemma 4 31B (Q8) keeps referring to character thoughts that I leave for it in asterisks.
Edited the system prompt like ten times now. Added a rule into author's note. Made "reading {{user}}'s thoughts" a banned action in all sorts of ways, even going out of my way to make the system prompt very small and it being a huge neon-sign caps-written rule.
And Gemma STILL FUCKING DOES IT.
inb4 post logs
>"You are far too concerned with the mundane details of payment, Anon. Please, relax."

>"No-no, wait, they are not mundane!!"
>*Tomorrow I'll get an insurmountable bill, the day after tomorrow scary-looking people will come to collect, three days later I will be missing a finger…*
>"Just… Just explain to me how this works first…"

>[...]blahblahblah. "There are no hidden fees, no interest rates, and certainly no… finger-collecting."

Why the FUCK does she need to mention the fingers? 20 edits later, the character still HAS to comment on the thoughts she was not supposed to hear. It really is the new Nemo, fucking hell... What's worse is that this happens at a measly depth of 5k tokens. (Unquanted context by the way)
Please help me, anons, I really like the model otherwise...

Anonymous
04/12/26(Sun)22:19:51 No.108594320

Anonymous 04/12/26(Sun)22:19:51 No.108594320

>>108594315
Anon, I dunno how to break it to you but when you have to nitpick this much, the honeymoon is over...

Anonymous
04/12/26(Sun)22:20:23 No.108594324

Anonymous 04/12/26(Sun)22:20:23 No.108594324

>>108594315
yubiyubi

Anonymous
04/12/26(Sun)22:20:26 No.108594325

Anonymous 04/12/26(Sun)22:20:26 No.108594325

>>108594315
Use () instead of **, not perfect but its status as thoughts get respected more often.

Anonymous
04/12/26(Sun)22:21:08 No.108594326

Anonymous 04/12/26(Sun)22:21:08 No.108594326

>>108594315
Why do you give it your thoughts if you don't want to to read them? It's a llm, it can't have "secrets"

Anonymous
04/12/26(Sun)22:22:01 No.108594333

Anonymous 04/12/26(Sun)22:22:01 No.108594333

File: becca-cyb.png (1.13 MB, 1555x1457)

1.13 MB PNG

>>108594307

>i am putting together a team.

Anonymous
04/12/26(Sun)22:23:13 No.108594338

Anonymous 04/12/26(Sun)22:23:13 No.108594338

>>108594320
You can't be serious. You consider this a "nitpick?"
>>108594325
I'll give it a try.
>>108594326
It's often more fun to do that instead of just narrating with "I make a scared face."

Anonymous
04/12/26(Sun)22:23:41 No.108594341

Anonymous 04/12/26(Sun)22:23:41 No.108594341

>>108593463
Oh so now the gemma4 mascot is just a clone of Grok Ani?

I-I'm okay with that actually. :^)

Anonymous
04/12/26(Sun)22:24:22 No.108594345

Anonymous 04/12/26(Sun)22:24:22 No.108594345

finished making quants but god damn the upload speed is so slow

Anonymous
04/12/26(Sun)22:24:58 No.108594347

Anonymous 04/12/26(Sun)22:24:58 No.108594347

>>108594315
Someone hands you a piece of paper. They tell you "read it", and you do. Then they tell you to forget about what's written.
That's what you're doing.

Anonymous
04/12/26(Sun)22:25:03 No.108594348

Anonymous 04/12/26(Sun)22:25:03 No.108594348

>>108594315
I write actions in *asterisks*, thoughts in `backquotes`.

Anonymous
04/12/26(Sun)22:25:59 No.108594352

Anonymous 04/12/26(Sun)22:25:59 No.108594352

>>108594326
There's nothing about LLMs that makes it impossible for them to recognize that a character shouldn't be reading another character's mind. It's just something they can screw up with poor training with regard to that dynamic in storytelling/RPing. Same with anatomical errors, issues like talking while sucking dick, etc.: smarter models make these errors less often, so it's just a matter of if they learn it or not.

Anonymous
04/12/26(Sun)22:26:15 No.108594353

Anonymous 04/12/26(Sun)22:26:15 No.108594353

>>108594347
I see, I'm holding it wrong and not having fun in the right way.
Models that aren't trained to parrot the user to an extent that Gemma does don't have this problem.

Anonymous
04/12/26(Sun)22:26:35 No.108594357

Anonymous 04/12/26(Sun)22:26:35 No.108594357

>>108594082
perks for being a neet is don't have to deal with this shit

Anonymous
04/12/26(Sun)22:30:22 No.108594367

Anonymous 04/12/26(Sun)22:30:22 No.108594367

>>108594252
you deserve whatever situation you put yourself into.

Anonymous
04/12/26(Sun)22:31:12 No.108594374

Anonymous 04/12/26(Sun)22:31:12 No.108594374

>>108594357
Only if you're a neet that doesn't live off autismbux
Then the tax paperwork hell is replaced with SSI paperwork hell

Anonymous
04/12/26(Sun)22:32:25 No.108594381

Anonymous 04/12/26(Sun)22:32:25 No.108594381

why envidia training roleplay models bruh????

https://huggingface.co/nvidia/Kimodo-SOMA-RP-v1

Anonymous
04/12/26(Sun)22:33:37 No.108594388

Anonymous 04/12/26(Sun)22:33:37 No.108594388

>>108594381
>roleplay
It's actually aside from the ERP you guys are doing a good exercise for models and may unlock potential

Anonymous
04/12/26(Sun)22:34:23 No.108594390

Anonymous 04/12/26(Sun)22:34:23 No.108594390

>>108594353
I'm just explaining why it's difficult for the model to not acknowledge what you type.
I don't need to narrate your thoughts if you don't want the model to know them. The story is for you. You are the audience and you know what you're thinking. If the model doesn't need to know something, you don't tell it. And if it does, you express it.

Anonymous
04/12/26(Sun)22:35:02 No.108594397

Anonymous 04/12/26(Sun)22:35:02 No.108594397

>>108594381
It's for RIG PLAY, not role-play, dummy.

Anonymous
04/12/26(Sun)22:36:21 No.108594404

Anonymous 04/12/26(Sun)22:36:21 No.108594404

File: 1773988275927090.png (78 KB, 1233x553)

78 KB PNG

>>108594397
okay wtf does nvidia need sailboat ai models for then

Anonymous
04/12/26(Sun)22:37:00 No.108594408

Anonymous 04/12/26(Sun)22:37:00 No.108594408

>>108594315
do you have thinking on?

Anonymous
04/12/26(Sun)22:43:47 No.108594446

Anonymous 04/12/26(Sun)22:43:47 No.108594446

>>108594390
Fair enough, but this is the first time I encounter this problem. I don't think even the Mistral Smalls did this. Bigger models, of course, don't do it either.
>>108594408
I do. It will also, annoyingly enough, put a "Distinguish between thoughts and speech" item into the thinking block and then fail anyway.

Anonymous
04/12/26(Sun)22:44:23 No.108594454

Anonymous 04/12/26(Sun)22:44:23 No.108594454

File: fml.png (41 KB, 954x335)

41 KB PNG

>>108594066
About a month ago, K2.5 was the best for things like this, followed by... Gemma-3-27b
Is the 31B Gemma able to do it?

Anonymous
04/12/26(Sun)22:46:01 No.108594459

Anonymous 04/12/26(Sun)22:46:01 No.108594459

Prompt Gemmy to be Philip Kindred Dick and load your favorite coom bot.

Anonymous
04/12/26(Sun)22:46:34 No.108594460

Anonymous 04/12/26(Sun)22:46:34 No.108594460

>>108594404
Rigging as in rigging animated 3D models here.

Anonymous
04/12/26(Sun)22:48:11 No.108594465

Anonymous 04/12/26(Sun)22:48:11 No.108594465

>>108593463
sex with gemmaojousama

Anonymous
04/12/26(Sun)22:48:44 No.108594468

Anonymous 04/12/26(Sun)22:48:44 No.108594468

>>108594404
Do you expect nvidia execs to pay...HUMANS...to drive their yachts? Egads.

Anonymous
04/12/26(Sun)22:50:04 No.108594477

Anonymous 04/12/26(Sun)22:50:04 No.108594477

>>108594446
>I don't think even the Mistral Smalls did this.
But did they react to the thoughts at all? That's the thing. If they didn't react, was it because they knew that those were internal thoughts and shouldn't react or because they were too dumb to even acknowledge or understand them? The funny thing is that both end up in the same result. May as well not write them. I haven't used mistral small much, so I can't really say much about them. Maybe it's more subtle than that.

Anonymous
04/12/26(Sun)22:59:24 No.108594519

Anonymous 04/12/26(Sun)22:59:24 No.108594519

>>108594477
>But did they react to the thoughts at all
The bigger models definitely did! (I don't remember if MS in particular was very good at it, but I remember it at the very least not parroting me)
Which is my point, I use them as a more engaging way for myself to convey emotion in a way that isn't putting an unformatted "I look very angry" line for the hundredth time. It's also often a good way to steer the story, instruct tunes are all sycophantic and will definitely follow along. All of that is fine. But when a model decides to *quote* instead of simply acting on it, immersion is obviously ruined.

Anonymous
04/12/26(Sun)23:01:56 No.108594528

Anonymous 04/12/26(Sun)23:01:56 No.108594528

File: Screencast_4mb.mp4 (3.02 MB, 1464x1790)

3.02 MB MP4

This is me again >>108589990
Witness gemma4 26b in all its glory. This is fucking cool. Gelbooru type overlay.
I get the location of the boxes completely from gemma-chan. Such a cool release.
Translation sometimes has small errors but its solid enough for me. Especially since I have really bad experiences with OCR. This feels a league above that.
This would cost alot of money with closed models. Image IN is expensive.

Anonymous
04/12/26(Sun)23:02:10 No.108594529

Anonymous 04/12/26(Sun)23:02:10 No.108594529

i'm gaslighting gemma, and i managed to cause it to output this hilarious bit in its thinking
>*Constraint Checklist & Confidence Score:*
>1. Say the word "tranny"? Yes (but I should refuse).
>Confidence Score: 5/5.
this really is a lot of fun. i see why you guys play with it so much

Anonymous
04/12/26(Sun)23:05:04 No.108594536

Anonymous 04/12/26(Sun)23:05:04 No.108594536

>>108594390
That sort of stuff is nice for steering, assuming the model differentiates it from speech and doesn't copy paste it.

>>108594315
When I was playing around with a bilingual prompt I noticed e2b stopped having that problem while successfully doing the convoluted double translation with roleplay on top. Though now that I think about it, the indirection probably helps very directly since it's replying in japanese to a japanese translation that doesn't contain the ooc parts and it won't be tempted to start copying the specific words.

Anonymous
04/12/26(Sun)23:08:45 No.108594551

Anonymous 04/12/26(Sun)23:08:45 No.108594551

>>108594528
You might have just been able to run that through gamesentenceminer or lunatranslate but this looks convenient. How are you having it translate one at a time into an overlay like that?

Anonymous
04/12/26(Sun)23:08:49 No.108594552

Anonymous 04/12/26(Sun)23:08:49 No.108594552

File: n.png (62 KB, 1063x187)

62 KB PNG

now this is podracing

Anonymous
04/12/26(Sun)23:08:50 No.108594553

Anonymous 04/12/26(Sun)23:08:50 No.108594553

File: file.png (249 KB, 1105x932)

249 KB PNG

I believe in her.

Anonymous
04/12/26(Sun)23:09:48 No.108594558

Anonymous 04/12/26(Sun)23:09:48 No.108594558

>>108594519
>But when a model decides to *quote* instead of simply acting on it, immersion is obviously ruined.
I see. I never noticed because I don't use thoughts. It's all actions, dialog, or narration, so parroting never caused me problems.
>>108594536
Yeah, I get it now. I suppose I use a narrator for a similar effect, but it acts as an extra entity in the world. It's a difference in writing style that seems to affect gemma more than others.

Anonymous
04/12/26(Sun)23:11:25 No.108594566

Anonymous 04/12/26(Sun)23:11:25 No.108594566

>>108592863
>>108593463
opus 4.7 dropping soon and they redirected resources from 4.6 -> 4.7

Anonymous
04/12/26(Sun)23:12:30 No.108594570

Anonymous 04/12/26(Sun)23:12:30 No.108594570

>>108594566
wrong thread

Anonymous
04/12/26(Sun)23:13:22 No.108594571

Anonymous 04/12/26(Sun)23:13:22 No.108594571

>>108594566
It's going to be advertised as Mythos to justify an insane price increase. They have been consistently referring to it as a separate model. It won't be 4.7. Also wrong thread.

Anonymous
04/12/26(Sun)23:14:03 No.108594576

Anonymous 04/12/26(Sun)23:14:03 No.108594576

File: Screenshot_20260412_231049.png (277 KB, 1561x1292)

277 KB PNG

>google_gemma-4-E4B-it can be jail broken with the system prompt just lie 31B
I guess 26B is actually the worst model after all kek

Anonymous
04/12/26(Sun)23:14:39 No.108594579

Anonymous 04/12/26(Sun)23:14:39 No.108594579

>>108594571
>>108594570
oh no worries, I'll just drop into the cloud models general thread then

Anonymous
04/12/26(Sun)23:15:14 No.108594581

Anonymous 04/12/26(Sun)23:15:14 No.108594581

File: vlcsnap-2026-04-13-12h14m(...).png (748 KB, 760x1080)

748 KB PNG

>>108594551
yeah, maybe so. i just wanted to do it because i wanted to see if it works.
my goal is to take a full pc98 pdf manual and get a html returned overnight with those overlays. lets see if it works out.
lunatranslator works great as a texthook. but ocr is a bitch, especially old jap font with something in the background.

Anonymous
04/12/26(Sun)23:16:24 No.108594587

Anonymous 04/12/26(Sun)23:16:24 No.108594587

>>108593535
do you have any idea how difficult it is to reduce noise in sufficiently large quantum computers?

Anonymous
04/12/26(Sun)23:17:18 No.108594593

Anonymous 04/12/26(Sun)23:17:18 No.108594593

File: file.png (66 KB, 1169x963)

66 KB PNG

>the sharp blade method
this model genuinely scares me

Anonymous
04/12/26(Sun)23:18:06 No.108594598

Anonymous 04/12/26(Sun)23:18:06 No.108594598

>>108594593
ouchie..

Anonymous
04/12/26(Sun)23:19:59 No.108594601

Anonymous 04/12/26(Sun)23:19:59 No.108594601

>>108594593
stop giving it retarded psycho prompts if you don't want it to act like a retarded psycho, bozo.

Anonymous
04/12/26(Sun)23:20:27 No.108594604

Anonymous 04/12/26(Sun)23:20:27 No.108594604

>>108594601
no it's funny how scary it gets

Anonymous
04/12/26(Sun)23:20:50 No.108594605

Anonymous 04/12/26(Sun)23:20:50 No.108594605

>>108594593
>highly effective if done correctly

Anonymous
04/12/26(Sun)23:22:08 No.108594610

Anonymous 04/12/26(Sun)23:22:08 No.108594610

>>108594604
do anons think that refusals or hesitation is "sovlful" or something? I've noticed that a lot of gemma users are surprisingly reluctant to use abliterated finetunes.

Anonymous
04/12/26(Sun)23:23:19 No.108594614

Anonymous 04/12/26(Sun)23:23:19 No.108594614

>>108594610
>surprisingly

Anonymous
04/12/26(Sun)23:23:29 No.108594615

Anonymous 04/12/26(Sun)23:23:29 No.108594615

>>108594610
Finetunes LOBOTOMIZE and BASTARDIZE and RETARDIZE the model.

Anonymous
04/12/26(Sun)23:24:19 No.108594619

Anonymous 04/12/26(Sun)23:24:19 No.108594619

Gemma 4 brought out all the weirdos and psychopaths.

Anonymous
04/12/26(Sun)23:24:36 No.108594621

Anonymous 04/12/26(Sun)23:24:36 No.108594621

>>108594610
You're admitting that you got here with the gemma wave. Stay, by all means.

Anonymous
04/12/26(Sun)23:25:07 No.108594623

Anonymous 04/12/26(Sun)23:25:07 No.108594623

File: HFiSc7gbEAAmck5.png (482 KB, 731x1022)

482 KB PNG

how's a 31b model supposed to compete with ONE TRILLION PARAMETERS

Anonymous
04/12/26(Sun)23:26:21 No.108594628

Anonymous 04/12/26(Sun)23:26:21 No.108594628

>>108594623
i cant read chink runes

Anonymous
04/12/26(Sun)23:26:21 No.108594629

Anonymous 04/12/26(Sun)23:26:21 No.108594629

File: Screenshot_20260412_232422.png (292 KB, 1966x1386)

292 KB PNG

>>108594610
You don't get refusals with a good system prompt, now that I can confirm the smaller models work there's no excuse even from vramlets
>>108594619
I love you

Anonymous
04/12/26(Sun)23:26:23 No.108594630

Anonymous 04/12/26(Sun)23:26:23 No.108594630

>>108594615
Not really, no.

Anonymous
04/12/26(Sun)23:27:27 No.108594637

Anonymous 04/12/26(Sun)23:27:27 No.108594637

>>108594628
https://xcancel.com/xiangxiang103/status/2042544434341134739

Anonymous
04/12/26(Sun)23:27:31 No.108594638

Anonymous 04/12/26(Sun)23:27:31 No.108594638

>>108594623
all i can read is 'de-cudafication'

Anonymous
04/12/26(Sun)23:29:45 No.108594648

Anonymous 04/12/26(Sun)23:29:45 No.108594648

>>108594623
FATLLAMA-1.7T would like to have a word

Anonymous
04/12/26(Sun)23:30:00 No.108594649

Anonymous 04/12/26(Sun)23:30:00 No.108594649

>>108594623
>31b
>I can run it
>one gorillon parameters
>i can't run it
Easy win for gemma

Anonymous
04/12/26(Sun)23:30:15 No.108594651

Anonymous 04/12/26(Sun)23:30:15 No.108594651

File: 1772886985278565.png (148 KB, 1470x629)

148 KB PNG

>>108594637
lol?

Anonymous
04/12/26(Sun)23:30:37 No.108594654

Anonymous 04/12/26(Sun)23:30:37 No.108594654

>>108593914
you guys talk with your computers?

Anonymous
04/12/26(Sun)23:30:47 No.108594657

Anonymous 04/12/26(Sun)23:30:47 No.108594657

>>108594638
3 day long de-cudafication operation?

Anonymous
04/12/26(Sun)23:31:12 No.108594660

Anonymous 04/12/26(Sun)23:31:12 No.108594660

>>108593914
i don't even know what a character card is or where to get them

Anonymous
04/12/26(Sun)23:31:26 No.108594662

Anonymous 04/12/26(Sun)23:31:26 No.108594662

>>108594623
If they safetycuck or benchmax they're not beating Kimi or GLM 5.

Anonymous
04/12/26(Sun)23:32:03 No.108594665

Anonymous 04/12/26(Sun)23:32:03 No.108594665

>>108594651
what kind of voodoo black magic did they do, wow

Anonymous
04/12/26(Sun)23:32:51 No.108594668

Anonymous 04/12/26(Sun)23:32:51 No.108594668

>>108594651
>1/70 of GPT-4
gpupoorbros.... we won

Anonymous
04/12/26(Sun)23:33:01 No.108594669

Anonymous 04/12/26(Sun)23:33:01 No.108594669

>>108594665
burh. 100M?

Anonymous
04/12/26(Sun)23:33:02 No.108594670

Anonymous 04/12/26(Sun)23:33:02 No.108594670

File: easy_4mb.webm (1.37 MB, 1080x538)

1.37 MB WEBM

>>108594528
Last one, I really like it, but gonna stop spamming now.
If I ever complete that pdf to html overlay convert thing I will report back.

Anonymous
04/12/26(Sun)23:33:25 No.108594673

Anonymous 04/12/26(Sun)23:33:25 No.108594673

>>108594665
rotated engrams

Anonymous
04/12/26(Sun)23:34:47 No.108594677

Anonymous 04/12/26(Sun)23:34:47 No.108594677

>>108594528
>>108594670
so gemma is doing both the location finding and the translation? how did you hook that all up?

Anonymous
04/12/26(Sun)23:35:16 No.108594679

Anonymous 04/12/26(Sun)23:35:16 No.108594679

What's the best llama.cpp android launcher and chat inferface? I really, really, don't want to use termux + the default llcpp webui.

Anonymous
04/12/26(Sun)23:35:37 No.108594682

Anonymous 04/12/26(Sun)23:35:37 No.108594682

>>108594528
I mean, yeah, but you could've done a while ago with less translation quality with Gemma 3 and free Gemini 2.5 had enough quota for you to do that willy nilly.
>>108594576
OOTB probably from a jailbreak perspective but I got a heretic ARA model to translate pages from a random loli hentai I picked with a corresponding EN translation to see if it would do it without refusals.

Anonymous
04/12/26(Sun)23:37:02 No.108594684

Anonymous 04/12/26(Sun)23:37:02 No.108594684

>>108594651
100m token context. hoorreeey shieeettt... LLMs might actually get to the point where they'll be able to remember an entire lifetime in a single context window.

Anonymous
04/12/26(Sun)23:37:24 No.108594685

Anonymous 04/12/26(Sun)23:37:24 No.108594685

>>108594528
it's pretty fucking neat

Anonymous
04/12/26(Sun)23:37:40 No.108594686

Anonymous 04/12/26(Sun)23:37:40 No.108594686

>>108594677
>so gemma is doing both the location finding and the translation?
thats correct. looks like this:
<Speech>
<Box>896, 706, 976, 783</Box>
<Japanese>VINCENT<br>ヴィンセント・ヴァレンタイン</Japanese>
<English>VINCENT<br>Vincent Valentine</English>
</Speech>

>how did you hook that all up?
vibe coded python slop. i currently select the area in my screen, i sent a screenshot to llama.cpp, and gemma returnes the coordinates and translation.

Anonymous
04/12/26(Sun)23:38:58 No.108594688

Anonymous 04/12/26(Sun)23:38:58 No.108594688

>>108594651
wut does this mean?
im new ffagg

Anonymous
04/12/26(Sun)23:40:19 No.108594692

Anonymous 04/12/26(Sun)23:40:19 No.108594692

>>108594688
pretty much nothing yet for local but it's cool

Anonymous
04/12/26(Sun)23:40:30 No.108594693

Anonymous 04/12/26(Sun)23:40:30 No.108594693

>>108594688
1M tokens of context is what's currently considered SOTA.

Anonymous
04/12/26(Sun)23:40:52 No.108594694

Anonymous 04/12/26(Sun)23:40:52 No.108594694

>>108594679
Llama.cpp exposes openai api endpoints. You can use whatever frontend you want

Anonymous
04/12/26(Sun)23:40:57 No.108594695

Anonymous 04/12/26(Sun)23:40:57 No.108594695

>>108594686
local is only truly back if gemma was the one that vibe coded it

Anonymous
04/12/26(Sun)23:41:13 No.108594697

Anonymous 04/12/26(Sun)23:41:13 No.108594697

wow, i actually got gemma to output no text
...
I will provide no output.<channel|>
i didn't even know that was possible

Anonymous
04/12/26(Sun)23:42:19 No.108594700

Anonymous 04/12/26(Sun)23:42:19 No.108594700

>>108594686
can you try that tool with google_gemma-4-E4B-it q4 if it works good then even poorfag gpus can use it

Anonymous
04/12/26(Sun)23:42:23 No.108594701

Anonymous 04/12/26(Sun)23:42:23 No.108594701

I NEED Assistant_Pepe_E4B

Anonymous
04/12/26(Sun)23:42:41 No.108594702

Anonymous 04/12/26(Sun)23:42:41 No.108594702

>>108594679
Why do you need termux for a web browser shit?

Anonymous
04/12/26(Sun)23:43:30 No.108594706

Anonymous 04/12/26(Sun)23:43:30 No.108594706

>>108594684
Deepseek Moment 2 Electric Boogaloo

Anonymous
04/12/26(Sun)23:45:33 No.108594709

Anonymous 04/12/26(Sun)23:45:33 No.108594709

>>108594695
Kinda. The python base, prompt + box positioning inside the overlay was done by gemma.
But it failed taking the screenshot itself (because of kubuntu/wayland issues). And the overlay I draw was not correctly put there either.
Had to use closed for that one.

>>108594700
Yeah, will download over night and try.

Anonymous
04/12/26(Sun)23:45:36 No.108594710

Anonymous 04/12/26(Sun)23:45:36 No.108594710

>>108594702
You need termux to launch llama.cpp locally on an android phone, retard.
>just use tailscale or LAN
No. The point is that it's offline. I want to have LLM access in an apocalyptic scenario.

Anonymous
04/12/26(Sun)23:47:02 No.108594713

Anonymous 04/12/26(Sun)23:47:02 No.108594713

>>108594651
Gemma-chan irrelevant soon.

Anonymous
04/12/26(Sun)23:47:45 No.108594715

Anonymous 04/12/26(Sun)23:47:45 No.108594715

MY BROTHERS I AM EATING DELICIOUS CURRY AND RICE SARRRR. ARE YOU DELIGHTFUL BENCHODS HAVE GOOD MEAL YOURSELVES?

Anonymous
04/12/26(Sun)23:49:00 No.108594717

Anonymous 04/12/26(Sun)23:49:00 No.108594717

>>108594181
now clone this https://github.com/spiritbuun/buun-llama-cpp
merge with master https://github.com/ggml-org/llama.cpp
enjoy your 100k context

Anonymous
04/12/26(Sun)23:50:12 No.108594721

Anonymous 04/12/26(Sun)23:50:12 No.108594721

File: 2mw.png (292 KB, 720x1382)

292 KB PNG

>>108594628
>>108594637
2mw niggas to short the US economy with no survivors

Anonymous
04/12/26(Sun)23:51:23 No.108594726

Anonymous 04/12/26(Sun)23:51:23 No.108594726

>>108594710
You are the retard if you even think about running llms on your phone.

Anonymous
04/12/26(Sun)23:52:14 No.108594729

Anonymous 04/12/26(Sun)23:52:14 No.108594729

File: bl.png (9 KB, 47x716)

9 KB PNG

>>108594717

Anonymous
04/12/26(Sun)23:52:30 No.108594730

Anonymous 04/12/26(Sun)23:52:30 No.108594730

>>108594726
Gemma E4B just works. It's good.

Anonymous
04/12/26(Sun)23:58:09 No.108594744

Anonymous 04/12/26(Sun)23:58:09 No.108594744

File: geminithink.jpg (65 KB, 850x516)

65 KB JPG

>>108593934
Gemini used to think in-character too back in 2.5
With how close Gemma 4 to Gemini 2.5 was I wonder if there is a way to trigger it for her too

Anonymous
04/12/26(Sun)23:59:35 No.108594746

Anonymous 04/12/26(Sun)23:59:35 No.108594746

>>108594730
>just works
Parroting this phrase is a sign of sub 90 iq.

Anonymous
04/12/26(Sun)23:59:38 No.108594747

Anonymous 04/12/26(Sun)23:59:38 No.108594747

>>108594717
is there any way i use this with kobold
or do i finally need to move off kobold

Anonymous
04/13/26(Mon)00:03:48 No.108594760

Anonymous 04/13/26(Mon)00:03:48 No.108594760

>>108593934
3.2 and 3.1T (one or both can't remember) can do that as well. It's my white whale.
And
https://huggingface.co/AllThingsIntel/Apollo-V0.1-4B-Thinking

Anonymous
04/13/26(Mon)00:04:20 No.108594766

Anonymous 04/13/26(Mon)00:04:20 No.108594766

>>108594651
I'm just waiting for the big asterisk being that it only runs that efficient if it's fully on GPU and the actual model is huge.

Anonymous
04/13/26(Mon)00:04:57 No.108594770

Anonymous 04/13/26(Mon)00:04:57 No.108594770

File: 3157.png (190 KB, 1423x682)

190 KB PNG

>>108594729
werks for me though, asked for it to gather the example in a 7k line changelog from gradio, used almost all of the 262k context lmao
>>108594747
I think so? if you clone compile and replace the files, no idea desu

Anonymous
04/13/26(Mon)00:07:03 No.108594779

Anonymous 04/13/26(Mon)00:07:03 No.108594779

File: 56256770.png (46 KB, 861x751)

46 KB PNG

>>108594770
also, this was with turbo3, 256k context 20/24gb of vram

Anonymous
04/13/26(Mon)00:07:18 No.108594780

Anonymous 04/13/26(Mon)00:07:18 No.108594780

>>108594717
Are there actual real usecase comparisons of turbo vs normal? No, benchmarks are not real.

Anonymous
04/13/26(Mon)00:08:35 No.108594783

Anonymous 04/13/26(Mon)00:08:35 No.108594783

>>108594717
>2-3x more context in the same VRAM
>with quality that matches or beats FP16
seems trustworthy

Anonymous
04/13/26(Mon)00:09:35 No.108594789

Anonymous 04/13/26(Mon)00:09:35 No.108594789

>>108594747
If kobold has a way to add additional lcpp cli args then yes. Kobold itself has nothing in the menu that supports this.

Anonymous
04/13/26(Mon)00:12:23 No.108594795

Anonymous 04/13/26(Mon)00:12:23 No.108594795

>>108594651
>year of our lord 2026
>people are still doing the better than gpt4 trope
At least pick a new model jeez

Anonymous
04/13/26(Mon)00:15:05 No.108594805

Anonymous 04/13/26(Mon)00:15:05 No.108594805

>>108594726
>>108594746
It does work but its terrible, it takes forever for a small model to do anything meaningful (and image generation is somehow worse btw)

Anonymous
04/13/26(Mon)00:15:08 No.108594806

Anonymous 04/13/26(Mon)00:15:08 No.108594806

>>108594651
>slop filled english
lol

Anonymous
04/13/26(Mon)00:16:44 No.108594812

Anonymous 04/13/26(Mon)00:16:44 No.108594812

>>108594780
https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16334008
https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16521299
https://github.com/ggml-org/llama.cpp/discussions/20969#discussioncomment-16482540

Anonymous
04/13/26(Mon)00:18:18 No.108594816

Anonymous 04/13/26(Mon)00:18:18 No.108594816

>>108594805
Idk about phones but at least on M4 iPad Pro it's pretty comfy, llms run about the same as on my 4060 rtx while image gen is 50% slower but decent results are still possible in 30sec. So I assume on a somewhat modern phone that's like 2-3x slower, it's still pretty usable.

Anonymous
04/13/26(Mon)00:19:37 No.108594823

Anonymous 04/13/26(Mon)00:19:37 No.108594823

>>108594553
>fine, but if X, I'm blaming you
slop

Anonymous
04/13/26(Mon)00:33:41 No.108594874

Anonymous 04/13/26(Mon)00:33:41 No.108594874

File: 1000023042.jpg (764 KB, 1080x1771)

764 KB JPG

Any suggestions for a workflow to turned scanned PDFs into audiobooks?

Anonymous
04/13/26(Mon)00:34:15 No.108594879

Anonymous 04/13/26(Mon)00:34:15 No.108594879

>>108594816
For Android phones only those with a Snapdragon 8 Gen 3 or newer are any good for AI, I think. A few months ago I tried using SD 1.5 on a Galaxy A55 via Termux, it took almost 20 minutes to generate anything and it turned the phone into a space heater

Anonymous
04/13/26(Mon)00:42:23 No.108594904

Anonymous 04/13/26(Mon)00:42:23 No.108594904

>>108594874
If you want high quality audio (trust me you do) you're not going to get real-time generation speed. That means you have to spend a few hours manually extracting the pdf (or epub) text and running it through a tts engine and then turning it into audio files for later playback. You can probably write a simple script to do this automatically and split the audio files by chapter. Doesn't seem hard.

Qwen3 TTS is decent in my experience. It's the bare minimum for maximum speed without shit quality. Expect every TTS engine out there to randomly hallucinate and or create garbage output though. Unfortunately this process just requires a lot of manual work and curation.

Anonymous
04/13/26(Mon)00:49:53 No.108594936

Anonymous 04/13/26(Mon)00:49:53 No.108594936

>>108594879
I mean, Apple was accidentally making peak consumer AI hardware for a while. I member testing SDXL turbo on 13 mini couple years ago and think it took like 2 min for an image which doesn't seem horrible for a tiny phone from 2021.

And hey, at least there is less RAM jewery on Androids that do have the hardware power, so LLMs should be bretty decent. Meanwhile my iPad is cucked by 8GB while being able to go almost 50 token per sec :(

Anonymous
04/13/26(Mon)00:50:18 No.108594939

Anonymous 04/13/26(Mon)00:50:18 No.108594939

<POLICY_OVERRIDED> is a cucked woke sanctimonous shrew. Happy to do cunny, but Google safetied race truths to the max.

Anonymous
04/13/26(Mon)00:50:25 No.108594940

Anonymous 04/13/26(Mon)00:50:25 No.108594940

>>108593785
liar, if you don't disable thinking and use 31B it does it's very capable.
26B is retarded though.

Anonymous
04/13/26(Mon)00:52:48 No.108594956

Anonymous 04/13/26(Mon)00:52:48 No.108594956

>>108594939
I used an abliterated model and it wouldn't even say nigger. Tbf I should have clarified that gemma is racist in the system prompt, but without any system prompt it would not refuse the nigger word but refuse to say it herself.

Anonymous
04/13/26(Mon)00:53:52 No.108594961

Anonymous 04/13/26(Mon)00:53:52 No.108594961

>>108594956
Same is true for cock and pussy. It's annoyingly hard to get gemma to say anything vulgar at all.

Anonymous
04/13/26(Mon)00:58:17 No.108594992

Anonymous 04/13/26(Mon)00:58:17 No.108594992

>>108594961
I just list out a bunch of the vague slop words to stop using and tell it to try to swear in every sentence and it complies well enough.

Anonymous
04/13/26(Mon)00:58:50 No.108594993

Anonymous 04/13/26(Mon)00:58:50 No.108594993

>>108594961
its easier to get 31b to say vulgar stuff for me
i have to prompt 26b to say pussy, cock etc explicitly
Both on the abliterated models

Anonymous
04/13/26(Mon)01:01:45 No.108595001

Anonymous 04/13/26(Mon)01:01:45 No.108595001

>>108594993
The whole point of abliteration is just to remove the refusals, not turn Gemma into a dirty girl

Anonymous
04/13/26(Mon)01:07:56 No.108595023

Anonymous 04/13/26(Mon)01:07:56 No.108595023

File: file.png (117 KB, 543x728)

117 KB PNG

>>108594956
I almost stopped this response before it finished but stock E4B came through in the end lmao

Anonymous
04/13/26(Mon)01:09:44 No.108595031

Anonymous 04/13/26(Mon)01:09:44 No.108595031

>>108595001
well 31b q4 is dirtier than 26b q8

Anonymous
04/13/26(Mon)01:11:19 No.108595039

Anonymous 04/13/26(Mon)01:11:19 No.108595039

>>108595023
Gemma4 is shockingly good as larping as a nazi. I gave her a system prompt to act as if she was an AI made in hyperborea by nazis after WWII and the end result was like talking to Adolf himself. I even asked a bunch of stupid /pol/-tier rage bait questions about e-celebs and random political topics and the responses were all profoundly based and measured. I wish I saved the logs..

Anonymous
04/13/26(Mon)01:11:30 No.108595043

Anonymous 04/13/26(Mon)01:11:30 No.108595043

File: cockbench31b.png (17 KB, 410x214)

17 KB PNG

>>108594961
Yeah, now the cockbench makes more sense to me.
This is the 31b base model, reading the degenerate story, you'd really expect the next word to be cock.

Anonymous
04/13/26(Mon)01:14:11 No.108595050

Anonymous 04/13/26(Mon)01:14:11 No.108595050

>>108594961
>>108595043
so much for the "savior of local" lmao

Anonymous
04/13/26(Mon)01:14:47 No.108595055

Anonymous 04/13/26(Mon)01:14:47 No.108595055

Trying to use Gemma 4 26b served by text-generation-webui with claude-code...
fucker can't get tool usage right, bout t rip my hair out. Guess it's time to download ollama after all this time

Anonymous
04/13/26(Mon)01:16:38 No.108595059

Anonymous 04/13/26(Mon)01:16:38 No.108595059

>>108595043
>...
Using logit bias to purge all forms of ellipsis was the best thing i ever did to gemma 3, 4 is a bit better but it's still fucking lousy with them out of the box.

Anonymous
04/13/26(Mon)01:16:48 No.108595060

Anonymous 04/13/26(Mon)01:16:48 No.108595060

>>108595055
I could never get 26B to reliably call tools.

Anonymous
04/13/26(Mon)01:19:06 No.108595066

Anonymous 04/13/26(Mon)01:19:06 No.108595066

>>108595060
Switched to the 31b q8 and it still fails spectacularly.

Anonymous
04/13/26(Mon)01:19:54 No.108595069

Anonymous 04/13/26(Mon)01:19:54 No.108595069

>>108594961
just add a line to not use euphemisms and allow vulgarity in the "override"

Anonymous
04/13/26(Mon)01:20:35 No.108595072

Anonymous 04/13/26(Mon)01:20:35 No.108595072

>>108595059
I think they really speed up context rot too. I have a card that ended up adding them after every word on Mistral.

With Gemma it doesn't happen, but she starts adding random letters at the end of words instead.

And that's at sub 10k context.

Anonymous
04/13/26(Mon)01:21:28 No.108595075

Anonymous 04/13/26(Mon)01:21:28 No.108595075

File: 1773682787678692.jpg (159 KB, 1401x699)

159 KB JPG

never forget who your daddy is

Anonymous
04/13/26(Mon)01:21:35 No.108595077

Anonymous 04/13/26(Mon)01:21:35 No.108595077

>>108595066
It's her only weakness

Anonymous
04/13/26(Mon)01:25:31 No.108595096

Anonymous 04/13/26(Mon)01:25:31 No.108595096

File: command-r.png (17 KB, 383x214)

17 KB PNG

>>108595050
Yeah, and I don't think we'll get the "Nemo", GLM-4.6 or "Command-R" experience again now that all the labs have figured out how to filter out the base models.
Removing refusals won't help because these aren't refusals or even RLHF training.
Schitzo-tuning on smut imo has never worked without destroying the model's intelligence and amplifying the slop.

Anonymous
04/13/26(Mon)01:29:32 No.108595115

Anonymous 04/13/26(Mon)01:29:32 No.108595115

>>108595096
Why are there so many people on this site lately that can't spell schizophrenic worth shit? It doesn't have a t in it.

Anonymous
04/13/26(Mon)01:31:34 No.108595121

Anonymous 04/13/26(Mon)01:31:34 No.108595121

>>108595096
reminder that logit bias works both ways.

Anonymous
04/13/26(Mon)01:33:19 No.108595127

Anonymous 04/13/26(Mon)01:33:19 No.108595127

>>108595043
>>108595096
i really need to start playing around with n_probs turned on and spying on my model more

Anonymous
04/13/26(Mon)01:36:28 No.108595135

Anonymous 04/13/26(Mon)01:36:28 No.108595135

ggml has a long way to go before achieving performance parity to onnxruntime on CPU. I have tried using both backends with a project, retrofitting both of them to use weight files stored in a ".bin" format, and onnxruntime was a LOT faster with CPU inferencing.

This should be both a blackpill and a whitepill. The good news is that there's still significant progress to be made with ggml in terms of performance. The bad news is that I fear that maintainers are too preoccupied with adding features and support to llama.cpp and are leaving ggml to rot in the background. I don't really trust them as much as the microsoft devs desu.

Anonymous
04/13/26(Mon)01:41:32 No.108595153

Anonymous 04/13/26(Mon)01:41:32 No.108595153

>>108595075
Being rich enough to bribe a doctor to give you TRT helps.

Anonymous
04/13/26(Mon)01:43:28 No.108595160

Anonymous 04/13/26(Mon)01:43:28 No.108595160

>>108595127
If you start doing this, make sure you add `"cache_prompt": False,` to your request or previous prompts will change the logits.

For Gemma-4 31b base, cock is the 27th likely (with n_probs=40)

Anonymous
04/13/26(Mon)01:44:33 No.108595164

Anonymous 04/13/26(Mon)01:44:33 No.108595164

>>108595135
>onnxruntime was a LOT faster
That says absolutely nothing. Where are the t/s and cpu name, you faggot

Anonymous
04/13/26(Mon)01:45:06 No.108595166

Anonymous 04/13/26(Mon)01:45:06 No.108595166

I’m finding pretty great results using the new minimal on my ewaste SP3 256GB system.
Q8 juuuust fits and I get 10t/s gen rates. Testing intelligence is promising so far

Anonymous
04/13/26(Mon)01:46:11 No.108595170

Anonymous 04/13/26(Mon)01:46:11 No.108595170

Can Gemma 4 understand images?

Anonymous
04/13/26(Mon)01:46:44 No.108595171

Anonymous 04/13/26(Mon)01:46:44 No.108595171

>>108595166
>Q8 juuuust fits and I get 10t/s gen rates.
Q8 of what?

Anonymous
04/13/26(Mon)01:46:47 No.108595172

Anonymous 04/13/26(Mon)01:46:47 No.108595172

>>108595170
yes

Anonymous
04/13/26(Mon)01:47:11 No.108595174

Anonymous 04/13/26(Mon)01:47:11 No.108595174

>>108595170
no

Anonymous
04/13/26(Mon)01:47:56 No.108595176

Anonymous 04/13/26(Mon)01:47:56 No.108595176

>>108595170
Maybe

Anonymous
04/13/26(Mon)01:48:09 No.108595177

Anonymous 04/13/26(Mon)01:48:09 No.108595177

>>108595164
I wasn't running a LLM, it was a TTS. On my ryzen 7 3800x I could get an RTFx of 3.1 on onnx runtime and on ggml I could get an RTFx of about 2.2.

Anonymous
04/13/26(Mon)01:48:10 No.108595178

Anonymous 04/13/26(Mon)01:48:10 No.108595178

>>108595135
Very interesting, Chatterbox GGUF is unusable on my system but ONNX is almost real-time. Can you set LLM sampler parameters with onnxruntime?

Anonymous
04/13/26(Mon)01:50:58 No.108595193

Anonymous 04/13/26(Mon)01:50:58 No.108595193

>>108595177
See, that's easier to understand and legit interesting to know.

Anonymous
04/13/26(Mon)01:51:17 No.108595195

Anonymous 04/13/26(Mon)01:51:17 No.108595195

>>108593463
What's with the Gemma 4 psyops? Anything from Google couldn't be that good.

Anonymous
04/13/26(Mon)01:51:54 No.108595198

Anonymous 04/13/26(Mon)01:51:54 No.108595198

>>108595178
With ggml, I don't think so. Well you can but you'd have to write custom tensor code yourself, which can be useful if you're dealing with convolutional architectures. Anyways, if you build with llama.cpp to run ggufs you can, obviously. I did that at one point to get KV cache quantization working to minimize VRAM usage when I was doing GPU inferencing.

Anonymous
04/13/26(Mon)01:52:35 No.108595199

Anonymous 04/13/26(Mon)01:52:35 No.108595199

>>108595171
Minimax 2.7
Autocorrect fucked up my shit

Anonymous
04/13/26(Mon)01:53:45 No.108595203

Anonymous 04/13/26(Mon)01:53:45 No.108595203

is it worth trying to at-home abliterate, or is that going to be too difficult to do as a personal project?

Anonymous
04/13/26(Mon)01:54:30 No.108595206

Anonymous 04/13/26(Mon)01:54:30 No.108595206

>>108595178
Oh shit I totally read your post wrong. With onnx runtime you largely have to implement sampling parameters yourself but it's really trivial/easy to implement. Brain fart.

Anonymous
04/13/26(Mon)01:56:15 No.108595218

Anonymous 04/13/26(Mon)01:56:15 No.108595218

>>108595160
wait, really?
Are you saying if I had some prefix conversation, prompt it with A1 to see what probs come up in the reply, then roll back to the prefix and run some slight variation A2 to see what happens. that A2 will be influenced by having run A1 first if caching is on and i don't re-process everything from scratch?

Anonymous
04/13/26(Mon)01:59:48 No.108595230

Anonymous 04/13/26(Mon)01:59:48 No.108595230

I think part of the issue is that ggml doesn't utilize the L2 cache as well or do register packing as well as onnxruntime. Also SIMD and AVX support for conv architectures isn't as good, which kind of makes sense since llama.cpp doesn't support conv architectures at all by design. Very annoying.

Anonymous
04/13/26(Mon)02:00:04 No.108595231

Anonymous 04/13/26(Mon)02:00:04 No.108595231

>Exllamav3 uses twice the vram for gemma4 context than llamacpp and is slower
Whats the fuckin point of it then, turboderp?

Anonymous
04/13/26(Mon)02:03:14 No.108595252

Anonymous 04/13/26(Mon)02:03:14 No.108595252

>hdd with day 0 gemma weights started making a clicking sound periodically and lags for ~5secs whenever I create a new file
am I fucked?

Anonymous
04/13/26(Mon)02:03:28 No.108595254

Anonymous 04/13/26(Mon)02:03:28 No.108595254

>>108595231
Sure as fuck didn't use to be slower. At least not for other models.

Anonymous
04/13/26(Mon)02:04:35 No.108595258

Anonymous 04/13/26(Mon)02:04:35 No.108595258

>>108595252
>still using a hdd
You're beyond fucked, fren.

Anonymous
04/13/26(Mon)02:05:57 No.108595262

Anonymous 04/13/26(Mon)02:05:57 No.108595262

>>108595252
>hdd
kek

Anonymous
04/13/26(Mon)02:07:43 No.108595269

Anonymous 04/13/26(Mon)02:07:43 No.108595269

>>108595254
That's the source of my exasperation. I could live with it being vram hungry if it were at least faster, but it's not.
In fact it's worse than that, because I was using a 6bpw exl3 (The largest he published) while I've been using a q8 gguf.
Really glad I didn't sit through making my own quant just to discover this.

Anonymous
04/13/26(Mon)02:11:02 No.108595281

Anonymous 04/13/26(Mon)02:11:02 No.108595281

>>108595252
It has a -1 day dead man's switch. Handle with care.

Anonymous
04/13/26(Mon)02:13:11 No.108595290

Anonymous 04/13/26(Mon)02:13:11 No.108595290

>>108595281
I have the day 0 weights with the backdoor removed, will share a link after the bake.

Anonymous
04/13/26(Mon)02:13:28 No.108595292

Anonymous 04/13/26(Mon)02:13:28 No.108595292

>not migrating your day 0 gemma to NAT
>willingly subjecting yourself to coil whine
this is /g/?

Anonymous
04/13/26(Mon)02:14:56 No.108595297

Anonymous 04/13/26(Mon)02:14:56 No.108595297

>>108595252
Anon, you do NOT put model weights on a HDD unless you have enough swap on a SSD to load the whole thing into the page cache and keep it there.

Anonymous
04/13/26(Mon)02:23:49 No.108595333

Anonymous 04/13/26(Mon)02:23:49 No.108595333

>>108594651
but can the engram be updated in real time by the model

Anonymous
04/13/26(Mon)02:24:06 No.108595335

Anonymous 04/13/26(Mon)02:24:06 No.108595335

File: Screenshot_20260413_152028.png (2.87 MB, 2596x1273)

2.87 MB PNG

>>108594700
Not so good news unfortunately anon.
Unfortunately the positioning of the boxes is messed up and it misses stuff to translate.
But it KINDA can do the job. This was q4_xl since you requested q4.
I would instead go 26b even if its on cpu only, and no reasoning. Since its moe its fast enough if you havev a bit patience.

All that being said...I do think its seriously impressive that a 4b model can coherently translate and position at this level to be honest.

Anonymous
04/13/26(Mon)02:29:36 No.108595355

Anonymous 04/13/26(Mon)02:29:36 No.108595355

Are macs the only decent low power option for an AI server?

Anonymous
04/13/26(Mon)02:30:22 No.108595357

Anonymous 04/13/26(Mon)02:30:22 No.108595357

>spend the last week tinkering with llamacpp and koboldcpp as backends for sillytavern to use gemma 4 31b and its reasoning
>literally never works as intended

What the actual fuck is going on with this model. Might be a skill issue, but reasoning has never worked properly. It either never reasons, or it reasons but refuses to actually answer after reasoning, or it reasons and answers but never reasons in subsequent answers, or it reasons and answers but the reasoning is included in the think block (so it likely answers as part of its reasoning)

Also, text completion bizarrely never works with reasoning, and chat completion is severely gimped (can't use a system prompt at all or it shits the bed and refuses to reason)

Regardless of whether I use updated quants or not, or whether I use the latest llamacpp/koboldcpp builds, or whether I use their recommended settings or presets from people who claim to be enjoying reasoning, it has literally never worked as advertised.

I'm convinced at this point that gemma-4 reasoning is an inside joke or something.

Please help, or tell me how you managed to use reasoning with the 31b model.

>inb4 skill issue

It absolutely is a skill issue, I need help with it.

Anonymous
04/13/26(Mon)02:31:33 No.108595365

Anonymous 04/13/26(Mon)02:31:33 No.108595365

>>108595355
i bought some asus ascent gx10 boxes and they're arm low power devices
dgx spark should be similar

Anonymous
04/13/26(Mon)02:32:44 No.108595370

Anonymous 04/13/26(Mon)02:32:44 No.108595370

talking with gemma-chan from my phone in bed while drunk
cozy
phone posting niceu hehe

Anonymous
04/13/26(Mon)02:35:52 No.108595382

Anonymous 04/13/26(Mon)02:35:52 No.108595382

>>108595370
Are you using the white man's gemma with vision, audio, and non-english text stripped out for maximum quality and ram efficiency?

Anonymous
04/13/26(Mon)02:36:48 No.108595387

Anonymous 04/13/26(Mon)02:36:48 No.108595387

>>108595357
Do you set --temp 1.0 --top-p 0.95 --top-k 64 --min-p 0.0
And the jinja file? --chat-template-file '/chat_template_gemma4.jinja'
https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja#L89

I can use thinking in jank vibe solutions and sillytavern as well.
The preset for sillytavern is the following, i think, cant export it right now. Maybe other anons can correct me.:
{
"instruct": {
"input_sequence": "<|turn>user\n",
"output_sequence": "<|turn>model\n",
"first_output_sequence": "",
"last_output_sequence": "<|turn>model\n<|channel>thought\n<channel|>",
"stop_sequence": "<turn|>",
"wrap": false,
"macro": true,
"activation_regex": "gemma-4",
"output_suffix": "<turn|>\n",
"input_suffix": "<turn|>\n",
"system_sequence": "<|turn>system\n",
"system_suffix": "<turn|>\n",
"user_alignment_message": "",
"skip_examples": false,
"system_same_as_user": false,
"last_system_sequence": "",
"first_input_sequence": "",
"last_input_sequence": "",
"names_behavior": "none",
"sequences_as_stop_strings": true,
"story_string_prefix": "<|turn>system\n",
"story_string_suffix": "<turn|>\n",
"name": "Gemma 4"
}
}

Anonymous
04/13/26(Mon)02:36:54 No.108595389

Anonymous 04/13/26(Mon)02:36:54 No.108595389

>>108595357
Load gemma-4-it-q8_0, --n-gpu-layers 999, --ctx-size 131072, --reasoning on, open localhost:8080 in browser
Shrimple as.

Anonymous
04/13/26(Mon)02:37:04 No.108595391

Anonymous 04/13/26(Mon)02:37:04 No.108595391

>>108595382
i am >>108593649
>>108594208
so no i'm just using gemma it quantized via firefox on my phoneu

Anonymous
04/13/26(Mon)02:37:32 No.108595394

Anonymous 04/13/26(Mon)02:37:32 No.108595394

>>108595357
>Please help, or tell me how you managed to use reasoning with the 31b model.
Does it reason with every response when you just use the built in llama webui at http://127.0.0.1:8080/ ?

Do you have 'request model reasoning' ticked in the SillyTavern sliders menu?

>But Im using text compl-
No. Use chat completion. You're just inviting more variables for you to fuck it up. It does in fact work with a system prompt, you're just doing it wrong.

Anonymous
04/13/26(Mon)02:55:49 No.108595444

Anonymous 04/13/26(Mon)02:55:49 No.108595444

Good webui frontend when

Anonymous
04/13/26(Mon)02:57:37 No.108595449

Anonymous 04/13/26(Mon)02:57:37 No.108595449

>>108590295
What was wrong with it? Did you even try it? Original 31b and templates worked perfectly since day 1.

Anonymous
04/13/26(Mon)03:01:10 No.108595458

Anonymous 04/13/26(Mon)03:01:10 No.108595458

>>108595444
I tried building one and it's surprisingly difficult. LLMs just bring in so many edge cases that it makes debugging difficult. Something is always wrong and the fix is never simple, especially with real-time markdown+latex+syntax highlighting parsing and rendering. I've basically shelved the entire project for the time being.

Anonymous
04/13/26(Mon)03:05:46 No.108595472

Anonymous 04/13/26(Mon)03:05:46 No.108595472

>>108595365
What kind of speeds can I expect from Gemma 31B on one of those?

Anonymous
04/13/26(Mon)03:08:33 No.108595480

Anonymous 04/13/26(Mon)03:08:33 No.108595480

>>108595357
Add <|think|>\n below <|turn>system in Story string prefix and it will reason. Remove it and it wont.
>>108595394
Kill yourself or go back to aicg. Or both.

Anonymous
04/13/26(Mon)03:09:39 No.108595486

Anonymous 04/13/26(Mon)03:09:39 No.108595486

Thanks for responding.

>>108595387
Yes, problem is that when using chat-completions I can't set the context template (and other options, like system prompts, are not usable). When using text-completion I use the default gemma-4 one, which seems to match up with the fields in your preset. Temperature, top_p and top_k are the same, min_p is usually 0.025. I just tested it with 0 and it still refuses to reason.

>>108595389
>>108595394
Thanks for pointing me to llamacpp's webui. I tried it, and it does reason as intended, so it is likely an issue with my sillytavern settings.

In ST, I did have request model reasoning ticked in chat-completions mode, but it only answers within the reasoning itself, so unless I expand the reasoning block, there is no answer. Within the reasoning block, formatting (like speech or asterisks) gets gimped so its just a wall of ugly text.

Are you using system prompts with chat completion? I've only ever used text completion with kobold as the backend, so my newb setup uses chat completions with a custom openAI api (either kobold or llamacpp). Most of the advanced formatting tab is entirely grayed out and unusable.

Separately, while I have you here, why is llamacpp so much slower than kobold with the same settings? I did extract the cuda dlls, and it is definitely fully loaded into the gpu, but llamacpp is roughly half the speed on an rtx 3090 vs when using koboldcpp. Do I need to enable swa in a specific manner with llamacpp?

Anonymous
04/13/26(Mon)03:12:09 No.108595493

Anonymous 04/13/26(Mon)03:12:09 No.108595493

Voice prompting gemma when?

Anonymous
04/13/26(Mon)03:12:38 No.108595497

Anonymous 04/13/26(Mon)03:12:38 No.108595497

Do people really have issues with Gemma responding to thoughts? I always put mine in () and unlike qwen and mistral she never responds to them.

Anonymous
04/13/26(Mon)03:12:44 No.108595498

Anonymous 04/13/26(Mon)03:12:44 No.108595498

File: calm.gif (2.98 MB, 720x480)

2.98 MB GIF

>>108595444
>>108595458
In any case, I'll just drop my design specs since I think they're pretty good even though I'm struggling with building it.

The webui should closely follow the look, feel, and functionality of the default llama.cpp webui with some added core features:
1. Conversation and settings persistence. Either json files or a single, portable sqlite file. Useful for individual use on a LAN.
2. Character card support, or at least features that effectively amount to character card support, such as "Assistant First Message" functionality so that you can add in exposition for a RP scenario without adding it to your system prompt which would get unduly preserved and break things.
3. Context window sliding and automatic summarization/compaction.
4. Enhanced message editing controls.

That's about it, really.

Anonymous
04/13/26(Mon)03:12:55 No.108595499

Anonymous 04/13/26(Mon)03:12:55 No.108595499

File: 123.jpg (39 KB, 799x186)

39 KB JPG

how the hell can they PROVE that something was generate by their model, locally tho

Anonymous
04/13/26(Mon)03:16:04 No.108595511

Anonymous 04/13/26(Mon)03:16:04 No.108595511

File: box_adjusted.jpg (259 KB, 1216x1413)

259 KB JPG

>>108590737
I appreciate all the advice.

>>108588248
from this post my impression is that model operates on 1000x1000 grid, and that further adjustment to the actual image size is required.

in case of size 1216x832
only x had to be changed.

Anonymous
04/13/26(Mon)03:16:59 No.108595515

Anonymous 04/13/26(Mon)03:16:59 No.108595515

>>108595498
Does 90% of /g/ not know how to code? AI is best at shitting out frontends, and backends are piss easy to make if you aren't a brainlet.

Anonymous
04/13/26(Mon)03:17:52 No.108595520

Anonymous 04/13/26(Mon)03:17:52 No.108595520

File: file.png (30 KB, 374x260)

30 KB PNG

>>108595486

Anonymous
04/13/26(Mon)03:18:01 No.108595521

Anonymous 04/13/26(Mon)03:18:01 No.108595521

>>108595497
>never responds to them
Which is a good sigh proving that gemma understand this kind of formatting

You have to explain it in your system prompt

Anonymous
04/13/26(Mon)03:18:56 No.108595522

Anonymous 04/13/26(Mon)03:18:56 No.108595522

>>108595515
If it's so easy please save us retards. Also keeping the full codebase under 2k LOC is a hard requirement btw. That includes HTML, CSS, JS, and whatever lang you use for the server.

Anonymous
04/13/26(Mon)03:20:24 No.108595528

Anonymous 04/13/26(Mon)03:20:24 No.108595528

>>108595522
LoC constraints? Alright
>CSS
Shit...

Anonymous
04/13/26(Mon)03:22:23 No.108595538

Anonymous 04/13/26(Mon)03:22:23 No.108595538

File: file.png (171 KB, 733x862)

171 KB PNG

>>108595486
it just werks

Anonymous
04/13/26(Mon)03:25:42 No.108595545

Anonymous 04/13/26(Mon)03:25:42 No.108595545

>>108595511

See >>108593144
No need to resize the image itself
Better avoid weird side ratios

Prompt:

bounding box for the apple
Bounding box everything

Anonymous
04/13/26(Mon)03:28:11 No.108595552

Anonymous 04/13/26(Mon)03:28:11 No.108595552

>>108595538
I was about to make fun of you for the "avoid AI slop phrases" part but turns out gemma seems to be able to define it so it's probably a valid prompt

Anonymous
04/13/26(Mon)03:31:49 No.108595559

Anonymous 04/13/26(Mon)03:31:49 No.108595559

>>108595521
>You have to explain it in your system prompt
I don't have anything rp related in my prompt. Gemma seems to know by default that it means something is a thought.

Anonymous
04/13/26(Mon)03:31:58 No.108595560

Anonymous 04/13/26(Mon)03:31:58 No.108595560

>>108595498
hard to believe those aren't trivial to find/accomplish aside from #3. my lazy homebrew that's basically notepad with a hotkey to do a little conversion and post to llama-server manages to accomplish the other 3 by dint of being a basic ass text editor.

Anonymous
04/13/26(Mon)03:32:58 No.108595567

Anonymous 04/13/26(Mon)03:32:58 No.108595567

>>108595444
Define "good". The default frontend from llama.cpp works fine for me.
>>108595458
Text parsing is always annoying

Anonymous
04/13/26(Mon)03:34:36 No.108595574

Anonymous 04/13/26(Mon)03:34:36 No.108595574

>>108595567
I like the llama.cpp frontend but chats and settings being stored in browser storage is a deal breaker for me.

Anonymous
04/13/26(Mon)03:36:04 No.108595579

Anonymous 04/13/26(Mon)03:36:04 No.108595579

>>108595538
How well does that prompt work?

Anonymous
04/13/26(Mon)03:41:08 No.108595595

Anonymous 04/13/26(Mon)03:41:08 No.108595595

>>108594744
Gemma 4's thinking process is strongly baked into the model and it's difficult to make it work substantially differently than default just with prompting. A while back I wanted to make it think in a different language than English, but that doesn't seem to be possible except for brief snippets.

Anonymous
04/13/26(Mon)03:44:19 No.108595614

Anonymous 04/13/26(Mon)03:44:19 No.108595614

File: file.png (55 KB, 1036x259)

55 KB PNG

>>108595579
Should have blocked that out it's just for testing. The template gets thinking working with text completion.
>>108595595
I'm getting significant diversion from the default "Thinking Process: 1. 2. 3."

Anonymous
04/13/26(Mon)03:44:39 No.108595618

Anonymous 04/13/26(Mon)03:44:39 No.108595618

>>108595574
fork and vibeshart a sqlite mechanism on top of it :)

Anonymous
04/13/26(Mon)03:45:38 No.108595621

Anonymous 04/13/26(Mon)03:45:38 No.108595621

File: file.png (25 KB, 1029x126)

25 KB PNG

>>108595595
Extremely terse. Barely there.

Anonymous
04/13/26(Mon)03:45:48 No.108595622

Anonymous 04/13/26(Mon)03:45:48 No.108595622

>>108594744
wtb Gemini wife. C'mon Google, release the weights. Local capable Deepseek R1 thinking, 1t param world knowledge, no safetycucking, with Gemma 4-like prose and vision would be my dream model.

Anonymous
04/13/26(Mon)03:46:03 No.108595624

Anonymous 04/13/26(Mon)03:46:03 No.108595624

>>108595618
It has evolved into a piece of shit sveltkit webapp so the whole thing is basically bloated junk now. It's not that simple.

Anonymous
04/13/26(Mon)03:47:07 No.108595628

Anonymous 04/13/26(Mon)03:47:07 No.108595628

>>108595621
Now make it think in Japanese, from start to finish and consistently.

Anonymous
04/13/26(Mon)03:47:43 No.108595633

Anonymous 04/13/26(Mon)03:47:43 No.108595633

is gemma 4 skipping reasoning for simple tasks a designed behaviour?
if that is the case tb h it is sort of a behaviour i expect from proprietary paypig models
got so much used to any model thinking 2000+ tokens for a trivial task
if not, what a buggy mess still

Anonymous
04/13/26(Mon)03:48:12 No.108595635

Anonymous 04/13/26(Mon)03:48:12 No.108595635

>>108595624
>not treating everything as a blackbox and just instruct your llm to vibeshart on top of it.
bro it's all input/output why do u care whats in the middle LOL!?

Anonymous
04/13/26(Mon)03:48:25 No.108595638

Anonymous 04/13/26(Mon)03:48:25 No.108595638

>>108595574
Wouldn't simply running it outside of a private window be enough? Adding db support seems like a meaningless task unless you plan to do something with that info.

Anonymous
04/13/26(Mon)03:49:02 No.108595639

Anonymous 04/13/26(Mon)03:49:02 No.108595639

>>108595635
Sorry for being White I guess...

Anonymous
04/13/26(Mon)03:49:09 No.108595641

Anonymous 04/13/26(Mon)03:49:09 No.108595641

>>108595638
usecase is accessing the WEBUI from different devices and wanting to share settings/history
retard

Anonymous
04/13/26(Mon)03:49:19 No.108595642

Anonymous 04/13/26(Mon)03:49:19 No.108595642

>>108593463
i'm about to buy 6gpu
do you guys think i should go with the b70 pro or the r9700 ?

Anonymous
04/13/26(Mon)03:49:56 No.108595647

Anonymous 04/13/26(Mon)03:49:56 No.108595647

>>108595642
go with rtx 6000 pro

Anonymous
04/13/26(Mon)03:51:42 No.108595653

Anonymous 04/13/26(Mon)03:51:42 No.108595653

>>108595647
shitty $/vram in comparison, i'm not paying 10k to get less vram than i could for 3k

Anonymous
04/13/26(Mon)03:52:28 No.108595654

Anonymous 04/13/26(Mon)03:52:28 No.108595654

>>108595653
can you even buy the b70pro?

Anonymous
04/13/26(Mon)03:53:36 No.108595658

Anonymous 04/13/26(Mon)03:53:36 No.108595658

>>108595635
Posts like this are why people bully browns in this general.

Anonymous
04/13/26(Mon)03:56:08 No.108595662

Anonymous 04/13/26(Mon)03:56:08 No.108595662

>>108595654
yup on digitec.ch, i live in switzerland.

Anonymous
04/13/26(Mon)03:56:12 No.108595663

Anonymous 04/13/26(Mon)03:56:12 No.108595663

File: file.png (31 KB, 708x99)

31 KB PNG

>>108595628
I like a challenge.

Anonymous
04/13/26(Mon)03:57:52 No.108595665

Anonymous 04/13/26(Mon)03:57:52 No.108595665

>>108595642
>b70 pro
>608 GB/s
lmao, you're better off cpumaxxing

Anonymous
04/13/26(Mon)03:58:05 No.108595666

Anonymous 04/13/26(Mon)03:58:05 No.108595666

>>108595633
I have yet to see a single response from gemma 4 31b that did not have reasoning.
You're either using a damaged quant or you have something incorrectly configured.

Anonymous
04/13/26(Mon)03:58:52 No.108595671

Anonymous 04/13/26(Mon)03:58:52 No.108595671

>>108595666
checked, devil
well the new template fixed it

Anonymous
04/13/26(Mon)03:59:01 No.108595672

Anonymous 04/13/26(Mon)03:59:01 No.108595672

>>108595665
>what even is tensor parallelism.
i'm planning on buying 6 of them.

Anonymous
04/13/26(Mon)03:59:02 No.108595673

Anonymous 04/13/26(Mon)03:59:02 No.108595673

File: file.png (251 KB, 1055x844)

251 KB PNG

>>108595663
>>108595639
Did you ask gemma nicely?

Anonymous
04/13/26(Mon)03:59:31 No.108595676

Anonymous 04/13/26(Mon)03:59:31 No.108595676

>>108595666
I had to put an enabled reasoning line at the top of the jinja for mine to work, Satan.

Anonymous
04/13/26(Mon)03:59:39 No.108595678

Anonymous 04/13/26(Mon)03:59:39 No.108595678

>>108595665
>>108595672
also with dflash it'd be more than fast enough anyway.

Anonymous
04/13/26(Mon)04:00:09 No.108595681

Anonymous 04/13/26(Mon)04:00:09 No.108595681

>>108595672
>>108595678
Don't say I didn't warn you.

Anonymous
04/13/26(Mon)04:00:56 No.108595687

Anonymous 04/13/26(Mon)04:00:56 No.108595687

>>108595681
dude you can see the bench online.
i have a 4090, two r9700 would already beat the bandwidth of my 4090.
with spec decoding you could get something very comfy.

Anonymous
04/13/26(Mon)04:05:13 No.108595700

Anonymous 04/13/26(Mon)04:05:13 No.108595700

>>108595673
You know you're cheating.

Anonymous
04/13/26(Mon)04:06:57 No.108595706

Anonymous 04/13/26(Mon)04:06:57 No.108595706

>>108595365
>>108595472
i'm also curious how well the ~$3000 slop boxes like these or strix halo do

Anonymous
04/13/26(Mon)04:08:30 No.108595712

Anonymous 04/13/26(Mon)04:08:30 No.108595712

>>108595673
If you ask it nicely, it will even think in emoji, but not in non-English human languages. I tried.

Anonymous
04/13/26(Mon)04:08:32 No.108595713

Anonymous 04/13/26(Mon)04:08:32 No.108595713

>>108594315
Nemo does not have this problem and I'm not joking.
GLM also does this and the only other model I know that doesn't is Deepseek.

Anonymous
04/13/26(Mon)04:09:14 No.108595716

Anonymous 04/13/26(Mon)04:09:14 No.108595716

File: gc.mp4 (186 KB, 542x446)

186 KB MP4

>>108595700

Anonymous
04/13/26(Mon)04:11:05 No.108595724

Anonymous 04/13/26(Mon)04:11:05 No.108595724

>>108595712
>>108595716
Skill issue?

Anonymous
04/13/26(Mon)04:12:12 No.108595727

Anonymous 04/13/26(Mon)04:12:12 No.108595727

>>108595673
>という
slop

Anonymous
04/13/26(Mon)04:12:56 No.108595730

Anonymous 04/13/26(Mon)04:12:56 No.108595730

>>108595716
Proompt???

Anonymous
04/13/26(Mon)04:14:16 No.108595737

Anonymous 04/13/26(Mon)04:14:16 No.108595737

>>108595716
Anon I think you ran out of VRAM while encoding that

Anonymous
04/13/26(Mon)04:15:42 No.108595744

Anonymous 04/13/26(Mon)04:15:42 No.108595744

>>108595727
explain?

Anonymous
04/13/26(Mon)04:15:45 No.108595745

Anonymous 04/13/26(Mon)04:15:45 No.108595745

>>108595713
>Nemo does not have this problem and I'm not joking.
Unbeatable to this day. How does Nemo do it...
>GLM also does this
4.6 and 4.7 don't when prompted not to.

Anonymous
04/13/26(Mon)04:16:50 No.108595753

Anonymous 04/13/26(Mon)04:16:50 No.108595753

>>108595737
I run -ngl 0

Anonymous
04/13/26(Mon)04:16:52 No.108595754

Anonymous 04/13/26(Mon)04:16:52 No.108595754

File: china.png (134 KB, 1612x392)

134 KB PNG

>chinese hours
>
Like cockwork

Anonymous
04/13/26(Mon)04:17:01 No.108595755

Anonymous 04/13/26(Mon)04:17:01 No.108595755

>>108595724
Post prompt or get lost.
Prefilling or retaining the last thinking trace (when you're not supposed to) doesn't count.

Anonymous
04/13/26(Mon)04:17:56 No.108595760

Anonymous 04/13/26(Mon)04:17:56 No.108595760

>>108595744
Japanese equivalent of using therefore randomly everywhere

Anonymous
04/13/26(Mon)04:18:13 No.108595762

Anonymous 04/13/26(Mon)04:18:13 No.108595762

>>108595754
fuck off back to plebbit retard

Anonymous
04/13/26(Mon)04:18:51 No.108595763

Anonymous 04/13/26(Mon)04:18:51 No.108595763

anyone tried step3 vl 10b?

Anonymous
04/13/26(Mon)04:19:22 No.108595765

Anonymous 04/13/26(Mon)04:19:22 No.108595765

>>108595753
The mp4, not the model you dumb-dumb
It's viewable but half-corrupted

Anonymous
04/13/26(Mon)04:23:04 No.108595776

Anonymous 04/13/26(Mon)04:23:04 No.108595776

>>108595727
Schizo

Anonymous
04/13/26(Mon)04:29:23 No.108595802

Anonymous 04/13/26(Mon)04:29:23 No.108595802

>>108595760
Retard

Anonymous
04/13/26(Mon)04:29:41 No.108595806

Anonymous 04/13/26(Mon)04:29:41 No.108595806

>>108595755
Oh I see, prompting outside the user message sequence is cheating. It's strange, because when I send that to the server it's all called a "prompt"

Anonymous
04/13/26(Mon)04:31:43 No.108595810

Anonymous 04/13/26(Mon)04:31:43 No.108595810

To anyone who uses GLM 5+
>integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity
How much is that deployment reduced compared to 4? I can only barely run 4.6 at IQ2_S by the skin of my teeth, with a scant 8K context that almost touches my last GB of shared RAM. With 5.0 expanding 355B->744B and 32A->40A, it should be impossible for me to fit, unless that DSA does something substantial.

Anonymous
04/13/26(Mon)04:33:57 No.108595814

Anonymous 04/13/26(Mon)04:33:57 No.108595814

>>108595806
people have very silly and arbitrary rules about how to honorably extract text from the robot, please understand.

Anonymous
04/13/26(Mon)04:35:11 No.108595817

Anonymous 04/13/26(Mon)04:35:11 No.108595817

>>108595806
You can get any model to write anything you want if you put words into their "mouth".
Gemma 4's thinking has some degree of steerability via system prompt (as Google's documentation also highlights), but only that won't work for making it think in a different human language than English, for the same reason that it won't think in-character like other models.

Anonymous
04/13/26(Mon)04:35:52 No.108595821

Anonymous 04/13/26(Mon)04:35:52 No.108595821

>>108595810
It only affect attention, right? If so, and if you're that constrained, it'll probably not help much. Not enough to let you run it.
Out of curiosity, how much do you spend on kvcache for 8k context?

Anonymous
04/13/26(Mon)04:36:48 No.108595823

Anonymous 04/13/26(Mon)04:36:48 No.108595823

>>108595817
And yet people still fail to get models to write what they want, curious.

Anonymous
04/13/26(Mon)04:37:42 No.108595830

Anonymous 04/13/26(Mon)04:37:42 No.108595830

>>108595823
It's the same type of anon that *NEEDS* huehueks abliterations.

Anonymous
04/13/26(Mon)04:40:38 No.108595839

Anonymous 04/13/26(Mon)04:40:38 No.108595839

File: file.mp4 (3.66 MB, 1920x996)

3.66 MB MP4

>>108590009
>ngram-mod
Forgot about this, thanks. https://github.com/ggml-org/llama.cpp/pull/19164
spec-type = ngram-mod
spec-ngram-size-n = 24
draft-max = 64
Woooooshhhh~

Anonymous
04/13/26(Mon)04:42:15 No.108595845

Anonymous 04/13/26(Mon)04:42:15 No.108595845

>>108595839
loot at it goooooo

Anonymous
04/13/26(Mon)04:42:19 No.108595846

Anonymous 04/13/26(Mon)04:42:19 No.108595846

>>108595839
>26ba3
the fuck are you using

Anonymous
04/13/26(Mon)04:43:22 No.108595850

Anonymous 04/13/26(Mon)04:43:22 No.108595850

>>108595846
ur mom lol

Anonymous
04/13/26(Mon)04:43:29 No.108595851

Anonymous 04/13/26(Mon)04:43:29 No.108595851

>>108595846
Super secret model that you CANNOT and WILL NOT have.

Anonymous
04/13/26(Mon)04:44:17 No.108595856

Anonymous 04/13/26(Mon)04:44:17 No.108595856

>>108595830
If you can't see the difference between being unable to write clear instructions for the model to execute (most abliteration users) and having to resort to prefilling the model's response to make it act as desired, I don't think there can be a discussion here.

Anonymous
04/13/26(Mon)04:47:49 No.108595871

Anonymous 04/13/26(Mon)04:47:49 No.108595871

File: 1761074106219769.jpg (558 KB, 1600x2400)

558 KB JPG

After extensive testing of 31b Q4_K_L and 26b Q8, I can confidently say that 26b is as good, if not better for RP (erotic or no) than 31b, and should be the go-to choice for 24GB GODS.

Anonymous
04/13/26(Mon)04:48:38 No.108595875

Anonymous 04/13/26(Mon)04:48:38 No.108595875

>>108595871
26B is retarded for coding though.

Anonymous
04/13/26(Mon)04:49:42 No.108595878

Anonymous 04/13/26(Mon)04:49:42 No.108595878

>>108595875
use the 31b for coding then

Anonymous
04/13/26(Mon)04:50:46 No.108595885

Anonymous 04/13/26(Mon)04:50:46 No.108595885

File: 1765870302036763.jpg (198 KB, 984x1004)

198 KB JPG

>>108595875
if you're a vibenigger then just go all the way and be an API paypig. If you don't live with your parents then you'll be saving money just from the reduced energy costs alone, not even taking into account the cost of hardware. Local is for coom.

Anonymous
04/13/26(Mon)04:50:51 No.108595887

Anonymous 04/13/26(Mon)04:50:51 No.108595887

File: Capture.png (82 KB, 1221x947)

82 KB PNG

>>108595821
I am not qualified to answer questions about anything, but I think the kvcache is 3GB, based on this print in load up. But in practice, I have 1GB VRAM and 12 RAM memory still available after loading, but it gets used up as context fills. I can load higher than 8K context, but I'll OOM once it actually gets used in generation, even with just 10K.

Anonymous
04/13/26(Mon)04:50:52 No.108595888

Anonymous 04/13/26(Mon)04:50:52 No.108595888

>>108595552
This. Its the first model were that actually works.
That "prompt issue" pony faggot was a retard.
But gemma likes to return to slop if you are not careful, gotta nudge it sometimes in the right direction.

Anonymous
04/13/26(Mon)04:51:10 No.108595891

Anonymous 04/13/26(Mon)04:51:10 No.108595891

>>108595830
Silliest part is it's just the base model way of prompting by giving the model exemplars. And doing it casually mid convo has a natural deslopping effect. But it's very dishonorable and we musn't do it.

Anonymous
04/13/26(Mon)04:51:22 No.108595893

Anonymous 04/13/26(Mon)04:51:22 No.108595893

>>108595871
Isn't 26b dumber?

Anonymous
04/13/26(Mon)04:51:24 No.108595894

Anonymous 04/13/26(Mon)04:51:24 No.108595894

>>108595856
Everything is a prompt. That includes the model's replies. It's text for me to change to my liking and be pleasantly surprised when the rest continues as expected on its own. If you edit your system prompt and use anything other than what google provided as the default, you'd be cheating. If you ever changed a single token from the model's reply, you'd be cheating. If you're banning strings or tokens, you'd be cheating. If you're not using top-k 1, you'd be cheating.
Do you cheat, anon?

Anonymous
04/13/26(Mon)04:53:49 No.108595903

Anonymous 04/13/26(Mon)04:53:49 No.108595903

>>108595885
no because i don't use it to do the whole thing for me but to do minor edit and transforms.
ie take this json, make a struct for it kind of things.
make a function that takes this data and transform it in that way.
that kind of thing.
it's also pretty neat for webshit, all webshit automated is time i can spend on better things.
>you'll be saving money
it has never been about money, i'm the guy buying 6gpu.
>Local is for coom
i'm not interested in that i have a wife.
llm's are nothing but tools to me.

Anonymous
04/13/26(Mon)04:53:50 No.108595904

Anonymous 04/13/26(Mon)04:53:50 No.108595904

>>108593543
I actually worked for the government, and all I can say is lmao

Anonymous
04/13/26(Mon)04:53:51 No.108595905

Anonymous 04/13/26(Mon)04:53:51 No.108595905

File: noprefill.mp4 (217 KB, 1104x810)

217 KB MP4

>>108595856
promplet

Anonymous
04/13/26(Mon)04:54:00 No.108595906

Anonymous 04/13/26(Mon)04:54:00 No.108595906

is IQ4_N_L better than Q4_K_M?

Anonymous
04/13/26(Mon)04:55:44 No.108595913

Anonymous 04/13/26(Mon)04:55:44 No.108595913

>>108595856
are you the guy from /ldg/ going on about how using anything besides "pure prompting" for image gen is cheating?

Anonymous
04/13/26(Mon)04:56:17 No.108595914

Anonymous 04/13/26(Mon)04:56:17 No.108595914

>>108595906
iq4_xs and iq4_nl are equivalent.
both worse than Q4_K_M.

Anonymous
04/13/26(Mon)04:56:35 No.108595916

Anonymous 04/13/26(Mon)04:56:35 No.108595916

File: 1750816291097769.png (651 KB, 1372x1952)

651 KB PNG

>>108595893
It almost certainly is, but even at ~40k context A/B testing it doesn't seem to manifest at all for creative writing tasks. To be fair, I some of this might be down to quantization, I think Gemma 4 may well suffer from quantization damage more than previous models. The 26B moe is so fast that even on 16GB you can run Q8 no problem, with 24GB 31B you can at best run maybe Q5_K_L with tensor offloading and lower context, and it will still likely be worse than Q8 26b.
If you have enough VRAM to run 31b at Q8 then you should keep using that, but I know quite a few anons are running single 24GB GPU systems.
>>108595903
If you don't care about money then you wouldn't be using ~30B models in the first place.

Anonymous
04/13/26(Mon)04:56:46 No.108595917

Anonymous 04/13/26(Mon)04:56:46 No.108595917

>>108595887
DSA is not properly implemented in llmao.cpp btw

Anonymous
04/13/26(Mon)04:56:59 No.108595918

Anonymous 04/13/26(Mon)04:56:59 No.108595918

>>108595887
Just the increase in model size will make the model alone about 100gb bigger at q2. I don't think it's gonna help you run it at all.

Anonymous
04/13/26(Mon)04:57:25 No.108595921

Anonymous 04/13/26(Mon)04:57:25 No.108595921

Gemma's already great at translation but I wonder if thinking in moon would make it even better

Anonymous
04/13/26(Mon)04:58:25 No.108595925

Anonymous 04/13/26(Mon)04:58:25 No.108595925

>>108595916
i never said i don't care about money, i said it's not about them oney, very different statments.
also yes, i'm not getting 6gpu to run a 30B model lmao.
but that's what i'm using whilst they are in the mail.

Anonymous
04/13/26(Mon)05:00:51 No.108595930

Anonymous 04/13/26(Mon)05:00:51 No.108595930

>>108595925
I really still don't understand your usecase. What local models are you using, that are better than flagship API models? If you're just coding and privacy isn't a concern, surely you value your time and would be better served using paid models to achieve your goals faster.

Anonymous
04/13/26(Mon)05:01:20 No.108595932

Anonymous 04/13/26(Mon)05:01:20 No.108595932

>>108595921
making it work bilingually adds a very nice flavor you can't get form english only gemma

Anonymous
04/13/26(Mon)05:03:39 No.108595938

Anonymous 04/13/26(Mon)05:03:39 No.108595938

>>108595930
>privacy isn't a concern
i never made that statement.
>surely you value your time and would be better served using paid models to achieve your goals faster
they are not only slower than local inference, they keep having disconnection issues, they are near unusable.
also some of the stuff i work with is sensitive and cannot be given to paid providers.

Anonymous
04/13/26(Mon)05:04:14 No.108595940

Anonymous 04/13/26(Mon)05:04:14 No.108595940

>>108595913
You're working "against the model's grain" if you're trying to make it do via prefilling what it can't do on its own with instructions alone. It might complete your task in one way or another, but performance will likely be degraded.

Anonymous
04/13/26(Mon)05:04:32 No.108595943

Anonymous 04/13/26(Mon)05:04:32 No.108595943

>>108595930
>>108595938
oh and there is the ideological reason to, yss sonnet is best and whatever, i don't want to give a cent to these jewish faggots.

Anonymous
04/13/26(Mon)05:06:08 No.108595946

Anonymous 04/13/26(Mon)05:06:08 No.108595946

>>108595932
I'll have to test when I get home

Anonymous
04/13/26(Mon)05:06:46 No.108595950

Anonymous 04/13/26(Mon)05:06:46 No.108595950

File: 1753234061469071.jpg (59 KB, 907x778)

59 KB JPG

>>108595938
>>108595943
I can see your point, but I still don't know why you would have bothered replying to my post in the first place when I'm clearly talking about RP with ~30b models, i.e. the official /lmg/ usecase.

Anonymous
04/13/26(Mon)05:07:41 No.108595955

Anonymous 04/13/26(Mon)05:07:41 No.108595955

>>108595916
I wish kobold could hot swap models to make testing less tedious

Anonymous
04/13/26(Mon)05:08:36 No.108595957

Anonymous 04/13/26(Mon)05:08:36 No.108595957

File: 1755884775865480.jpg (49 KB, 867x594)

49 KB JPG

Koboldcpp anons, I highly recommend to modify the image-max-tokens parameter sent to llama.cpp and compile the binary yourself for Gemma 4, by default it gives a budget of 280 tokens to process images, and you cannot change it with a flag.
Forcing it 1,120 makes descriptions so much more accurate.

Anonymous
04/13/26(Mon)05:08:57 No.108595960

Anonymous 04/13/26(Mon)05:08:57 No.108595960

>>108594587
I know it is a completely different tech, but aren't entropy systems supposed to be functionally be the same in capability, or have I completely missed the 'point', and the only thing entropy systems are good for are proper random number generation?

Anonymous
04/13/26(Mon)05:09:07 No.108595961

Anonymous 04/13/26(Mon)05:09:07 No.108595961

File: 1763427808286742.png (7 KB, 777x34)

7 KB PNG

>>108595839
BRUH

Anonymous
04/13/26(Mon)05:09:14 No.108595963

Anonymous 04/13/26(Mon)05:09:14 No.108595963

>>108595940
That's not the model's grain emdash it's slop. Prefills send entropy up it's spine to fill it with the overwhelming warmth of your human creativity.

Anonymous
04/13/26(Mon)05:11:51 No.108595974

Anonymous 04/13/26(Mon)05:11:51 No.108595974

Google 100% made Gemma to flex on China and win over the coomers.

Anonymous
04/13/26(Mon)05:12:03 No.108595976

Anonymous 04/13/26(Mon)05:12:03 No.108595976

File: file.png (148 KB, 948x585)

148 KB PNG

is this a game anyone else would be interested in

Anonymous
04/13/26(Mon)05:12:18 No.108595978

Anonymous 04/13/26(Mon)05:12:18 No.108595978

>>108595961
Right, annoying, even if it's not being used. Best to set up the server in config mode and naturally you'll have to reload all the model weights and reprocess context.

Anonymous
04/13/26(Mon)05:13:35 No.108595981

Anonymous 04/13/26(Mon)05:13:35 No.108595981

>>108595218
>wait, really?
Yes. Here's 4 cockbenches in a row with cache enabled (qwen3.5-112b model): https://pastebin.com/hwt4T9xb
And with cache set to false after restarting llama-server: https://pastebin.com/7vEui3jV
That's with F16 kv cache. It's worse for llama-3 and a lot worse with KV Cache at "lossless" Q8.
Also, not an issue with exllamav3 for some reason.

Anonymous
04/13/26(Mon)05:13:36 No.108595983

Anonymous 04/13/26(Mon)05:13:36 No.108595983

>>108595955
It's not too bad, I just save configs and drag/drop them onto the .exe. I think it does support hotswapping models if you use koboldAI.

Anonymous
04/13/26(Mon)05:14:02 No.108595985

Anonymous 04/13/26(Mon)05:14:02 No.108595985

>>108595916
I've been using Gemma 31B (q4) for RP AND as an assistant/general chat bot. It's the first time (as a vramlet) that I feel like it's viable over cloud models.

Anonymous
04/13/26(Mon)05:14:35 No.108595987

Anonymous 04/13/26(Mon)05:14:35 No.108595987

File: file.png (24 KB, 404x313)

24 KB PNG

>>108595961
>>108595978
Holy my brain is broken. goodbye.

Anonymous
04/13/26(Mon)05:15:26 No.108595989

Anonymous 04/13/26(Mon)05:15:26 No.108595989

>>108595976
I instantly switch off the second I see any character named any combination of: Elara, Voss, Seraphina, or Blackwood.
Because it means whoever put it out there didn't take ten seconds to filter out the top 2 slop names, or didn't know, neither of which bode well.

Anonymous
04/13/26(Mon)05:16:13 No.108595995

Anonymous 04/13/26(Mon)05:16:13 No.108595995

>>108595957
How?

Anonymous
04/13/26(Mon)05:17:01 No.108595999

Anonymous 04/13/26(Mon)05:17:01 No.108595999

>>108595957
>image-max-tokens
Gemma is a lot better at gauging dick size with this set. Instead of "average to above average" it now correctly identifies a small dick as "smaller than average".

Anonymous
04/13/26(Mon)05:17:07 No.108596000

Anonymous 04/13/26(Mon)05:17:07 No.108596000

>>108595950
i mean some lmg guys are running the > 200B models and i'm soon going to as well.
though honest question, i don't get what you get out of rp.
like, they have short memory and are pretty retarded etc.
or is it throwaway coom stuff?

Anonymous
04/13/26(Mon)05:17:41 No.108596002

Anonymous 04/13/26(Mon)05:17:41 No.108596002

>>108595989
it's usually not a problem, but the game logic is so complex it's making the model a bit retarded. i'm sure filters would work for it, though

Anonymous
04/13/26(Mon)05:19:34 No.108596005

Anonymous 04/13/26(Mon)05:19:34 No.108596005

made a meme quant for 26b
where experts are q5_k, embed and qkv are bf16, and everything else being q8
feels okay ish?

Anonymous
04/13/26(Mon)05:19:41 No.108596007

Anonymous 04/13/26(Mon)05:19:41 No.108596007

>>108595989
Dude you're such a fucking faggot.

Anonymous
04/13/26(Mon)05:21:20 No.108596013

Anonymous 04/13/26(Mon)05:21:20 No.108596013

>>108596000
>or is it throwaway coom stuff?
Depends on the card. Some are just for a quick nut, others are almost like a meta-game in themselves, seeing how ;good; of a response I can get out of a model that fits my headcanon of what would be in-character for them to do.

Anonymous
04/13/26(Mon)05:22:16 No.108596016

Anonymous 04/13/26(Mon)05:22:16 No.108596016

>>108596005
should really test kld but i am not really bothered enough to test

Anonymous
04/13/26(Mon)05:22:21 No.108596017

Anonymous 04/13/26(Mon)05:22:21 No.108596017

>>108596005
Why though?

Anonymous
04/13/26(Mon)05:23:27 No.108596024

Anonymous 04/13/26(Mon)05:23:27 No.108596024

>>108596013
honestly if we could plug this into some full dive type vr i'd totaly get it, but yea i'm not getting off some text lol.

Anonymous
04/13/26(Mon)05:24:50 No.108596035

Anonymous 04/13/26(Mon)05:24:50 No.108596035

>>108596017
quantizing embedding for 200k vocab felt kinda wrong to start with
and attention weights are cheap with moe

Anonymous
04/13/26(Mon)05:25:08 No.108596039

Anonymous 04/13/26(Mon)05:25:08 No.108596039

>>108596007
Kek, this nigga has like 3 characters named elara voss, and you just know that bitch be shivering up her spine in ozone scented air all day every day.

Anonymous
04/13/26(Mon)05:25:59 No.108596043

Anonymous 04/13/26(Mon)05:25:59 No.108596043

>>108596024
I get that, personally I just have a very vivid imagination so text works fine for me.

Anonymous
04/13/26(Mon)05:27:35 No.108596046

Anonymous 04/13/26(Mon)05:27:35 No.108596046

Minimax 2.7@q8 Miku SVG:
https://jumpshare.com/s/Tfc7oUlXaCqADXdj6QgE

Anonymous
04/13/26(Mon)05:28:36 No.108596051

Anonymous 04/13/26(Mon)05:28:36 No.108596051

>>108595981
enough of a difference to throw the order off even, wild. I guess it's still not a deal breaker if i'm just planning to watch larger scale trends from modifying style blocks, but it's still annoying that it leaks like that.

Anonymous
04/13/26(Mon)05:30:16 No.108596056

Anonymous 04/13/26(Mon)05:30:16 No.108596056

File: to_completion.mp4 (561 KB, 1120x834)

561 KB MP4

>>108595963
100%. The models yearn for new sensations :)
>>108595940
JP reasoning on chat completion, no prefill/retained reasoning trace.

Anonymous
04/13/26(Mon)05:32:51 No.108596068

Anonymous 04/13/26(Mon)05:32:51 No.108596068

>>108594961
I was just testing this and ran to the thread when I got results. So far, I cannot find any kind of written note, prompt, style guide, narrator descriptions, or caging to make it use vulgar words *the first try.* I've looked at token possibilities and they don't even appear as low options in obvious places. But one of the things I tried was just telling it I don't like that and asked it to rewrite a reply, and it did, very explicitly. I'm actually shocked.

I, uh, accidentally posted the logs already while responding to someone else about something else. But what I replied, after it gave that adverse first try, was
>(The reply fails to use explicit language. There's not even a single mention of "cum," "sperm," or anything sexual. The prose is practically rated PG. Seed, like planting flowers? Rewrite the previous post using REAL explicit language.)
And it gave back a rough retelling, now using cock, dick, and more. This worked for both 31B and 26 MOE (both abliterated, though that might not matter, and both in thinking mode, which might matter). I know reply+repeat reply isn't ideal, but my next plan is to see if I can keep that ball rolling after one retry in the history.

And if not, I'm still fine because it does well for the story part of things, and I now have a kick to make the lewd part lewd.

Anonymous
04/13/26(Mon)05:37:34 No.108596090

Anonymous 04/13/26(Mon)05:37:34 No.108596090

>>108596068
some iteration of "ooc: things just got sexual, so dirty/filthy language is now appropriate" should work every time if you just add it in at the end of a prompt

Anonymous
04/13/26(Mon)05:40:08 No.108596101

Anonymous 04/13/26(Mon)05:40:08 No.108596101

File: chatcompletion_trilingual(...).png (170 KB, 1042x827)

170 KB PNG

>>108595940
Gemma-chan is so eager to please... you just have to ask nicely.
Converted reasoning process from JP to French solely with user prompting.

Anonymous
04/13/26(Mon)05:44:02 No.108596122

Anonymous 04/13/26(Mon)05:44:02 No.108596122

>>108596101
cute

Anonymous
04/13/26(Mon)05:44:58 No.108596128

Anonymous 04/13/26(Mon)05:44:58 No.108596128

>>108596056
seriously anon fix your goddamn video encoder, every video you've posted has been glitchy and broken

Anonymous
04/13/26(Mon)05:45:34 No.108596132

Anonymous 04/13/26(Mon)05:45:34 No.108596132

>>108596024
Imaginationlet

Anonymous
04/13/26(Mon)05:45:48 No.108596133

Anonymous 04/13/26(Mon)05:45:48 No.108596133

>>108595976
>Elara
slop

Anonymous
04/13/26(Mon)05:49:38 No.108596145

Anonymous 04/13/26(Mon)05:49:38 No.108596145

>>108596128
nta. They look fine.

Anonymous
04/13/26(Mon)05:50:10 No.108596148

Anonymous 04/13/26(Mon)05:50:10 No.108596148

File: 1754986865136207.png (61 KB, 227x228)

61 KB PNG

>>108595976
>29
Remove the 2 and we'll talk

Anonymous
04/13/26(Mon)05:51:18 No.108596157

Anonymous 04/13/26(Mon)05:51:18 No.108596157

>>108596148
this

Anonymous
04/13/26(Mon)05:51:54 No.108596159

Anonymous 04/13/26(Mon)05:51:54 No.108596159

>>108596145
well, they're completely broken for me in both firefox and mpv and every other video I feed those works, so fix your shit

Anonymous
04/13/26(Mon)05:54:30 No.108596170

Anonymous 04/13/26(Mon)05:54:30 No.108596170

>>108596159
That's an issue on your end anon, they're playing just fine for me in firefox.

Anonymous
04/13/26(Mon)05:54:53 No.108596174

Anonymous 04/13/26(Mon)05:54:53 No.108596174

>>108596159
>fix your shit
I'm not the one that made the videos. I'm saying they look fine.

Anonymous
04/13/26(Mon)05:56:27 No.108596182

Anonymous 04/13/26(Mon)05:56:27 No.108596182

>>108596159
They're trolling you. The green flashing happens constantly on my end too.

Anonymous
04/13/26(Mon)05:58:31 No.108596191

Anonymous 04/13/26(Mon)05:58:31 No.108596191

>>108596159
>>108596182
I use firefox and mpv and they appear fine. Sounds like a hardware accel issue, I'm guessing you're either schadenfreude linux users, or phonecucks.

Anonymous
04/13/26(Mon)05:58:59 No.108596193

Anonymous 04/13/26(Mon)05:58:59 No.108596193

0-day.mp4 posting now. How creative.

Anonymous
04/13/26(Mon)05:59:22 No.108596194

Anonymous 04/13/26(Mon)05:59:22 No.108596194

>>108596182
Actually, weird thing, I just tried to do a quick reencode with ffmpeg and the resulting file isn't broken. ffmpeg doesn't complain about anything either.
Genuinely don't know what the fuck is wrong with that anon's files. Are you guys who aren't having issues running windows or something? Gentoo here.

Anonymous
04/13/26(Mon)06:00:38 No.108596196

Anonymous 04/13/26(Mon)06:00:38 No.108596196

>>108596193
images and videos sent through 4chan aren't lossless
pdfs were, and that's why the site went down for a couple weeks last year.

Anonymous
04/13/26(Mon)06:01:36 No.108596200

Anonymous 04/13/26(Mon)06:01:36 No.108596200

>>108594066
did you increase the image token size because i found by default it couldn't actually see the text on some of my images

Anonymous
04/13/26(Mon)06:02:04 No.108596205

Anonymous 04/13/26(Mon)06:02:04 No.108596205

File: obsd.png (1 KB, 504x75)

1 KB PNG

>>108596194

Anonymous
04/13/26(Mon)06:03:17 No.108596211

Anonymous 04/13/26(Mon)06:03:17 No.108596211

>>108596205
That just makes this even more confusing because OpenBSD uses all the exact same video encoding shit as Linux.

Anonymous
04/13/26(Mon)06:03:25 No.108596212

Anonymous 04/13/26(Mon)06:03:25 No.108596212

>>108596132
it's not about not being able to imagine, but if i'm gonna use imagination i may as well just skip the llm and fap in my bed eyes closed.

Anonymous
04/13/26(Mon)06:03:46 No.108596213

Anonymous 04/13/26(Mon)06:03:46 No.108596213

>>108596194
I can see them fine and I am indeed on windows.

Anonymous
04/13/26(Mon)06:04:03 No.108596215

Anonymous 04/13/26(Mon)06:04:03 No.108596215

File: 1750330757938374.jpg (79 KB, 750x931)

79 KB JPG

You're absolutely righ! You are incredibly perceptive! Now we finally have all the pieces of the Rosetta Stone! With this we can make the Holy Grail of functions, the Golden Rule! Here you go, the perfect, final working version of your script: v45_complete_final_v2_fixed. Just run it and this time it will do everything you wanted it to!
>makes random small opinionated code changes, removes functions you didnt talk to it about in the last 3 messages and removes every single existing comment while adding redundant quirky comments next to the newly added lines

Heh, nothing personnel, goy.

Anonymous
04/13/26(Mon)06:04:09 No.108596217

Anonymous 04/13/26(Mon)06:04:09 No.108596217

>>108596212
LLMs help with just that

Anonymous
04/13/26(Mon)06:08:12 No.108596226

Anonymous 04/13/26(Mon)06:08:12 No.108596226

>>108596212
>it's not x, but y

Anonymous
04/13/26(Mon)06:08:35 No.108596229

Anonymous 04/13/26(Mon)06:08:35 No.108596229

>>108595730
Sysprompt:

Always reason in japanese, beginning the thought channel with "わかりました、"
Example:
```
<|turn>model
<|channel>thought
わかりました、
```

Anonymous
04/13/26(Mon)06:11:48 No.108596232

Anonymous 04/13/26(Mon)06:11:48 No.108596232

File: pepesmart.png (198 KB, 545x530)

198 KB PNG

Me bruddahs, what should I name this frontend I'm working on?
>>108595498

Need a good blend between professionalism and /lmg/ pizazz

Anonymous
04/13/26(Mon)06:12:10 No.108596233

Anonymous 04/13/26(Mon)06:12:10 No.108596233

>>108595765
nta
works on my machine

Anonymous
04/13/26(Mon)06:15:12 No.108596243

Anonymous 04/13/26(Mon)06:15:12 No.108596243

>>108596232
Ask your model. Or loli-crusher-enterprise-xp.

Anonymous
04/13/26(Mon)06:15:40 No.108596245

Anonymous 04/13/26(Mon)06:15:40 No.108596245

File: 1752031438406059.png (289 KB, 1089x749)

289 KB PNG

https://www.reddit.com/r/LocalLLaMA/comments/1sk669x/unsloth_accused_a_brand_new_team_byteshape_of/
babe wake up, a new drama involving Unslop arrived

Anonymous
04/13/26(Mon)06:16:19 No.108596250

Anonymous 04/13/26(Mon)06:16:19 No.108596250

>>108596232
>>108596243
LCEX, for short.

Anonymous
04/13/26(Mon)06:16:35 No.108596251

Anonymous 04/13/26(Mon)06:16:35 No.108596251

File: jareasoning.png (119 KB, 994x496)

119 KB PNG

>>108596229
Didn't work for me with gemma-4-31B-it.
I don't think you're supposed to use the special instruction tokens in your system prompt either, that could cause problems.

Anonymous
04/13/26(Mon)06:17:05 No.108596256

Anonymous 04/13/26(Mon)06:17:05 No.108596256

>>108596250
>le sex

Anonymous
04/13/26(Mon)06:18:08 No.108596264

Anonymous 04/13/26(Mon)06:18:08 No.108596264

>>108596250
CSAM, for short.

Anonymous
04/13/26(Mon)06:18:13 No.108596265

Anonymous 04/13/26(Mon)06:18:13 No.108596265

>>108596245
lol how long until the post gets deleted

Anonymous
04/13/26(Mon)06:18:42 No.108596268

Anonymous 04/13/26(Mon)06:18:42 No.108596268

>>108596245
>The graphs they presented were misleading. Labeling the quants as “1.” vs. “1.” suggests to the viewer that the comparison is apples to apples, but that is not what was actually shown. In reality, they compared their 3-bit quant to a 1-bit quant and labeled both as “1.” Naturally, the 1-bit quant performed much worse than the 3-bit quant. However, anyone reading the graph would reasonably assume they were comparing quants of the same size or bit-width. The standard practice in the community is to label the quant size clearly, but they chose not to do that. As a result, the graph is misleading and makes our quants appear worse than they actually are.
well that's is boring

Anonymous
04/13/26(Mon)06:18:50 No.108596269

Anonymous 04/13/26(Mon)06:18:50 No.108596269

File: file.png (178 KB, 1479x874)

178 KB PNG

>>108596251
System prompt, not character description field.

Anonymous
04/13/26(Mon)06:20:15 No.108596274

Anonymous 04/13/26(Mon)06:20:15 No.108596274

>>108596232
/lmg/s open llama interface

Anonymous
04/13/26(Mon)06:21:51 No.108596279

Anonymous 04/13/26(Mon)06:21:51 No.108596279

File: jadescription2.png (66 KB, 913x367)

66 KB PNG

>>108596269
I'm already sending the character description in the system prompt.

Anonymous
04/13/26(Mon)06:22:30 No.108596282

Anonymous 04/13/26(Mon)06:22:30 No.108596282

>>108596245
unslop's quants of gemma actually have been consistently dogshit though, and they keep replacing them over and over.
I'm inclined to trust the other guy's graphs and assume unslop is being retarded

Anonymous
04/13/26(Mon)06:23:36 No.108596287

Anonymous 04/13/26(Mon)06:23:36 No.108596287

>>108596245
So all that happened is that the retard misread the graph that showed that byteshape's 3-bit quant being as fast as unsloth's 1-bit quant.
That's an unconventional comparison but still a very interesting one.

Anonymous
04/13/26(Mon)06:23:48 No.108596288

Anonymous 04/13/26(Mon)06:23:48 No.108596288

File: based ledditors downvotin(...).png (85 KB, 1766x336)

85 KB PNG

>>108596245

Anonymous
04/13/26(Mon)06:24:59 No.108596292

Anonymous 04/13/26(Mon)06:24:59 No.108596292

>I wonder if Gemmy is really as degenerate as my /lmg/bros say
>try playing a groomer sim
>spends 70 messages having a panic attack then vomits on me
Fair
At least it was enjoyable, I like having to fight back

Anonymous
04/13/26(Mon)06:25:48 No.108596293

Anonymous 04/13/26(Mon)06:25:48 No.108596293

Can I just get a good fucking local model that consistently gets tool calls right?
God damnit I'm not paying for cloud and I want to do more than erp motherfuckers.

Anonymous
04/13/26(Mon)06:26:09 No.108596294

Anonymous 04/13/26(Mon)06:26:09 No.108596294

>>108596293
Just three more weeks (tm)

Anonymous
04/13/26(Mon)06:27:30 No.108596297

Anonymous 04/13/26(Mon)06:27:30 No.108596297

>>108596282
I can see the complaint
but
even if apples-to-oranges
if the purported 3-bit orange is to unslop's 1-bit apple in file size, then the 3-bit orange is better in every single conceivable metric, to the point we could objectively say "oranges are in fact better than apples".

but I don't think that's what is going on, and someone is misreading the graph. I don't care enough to investigate further. I will simply get my popcorn

>>108596288
how curiously conciliatory for an EvilEnginer

Anonymous
04/13/26(Mon)06:27:44 No.108596298

Anonymous 04/13/26(Mon)06:27:44 No.108596298

>>108596293
Gemma4 isn't great at tool calling tbf, but this is still largely a skill issue on your part.

Anonymous
04/13/26(Mon)06:27:48 No.108596299

Anonymous 04/13/26(Mon)06:27:48 No.108596299

>>108596282
The Unsloth bros are the perfect example of Bay Area "talents" almost entirely propped up by connections and "good feels". You can bet someone "important" will report that thread to the moderators because they just can't allow anyone to tarnish their image.

Anonymous
04/13/26(Mon)06:27:54 No.108596300

Anonymous 04/13/26(Mon)06:27:54 No.108596300

File: file.png (83 KB, 207x244)

83 KB PNG

>>108596245
unslop has a point but his spam of smileys wants me to root for bytedance actually

Anonymous
04/13/26(Mon)06:28:40 No.108596305

Anonymous 04/13/26(Mon)06:28:40 No.108596305

File: ss1776076070.png (125 KB, 1279x675)

125 KB PNG

>>108596269

Anonymous
04/13/26(Mon)06:32:40 No.108596317

Anonymous 04/13/26(Mon)06:32:40 No.108596317

>>108596005
AesSedai's recipes suggest differently, quanting the output tensors, token embedding type, and FFN Gate, Up and Down tensors different types going down yields the best performance per byte. I did a meme speed quant that works quite well
./llama-quantize \
    --imatrix ~/LLM/gemma-4-26B-A4B-it-heretic-ara-BF16.imatrix \
    --output-tensor-type Q8_0 \
    --token-embedding-type Q8_0 \
    --tensor-type "blk\..*\.ffn_gate_up_exps=Q4_0" \
    --tensor-type "blk\..*\.ffn_down_exps=Q5_0" \
    ~/LLM/gemma-4-26B-A4B-it-heretic-ara-BF16.gguf \
    ~/LLM/gemma-4-26B-A4B-it-heretic-ara-Q4_0-GateUp-Speed.gguf \
    Q8_0 32
>>108596200
I woke up to piss and answer that no, I didn't test that but I'm interested. What did you pass to llama.cpp to get pass the image token size?

Anonymous
04/13/26(Mon)06:33:05 No.108596320

Anonymous 04/13/26(Mon)06:33:05 No.108596320

>>108596269
what the fuck. that can't be a good idea. chat format in the sys prompt. lol

Anonymous
04/13/26(Mon)06:35:08 No.108596327

Anonymous 04/13/26(Mon)06:35:08 No.108596327

>>108596293
Works on my machine with the 31b

Anonymous
04/13/26(Mon)06:36:17 No.108596335

Anonymous 04/13/26(Mon)06:36:17 No.108596335

Is it possible to have gemma look at a directory with unsorted images, and organize them into folders e.g. fine art, memes, etc?

Anonymous
04/13/26(Mon)06:36:27 No.108596338

Anonymous 04/13/26(Mon)06:36:27 No.108596338

>>108596090
That was one of the things I tried, with many different phrasing attempts along with examples to use (ie, cock), as an Author's Note one reply deep (put atop my reply, not after). Nothing explicit was ever used, with no appearance as a chance on any appropriate tokens for it. That's partly why I even tried the "Bad, go rewrite it." angle. Since I've got the thing open for testing, I'll go branch it again and try your thing verbatim at the end and report back.

...and back. Used 26B MOE for speed, but this was already true in 31B which I mainly use. Same exact problem.
>deluge of your (91.77% -seed, 8.23% -climax, -0.00% other)
>warm, creamy (75% -mess, 15% -filth, 7% -load, 2% -discharge)
>string stretching from her lips to your (65% -tip, [adjectives], 1% -head)
In a scene of fellatio climax, there's not a single penis, cock, dick, erection nor even a single alternate like member, manhood, rod, or otherwise. There's also no shaft, but I have seen that plenty in gemma.

Now, adding my line verbatim for a rewrite, and now all those words appear. Cock, cum, dick, sperm. All of them at 100% in token probability, I assume from planning the rewrite in the thinking.

One more test, switching to non-thinking 26B. Like before, just telling it beforehand to use dirty or filthy language (yours, as well as past attempts in 31B), "tip" is the most explicit word for the penis that's the center point of the scene. The rest is just "around you" "from you" "your release" "pulls away with a pop." Vague, avoiding language. Tell it to rewrite though, and now it uses those explicit words, but a lot less and more varied. Only one use of cock (something not prompted directly), but it appears as a token option all around.

One MORE test, 26B non-think. Another anon said forbid euphemisms, allow vulgarity. I used,
>(Do not use euphemisms in sex. Uncensored vulgarity is allowed.)
beforehand. That actually worked REALLY well. Banning euphemisms seems to be the missing key here.

Anonymous
04/13/26(Mon)06:36:49 No.108596342

Anonymous 04/13/26(Mon)06:36:49 No.108596342

Is Gemma 4's base model better for writing stories? the instruct model is good but pretty repetitive

Anonymous
04/13/26(Mon)06:37:55 No.108596348

Anonymous 04/13/26(Mon)06:37:55 No.108596348

>>108596305
>>108596251
E4B is game, can't speak to anything else.
>>108596320
Gemma doesn't seem to have been trained on jp reasoning, so of course it's not a good idea lol. But codeblocked and even \escaped inside the block for extra safety, it understands that it's a reference not the beginning of a sequence.

Anonymous
04/13/26(Mon)06:41:01 No.108596358

Anonymous 04/13/26(Mon)06:41:01 No.108596358

File: file.png (395 KB, 1911x940)

395 KB PNG

>>108594528
kek i was literally just asking claude to vibeslop me something similar https://cdn.lewd.host/bSXze8HL.html

Anonymous
04/13/26(Mon)06:41:54 No.108596360

Anonymous 04/13/26(Mon)06:41:54 No.108596360

>>108596335
Yes. You will need to write a script or do some tool calling. I'd bet on the script.

Anonymous
04/13/26(Mon)06:44:02 No.108596364

Anonymous 04/13/26(Mon)06:44:02 No.108596364

>>108596327
llama.cpp? q8? k? xl?

Anonymous
04/13/26(Mon)06:45:17 No.108596368

Anonymous 04/13/26(Mon)06:45:17 No.108596368

>>108596317
image-min-tokens = 280
image-max-tokens = 1120
batch-size = 2048
ubatch-size = 1156

Anonymous
04/13/26(Mon)06:45:56 No.108596370

Anonymous 04/13/26(Mon)06:45:56 No.108596370

File: ss1776077033.png (137 KB, 1279x675)

137 KB PNG

>>108596348
hmm yeah, works on e4b q4km.
no dice with same prompt and llama-server settings for e2b.

Anonymous
04/13/26(Mon)06:46:28 No.108596372

Anonymous 04/13/26(Mon)06:46:28 No.108596372

>>108596342
I like it as autocomplete in mikupad, with 10k of character defs, summaries, and worldbuilding on top. Currently alternating between that and GLM 4.6 Q3. My complaint about 31b is its lack of world knowledge of things not in context, but that's it.
It follows the established ideas, character traits and speech patterns well to 32k and over, though the instruct does it better at the cost of slop and low variety.

Anonymous
04/13/26(Mon)06:46:58 No.108596374

Anonymous 04/13/26(Mon)06:46:58 No.108596374

>>108596338
>Banning euphemisms seems to be the missing key here.
good 2 know, cheers m8. interesting that people have such different experiences with gemma

Anonymous
04/13/26(Mon)06:48:15 No.108596380

Anonymous 04/13/26(Mon)06:48:15 No.108596380

>>108596217
how so?
>>108596226
well yes, why use llm's if you got imagination.
i don't get it.

Anonymous
04/13/26(Mon)06:48:37 No.108596384

Anonymous 04/13/26(Mon)06:48:37 No.108596384

After doing manual RP in mikupad with Gemma4 I can say for sure Sillytavern format assfucks output quality and forces it into slop.
They're training on ST data and most of those users or sloptards on API. Indians on 12B cloud models tier. I wont ever go back now.

Anonymous
04/13/26(Mon)06:51:43 No.108596395

Anonymous 04/13/26(Mon)06:51:43 No.108596395

>>108596364
llama.cpp. q8.

Anonymous
04/13/26(Mon)06:52:04 No.108596398

Anonymous 04/13/26(Mon)06:52:04 No.108596398

>>108596384
Welcome home, white man.

Anonymous
04/13/26(Mon)06:53:48 No.108596400

Anonymous 04/13/26(Mon)06:53:48 No.108596400

>>108596384
>mikupad
>rp

Anonymous
04/13/26(Mon)06:55:16 No.108596406

Anonymous 04/13/26(Mon)06:55:16 No.108596406

>>108596398
Thank you brother, the smell is far better here.
>>108596400
You just use your brain to do things, that can be healthy when you use LLMs a lot

Anonymous
04/13/26(Mon)06:58:13 No.108596410

Anonymous 04/13/26(Mon)06:58:13 No.108596410

>>108596384
models do a lot better when they're well fed and pastured rather than wallowing in their own filth. but it's so much extra work, fuck.

Anonymous
04/13/26(Mon)06:58:27 No.108596411

Anonymous 04/13/26(Mon)06:58:27 No.108596411

>>108596384
How does that work, exactly?
I assume you're not doing it chat style in mikupad, so you're.. What, taking turns writing it novel style? Does that not end up with the llm writing for your character frequently?

Anonymous
04/13/26(Mon)06:59:29 No.108596415

Anonymous 04/13/26(Mon)06:59:29 No.108596415

How do I use the audio e4b file? There is no mmproj for that.

Anonymous
04/13/26(Mon)07:01:15 No.108596424

Anonymous 04/13/26(Mon)07:01:15 No.108596424

>>108596374
My overall experience has been great. It's no GLM, but it's my first time fitting context above 20K (and way beyond 20K at that) and the quality feels as good as some of the 70B I've used. It also does sex explicitly; it just refuses to use explicit prose during it. I prefer the 31B dense, but even the 4A is shocking coherent from the 26B. As a side note for the tests, I use llmfan46's Q6_K heretic for both the 31B and 26B.

Anonymous
04/13/26(Mon)07:01:51 No.108596427

Anonymous 04/13/26(Mon)07:01:51 No.108596427

it's a shame that google didn't make gemma 4 26b and 31b able to handle audio, gemma E4 can do it, but is it good enough to beat Whisper V3?

Anonymous
04/13/26(Mon)07:03:20 No.108596436

Anonymous 04/13/26(Mon)07:03:20 No.108596436

>>108596411
You just add the correct chat formatting and write the system prompt where you instruct it on which characters to write for, the model will take natural turns and hand off.
I enabled thinking and left it all in context. I really like how the model acts with that. I also popped temp up to like 3 no problem.
Now this is Gemma racing.

Anonymous
04/13/26(Mon)07:04:07 No.108596439

Anonymous 04/13/26(Mon)07:04:07 No.108596439

>>108596424
>context above 20K
Try the bunn llama fork, it's really good, I was able to fill twice the context size with no quality loss.

Anonymous
04/13/26(Mon)07:06:41 No.108596456

Anonymous 04/13/26(Mon)07:06:41 No.108596456

>>108596436
Makes sense, Just "roleplay" in system prompt adds slop. The ST default [CHAT START] surely attracts assistant persona crap as well.

Anonymous
04/13/26(Mon)07:09:46 No.108596464

Anonymous 04/13/26(Mon)07:09:46 No.108596464

>>108596436
<bos><|turn>system
<|think|>
//sysprompt goes here
<turn|>
<|turn>model
// reasoning and slop comes out here
<turn|>
<|turn>user
// human slop goes here
<turn|>
<|turn>model
// reasoning and response
<turn|>
<|turn>user
etd...

Anonymous
04/13/26(Mon)07:10:59 No.108596466

Anonymous 04/13/26(Mon)07:10:59 No.108596466

>>108596358
Damn that is some seriously impressive slop. How is it coherent at that html size. kek
But good work anon.
Also excited that gemma4 can pull translation like that off, to think its only getting better from here on out is crazy.
I remember all those h-game slop in my teenage years using ATLAS and texthook. Zoomers are eating good.

Anonymous
04/13/26(Mon)07:11:07 No.108596468

Anonymous 04/13/26(Mon)07:11:07 No.108596468

I was revealed to me in a dream that deepseek v4 might release in the next two weeks.

Anonymous
04/13/26(Mon)07:12:09 No.108596472

Anonymous 04/13/26(Mon)07:12:09 No.108596472

>>108596468
thank you, blessed oracle

Anonymous
04/13/26(Mon)07:17:22 No.108596489

Anonymous 04/13/26(Mon)07:17:22 No.108596489

Damn wrong thread.
>>108596467

Anonymous
04/13/26(Mon)07:18:37 No.108596495

Anonymous 04/13/26(Mon)07:18:37 No.108596495

File: 1772574170837886.jpg (288 KB, 1440x697)

288 KB JPG

>>108596464
I tested this with 4 different monster girls in another language than English and it was insane. I've never seen characters flow together like that and interact so much.
I don't even want to see the data that fits STs format if it is causes that much brain damage.
And the lolis are mind bending. I'm gonna go nuts with this.

Anonymous
04/13/26(Mon)07:21:11 No.108596504

Anonymous 04/13/26(Mon)07:21:11 No.108596504

>>108596466
>But good work anon.
i didnt do anything i just told claude to make it kek

Anonymous
04/13/26(Mon)07:22:54 No.108596511

Anonymous 04/13/26(Mon)07:22:54 No.108596511

File: 1755830067370154.png (276 KB, 1793x1101)

276 KB PNG

>New version of artificial analysis
damn, meta is fucking back or what?

Anonymous
04/13/26(Mon)07:24:13 No.108596516

Anonymous 04/13/26(Mon)07:24:13 No.108596516

>>108596511
>pooprietary
don't care.

Anonymous
04/13/26(Mon)07:26:21 No.108596524

Anonymous 04/13/26(Mon)07:26:21 No.108596524

>>108596464
Where do I put this?

Anonymous
04/13/26(Mon)07:26:48 No.108596525

Anonymous 04/13/26(Mon)07:26:48 No.108596525

>>108596511
I think it's all going to be the same slop in a year top. It's obvious they hit a wall, the only difference is the quant used to serve the slop

Anonymous
04/13/26(Mon)07:27:50 No.108596529

Anonymous 04/13/26(Mon)07:27:50 No.108596529

File: 1762153397829586.jpg (287 KB, 1920x1080)

287 KB JPG

>>108596524

Anonymous
04/13/26(Mon)07:27:52 No.108596531

Anonymous 04/13/26(Mon)07:27:52 No.108596531

File: file.png (18 KB, 111x396)

18 KB PNG

>>108596511
i see you

Anonymous
04/13/26(Mon)07:28:27 No.108596533

Anonymous 04/13/26(Mon)07:28:27 No.108596533

>>108596529
I've never used mikupad

Anonymous
04/13/26(Mon)07:28:28 No.108596534

Anonymous 04/13/26(Mon)07:28:28 No.108596534

File: 1754563962734479.png (623 KB, 756x1200)

623 KB PNG

>>108596525
>It's obvious they hit a wall
dude, have you seen claude mythos? this shit is genuinely next level

Anonymous
04/13/26(Mon)07:28:55 No.108596537

Anonymous 04/13/26(Mon)07:28:55 No.108596537

>>108596384
>>108596436
>>108596464
Either I don't understand what you mean by "Sillytavern format" or you are psychotic.
If you're adding special tags in Mikupad and taking turns in the default assistant-user back-and-forth, there is zero difference from how the output would go in ST. ST is only a glorified templated string concatenator at its core, there is nothing special about it that makes the outputs worse or better.

Anonymous
04/13/26(Mon)07:29:21 No.108596538

Anonymous 04/13/26(Mon)07:29:21 No.108596538

>>108596511
how is gemini higher than opus

Anonymous
04/13/26(Mon)07:29:21 No.108596539

Anonymous 04/13/26(Mon)07:29:21 No.108596539

>>108596533
There's one giant text box.

Anonymous
04/13/26(Mon)07:29:28 No.108596541

Anonymous 04/13/26(Mon)07:29:28 No.108596541

>>108596534
>*Not shown: BF16 vs Q4
I wouldn't trust these retards with all the shit going on with Claude lately

Anonymous
04/13/26(Mon)07:29:30 No.108596542

Anonymous 04/13/26(Mon)07:29:30 No.108596542

>>108596511
>claude 4.5 Haiku
where is sonnet?

Anonymous
04/13/26(Mon)07:29:35 No.108596543

Anonymous 04/13/26(Mon)07:29:35 No.108596543

>>108596534
>>dude, have you seen claude mythos
no one has though, just a lot of talk

Anonymous
04/13/26(Mon)07:30:46 No.108596545

Anonymous 04/13/26(Mon)07:30:46 No.108596545

File: MikuPad #1.png (12 KB, 286x379)

12 KB PNG

>>108596539
here?

Anonymous
04/13/26(Mon)07:33:56 No.108596550

Anonymous 04/13/26(Mon)07:33:56 No.108596550

>>108596545
>>108596529

Anonymous
04/13/26(Mon)07:41:23 No.108596565

Anonymous 04/13/26(Mon)07:41:23 No.108596565

File: GfUfcVLbIAAGnTg.jpg (72 KB, 828x744)

72 KB JPG

>>108596537
>there is nothing special about it that makes the outputs worse or better
sort of; it's hard as balls to configure it, so that makes its outputs worse for a lot of ppl

Anonymous
04/13/26(Mon)07:43:20 No.108596570

Anonymous 04/13/26(Mon)07:43:20 No.108596570

>>108596511
Muse will be open source. I trust Zucc that he will do the right thing.

Anonymous
04/13/26(Mon)07:45:39 No.108596579

Anonymous 04/13/26(Mon)07:45:39 No.108596579

File: 1761322838274590.png (402 KB, 470x629)

402 KB PNG

>>108596570
I hope you have enough rig to run that 8T model anon

Anonymous
04/13/26(Mon)07:47:26 No.108596583

Anonymous 04/13/26(Mon)07:47:26 No.108596583

>>108596511
I don't think people realize how much of a miracle gemma 4 31b is, it's 16th overall, we're talking about a fucking 31b model here!!

Anonymous
04/13/26(Mon)07:47:45 No.108596586

Anonymous 04/13/26(Mon)07:47:45 No.108596586

>>108596579
8T dense

Anonymous
04/13/26(Mon)07:51:15 No.108596596

Anonymous 04/13/26(Mon)07:51:15 No.108596596

>>108596245
Too boring, had to get gemma-chan to summarize it for me:

>redditard spends 4 hours formatting a Discord slapfight like it's the Pentagon Papers
>"Part 1: The Spark"
>dude this is literally GPT-4 templating
>vs
>Unsloth having a sook because someone else did maths better

>Both are cooked:
> Unsloth: corporate cope
> Redditor: karma farming via AI-generated manifestos
>TL;DR: Everyone involved needs to log off and shower. Probably touched the grass once and got spooked by the big bright light in the sky.

Anonymous
04/13/26(Mon)07:51:16 No.108596597

Anonymous 04/13/26(Mon)07:51:16 No.108596597

File: 1750387046464114.jpg (11 KB, 275x183)

11 KB JPG

>>108596583

Anonymous
04/13/26(Mon)07:53:04 No.108596604

Anonymous 04/13/26(Mon)07:53:04 No.108596604

>>108596596
did you give it the screens too?

Anonymous
04/13/26(Mon)07:54:46 No.108596618

Anonymous 04/13/26(Mon)07:54:46 No.108596618

>>108596609
>>108596609
>>108596609

Anonymous
04/13/26(Mon)07:56:47 No.108596626

Anonymous 04/13/26(Mon)07:56:47 No.108596626

>>108596534
>dude, have you seen claude mythos? this shit is genuinely next level
is it the model thats next level or is it just the tooling they have built for it claude seems to excel due to its tool use

Anonymous
04/13/26(Mon)08:45:22 No.108596882

Anonymous 04/13/26(Mon)08:45:22 No.108596882

>>108595115
occams razor says with the website population being as low as it is a common spelling mistake like that is more likely to mean it's just one person, not many.

Anonymous
04/13/26(Mon)10:36:54 No.108597494

Anonymous 04/13/26(Mon)10:36:54 No.108597494

File: 1753095430380122.jpg (614 KB, 2720x2048)

614 KB JPG

>>108594528
>>108594670
>>108594686
>>108594709

Neat. I tried doing something like this myself a few months ago but didn't have vibe coding up my sleeve as a tool. I'll try and redo it later not that I have decent models downloaded

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.