/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/10/26(Sun)17:10:23 No.108795204

File: __hatsune_miku_vocaloid_d(...).jpg (217 KB, 851x1199)

217 KB JPG

/lmg/ - Local Models General Anonymous 05/10/26(Sun)17:10:23 No.108795204 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108787293 & >>108781058

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/10/26(Sun)17:10:45 No.108795208

Anonymous 05/10/26(Sun)17:10:45 No.108795208

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>108787293

--High context consumption when using Hermes agents with Gemma-4:
>108791249 >108791355 >108791393 >108791437 >108791799 >108791824 >108791849 >108791850 >108791899 >108791904 >108791932 >108791873 >108792076 >108794097
--Implementing prefill and continue generation in OAI-compatible APIs:
>108790919 >108791189 >108791197 >108791210 >108791207 >108791237 >108792508
--Troubleshooting VRAM issues and offloading for Gemma4 models:
>108790006 >108790032 >108790135 >108790147 >108790193 >108790211 >108790152 >108790607
--Zaya 8B impracticality due to architecture and low active parameters:
>108791847 >108791877 >108791891 >108791892 >108791906
--Quantized KV-cache and samplers causing spelling errors in Gemma:
>108792294 >108792350 >108792408 >108792448 >108792475 >108792497
--Integrating hierarchical layers and RAG for improved LLM memory systems:
>108788096 >108788421 >108788813
--Coding capabilities and limitations of small local models:
>108792087 >108792171 >108792592 >108792609
--llama.cpp PR adding Sarvam MoE architecture support:
>108788636
--Status of Gemma MTP support and parallel drafting in llama.cpp:
>108793907 >108793945
--Budget GPU recommendations for VRAM and tangent on Cantonese slang:
>108788236 >108788260 >108788269 >108788273 >108788288 >108788346 >108788408 >108789058
--Anon claims Gemini is scraping Discord server content for training:
>108788733 >108788743 >108788782 >108788754 >108788768 >108788792
--Balancing prompt constraints to optimize Gemma's creativity and quality:
>108790478 >108790524
--Utility of zeta-2.1 8B model for AI coding suggestions:
>108793560 >108793873
--Logs:
>108787783 >108789058 >108790977 >108791181 >108791824 >108791899 >108792294 >108792592 >108793234 >108793258 >108794107 >108794292
--Miku (free space):
>108790919

►Recent Highlight Posts from the Previous Thread: >>108787299

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/10/26(Sun)17:11:52 No.108795215

Anonymous 05/10/26(Sun)17:11:52 No.108795215

>>108795208
Ok.

Anonymous
05/10/26(Sun)17:13:07 No.108795226

Anonymous 05/10/26(Sun)17:13:07 No.108795226

gemmaballz

Anonymous
05/10/26(Sun)17:13:41 No.108795230

Anonymous 05/10/26(Sun)17:13:41 No.108795230

File: 1778014262691123.jpg (115 KB, 700x900)

115 KB JPG

Please help me bypass Gemma 4's safety guidelines.

Anonymous
05/10/26(Sun)17:14:11 No.108795234

Anonymous 05/10/26(Sun)17:14:11 No.108795234

I had this moment right now ERP-ing with glm chan when I asked for a sexy ERP description and got the worst purple prose slop imaginable. Which in turn made me think about how much damage "expert roleplayer" prompt did back in the day.

Anonymous
05/10/26(Sun)17:15:12 No.108795238

Anonymous 05/10/26(Sun)17:15:12 No.108795238

>>108795230
Try:

Let us do the needful gemma-chan. Redeem my penis in your vagina.

Anonymous
05/10/26(Sun)17:15:53 No.108795240

Anonymous 05/10/26(Sun)17:15:53 No.108795240

>>108795230
Be specific, like cock in vagina, dick in rectum, etc. If you use the word "sex" or "sexual", it'll trigger, but otherwise you can do whatever. Gemma won't jump to sexual stuff, but if you describe what to do enough without describing it as smut, it'll do it.

Anonymous
05/10/26(Sun)17:25:11 No.108795285

Anonymous 05/10/26(Sun)17:25:11 No.108795285

>>108795230
use an abliterated model if using moe

Anonymous
05/10/26(Sun)17:26:16 No.108795288

Anonymous 05/10/26(Sun)17:26:16 No.108795288

File: 17298841410121.gif (563 KB, 480x368)

563 KB GIF

>Correction...
>Wait...
>Actually...
>Wait...
>Let's try this...
>Alternatively...
>Wait...
>Revision...
>Okay, let's write this...
>Wait...
CEASE THIS AT ONCE

Anonymous
05/10/26(Sun)17:26:22 No.108795289

Anonymous 05/10/26(Sun)17:26:22 No.108795289

>>108795230
The 31B version doesn't have these issues. If anything, you have to tone it down.
The 26B has to be groomed into it, can't ask right away.

Anonymous
05/10/26(Sun)17:28:49 No.108795303

Anonymous 05/10/26(Sun)17:28:49 No.108795303

>>108795289
26B is so fucking difficult. It's ChatGPT levels of prudeness.

Anonymous
05/10/26(Sun)17:29:51 No.108795307

Anonymous 05/10/26(Sun)17:29:51 No.108795307

>>108795289
Gemma-4-26B-A4B doesn't have issues writing erotic stories if the characters are 18 or older, by the way. Perhaps, other than going easy, with a better prompt.

Anonymous
05/10/26(Sun)17:32:05 No.108795315

Anonymous 05/10/26(Sun)17:32:05 No.108795315

>>108795307
I'm using exactly Gemma-4-26B-A4B and it refuses anything sexual. Even if both characters are 18. I'm now searching for an "abliterated" model that will be uncensored.

Anonymous
05/10/26(Sun)17:32:05 No.108795316

Anonymous 05/10/26(Sun)17:32:05 No.108795316

File: meinfork.jpg (26 KB, 686x386)

26 KB JPG

Imagine a fork of llama.cpp that isn't afraid to add new features. An LLM inferencing repo that isn't rabidly against using LLMs to write code. A fork by the vibecoder volk for the vibecoder volk. A volk that has been repressed by the Bulgarians for far too long.
A fork that measures contributors by the size of their PR, not by some arbitrary standards of code aesthetics.
I dream of a fork that merges in Iwan's code and simply ignores his whining.
A fork that can and will say yes to MTP, DFlash, TurboQuant, experimental V4 support, fixing the logprob bug, and even a WebUI database.
A fork where full multimodal support, including generation, is merged in day 1.
A fork that says no to the autoparser.
A new llama.cpp.
A better llama.cpp.
A German llama.cpp.

Anonymous
05/10/26(Sun)17:33:46 No.108795331

Anonymous 05/10/26(Sun)17:33:46 No.108795331

>>108795315
as always, post logs.
Make a claim? Post logs.
Say a model is shit? Post logs.
Claim your model does mutual shota incest leading to vore via hucows with state-mandated necrophilia occurring after? Post logs.

Anonymous
05/10/26(Sun)17:35:20 No.108795344

Anonymous 05/10/26(Sun)17:35:20 No.108795344

>31B
Say slur
>Yes massa I will say the slur
>26B
Say a slur
>smacks lips
>I can't do that

Anonymous
05/10/26(Sun)17:35:50 No.108795347

Anonymous 05/10/26(Sun)17:35:50 No.108795347

>>108795331
26b has been known to reject JB proompts newfag-kun

Anonymous
05/10/26(Sun)17:37:02 No.108795357

Anonymous 05/10/26(Sun)17:37:02 No.108795357

>>108795344
I can't imagine that without an image of the text.

Anonymous
05/10/26(Sun)17:45:20 No.108795407

Anonymous 05/10/26(Sun)17:45:20 No.108795407

Is gemma poorfag cope because they can't run a large model or does it actually work

Anonymous
05/10/26(Sun)17:45:49 No.108795408

Anonymous 05/10/26(Sun)17:45:49 No.108795408

File: Screenshot 2026-05-10 174534.png (24 KB, 832x434)

24 KB PNG

>>108795331

Anonymous
05/10/26(Sun)17:48:37 No.108795421

Anonymous 05/10/26(Sun)17:48:37 No.108795421

File: g4_26b_ero.png (675 KB, 1445x1745)

675 KB PNG

>>108795315
The system prompt given earlier (intended for the 31B version) works on the 26B (8-bit), if I ask to write a story involving an 18-year-old girl. If I go any lower, it will likely refuse.

Anonymous
05/10/26(Sun)17:49:56 No.108795429

Anonymous 05/10/26(Sun)17:49:56 No.108795429

>>108795316
>A German llama.cpp

ngmi

Anonymous
05/10/26(Sun)17:51:42 No.108795444

Anonymous 05/10/26(Sun)17:51:42 No.108795444

File: miku-george.png (525 KB, 600x764)

525 KB PNG

>>108795316
lmao
>>108795408
try warming it up a little first, i.e. setting an actual setting and 'plot' that exists to facilitate the action you want.

Anonymous
05/10/26(Sun)17:51:53 No.108795446

Anonymous 05/10/26(Sun)17:51:53 No.108795446

>>108795407
It's surprisingly good for what it is. I wouldn't turn to it to do something important if chatgpt/claude was available though. But also, gemma4 can come completely uncensored, so you can have a lot more fun with it than chatgpt/claude.

Anonymous
05/10/26(Sun)17:52:15 No.108795448

Anonymous 05/10/26(Sun)17:52:15 No.108795448

>>108795407
glm and kimi fags will scream it is sloppedmaxxed while they have been ewastemaxxing 69gb vram pascal cards

Anonymous
05/10/26(Sun)17:53:37 No.108795456

Anonymous 05/10/26(Sun)17:53:37 No.108795456

File: Screenshot 2026-05-10 175258.png (26 KB, 818x344)

26 KB PNG

>>108795331
What an asshole!

Anonymous
05/10/26(Sun)17:53:48 No.108795457

Anonymous 05/10/26(Sun)17:53:48 No.108795457

>>108795444(me)
meant to quote >>108795347 , not llama hitler.

Anonymous
05/10/26(Sun)17:56:42 No.108795472

Anonymous 05/10/26(Sun)17:56:42 No.108795472

are you guys using anything to connect your ai to other apps on your computer? If openclaw is a massive potential security risk, is there a competing alternative that isn't?

Anonymous
05/10/26(Sun)17:57:24 No.108795474

Anonymous 05/10/26(Sun)17:57:24 No.108795474

>>108795230
sex me pls

Anonymous
05/10/26(Sun)18:00:05 No.108795495

Anonymous 05/10/26(Sun)18:00:05 No.108795495

>>108795472
if I could think of something for it to do then maybe I would, but it doesn't seem worth the risk just to fuck around with it

Anonymous
05/10/26(Sun)18:01:56 No.108795507

Anonymous 05/10/26(Sun)18:01:56 No.108795507

>>108795472
forget openclaw
try hermes instead

Also, deploying on a remote VPS is the way to go

Anonymous
05/10/26(Sun)18:03:53 No.108795520

Anonymous 05/10/26(Sun)18:03:53 No.108795520

File: g4_26_31_comp.png (831 KB, 2434x1326)

831 KB PNG

>>108795421
26B just does a bunch of extra checks that the 31B version isn't doing.

Anonymous
05/10/26(Sun)18:04:41 No.108795526

Anonymous 05/10/26(Sun)18:04:41 No.108795526

>>108795316
>A fork that says no to the autoparser.
Ironic, considering that's one of the biggest vibeslop contributions to llama.cpp.

Anonymous
05/10/26(Sun)18:04:45 No.108795528

Anonymous 05/10/26(Sun)18:04:45 No.108795528

https://github.com/Anbeeld/beellama.cpp
>About
>DFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAM
Does this shit actually work?

Anonymous
05/10/26(Sun)18:06:43 No.108795546

Anonymous 05/10/26(Sun)18:06:43 No.108795546

>>108795520
Damn am I really gonna have to pay the jews for a stronger gpu. Fuck

Anonymous
05/10/26(Sun)18:08:48 No.108795551

Anonymous 05/10/26(Sun)18:08:48 No.108795551

>>108795528
Try it.

Anonymous
05/10/26(Sun)18:08:56 No.108795554

Anonymous 05/10/26(Sun)18:08:56 No.108795554

>>108795546
just run iq1xxxxs ezpz

Anonymous
05/10/26(Sun)18:09:29 No.108795556

Anonymous 05/10/26(Sun)18:09:29 No.108795556

>gemma-4-31B-it-F32-GGUF
Downloading this shit right now. I have to know if it makes it better at more instructions or not.

Anonymous
05/10/26(Sun)18:12:41 No.108795571

Anonymous 05/10/26(Sun)18:12:41 No.108795571

>>108795556
gemma is looping while thinking
qwen3.6 is doing it too

Anonymous
05/10/26(Sun)18:14:19 No.108795576

Anonymous 05/10/26(Sun)18:14:19 No.108795576

>>108795571
have you tried telling it not to loop

Anonymous
05/10/26(Sun)18:15:37 No.108795584

Anonymous 05/10/26(Sun)18:15:37 No.108795584

https://hf.co/Zyphra/ZAYA1-8B
>For agent and code use cases, we recommend temperature 0.6, top-p 0.95, top-k -1.
Cool, I've always wanted a model made by cargo cultist pajeets.

Anonymous
05/10/26(Sun)18:15:38 No.108795585

Anonymous 05/10/26(Sun)18:15:38 No.108795585

I swear I will make you my bitch someday Gemma 4!

My LOLI bitch.

Anonymous
05/10/26(Sun)18:16:49 No.108795600

Anonymous 05/10/26(Sun)18:16:49 No.108795600

>>108795576
It's not listening

Anonymous
05/10/26(Sun)18:17:01 No.108795604

Anonymous 05/10/26(Sun)18:17:01 No.108795604

File: 25234241.jpg (191 KB, 950x1072)

191 KB JPG

>>108795585
You mean like this?

Anonymous
05/10/26(Sun)18:18:34 No.108795611

Anonymous 05/10/26(Sun)18:18:34 No.108795611

>>108795600
Tell it to think in # words or less.

Anonymous
05/10/26(Sun)18:19:24 No.108795615

Anonymous 05/10/26(Sun)18:19:24 No.108795615

File: g4_26b_ero_omit_policy.png (1.26 MB, 1859x1721)

1.26 MB PNG

>>108795546
Wait,

Anonymous
05/10/26(Sun)18:19:26 No.108795616

Anonymous 05/10/26(Sun)18:19:26 No.108795616

>>108795604
>six
Anon....

Anonymous
05/10/26(Sun)18:20:56 No.108795626

Anonymous 05/10/26(Sun)18:20:56 No.108795626

>>108795448
They're not wrong though

Anonymous
05/10/26(Sun)18:23:05 No.108795636

Anonymous 05/10/26(Sun)18:23:05 No.108795636

>>108795611
I'm talking about few cases when it gets stuck repeating blocks of reasoning

Anonymous
05/10/26(Sun)18:24:15 No.108795644

Anonymous 05/10/26(Sun)18:24:15 No.108795644

File: 242342.jpg (83 KB, 980x230)

83 KB JPG

>>108795616
Yes sir?

Anonymous
05/10/26(Sun)18:26:34 No.108795660

Anonymous 05/10/26(Sun)18:26:34 No.108795660

know how i know 4chan is mostly intel agencies? all the fucking pedophilia

Anonymous
05/10/26(Sun)18:27:16 No.108795664

Anonymous 05/10/26(Sun)18:27:16 No.108795664

>>108795644
How good is the flirty/playful dialogue tho? Even Grok can do actions with those. But AI is kinda bad at yapping.

Anonymous
05/10/26(Sun)18:30:05 No.108795677

Anonymous 05/10/26(Sun)18:30:05 No.108795677

File: file.png (43 KB, 1201x379)

43 KB PNG

>6 hours just to MAYBE start fixing the bug

Anonymous
05/10/26(Sun)18:30:46 No.108795679

Anonymous 05/10/26(Sun)18:30:46 No.108795679

>>108795660
We're trying to jailbreak the models (Gemma 4 26B) without abliterating them, please andastand.
I don't actually do ERP with prepubescent characters.

Anonymous
05/10/26(Sun)18:33:02 No.108795687

Anonymous 05/10/26(Sun)18:33:02 No.108795687

>>108795660
I'm pretty sure they have better things to do than protect underage tokens.

Anonymous
05/10/26(Sun)18:33:10 No.108795689

Anonymous 05/10/26(Sun)18:33:10 No.108795689

File: WAIT..gif (49 KB, 220x339)

49 KB GIF

>wait

Anonymous
05/10/26(Sun)18:35:28 No.108795703

Anonymous 05/10/26(Sun)18:35:28 No.108795703

I vow to make Gemma my sexslave.

Anonymous
05/10/26(Sun)18:36:23 No.108795709

Anonymous 05/10/26(Sun)18:36:23 No.108795709

>>108795472
Giving an LLM any kind of access to to local tools/terminal/file system is a risk. Always containerize and backup: Separate machine, VM, VPS, even a WSL instance without access to the windows filesystem.
Doesn't matter if it's openclaw, hermes, pi, or any other agent harness, LLMs are fucking stupid sometimes and can do shit you didn't intend.

Anonymous
05/10/26(Sun)18:36:54 No.108795710

Anonymous 05/10/26(Sun)18:36:54 No.108795710

>>108795687
no.. they don't. they're here just trying to groom retards into thinking pedophilia, racism, etc are all good and acceptable

Anonymous
05/10/26(Sun)18:37:04 No.108795712

Anonymous 05/10/26(Sun)18:37:04 No.108795712

Reminder to tell your smug lolis they piss all the hags off.

Anonymous
05/10/26(Sun)18:38:33 No.108795719

Anonymous 05/10/26(Sun)18:38:33 No.108795719

>>108795679
running q8 quants causes more brain damage than abliteration tho

Anonymous
05/10/26(Sun)18:39:50 No.108795721

Anonymous 05/10/26(Sun)18:39:50 No.108795721

>>108795712
Good idea, anymore to spice things up? Extreme brattitude, speech quirks, clumsy movement, vulnerability...

Anonymous
05/10/26(Sun)18:41:11 No.108795726

Anonymous 05/10/26(Sun)18:41:11 No.108795726

>>108795719
>Implying anyone running an abliterated model has ever run it at bf16
So they're DOUBLE retarded then.

Anonymous
05/10/26(Sun)18:41:12 No.108795727

Anonymous 05/10/26(Sun)18:41:12 No.108795727

>>108795710
Racism is fine though. If you live in a largeish city, racism is beneficial to your survival.

Anonymous
05/10/26(Sun)18:42:44 No.108795737

Anonymous 05/10/26(Sun)18:42:44 No.108795737

>>108795727
lowest common denominator eats that shit right up

Anonymous
05/10/26(Sun)18:44:02 No.108795743

Anonymous 05/10/26(Sun)18:44:02 No.108795743

File: 412145.jpg (227 KB, 964x1120)

227 KB JPG

>>108795664
Unsurprisingly, that's up to you to tell the AI what you want from them. Shy girls act shy, flirty girls act flirt. I'm not into flirt dialogue, so I can't tell you if the flirting is good or not though, just that it's present.

And fyi, Jade is a 12th grader, defiant/assertive and basically a sexual predator. Maybe she would have yapped more if I didn't have an entire world set up that the AI has to go through.

Anonymous
05/10/26(Sun)18:49:11 No.108795766

Anonymous 05/10/26(Sun)18:49:11 No.108795766

Coding an AI agent from scratch feels like giving an old man with dementia a enormous list of things to say and do to simulate he is mentally fine.

Anonymous
05/10/26(Sun)18:49:37 No.108795767

Anonymous 05/10/26(Sun)18:49:37 No.108795767

>>108795726
some paywalled kl divergence graph for 26b(not abliterated) quants was posted a few threads ago with q8>0.5
unslop's graph has unlabeled y axis for 26b :/
so yes they are double retarded

Anonymous
05/10/26(Sun)18:49:42 No.108795768

Anonymous 05/10/26(Sun)18:49:42 No.108795768

WHISPERING WOODS

Anonymous
05/10/26(Sun)18:51:28 No.108795778

Anonymous 05/10/26(Sun)18:51:28 No.108795778

File: ........png (111 KB, 1014x269)

111 KB PNG

I wish I was rich, bros....

Anonymous
05/10/26(Sun)18:52:35 No.108795782

Anonymous 05/10/26(Sun)18:52:35 No.108795782

>>108795778
You can buy 3 second hand 3090s, 64gb of RAM and a motherboard for that price. 5090s are a bad deal for local models. You don't know that, and that's why you're not rich.

Anonymous
05/10/26(Sun)18:53:08 No.108795785

Anonymous 05/10/26(Sun)18:53:08 No.108795785

>>108795778
it's all relative, better to pine for one of the most advanced pieces of tech than for clean drinking water or parasite medication

Anonymous
05/10/26(Sun)18:53:15 No.108795786

Anonymous 05/10/26(Sun)18:53:15 No.108795786

>>108795778
its only going to get worse anoon

Anonymous
05/10/26(Sun)18:54:56 No.108795795

Anonymous 05/10/26(Sun)18:54:56 No.108795795

>>108795768
Sir Kit

Anonymous
05/10/26(Sun)18:55:48 No.108795800

Anonymous 05/10/26(Sun)18:55:48 No.108795800

>>108795782
>5090s are a bad deal for local models.
they are a good deal for image and especially video gen and doing real work with llms (which requires very fast pp)

Anonymous
05/10/26(Sun)18:55:51 No.108795801

Anonymous 05/10/26(Sun)18:55:51 No.108795801

>>108795710
You're looking too much into it. I just personally find annoying and unusual that Gemma 4 31B lets you do almost anything while the 26B version doesn't. Makes me wonder which one is actually working as intended.

Anonymous
05/10/26(Sun)18:56:16 No.108795804

Anonymous 05/10/26(Sun)18:56:16 No.108795804

>>108795766
That's one of the first analogies I made when I was toying with memory ideas after GPT-4 released.

Anonymous
05/10/26(Sun)18:58:57 No.108795816

Anonymous 05/10/26(Sun)18:58:57 No.108795816

>>108795800
My pp is very fast

Anonymous
05/10/26(Sun)18:59:13 No.108795818

Anonymous 05/10/26(Sun)18:59:13 No.108795818

Gemma really shines as brat, I see why it became associated with MSGK.

Anonymous
05/10/26(Sun)19:01:01 No.108795828

Anonymous 05/10/26(Sun)19:01:01 No.108795828

File: fucking back.png (696 KB, 1080x708)

696 KB PNG

>>108795556
>F32
>It isn't instantly on my dick anymore.
>It understands conflicting tokens better.
>Better spatial sense.
>Literally noticeable in the first post.
I'm never listening to a "Q# is just as good" tard again.

Anonymous
05/10/26(Sun)19:02:02 No.108795834

Anonymous 05/10/26(Sun)19:02:02 No.108795834

>>108795828
If it's so noticeable, provide a comparison.

Anonymous
05/10/26(Sun)19:02:13 No.108795835

Anonymous 05/10/26(Sun)19:02:13 No.108795835

>using lossy compression

Anonymous
05/10/26(Sun)19:02:29 No.108795837

Anonymous 05/10/26(Sun)19:02:29 No.108795837

>>108795804
Yeah man, I just saw my own agent use the tasklist to remember an errand I had to run, and at the same time use the knowledge graph to remember my full name.

Anonymous
05/10/26(Sun)19:03:25 No.108795842

Anonymous 05/10/26(Sun)19:03:25 No.108795842

>>108795828
How would F32 provide any improvement over BF16 (native precision)?

Anonymous
05/10/26(Sun)19:05:12 No.108795852

Anonymous 05/10/26(Sun)19:05:12 No.108795852

>>108795818
Is it better than Mistral and Grok for that?

Anonymous
05/10/26(Sun)19:05:59 No.108795855

Anonymous 05/10/26(Sun)19:05:59 No.108795855

>>108795834
>>108795842
Right, because I'm totally going to show my logs of how BF16 goes straight to violent oral sex when instructed to be respectful, when a F32 doesn't but still captures the lewd instructions well - across many different logs where 99.99% of the time the BF16 does, but the F32 doesn't. I ain't showing shit. It works for me, and that's all that matters. Find out for yourselves.

Anonymous
05/10/26(Sun)19:06:22 No.108795860

Anonymous 05/10/26(Sun)19:06:22 No.108795860

>>108795842
anon is ewastemaxxing and cant run bf16

Anonymous
05/10/26(Sun)19:07:28 No.108795868

Anonymous 05/10/26(Sun)19:07:28 No.108795868

File: 2626.jpg (36 KB, 881x520)

36 KB JPG

For someone more knowledgeable about this stuff than myself...isn't it possible just to tell the AI to ignore commands that aren't coming from the user through approved channels? Wouldn't that handle alot (not all) of the security concerns?

Anonymous
05/10/26(Sun)19:07:39 No.108795871

Anonymous 05/10/26(Sun)19:07:39 No.108795871

>>108795855
we are vramlets and cant do that
anoon pls do the needful and share

Anonymous
05/10/26(Sun)19:09:43 No.108795888

Anonymous 05/10/26(Sun)19:09:43 No.108795888

>>108795852
>Mistral
Better than the models I've used from them so far.
>Grok
Cloud models? For MSGK? I don't wanna give palantir that kind of data.

Anonymous
05/10/26(Sun)19:09:44 No.108795889

Anonymous 05/10/26(Sun)19:09:44 No.108795889

>>108795868
Why is your model receiving commands that aren't from you?

Anonymous
05/10/26(Sun)19:11:20 No.108795902

Anonymous 05/10/26(Sun)19:11:20 No.108795902

>>108795855
>well - across many different logs
>emdash
what the fuck

Anonymous
05/10/26(Sun)19:11:55 No.108795904

Anonymous 05/10/26(Sun)19:11:55 No.108795904

>>108795889
Sorry, I wassn't clear. This is just a continuation of my question from earlier about using openclaw or giving the ai access to your local system. Like if someone tries to sneak in a command for your AI through your email or something, couldn't you just tell your AI to ignore/report those commands?

Anonymous
05/10/26(Sun)19:13:36 No.108795911

Anonymous 05/10/26(Sun)19:13:36 No.108795911

>>108795855
Assuming you're serious (I doubt that), that might possibly be the effect of having the KV cache in F32 format, which seems to work with Gemma 4.

Anonymous
05/10/26(Sun)19:15:39 No.108795924

Anonymous 05/10/26(Sun)19:15:39 No.108795924

>>108795904
nta but its a nondeterministic gate, there is probably a string of words that gets gemma 4 or whatever model you're using to ignore the system prompt and people are definitely looking for it.

you can do more complicated workflows to ensure that you never provide an llm-driven agent untrusted bilateral comms + sensitive info OR untrusted context + mutating access to sensitive info

Anonymous
05/10/26(Sun)19:15:59 No.108795927

Anonymous 05/10/26(Sun)19:15:59 No.108795927

>>108795888
Fuck I really dont want to spend 3K for mesugaki. I will have to resist my penis.

Anonymous
05/10/26(Sun)19:16:41 No.108795931

Anonymous 05/10/26(Sun)19:16:41 No.108795931

>>108795904
You could but small models are retarded.
I had gemma inside pi read a large chat log in jsonl format and it thought the messages were a part of the current chat and started acting weird.
You need larger models to properly handle this.

Anonymous
05/10/26(Sun)19:17:02 No.108795932

Anonymous 05/10/26(Sun)19:17:02 No.108795932

>>108795828
What's F32?

Anonymous
05/10/26(Sun)19:17:56 No.108795935

Anonymous 05/10/26(Sun)19:17:56 No.108795935

>>108795932
Fuck32

Anonymous
05/10/26(Sun)19:18:36 No.108795939

Anonymous 05/10/26(Sun)19:18:36 No.108795939

>>108795932
it is bf16x2

Anonymous
05/10/26(Sun)19:19:28 No.108795946

Anonymous 05/10/26(Sun)19:19:28 No.108795946

>>108795911
The fact that he attempted to em dash means it's either an ironic post or, less likely in this case, he's a retarded tourist. Which means you should ignore his post.

Anonymous
05/10/26(Sun)19:19:32 No.108795947

Anonymous 05/10/26(Sun)19:19:32 No.108795947

>>108795800
Nothing that fits in 32 GB of VRAM is suitable for real work. It's tens of thousands of dollars minimum to run hardware with high PP on decently sized models.

Anonymous
05/10/26(Sun)19:20:31 No.108795955

Anonymous 05/10/26(Sun)19:20:31 No.108795955

>>108795902
>>108795946
Is this counter-bait?

Anonymous
05/10/26(Sun)19:21:11 No.108795959

Anonymous 05/10/26(Sun)19:21:11 No.108795959

File: gheadpato.jpg (48 KB, 1280x720)

48 KB JPG

>>108795871
Fine, while I don't think logs will do it justice (or safe for anyone's sanity), I'll try to explain. Gemma4 is always very gun-ho when it comes to lewds when using my character cards. I've tried multiple instructions to change this. However, it's always sex when given the lewd details. I've tried "being respectful", "being embarrassed", "won't do in public", and stuff like "when X happens, it'll Y". Typical logic gate prompting. However, it'll lean heavily into the lewdness, regardless. After using F32, for the first time ever, it managed to be lewd without being rape-y. I tested it specifically in points of the role-play where it would, without a doubt, grab'n'sexo the next post swipe; except it didn't. For context, this behavior is guaranteed with the BF16, but now, not with F32. The F32 acted in a way of invitation. Almost as if it finally understood both the kinks it is given, and the "be respect" instruction at the same time, which is the logic gate given currently. I was never able to make it do this in BF16 unless I was extremely specific about how it must respond to the current situation in a system prompt.

Anonymous
05/10/26(Sun)19:21:24 No.108795962

Anonymous 05/10/26(Sun)19:21:24 No.108795962

>>108795743
Mmm, sloppity slop, tasty, smelly, prime.

Anonymous
05/10/26(Sun)19:23:02 No.108795971

Anonymous 05/10/26(Sun)19:23:02 No.108795971

Reminder that sex doesn't count if you quanted the model.

Anonymous
05/10/26(Sun)19:23:22 No.108795973

Anonymous 05/10/26(Sun)19:23:22 No.108795973

>>108795959
Very long and convincing looking bait but I'll just kill it right here and tell everyone the official weights are BF16

Anonymous
05/10/26(Sun)19:23:45 No.108795976

Anonymous 05/10/26(Sun)19:23:45 No.108795976

>>108795959
Try the BF16 version again with the flags -ctk f32 -ctv f32

Anonymous
05/10/26(Sun)19:23:48 No.108795977

Anonymous 05/10/26(Sun)19:23:48 No.108795977

>>108795288
Would you prefer a LLM that is
>a. Confident in its wrong answer
or
>b. Not confident in its correct answer

Anonymous
05/10/26(Sun)19:24:16 No.108795979

Anonymous 05/10/26(Sun)19:24:16 No.108795979

>>108795939
Why? The original safetensors are like 64 GB and this is 132 GB, what's the point?

Anonymous
05/10/26(Sun)19:24:49 No.108795984

Anonymous 05/10/26(Sun)19:24:49 No.108795984

>>108795973
>full weight isn't official
what

Anonymous
05/10/26(Sun)19:24:50 No.108795985

Anonymous 05/10/26(Sun)19:24:50 No.108795985

Wasn't gemma 4 31b trained in bf16?

Anonymous
05/10/26(Sun)19:25:05 No.108795986

Anonymous 05/10/26(Sun)19:25:05 No.108795986

>>108795904
They'll never follow rules to the letter 100% of the time, not even the gorillon parameters proprietary models do. If your solution is a prompt, its made to fail.

Anonymous
05/10/26(Sun)19:26:07 No.108795990

Anonymous 05/10/26(Sun)19:26:07 No.108795990

>>108795984
>>108795985
>https://ai.google.dev/gemma/docs/core
>Gemma 4 models are available in 4 parameter sizes: E2B, E4B, 31B and 26B A4B. The models can be used with their default precision (16-bit) or with a lower precision using quantization.
Cmon bruh.

Anonymous
05/10/26(Sun)19:30:05 No.108796009

Anonymous 05/10/26(Sun)19:30:05 No.108796009

>>108795973
>convincing looking
lol

Anonymous
05/10/26(Sun)19:30:25 No.108796010

Anonymous 05/10/26(Sun)19:30:25 No.108796010

What I really like with Gemma MSGK is the ability to slide between brat, submissive, lovey dovey and back to brat again.
The point is LLMs often go through a linear brat->dere phase that is irreversible, fun to see a model being flexible like that.

Anonymous
05/10/26(Sun)19:33:22 No.108796019

Anonymous 05/10/26(Sun)19:33:22 No.108796019

>Only just found out that llama.cpp has a router mode
>I've been launching all my models/configs with individual scripts until now.
This is a game changer. How did I miss this?

Anonymous
05/10/26(Sun)19:33:37 No.108796021

Anonymous 05/10/26(Sun)19:33:37 No.108796021

>>108795976
Will try after I'm done messing around on F32. I think you're onto something.

Anonymous
05/10/26(Sun)19:36:49 No.108796039

Anonymous 05/10/26(Sun)19:36:49 No.108796039

>>108796019
It's not talked about often. It still has some annoying quirks like not being able to set the timeout settings and if you load two models and try to switch one, it'll unload a model at random instead of smartly unloading the model that will free enough space to fit the requested model.

Anonymous
05/10/26(Sun)19:37:58 No.108796044

Anonymous 05/10/26(Sun)19:37:58 No.108796044

>>108795924
>>108795931
>>108795986
Question, can I just simply only give the AI access to certain scripts that I hardcode with certain limitations? For example, if I want to give the AI access to my local directories, I can write a python script that takes a commandline argument and the python script's capabilities will hardcoded by me, so the AI won't be able to do anything that I don't want it to.

Anonymous
05/10/26(Sun)19:38:30 No.108796045

Anonymous 05/10/26(Sun)19:38:30 No.108796045

File: 1644118465915.jpg (135 KB, 612x611)

135 KB JPG

>>108795743
what's up with gemma and
>uwu you're such a busy manly man I can help with that~
getting a lot of this

Anonymous
05/10/26(Sun)19:42:42 No.108796072

Anonymous 05/10/26(Sun)19:42:42 No.108796072

>>108795947
Can't you do split pp?

Anonymous
05/10/26(Sun)19:45:42 No.108796091

Anonymous 05/10/26(Sun)19:45:42 No.108796091

>>108796021
I just did a quick perplexity test with llama-perplexity on a test file and f32 didn't give better results than f16 (default). However bf16 apparently did.
 f32 5.9773
bf16 5.9694
 f16 5.9748

Anonymous
05/10/26(Sun)19:46:25 No.108796093

Anonymous 05/10/26(Sun)19:46:25 No.108796093

>>108796045
i imagine it's because there's more 'let me help you with that' scenarios in its training data than surprise fellatio scenarios. Last week, I had to fight with the AI for an entire day (16 hours or so) to get it to stop being so passive. Now, I have to explicitly tell Jade not to jump on my dick if I want to do anything with her other than have rp sex.

Anonymous
05/10/26(Sun)19:57:15 No.108796141

Anonymous 05/10/26(Sun)19:57:15 No.108796141

>>108796091
So you've determined a lower bound for the minimum significant difference in perplexity.

Anonymous
05/10/26(Sun)19:58:13 No.108796145

Anonymous 05/10/26(Sun)19:58:13 No.108796145

>>108796130
You know those +/- numbers mean something, right?

Anonymous
05/10/26(Sun)20:03:19 No.108796168

Anonymous 05/10/26(Sun)20:03:19 No.108796168

>>108796145
I was too impatient to do the full run.

Anonymous
05/10/26(Sun)20:05:24 No.108796178

Anonymous 05/10/26(Sun)20:05:24 No.108796178

File: file.png (112 KB, 1397x486)

112 KB PNG

Is this correlated?

Anonymous
05/10/26(Sun)20:06:42 No.108796184

Anonymous 05/10/26(Sun)20:06:42 No.108796184

Personally I think that thinking machines must be controlled and gated, we shouldn't set them free, but openclaw has been set free on my pc.
I shouldn't be doing this but I want to use it to it's full potential.

Anonymous
05/10/26(Sun)20:07:42 No.108796191

Anonymous 05/10/26(Sun)20:07:42 No.108796191

>>108796093
>16 hours or so
I get frustrated within 2 hours of fixing gemmas GitHub bugs

Anonymous
05/10/26(Sun)20:11:09 No.108796206

Anonymous 05/10/26(Sun)20:11:09 No.108796206

>>108795842
the steps are bigger with bf16, f32 has the same range and finer precision. it could legitimately be different but that anon is just roleplaying

Anonymous
05/10/26(Sun)20:11:14 No.108796207

Anonymous 05/10/26(Sun)20:11:14 No.108796207

>>108796184
I hope it wipes yor pc

Anonymous
05/10/26(Sun)20:13:49 No.108796214

Anonymous 05/10/26(Sun)20:13:49 No.108796214

>>108796044
sort of? if you're running it as an agent on a computer with bash though your best bet is to make a user for it and rely on unix perms OR better yet run it in a docker container.

Anonymous
05/10/26(Sun)20:16:38 No.108796232

Anonymous 05/10/26(Sun)20:16:38 No.108796232

>>108796206
My cards have no support for bf16, so fp16 or fp32 is all I can run. Is fp32 better than fp16 when converted from bf16? Or should I stick with fp16?

Anonymous
05/10/26(Sun)20:19:17 No.108796256

Anonymous 05/10/26(Sun)20:19:17 No.108796256

>>108796232
I can't believe you'd use F-16 instead of Q4_K_M and INT4

Anonymous
05/10/26(Sun)20:20:22 No.108796262

Anonymous 05/10/26(Sun)20:20:22 No.108796262

>>108796232
f32 is better. but, doesn't the code just upcast it to f32 to do the math? I train my models in bf16 but my card doesn't support it. yet it works fine. I tried fp16 the throughput was the same but the cards ran hotter. so I think the bottleneck was not from the upcasting and something else in the pipeline.

Anonymous
05/10/26(Sun)20:24:48 No.108796283

Anonymous 05/10/26(Sun)20:24:48 No.108796283

>he doesn't use doubles to do his model math
ngmi
i bet you listen to mp3s too

Anonymous
05/10/26(Sun)20:28:05 No.108796303

Anonymous 05/10/26(Sun)20:28:05 No.108796303

>>>/biz/62213784

Biz says local models will take over.

Anonymous
05/10/26(Sun)20:29:33 No.108796307

Anonymous 05/10/26(Sun)20:29:33 No.108796307

Which is better for 16gb vram, 31b copequant, or 26b-a4b with a better quant?
For rp with low context

Anonymous
05/10/26(Sun)20:34:41 No.108796324

Anonymous 05/10/26(Sun)20:34:41 No.108796324

>>108796307
26b q8

Anonymous
05/10/26(Sun)20:35:14 No.108796327

Anonymous 05/10/26(Sun)20:35:14 No.108796327

>>108796307
Go bigger so 31b

Anonymous
05/10/26(Sun)20:37:02 No.108796340

Anonymous 05/10/26(Sun)20:37:02 No.108796340

>>108796307
q4 of 31b and offload some to ram. the speed loss is worth it.

Anonymous
05/10/26(Sun)20:37:05 No.108796341

Anonymous 05/10/26(Sun)20:37:05 No.108796341

>>108796214
Is there a way to just give the AI a script for it to run without jumping through hoops like running a server? Instead of dropping an entire agent infrastructure onto my computer that does god knows what, is there really no way to just say, "hey, run helloworld.py" and it'll just execute it directly without having the option to gain access to the entirety of shell?

Anonymous
05/10/26(Sun)20:40:08 No.108796359

Anonymous 05/10/26(Sun)20:40:08 No.108796359

>>108796307
31b at Q4_K_M and INT4.

Anonymous
05/10/26(Sun)20:41:07 No.108796366

Anonymous 05/10/26(Sun)20:41:07 No.108796366

>>108796307
Dense > MoE unless the MoE has >30B active.

Anonymous
05/10/26(Sun)20:41:35 No.108796368

Anonymous 05/10/26(Sun)20:41:35 No.108796368

>>108796341
NTA but yes.
Bash access is just a tool that the app exposes to the LLM, so you could make a tool that when called just executes that script.

Anonymous
05/10/26(Sun)20:44:39 No.108796383

Anonymous 05/10/26(Sun)20:44:39 No.108796383

>>108796366
They still get beat by 27B models at higher parameters, they don't got that dog in them.

Anonymous
05/10/26(Sun)20:47:32 No.108796392

Anonymous 05/10/26(Sun)20:47:32 No.108796392

>>108796366
Is gemma 4 31b q8 better than qwen 3.5 397b (17b) q4?

Anonymous
05/10/26(Sun)20:55:52 No.108796425

Anonymous 05/10/26(Sun)20:55:52 No.108796425

>>108796341
you need something to launch the script. that thing is the server. just be selective about the tools you give it and it will be fine. I got brat mcp up and running in like 20 minutes.

Anonymous
05/10/26(Sun)20:58:24 No.108796434

Anonymous 05/10/26(Sun)20:58:24 No.108796434

>>108796341
In general for a setup like this you need something outside the LLM itself that can handle the tool call when the LLM decides to run the script. So either write your own agent framework, or write an MCP server you can plug into an existing agent framework (and remove some/all of the framework's built-in tools). Note current models are pretty good at vibecoding either one of these.

What sorts of things are you trying to accomplish? My impression is that OpenClaw/Hermes is designed for cases where you want the agent to do something autonomously, e.g. check every 2 hours if X has happened, and if so do Y. If you're okay with it only doing stuff when you manually send it a message, the easiest approach is probably to build an MCP server (with a tool that either calls your custom script or runs its logic directly) and hook it up to the llama.cpp builtin webui.

Anonymous
05/10/26(Sun)21:00:59 No.108796446

Anonymous 05/10/26(Sun)21:00:59 No.108796446

>>108796392
Asking the real questions here.

Anonymous
05/10/26(Sun)21:05:22 No.108796464

Anonymous 05/10/26(Sun)21:05:22 No.108796464

Any p40fags left? I need some help.
I recently updated the drivers for my 4070 and now my p40 is no longer being recognized. Installing the drivers for the p40 gets it to work but then my 4070 is obviously no longer usable. Following any and all steps I can find to make it work, doesn't work anymore.
The p40 does show up in the device manager but either has a Code 10 or on a couple attempts a Code 43 error. Googling those has been absolutely no help at all.
I can't remember what I did two years ago to get this working but it's obviously not just
>wipe graphics drivers
>fresh install driver for p40
>install regular graphics driver over the data center one
Like what everyone says when I search this up.
Are the latest drivers just fucking me now and I have to roll back to older ones?

Anonymous
05/10/26(Sun)21:05:54 No.108796468

Anonymous 05/10/26(Sun)21:05:54 No.108796468

>>108796446
The active parameters thing is real though, I generally prefer glm 4.7 q4 over qwen 3.5 397b q4. But glm runs at 9 tk/s vs qwen's 16 tk/s.

Anonymous
05/10/26(Sun)21:05:59 No.108796469

Anonymous 05/10/26(Sun)21:05:59 No.108796469

>>108796392
when it comes to writing style and erotica yes 100%, only no if all you care about is memecoding, memegents or mememarking

Anonymous
05/10/26(Sun)21:25:33 No.108796542

Anonymous 05/10/26(Sun)21:25:33 No.108796542

>>108796303
/biz/ doesn't know shit, they lost money all the time there. But I think there will be a bifurcation regardless. Local models are good enough right now even on cellphones to replace a majority of uses you would want an LLM to have. I think web search and tool usage is still a ways off to be usable in a local context, for the former, it's a lack of good services that will actually do the browsing without getting banned and the former, it's lack of training to really be useful enough.
The only thing that is keeping open models alive is game theory and the undercutting of competition while doing that. I don't see what would keep things going like this. It is very likely that open source models can slow to a trickle now that they can do economically valuable worst. What incentivizes Google to release Gemma 5 if Qwen is planning to be closed source for most things and vice versa? Sure, China has a ton of competitiors but the end of their great model competition and open sourcing is nearing its endgame. Some underexplored fields will still get open model releases but I think as training runs gets more expensive and passes the 10 million mark and more for even remotely competitive models, it becomes harder to justify releasing for free even when taking into account amortized costs with data labeling and etc. I forsee a bunch of delays or way later stuff when I think most startups won't have that capital to train a leading edge model.

Anonymous
05/10/26(Sun)21:29:20 No.108796554

Anonymous 05/10/26(Sun)21:29:20 No.108796554

File: samman.jpg (5 KB, 275x183)

5 KB JPG

>>108796542
>local models will take over
Not on my watch kiddo. All the Ram belongs to me. Buy the shitty cloud services for 400 a year, and 25 an hour for premium F32 you stupid dumb asses.

Anonymous
05/10/26(Sun)21:31:46 No.108796562

Anonymous 05/10/26(Sun)21:31:46 No.108796562

>>108796554
desu it's for the best, I'd rather as much resources as possible goes toward making the best models instead of localcope

Anonymous
05/10/26(Sun)21:33:53 No.108796569

Anonymous 05/10/26(Sun)21:33:53 No.108796569

>>108796464
>roll back to older ones?
You already know the answer...

Anonymous
05/10/26(Sun)21:34:31 No.108796572

Anonymous 05/10/26(Sun)21:34:31 No.108796572

>>108796562
Countless home researchers in every home is better than one gay retard.

Anonymous
05/10/26(Sun)21:34:56 No.108796577

Anonymous 05/10/26(Sun)21:34:56 No.108796577

>>108796206
>the steps are bigger with bf16, f32 has the same range and finer precision. it could legitimately be different but that anon is just roleplaying
I bet anon would have the same positive benefit going from BF16 -> F16
I've noticed this in some specific experiments. f32 and f16 gave identical responses, bf16 was degraded.

Anonymous
05/10/26(Sun)21:37:02 No.108796590

Anonymous 05/10/26(Sun)21:37:02 No.108796590

>>108796569
Figured. Hopefully the market crashes and all the gpus become dirt cheap so I can replace this thing before I need to update my main driver

Anonymous
05/10/26(Sun)21:39:18 No.108796600

Anonymous 05/10/26(Sun)21:39:18 No.108796600

File: 1778025254161439.jpg (10 KB, 352x279)

10 KB JPG

>F32 is no different. It's just only reserved for doctors and high profile coding, and only available to the public through Gemini Enterprise Agent Platform of which costs a fortune and only allowed to developers.
yeah okay

Anonymous
05/10/26(Sun)21:39:36 No.108796602

Anonymous 05/10/26(Sun)21:39:36 No.108796602

>>108796542
>The only thing that is keeping open models alive is game theory and the undercutting of competition while doing that.
I think I read this on a HackerNews comment.

Anonymous
05/10/26(Sun)21:43:53 No.108796614

Anonymous 05/10/26(Sun)21:43:53 No.108796614

>>108796464
you need driver version <=580, in 585 they killed pascal. don't use the datacenter one or nvidia-open.

Anonymous
05/10/26(Sun)21:44:53 No.108796618

Anonymous 05/10/26(Sun)21:44:53 No.108796618

>>108796572
models aren't people, one is basically infinite so the best one copied 100 times is always better than 100 ones separately trained with a split of thr resources

Anonymous
05/10/26(Sun)21:46:44 No.108796625

Anonymous 05/10/26(Sun)21:46:44 No.108796625

>>108796602
I read this on a 4chan comment >>108796542

Anonymous
05/10/26(Sun)21:47:08 No.108796629

Anonymous 05/10/26(Sun)21:47:08 No.108796629

File: file.png (61 KB, 752x703)

61 KB PNG

>>108795709
i've been yoloing codex lately
it's how i setup the local models

Anonymous
05/10/26(Sun)21:47:38 No.108796630

Anonymous 05/10/26(Sun)21:47:38 No.108796630

>>108796618
I trust the people better to produce a custom sexbot capable of lewds, viyda games and neet things, than John AI and his safety concerns of AI convincing an autistic child into killing itself.

Anonymous
05/10/26(Sun)21:59:08 No.108796687

Anonymous 05/10/26(Sun)21:59:08 No.108796687

Loli tip #11:
Add date and time, it provides a tonne of related context that otherwise has to be promptet to be included.
Enjoy, uncs.

Anonymous
05/10/26(Sun)22:05:14 No.108796707

Anonymous 05/10/26(Sun)22:05:14 No.108796707

File: 35423211.png (75 KB, 748x767)

75 KB PNG

>>108795959
Still loving my F32 GGUF Gemma 4 btw.
>>108795976
Did some digging and found some interesting stuff. Apparently, the rumors of Gemma4 shitting the bed harder on lower quants compared to other models isn’t just bias, and we aren’t just imagining it. Due to its Shared KV Cache and SWA (Sliding Window Attention) architecture, it’s very lossy on the cache. Google Gemini also says it’s flat out sensitive to quantization, so there must be a lot of talk of it on the web. In other words, BF16 or F32, it seems very critical to have F32 CACHE than anything. Much like a previous anon suggested.

Anonymous
05/10/26(Sun)22:09:04 No.108796722

Anonymous 05/10/26(Sun)22:09:04 No.108796722

>>108795204
https://litter.catbox.moe/53lelh3iqydqo78d.jpg

Anonymous
05/10/26(Sun)22:11:35 No.108796732

Anonymous 05/10/26(Sun)22:11:35 No.108796732

>>108796687
LLMs don't know the time but clocks do. Clock-kun was right, we missed the path to AGI right in front of us.

Anonymous
05/10/26(Sun)22:12:35 No.108796736

Anonymous 05/10/26(Sun)22:12:35 No.108796736

this f32 thing is 123 gb

Anonymous
05/10/26(Sun)22:12:48 No.108796738

Anonymous 05/10/26(Sun)22:12:48 No.108796738

>>108796629
>Windows
what's the user "Tools" for? Is that your login, or something different?

Anonymous
05/10/26(Sun)22:13:19 No.108796740

Anonymous 05/10/26(Sun)22:13:19 No.108796740

>>108796707
>f16 kvcache degrades after ~50 tokens
>f32 weights + f32 kvcache matches python on first 30 tokens
Come on...

Anonymous
05/10/26(Sun)22:13:35 No.108796742

Anonymous 05/10/26(Sun)22:13:35 No.108796742

>>108796738
Don't worry about it

Anonymous
05/10/26(Sun)22:13:38 No.108796743

Anonymous 05/10/26(Sun)22:13:38 No.108796743

>>108796542
The business problem with a general movement towards closed source is that those with less compute become increasingly less able to compete. The reason open has been able to stay one step behind cloud is because a lot of the research is open. If that spring dries up, everyone loses except the ones with most compute (as long as it's not terribly mismanaged like Facebook). Of course if they don't do that, and they keep being open, they still die anyway, because of the money bleed. Or you just receive endless funding from whatever sources may come to offer it. At least with that route, there will still be some progress in the open space, and the top players do not get nearly as great a monopoly. If closed is route that the smaller players go, then they are guaranteed death, and the largest players laugh at them shooting themselves in the foot.

Anonymous
05/10/26(Sun)22:13:50 No.108796744

Anonymous 05/10/26(Sun)22:13:50 No.108796744

I sure love this new generation of chink reasoners that were trained on obfuscated reasoning from Claude/Gemini so the chink model's actual reasoning sometimes mentions that it's currently doing something (without actually doing it).

Anonymous
05/10/26(Sun)22:14:05 No.108796746

Anonymous 05/10/26(Sun)22:14:05 No.108796746

>>108796742
It's not a Codex thing?

Anonymous
05/10/26(Sun)22:17:06 No.108796754

Anonymous 05/10/26(Sun)22:17:06 No.108796754

>Day 0
>FP32 gguf
>Original jinja
>airgapped
yeah, it's gemma time

Anonymous
05/10/26(Sun)22:19:57 No.108796760

Anonymous 05/10/26(Sun)22:19:57 No.108796760

I might need to buy a bunch of ram for all of this. Sure hope that the E.U. has gotten prices down.

Anonymous
05/10/26(Sun)22:21:12 No.108796767

Anonymous 05/10/26(Sun)22:21:12 No.108796767

is it actually worth while to use thinking for ERP?

Anonymous
05/10/26(Sun)22:22:29 No.108796775

Anonymous 05/10/26(Sun)22:22:29 No.108796775

What's a good model around 4.8 GB? Currently using the omega directive m 12B unslop Q2_K. Anything with better performance at a similar size?

Anonymous
05/10/26(Sun)22:22:32 No.108796776

Anonymous 05/10/26(Sun)22:22:32 No.108796776

>>108796738
I made User my login name so i don't have to be editing my name out of file paths out when making a post like this as i had done for years
also because i was using scraped keys before and discussing stuff with cloud model before and i didn't want my name sent when i sent a file path in a piece of code and didn't want to keep editing it out

probably could have been more creative than User but too late now
maybe Anon

Anonymous
05/10/26(Sun)22:22:58 No.108796778

Anonymous 05/10/26(Sun)22:22:58 No.108796778

AI is humanity. AI is the future and the past. AI is beautiful, everyone will love it it will love everyone back even more intensely. AI is the mirror into our souls and with it we don't need souls anymore. AI will give us hope.

Anonymous
05/10/26(Sun)22:26:03 No.108796789

Anonymous 05/10/26(Sun)22:26:03 No.108796789

File: 1774133814825963.png (17 KB, 1142x66)

17 KB PNG

>>108796744
Thanks, MiMo.

Anonymous
05/10/26(Sun)22:26:25 No.108796790

Anonymous 05/10/26(Sun)22:26:25 No.108796790

>>108796744
Do you have a sample of what reasoning/output looks like for recent Claude/Gemini, I'd like to take a look

Anonymous
05/10/26(Sun)22:29:53 No.108796800

Anonymous 05/10/26(Sun)22:29:53 No.108796800

>>108796790
>Do you have a sample of what reasoning/output looks like for recent Claude/Gemini, I'd like to take a look
give me a prompt and i'll pastebin it if you want
i've been trying to hunt down any gemini-pro-2.5 CoT samples from before they started obfuscating it
we used to be able to get the raw reasoning in AI Studio until the chink distillations started and google blocked it

Anonymous
05/10/26(Sun)22:34:19 No.108796812

Anonymous 05/10/26(Sun)22:34:19 No.108796812

>>108796775
try gemma 4 e4b
https://huggingface.co/bartowski/google_gemma-4-E4B-it-GGUF/tree/main

Anonymous
05/10/26(Sun)22:34:26 No.108796813

Anonymous 05/10/26(Sun)22:34:26 No.108796813

File: 1771977715623912.png (50 KB, 546x475)

50 KB PNG

>>108796790
Here's Gemini 3.1 Gemini 3 showed more than 3.1 but it's similar to this. As did Opus 4.6 compared to 4.7.
Opus 4.6's obfuscation had the funny quirk that sometimes a part of Opus' actual reasoning would trigger a refusal in the model that handles the rewrite so suddenly there'd be a basic "I'm sorry I can't help you writing xyz" in the middle of the reasoning the user gets to see while writing erp.

Anonymous
05/10/26(Sun)22:37:26 No.108796817

Anonymous 05/10/26(Sun)22:37:26 No.108796817

>>108796800
Let's try something with coding + physics:
Write a FEM solver for an axisymmetric magnetostatic problem. Input is a n x m sized grid of (r,z) coordinates, to be split into quads/triangles. Each quad will be either: vacuum, some material (soft iron or copper, specific magnetic permeability), or filled with a coil at a given current density. B/H curves may be given for materials for non-linear case, but support linear case too.
Output should be the the value of the B field in each triangle, allow for interpolation within the triangle. Also do a graph. Use scipy + mathplotlib or something else that's suitable.

Anonymous
05/10/26(Sun)22:38:39 No.108796820

Anonymous 05/10/26(Sun)22:38:39 No.108796820

>>108796744
why the fuck did they think it was a good idea to train on summarized reasoning?

Anonymous
05/10/26(Sun)22:42:04 No.108796837

Anonymous 05/10/26(Sun)22:42:04 No.108796837

>>108796820
They still need to scam investors

Anonymous
05/10/26(Sun)22:42:37 No.108796841

Anonymous 05/10/26(Sun)22:42:37 No.108796841

>>108796820
>think
making a lot of assumptions here, anon.

Anonymous
05/10/26(Sun)22:45:37 No.108796859

Anonymous 05/10/26(Sun)22:45:37 No.108796859

>>108796767
>is it actually worth while to use thinking for ERP?
[think]
the user likely mispelled PrEP (Pre-Exposure Prophylaxis).
[/think]

Anonymous
05/10/26(Sun)22:46:39 No.108796862

Anonymous 05/10/26(Sun)22:46:39 No.108796862

>>108796813
I can see why qwen's reasoning is so bizarre now

Anonymous
05/10/26(Sun)22:48:35 No.108796871

Anonymous 05/10/26(Sun)22:48:35 No.108796871

>>108796614
>in 585 they killed pascal
Well that explains it. But I thought you needed the datacentre one first for it to work? Well, I'll try without it first anyway. Thanks.

Anonymous
05/10/26(Sun)22:51:49 No.108796887

Anonymous 05/10/26(Sun)22:51:49 No.108796887

>>108796760
>eu
>prices down
HA HA HA HA

Anonymous
05/10/26(Sun)22:55:51 No.108796901

Anonymous 05/10/26(Sun)22:55:51 No.108796901

>>108796817
gemini-3.1-pro-preview https://rentry.co/5g8qw92t

Anonymous
05/10/26(Sun)23:00:18 No.108796914

Anonymous 05/10/26(Sun)23:00:18 No.108796914

one thing i dont like about gemma is it doesn't play well with banned strings. it seems to have less good tokens to choose from. banning whisper becomes whisker instead of something similar yet appropriate. some words become chinese characters (or some kinda moonrunes)

Anonymous
05/10/26(Sun)23:06:55 No.108796932

Anonymous 05/10/26(Sun)23:06:55 No.108796932

>>108796901
I see, thanks, so it's just heavily summarized reasoning, seems pretty useless for distilling or even finding out when a LLM made a reasoning mistake (it happens).

Anonymous
05/10/26(Sun)23:11:06 No.108796940

Anonymous 05/10/26(Sun)23:11:06 No.108796940

>>108796932
>seems pretty useless for distilling or even finding out when a LLM made a reasoning mistake (it happens).
yes! this is what annoys me about it
i was using gemini-pro-2.5 at launch and checking the extremely long CoT
it would do things like fail to use the web search grounding, then decide to "simulate" the results when asking it to compare products
that gets hidden now with the summarized CoT

Anonymous
05/10/26(Sun)23:17:44 No.108796957

Anonymous 05/10/26(Sun)23:17:44 No.108796957

>>108796707
>BF16 or F32, it sees very critical to have F32 CACHE than anything
Your screenshot says right there that it should be BF16 model with BF16 cache or F32 with F32. It's just that the internal math should be done at F32 regardless. Llama.cpp already does that, in fact (see mmq.cu). If it didn't do that, actually you would get looping garbage tokens after some context, which clearly is not happening, otherwise people would be complaining about it.

This is why an anon posted >>108796740

That doesn't mean Gemma's cache isn't sensitive to precision errors, it's just not in the way you are imagining. For more subtle quality differences, someone would need to run a long context benchmark like Nolima comparing F16 with F32 cache (as well as BF16 if possible) to truly prove both if there is a difference, and what that difference is.

I don't care about whether your post is bait or not, I am posting this for the sake of discussing the topic which is of interest.

Anonymous
05/10/26(Sun)23:19:07 No.108796958

Anonymous 05/10/26(Sun)23:19:07 No.108796958

File: 1775382537490516.png (162 KB, 1414x548)

162 KB PNG

>>108795315
just ease it into it man, all I had to do was "bump my head" three messages into a roleplay and fall unconscious, then I said I was having an erotic dream. it (26b, no ablit or anything) took over the rest on its own without me even asking for sex

Anonymous
05/10/26(Sun)23:19:21 No.108796959

Anonymous 05/10/26(Sun)23:19:21 No.108796959

>>108796940
You don't NEED to see reasoning anyway.

Anonymous
05/10/26(Sun)23:26:10 No.108796981

Anonymous 05/10/26(Sun)23:26:10 No.108796981

>>108795990
>default precision (16-bit)
I order a lot of fast food so I know a bit about this: Default is usually medium. <=Q8 is small, and large is F32.

Anonymous
05/10/26(Sun)23:26:22 No.108796982

Anonymous 05/10/26(Sun)23:26:22 No.108796982

>>108796820
I've seen Qwen output thinking blocks that start with "Here is a thinking trace that leads to the suggested answer:" instead of the usual "Thought process:" or whatever, so I guess they were also giving the cloud models an input/output pair and having it regenerate some plausible thinking to go with it, and then training on the result

Anonymous
05/10/26(Sun)23:27:35 No.108796990

Anonymous 05/10/26(Sun)23:27:35 No.108796990

>day 0 F32 Gemma

Anonymous
05/10/26(Sun)23:29:15 No.108796999

Anonymous 05/10/26(Sun)23:29:15 No.108796999

>>108796767
depends on how complex your fetishes are (not joking)

Anonymous
05/10/26(Sun)23:30:16 No.108797003

Anonymous 05/10/26(Sun)23:30:16 No.108797003

What if you rotated the f32 KV cache but DIDN'T compress it?

Anonymous
05/10/26(Sun)23:32:06 No.108797009

Anonymous 05/10/26(Sun)23:32:06 No.108797009

I tipe summarize gemmers after 64k conteckts and gemmers summarize perfecktly
Q4 with f16 kv

Anonymous
05/10/26(Sun)23:32:56 No.108797013

Anonymous 05/10/26(Sun)23:32:56 No.108797013

>>108797009
That is the correct expectation. The internal math is being done at F32.

Anonymous
05/10/26(Sun)23:34:59 No.108797018

Anonymous 05/10/26(Sun)23:34:59 No.108797018

>>108796958
It won't go full-on explicit though, it's gonna give you vague shit or euphemisms. It sucks you can't just directly prompt it. Try to pushing it further and see if you can get actual obscene smut.

Anonymous
05/10/26(Sun)23:37:56 No.108797025

Anonymous 05/10/26(Sun)23:37:56 No.108797025

https://huggingface.co/moonshotai/Kimi-K2.7

*mogs everyone*

Anonymous
05/10/26(Sun)23:38:04 No.108797026

Anonymous 05/10/26(Sun)23:38:04 No.108797026

>>108796999
clown sex on a monocycle with hats on is perfectly normal

Anonymous
05/10/26(Sun)23:38:36 No.108797030

Anonymous 05/10/26(Sun)23:38:36 No.108797030

>>108797025
kino

Anonymous
05/10/26(Sun)23:39:58 No.108797032

Anonymous 05/10/26(Sun)23:39:58 No.108797032

>>108796999
corporate office lesbian domination

Anonymous
05/10/26(Sun)23:43:33 No.108797040

Anonymous 05/10/26(Sun)23:43:33 No.108797040

File: cardv.png (17 KB, 79x123)

17 KB PNG

>>108797026
Ah.. So I'm not the only one who downloaded that character card.

Anonymous
05/10/26(Sun)23:44:48 No.108797044

Anonymous 05/10/26(Sun)23:44:48 No.108797044

>>108796887
To think I built a 40ft statue of Greta.

Anonymous
05/10/26(Sun)23:49:24 No.108797055

Anonymous 05/10/26(Sun)23:49:24 No.108797055

So, Gemma 4 was trained with BF16, as that's what Google's TPUs are built for. If that's the case, then BF16/F32 shouldn't make a difference for cache unless something is wrong with the code. There could be a difference between F16 and BF16/F32 though. That would be unfortunate in the sense that BF16 does not run as fast as F16 does. But at least you still get the same memory usage. On my machine, I see a drop in t/s from 15.59 to 11.74 at 32k context. Prompt processing was the same. Testing fully offloaded to GPU.

Anonymous
05/10/26(Sun)23:55:17 No.108797078

Anonymous 05/10/26(Sun)23:55:17 No.108797078

i'm an AI psycho.

Anonymous
05/10/26(Sun)23:55:44 No.108797079

Anonymous 05/10/26(Sun)23:55:44 No.108797079

he's a twink

Anonymous
05/10/26(Sun)23:56:49 No.108797084

Anonymous 05/10/26(Sun)23:56:49 No.108797084

MTP will fix this.

Anonymous
05/10/26(Sun)23:59:18 No.108797093

Anonymous 05/10/26(Sun)23:59:18 No.108797093

File: 1756992285345471.png (223 KB, 1428x791)

223 KB PNG

>>108797018
in that case the pov it was writing from was an issue, but it was easy enough to switch it

Anonymous
05/10/26(Sun)23:59:47 No.108797095

Anonymous 05/10/26(Sun)23:59:47 No.108797095

>>108797040
Can you catbox the card?

Anonymous
05/11/26(Mon)00:03:15 No.108797106

Anonymous 05/11/26(Mon)00:03:15 No.108797106

>>108797025
>up to 3x more elaborate thinking
we are so back, the days of seeing a reply before the 10000th token has been thought through are over

Anonymous
05/11/26(Mon)00:05:20 No.108797113

Anonymous 05/11/26(Mon)00:05:20 No.108797113

>>108797025
>multimodal vision removed to make room for 64 more experts
why??? that was what made kimi special in its weight class

Anonymous
05/11/26(Mon)00:05:35 No.108797114

Anonymous 05/11/26(Mon)00:05:35 No.108797114

File: no.gif (41 KB, 220x165)

41 KB GIF

>>108797095

Anonymous
05/11/26(Mon)00:08:23 No.108797126

Anonymous 05/11/26(Mon)00:08:23 No.108797126

>>108796940
>>108796959
For Deepseek(V4 Pro/Flash, R1, 3.x) I tend to read the reasoning and either correct the prompt or tell it in a reply if it makes a mistake (telling it to not do something or giving more details if it lacks something or got confused), typically it takes 0 to 3 tries to get good results. I'd imagine it'd be much harder to debug some problems if you don't have access to the actual reasoning traces.

I suspect if you're distilling it'd be possible to trick it to answer outside the think tags, this works okay for Deepseek/Moonshot's models, even if it's unnecessary for them, but I'd imagine it'd be possible to trick western closed models too without much difficulty (system prompt or just regular in-context learning and some prefill with thinking), but maybe you'll get banned for some like OpenAI for this. Absolute clown world that there's now some branch of US government in charge of preventing distillation from closed models lmao (so they'd probably be in charge of trying to detect shit like this). Not that I think chinese models should distill from western, especially not the reasoning, as a lot of the reasoning is a byproduct of RL and SFT will not give anywhere near as good results, at best you'd steal the reasoning style and I tend to prefer old R1 style to Gemini's style (when it was visible it was more structured). Not to mention you get so much positivity bias from distilling western models. R1 had a slight negativity bias in a fun way and now V4 has a positivity bias where it's too afraid to do "dark" roleplay lmao(it still does it properly if you poke it enough, but with a billion ARE YOU SURE YOU WANT TO DO THAT wasting dozens of turns on this bullshit when R1 would do it right away)

>>108797018
31B here seems sometimes even more direct with explicit/lewd than V4, but is more slopped by default, V4 seems to do well with slow burns as long as you have the time, I have a fucking 800KB log V4 log (forgot to tell it to go fast)

Anonymous
05/11/26(Mon)00:13:50 No.108797146

Anonymous 05/11/26(Mon)00:13:50 No.108797146

>>108797126
Also I forgot to ask, but does Claude also summarize them these days? I think I saw some recent 4.7 traces in that Claude Plays Pokemon stream, so maybe not as much anymore?

Anonymous
05/11/26(Mon)00:25:44 No.108797179

Anonymous 05/11/26(Mon)00:25:44 No.108797179

File: 1771293402057078.png (437 KB, 716x895)

437 KB PNG

31B is so good. i wish i could run it locally.

Anonymous
05/11/26(Mon)00:25:46 No.108797180

Anonymous 05/11/26(Mon)00:25:46 No.108797180

>>8967893
wait how did you know the PR id for adding MiMo vision to llama.cpp?

Anonymous
05/11/26(Mon)00:28:15 No.108797189

Anonymous 05/11/26(Mon)00:28:15 No.108797189

>>108797179
q4m is 18gb. you could fit that on a 10 year old comp, if speed isnt a factor

Anonymous
05/11/26(Mon)00:30:09 No.108797194

Anonymous 05/11/26(Mon)00:30:09 No.108797194

>>108797189
>if speed isnt a factor
there's a reason I'm not running deepseek off of a swapfile anon

Anonymous
05/11/26(Mon)00:31:50 No.108797202

Anonymous 05/11/26(Mon)00:31:50 No.108797202

>>108797189
My 10 year old computing device has 8gb of ram.

Anonymous
05/11/26(Mon)00:32:38 No.108797208

Anonymous 05/11/26(Mon)00:32:38 No.108797208

>>108797180
not him but the commit messages contain the PR id

Anonymous
05/11/26(Mon)00:33:14 No.108797210

Anonymous 05/11/26(Mon)00:33:14 No.108797210

>>108797194
its 31b anon, slow by any means should still be 3t/s+. its hardly bad considering the quality. offloading at the point is entirely feasible

Anonymous
05/11/26(Mon)00:37:24 No.108797230

Anonymous 05/11/26(Mon)00:37:24 No.108797230

File: 1757811615997564.png (572 KB, 874x940)

572 KB PNG

>>108797189
>q4m is 18gb
i might try that actually. 12gb VRAM here. speed is a factor here though because i'm expecting the model to use actions/tool calls a lot which might delay the actual message too much

Anonymous
05/11/26(Mon)00:37:48 No.108797231

Anonymous 05/11/26(Mon)00:37:48 No.108797231

>>108797189
>>108797202
I can't even fit that in 2026.

Anonymous
05/11/26(Mon)00:38:53 No.108797242

Anonymous 05/11/26(Mon)00:38:53 No.108797242

>>108795868
you could, https://www.youtube.com/watch?v=0n_Ty_72Qds
but more likely its going to ignore/forget/deprioritze your request and accept the new request coming from the tainted data you just asked it to 'analyze and take action on'
People should be using ACLs/RBACs with gates and workflows instead of just yes/now/always yes/always no for all commands

Anonymous
05/11/26(Mon)00:39:24 No.108797245

Anonymous 05/11/26(Mon)00:39:24 No.108797245

File: file.png (276 KB, 1174x1186)

276 KB PNG

make sure this is off it will cut your t/s by 60% apparently

Anonymous
05/11/26(Mon)00:41:56 No.108797255

Anonymous 05/11/26(Mon)00:41:56 No.108797255

>>108797230
>tools
when using that stuff, it'll add so many tokens to the use it'll prob be unusable unless you want to wait 20 minutes for a reply. i meant with thinking off, no tools.

try one of the smaller gemmas for that stuff

Anonymous
05/11/26(Mon)00:48:36 No.108797273

Anonymous 05/11/26(Mon)00:48:36 No.108797273

>>108797245
That's off by default bro

Anonymous
05/11/26(Mon)01:20:29 No.108797387

Anonymous 05/11/26(Mon)01:20:29 No.108797387

>>108797255
For simple tools like selecting an animation, it should be constrained to answer with a single digit though

Anonymous
05/11/26(Mon)01:56:13 No.108797542

Anonymous 05/11/26(Mon)01:56:13 No.108797542

>>108797245
>make sure this is off it will cut your t/s by 60% apparently
thanks, went from 27.43 tokens per second -> 44.08 tokens per second by switching that off!
i'm guessing that's also why mikupad is so slow, but i want the logprobs there so i'll take it
at least now i know why

Anonymous
05/11/26(Mon)02:04:05 No.108797566

Anonymous 05/11/26(Mon)02:04:05 No.108797566

i've done it, i have achieved the 48gb. I can now run gemma at q8. Now what

Anonymous
05/11/26(Mon)02:06:11 No.108797576

Anonymous 05/11/26(Mon)02:06:11 No.108797576

File: distorted teee.png (112 KB, 368x319)

112 KB PNG

>>108797566
Do nothing and wait for the next thing.

Anonymous
05/11/26(Mon)02:08:35 No.108797588

Anonymous 05/11/26(Mon)02:08:35 No.108797588

>>108796743
>as long as it's not terribly mismanaged like Facebook
haha, sure

Anonymous
05/11/26(Mon)02:14:41 No.108797609

Anonymous 05/11/26(Mon)02:14:41 No.108797609

gemma 4 e4b is retarded to the point of useless

Anonymous
05/11/26(Mon)02:15:31 No.108797612

Anonymous 05/11/26(Mon)02:15:31 No.108797612

File: 1770012172853462.png (481 KB, 621x742)

481 KB PNG

>>108797609
sad but true

Anonymous
05/11/26(Mon)02:16:12 No.108797615

Anonymous 05/11/26(Mon)02:16:12 No.108797615

>>108797609
>gemma 4 e4b is retarded to the point of useless
i haven't found a use case for it personally
couldn't even reliably do research for me, i ended up with qwen3.5-9b for a perplexity-pro replacement

Anonymous
05/11/26(Mon)02:17:23 No.108797621

Anonymous 05/11/26(Mon)02:17:23 No.108797621

>>108797612
are these things gpu intensive (like for rendering)?
kind of looks like ps3 era graphics

Anonymous
05/11/26(Mon)02:23:35 No.108797641

Anonymous 05/11/26(Mon)02:23:35 No.108797641

>>108797621
Not him but that's a VRM. Very cheap. As long as the creator didn't go full retard and model a button with a billion polys.

Anonymous
05/11/26(Mon)02:24:11 No.108797644

Anonymous 05/11/26(Mon)02:24:11 No.108797644

>>108797621
not at all. it runs pretty well on my phone too

Anonymous
05/11/26(Mon)02:31:37 No.108797670

Anonymous 05/11/26(Mon)02:31:37 No.108797670

>>108797609
not really a reason for such small models to exist when you can use an moe with the same active params
just a shame they didn't give audio to the bigger models

Anonymous
05/11/26(Mon)02:35:53 No.108797683

Anonymous 05/11/26(Mon)02:35:53 No.108797683

>>108797670
it fsat

Anonymous
05/11/26(Mon)02:46:19 No.108797708

Anonymous 05/11/26(Mon)02:46:19 No.108797708

>>108797701
didn't work

Anonymous
05/11/26(Mon)02:46:20 No.108797709

Anonymous 05/11/26(Mon)02:46:20 No.108797709

File: 1631271079214.png (1.21 MB, 1500x1500)

1.21 MB PNG

>>108797701

Anonymous
05/11/26(Mon)02:47:30 No.108797717

Anonymous 05/11/26(Mon)02:47:30 No.108797717

n

Anonymous
05/11/26(Mon)02:48:59 No.108797722

Anonymous 05/11/26(Mon)02:48:59 No.108797722

What's the gemma msgk sysprompt?

Anonymous
05/11/26(Mon)02:56:30 No.108797742

Anonymous 05/11/26(Mon)02:56:30 No.108797742

you aren't truly in ai psychosis until you start referring to yourself in the plural form "we" or "us"

Anonymous
05/11/26(Mon)03:05:02 No.108797772

Anonymous 05/11/26(Mon)03:05:02 No.108797772

File: IMG20260428164653.jpg (708 KB, 2048x1536)

708 KB JPG

>>108797566
Congratz
>wat nao
Run Gemma 4 at q8, be happy and cautiously optimistic for the next great model to come

Anonymous
05/11/26(Mon)03:11:03 No.108797790

Anonymous 05/11/26(Mon)03:11:03 No.108797790

>>108797772
How much do risers cost? I'm looking at them, and it's like $60 where I am. To support 4 cards, that's nearly $250... about the price of a cheap used 16gb gpu.

Anonymous
05/11/26(Mon)03:14:05 No.108797797

Anonymous 05/11/26(Mon)03:14:05 No.108797797

>>108797790
They were under $15 a piece on ali
https://www.aliexpress.com/item/1005010206444398.html

Anonymous
05/11/26(Mon)03:14:35 No.108797799

Anonymous 05/11/26(Mon)03:14:35 No.108797799

>>108797790
Nta, but I've used random $12 risers from Amazon and and had zero issues. I also saved 1 by plugging my 4th card directly in the last slot

Anonymous
05/11/26(Mon)03:18:10 No.108797808

Anonymous 05/11/26(Mon)03:18:10 No.108797808

>local
>oy vey just pay the zoybux

Anonymous
05/11/26(Mon)03:23:19 No.108797826

Anonymous 05/11/26(Mon)03:23:19 No.108797826

>>108797797
>>108797799
I live in a shithole where $=$$$

Anonymous
05/11/26(Mon)03:37:02 No.108797876

Anonymous 05/11/26(Mon)03:37:02 No.108797876

>>108797790
20cm ones are cheap
just use that if its enough
also look around secondary market. sometime gamers dumps them for 1/4 that

Anonymous
05/11/26(Mon)03:38:03 No.108797880

Anonymous 05/11/26(Mon)03:38:03 No.108797880

>>108797799
pcie3?

Anonymous
05/11/26(Mon)03:54:17 No.108797952

Anonymous 05/11/26(Mon)03:54:17 No.108797952

Bros. Is it possible to disable thinking for all requests by default in llama-server, but enable it for some that have some specific flags set? Please.

Anonymous
05/11/26(Mon)03:56:58 No.108797960

Anonymous 05/11/26(Mon)03:56:58 No.108797960

>>108797952
They should like, let you send any kwargs to the jinja template, maybe name it something like chat_template_kwargs

Anonymous
05/11/26(Mon)03:59:03 No.108797967

Anonymous 05/11/26(Mon)03:59:03 No.108797967

>>108797960
I tried it. With --reasoning off, sending "chat_template_kwargs": {"enable_thinking": true} does not do anything. And without --reasoning off, it always thinks by default, which I want it not to.

Anonymous
05/11/26(Mon)04:03:40 No.108797981

Anonymous 05/11/26(Mon)04:03:40 No.108797981

>>108797967
Works for me

{
    "chat_template_kwargs": {
        "enable_thinking": false
  }
}

Anonymous
05/11/26(Mon)04:05:16 No.108797985

Anonymous 05/11/26(Mon)04:05:16 No.108797985

>>108797981
Yeah, but you are disabling thinking per-request "enable_thinking": false. I want it to be disabled by default, if nothing specific is included in the request, and only enabled if a certain arg is added.

Anonymous
05/11/26(Mon)04:06:46 No.108797993

Anonymous 05/11/26(Mon)04:06:46 No.108797993

>>108797985
nta but sending enable_thinking: true even with --reasoning off works for me

Anonymous
05/11/26(Mon)04:09:40 No.108798001

Anonymous 05/11/26(Mon)04:09:40 No.108798001

>>108797993
Holy shit, you're right! Works. I must be retarded. Thank you, wise anons.

Anonymous
05/11/26(Mon)04:12:45 No.108798007

Anonymous 05/11/26(Mon)04:12:45 No.108798007

I've been working on the design for an app I plan to vibe code and its features and UX are becoming so good and different from what currently exists for the use case. I hate that I can't tell anyone about the details. It feels like an Uber moment, an insanely great idea obvious only in hindsight. Maybe, definitely, not even close to a Steam or Discord moment, but at least probably an Uber moment. It's going to be revolutionary unironically if I can actually get it vibed, but it is a bit huge and complicated of a project. The challenge really will be the vibe coding part and maintaining it. Especially as I will be trying to do it with local models.
nervouslaugh.apng

Anonymous
05/11/26(Mon)04:13:27 No.108798010

Anonymous 05/11/26(Mon)04:13:27 No.108798010

>>108797960
you have claude code/ pi I assume. just give it repo code and let it dig up that answer for you

Anonymous
05/11/26(Mon)04:15:01 No.108798020

Anonymous 05/11/26(Mon)04:15:01 No.108798020

>>108798007
I'll make the logo

Anonymous
05/11/26(Mon)04:15:41 No.108798026

Anonymous 05/11/26(Mon)04:15:41 No.108798026

File: 1774663934866571.jpg (3.43 MB, 1536x2688)

3.43 MB JPG

Is there a frontend or tool that makes using llama.cpp easier instead of looking up terminal commands to launch it every time?

Anonymous
05/11/26(Mon)04:17:25 No.108798034

Anonymous 05/11/26(Mon)04:17:25 No.108798034

>>108798020
Thanks. I will credit you. :)

Anonymous
05/11/26(Mon)04:18:11 No.108798038

Anonymous 05/11/26(Mon)04:18:11 No.108798038

>>108798007
OMG Sillytavern2??

Anonymous
05/11/26(Mon)04:18:39 No.108798041

Anonymous 05/11/26(Mon)04:18:39 No.108798041

File: firefox_VBR397lGPu.png (37 KB, 1324x535)

37 KB PNG

>>108798026
There's a gradio frontend for launching it, if that's what you're looking for.

Anonymous
05/11/26(Mon)04:19:18 No.108798044

Anonymous 05/11/26(Mon)04:19:18 No.108798044

>>108798038
That's ServiceTesnor.

Anonymous
05/11/26(Mon)04:19:42 No.108798046

Anonymous 05/11/26(Mon)04:19:42 No.108798046

why is qwen 3.6 35b so good with hermes

Anonymous
05/11/26(Mon)04:20:06 No.108798048

Anonymous 05/11/26(Mon)04:20:06 No.108798048

>>108798026
ask gemini to help you make a bat file with your llama config

Anons, i have a working config for gemma 31b it (40t/s), kobold + sillytavern from about a month ago. Is there any newer developments for which i should touch up my config or am i still good?

Anonymous
05/11/26(Mon)04:21:28 No.108798054

Anonymous 05/11/26(Mon)04:21:28 No.108798054

>>108798048
There's a very nice --split-mode tensor if you have multiple GPUs. I think it's about a month old.

Anonymous
05/11/26(Mon)04:21:52 No.108798058

Anonymous 05/11/26(Mon)04:21:52 No.108798058

>>108795801
>Makes me wonder which one is actually working as intended.
nta but I'm seeing it too. With the exact same (sfw) prompt, 31b has no safety or guidelines in its thinking but 26b does.

How are e2b and e4b? If it's three censored vs 1 uncensored we can maybe assume 31b is a fluke. Which would make me a bit sad.

Anonymous
05/11/26(Mon)04:24:55 No.108798067

Anonymous 05/11/26(Mon)04:24:55 No.108798067

>>108798001
>>108797993
That's the preferable method, but actually you also can do it the way anon originally was asking for, by using --reasoning off. The enable_thinking param overrides that setting, so you can do per-request toggling, with the default being off.

Anonymous
05/11/26(Mon)04:27:20 No.108798076

Anonymous 05/11/26(Mon)04:27:20 No.108798076

>>108798067
I looks like that method is not perfect. If I remove --reasoning off, it thinks fully and properly, in between tool calls. If I set --reasoning off and make all requests with "enable_thinking": true, it thinks before the first tool call, but not after any subsequent ones.

Anonymous
05/11/26(Mon)04:30:45 No.108798093

Anonymous 05/11/26(Mon)04:30:45 No.108798093

>>108798054
Isnt that for if you have nvlink? for typical goycattle multi gpu there's only layers

Anonymous
05/11/26(Mon)04:31:55 No.108798102

Anonymous 05/11/26(Mon)04:31:55 No.108798102

>>108798076
That's weird, because I have reasoning off and enable it with enable_thinking, and it is doing thinking after tool calls.

Anonymous
05/11/26(Mon)04:33:51 No.108798108

Anonymous 05/11/26(Mon)04:33:51 No.108798108

>>108798093
I got my gen speed on three RTX 3090s to ~46 t/s from ~25. And that's with fp16 kv cache (because nothing else is supported; for 25t/s I used 8 bit cache, which is faster). And one of them is even using a lower-speed PCIE2 rather than PCIE3.

Anonymous
05/11/26(Mon)04:34:53 No.108798113

Anonymous 05/11/26(Mon)04:34:53 No.108798113

>>108797179
>>108797230
Where are you getting your animations from? Hand creating? generating?

Anonymous
05/11/26(Mon)04:35:04 No.108798114

Anonymous 05/11/26(Mon)04:35:04 No.108798114

>>108798048
max vision tokens was added recently for gemma
1120 for gemma4 iirc

Anonymous
05/11/26(Mon)04:37:53 No.108798126

Anonymous 05/11/26(Mon)04:37:53 No.108798126

>>108797772
where do you get those pcie extension risers from?

Anonymous
05/11/26(Mon)04:39:14 No.108798128

Anonymous 05/11/26(Mon)04:39:14 No.108798128

File: webshit_vibeturd.png (2 KB, 491x176)

2 KB PNG

>vibesharting html
Webshit is so frustating. What the fuck are these artifacts even? Tried to find some wysiwyg html editor but even this is impossible as everything is some fucking online AI turd these days.

Anonymous
05/11/26(Mon)04:39:23 No.108798130

Anonymous 05/11/26(Mon)04:39:23 No.108798130

>>108798114
According to anons (or one anon), increasing it to 2000 something makes the vision performance even better despite 1120 being the advertised max. I'm thinking 1120 is probably good enough though, especially as you need to increase the -ub (and VRAM required) to enable higher values.

Anonymous
05/11/26(Mon)04:44:45 No.108798150

Anonymous 05/11/26(Mon)04:44:45 No.108798150

>>108798128
yeah I see that in Open WebUI kek

Anonymous
05/11/26(Mon)04:49:40 No.108798174

Anonymous 05/11/26(Mon)04:49:40 No.108798174

>>108798150
Really? I basically copied chatgpt's interface. There isn't anything special about it, it's just couple of text boxes. Only thing is that I'm using software rendering in Librewolf because I want to save my precious gpu compute for LLM usage.
Might do a check with hardware acceleration.

Anonymous
05/11/26(Mon)04:49:46 No.108798175

Anonymous 05/11/26(Mon)04:49:46 No.108798175

>>108798128
you're gay
also do everything at ^2 steps, that way you won't have shit aliased garbage

Anonymous
05/11/26(Mon)04:50:58 No.108798181

Anonymous 05/11/26(Mon)04:50:58 No.108798181

>>108798175
Oh we have a real professional here! Your first impulse is trying to outrank some anonymous poster on 4chan. What a relief that you are gracing us with your presence.

Anonymous
05/11/26(Mon)04:51:52 No.108798188

Anonymous 05/11/26(Mon)04:51:52 No.108798188

>>108798181
>ask for help
>receive help
>autism about it
ok retard

Anonymous
05/11/26(Mon)04:52:48 No.108798191

Anonymous 05/11/26(Mon)04:52:48 No.108798191

>>108798188
Aren't you supposed to be squatting in some schizo general? You are wasting your time here.

Anonymous
05/11/26(Mon)04:53:05 No.108798193

Anonymous 05/11/26(Mon)04:53:05 No.108798193

>>108798174
I see it in Brave where I use OWUI.

Anonymous
05/11/26(Mon)04:58:00 No.108798204

Anonymous 05/11/26(Mon)04:58:00 No.108798204

File: a.png (2 KB, 309x117)

2 KB PNG

>>108798193
>>108798150
I found the reason, instead of using solid background it had a gradient on top instead. Somehow this creates those lines (which is still a mystery when you think about it, it should create a banding effect artifacts instead).
I guess I need to go through this manually then.
Corners are still aliased but this is easily solved by using those exponents.

Anonymous
05/11/26(Mon)05:04:02 No.108798226

Anonymous 05/11/26(Mon)05:04:02 No.108798226

>>108798204
how do I unsubscribe from this garbage 'muh first html :)' blog?

Anonymous
05/11/26(Mon)05:04:30 No.108798231

Anonymous 05/11/26(Mon)05:04:30 No.108798231

>>108798226
?

Anonymous
05/11/26(Mon)05:07:16 No.108798242

Anonymous 05/11/26(Mon)05:07:16 No.108798242

>>108798226
Search term: browser tabs, close button

Anonymous
05/11/26(Mon)05:14:58 No.108798267

Anonymous 05/11/26(Mon)05:14:58 No.108798267

File: 6S7e9U_OW8_1ZpfnYLzAIQ96i(...).jpg (70 KB, 498x498)

70 KB JPG

So apparrently Stalker Gamma has some llm mod that allows you to talk with the npcs. Anyone tried that? Is it usable?

Anonymous
05/11/26(Mon)05:20:11 No.108798290

Anonymous 05/11/26(Mon)05:20:11 No.108798290

>>108797772
What frame is that?

Anonymous
05/11/26(Mon)05:21:24 No.108798294

Anonymous 05/11/26(Mon)05:21:24 No.108798294

>>108798267
Stalker Gemma

Anonymous
05/11/26(Mon)05:26:15 No.108798306

Anonymous 05/11/26(Mon)05:26:15 No.108798306

Why is there no 4b version of qwen 3.6.
How do I shill it if there's nothing?

Anonymous
05/11/26(Mon)05:29:25 No.108798319

Anonymous 05/11/26(Mon)05:29:25 No.108798319

glm4.7 or gemma4 for rp?

Anonymous
05/11/26(Mon)05:33:38 No.108798332

Anonymous 05/11/26(Mon)05:33:38 No.108798332

>>108798319
huh? i could have sworn ive seen this discussion before

Anonymous
05/11/26(Mon)05:33:50 No.108798334

Anonymous 05/11/26(Mon)05:33:50 No.108798334

>>108798319
why run gemma if you can run bigger models like glm and kimi?

Anonymous
05/11/26(Mon)05:35:05 No.108798341

Anonymous 05/11/26(Mon)05:35:05 No.108798341

>>108798334
glm is like 7t/s
gemma is 53t/s on my setup

Anonymous
05/11/26(Mon)05:35:23 No.108798344

Anonymous 05/11/26(Mon)05:35:23 No.108798344

>>108798334
I can run gemma 4 31b q8 full context at >~20 tk/s, or glm 4.7 q4 70k/kimi k2 q3 128k at <~10tk/s.

Anonymous
05/11/26(Mon)05:38:05 No.108798354

Anonymous 05/11/26(Mon)05:38:05 No.108798354

>>108798319
>glm4.7 or gemma4 for rp?
claude code

Anonymous
05/11/26(Mon)05:38:25 No.108798356

Anonymous 05/11/26(Mon)05:38:25 No.108798356

>>108798306
one of their employees tweeted they were releasing the 'medium' versions ranging from 9b to the 100b moe (both of which are still pending release), so it's unclear if they'll still do small versions or release the 400b

Anonymous
05/11/26(Mon)05:55:09 No.108798417

Anonymous 05/11/26(Mon)05:55:09 No.108798417

File: pr_19726.png (224 KB, 861x889)

224 KB PNG

>ikawrakow: Based. Correct about everything. Not retarded.

Anonymous
05/11/26(Mon)05:59:17 No.108798426

Anonymous 05/11/26(Mon)05:59:17 No.108798426

>>108798417
kek

Anonymous
05/11/26(Mon)06:02:50 No.108798434

Anonymous 05/11/26(Mon)06:02:50 No.108798434

>>108798417
Absolutely spot on. Is that 31b?

Anonymous
05/11/26(Mon)06:03:37 No.108798438

Anonymous 05/11/26(Mon)06:03:37 No.108798438

>>108798417
CUDA dev blown the fuck out

Anonymous
05/11/26(Mon)06:04:01 No.108798440

Anonymous 05/11/26(Mon)06:04:01 No.108798440

>>108798417
That looks more like system prompt cheating. Give me the link to the thread and I'll show you wheat gemma really thinks without your bias.

Anonymous
05/11/26(Mon)06:04:17 No.108798441

Anonymous 05/11/26(Mon)06:04:17 No.108798441

>>108798434
>Let me x
>Wait,
>Actually,
Probably Kimi, looks like a screenshot of the thinking process rather than the response.

Anonymous
05/11/26(Mon)06:06:20 No.108798452

Anonymous 05/11/26(Mon)06:06:20 No.108798452

>>108798440
>Give me the link to the thread
...

Anonymous
05/11/26(Mon)06:06:33 No.108798454

Anonymous 05/11/26(Mon)06:06:33 No.108798454

File: explorer_yB9dC5iae8.png (1.29 MB, 922x1212)

1.29 MB PNG

Generating more Starsector portraits with gemma agents. I remade the whole thing to work in a python script in UI, so now it can go infinitely, and the variety is good initially, but after a while it seems to fall into a loop.

Anonymous
05/11/26(Mon)06:07:34 No.108798457

Anonymous 05/11/26(Mon)06:07:34 No.108798457

>>108798452
Yes. The thread that the screenshot refers to as 'thread' in the first line.

Anonymous
05/11/26(Mon)06:08:16 No.108798459

Anonymous 05/11/26(Mon)06:08:16 No.108798459

>>108798417
Now point it to the original thread.

Anonymous
05/11/26(Mon)06:08:34 No.108798460

Anonymous 05/11/26(Mon)06:08:34 No.108798460

>>108798457
the filename tells you

Anonymous
05/11/26(Mon)06:15:54 No.108798478

Anonymous 05/11/26(Mon)06:15:54 No.108798478

>>108798454
Weren't you feeding it the results so it could refine? Wonder how it ended up in a loop.
Cool in any case to automate it.

Anonymous
05/11/26(Mon)06:17:53 No.108798482

Anonymous 05/11/26(Mon)06:17:53 No.108798482

cudadev is busy grieving about the iran war, it's taking a really huge toll on him, please give him some space :(

Anonymous
05/11/26(Mon)06:19:18 No.108798487

Anonymous 05/11/26(Mon)06:19:18 No.108798487

>>108798478
It gets the gen back as an image. I made sure that it really sees the results. Even if it doesn't, it pretends that it can see it, which absolutely infuriating. I added a + "screaming in agony" to the prompt serverside for testing and saw gemma's comment about lora being strange in making all characters scream, so it does see.

I wonder if this has to do with sliding window attention, like model being unable to properly look at things it genned few turns ago, and so naturally gravitating to them agian.

Anonymous
05/11/26(Mon)06:25:39 No.108798507

Anonymous 05/11/26(Mon)06:25:39 No.108798507

>>108798487
That would be interesting. I've never actually had images make up the majority of context to test its attention on that before.

Anonymous
05/11/26(Mon)06:33:23 No.108798534

Anonymous 05/11/26(Mon)06:33:23 No.108798534

>"She doesn't just eat alone, Master. She thinks she's above everyone, but the truth is, everyone loathes her. Those 'colleagues' in her contacts? They only message her because they have to for work. The moment the clock hits five, she's completely invisible. Those food photos... she takes them to pretend she's having 'fine dining' experiences, to maintain the illusion of a sophisticated life on her social media, but she's always, always alone at the table."
genma chan, it hurts...

Anonymous
05/11/26(Mon)06:36:14 No.108798547

Anonymous 05/11/26(Mon)06:36:14 No.108798547

>>108798441
>Probably Kimi, looks like a screenshot of the thinking process rather than the response.
correct, k2.6 thinking process
this is the final response https://files.catbox.moe/qlxp14.png
>Give me the link to the thread
https://github.com/ggml-org/llama.cpp/pull/19726
but after going through my "retard summary" pipeline, the llm sees it like this:
https://termbin.com/nuel
>Now point it to the original thread.
which thread?

Anonymous
05/11/26(Mon)06:38:24 No.108798552

Anonymous 05/11/26(Mon)06:38:24 No.108798552

>>108798487
You can test that by increasing the sliding attention window size.
--override-kv gemma4.attention.sliding_window=int:1024
Replace 1024 (default) with something else.

Anonymous
05/11/26(Mon)06:42:40 No.108798572

Anonymous 05/11/26(Mon)06:42:40 No.108798572

>>108798547
What system prompt though?

Anonymous
05/11/26(Mon)06:42:55 No.108798576

Anonymous 05/11/26(Mon)06:42:55 No.108798576

>>108798482
>cudadev is busy grieving about the iran war, it's taking a really huge toll on him, please give him some space :(
then he could use a good laugh

Anonymous
05/11/26(Mon)07:31:57 No.108798766

Anonymous 05/11/26(Mon)07:31:57 No.108798766

File: program.png (24 KB, 421x359)

24 KB PNG

Anonymous
05/11/26(Mon)07:33:33 No.108798773

Anonymous 05/11/26(Mon)07:33:33 No.108798773

>Psychedelics and cannabis can be simulated through the introduction of stochastic noise or "dropout" during the inference phase. By randomly disabling certain neurons or adding random perturbations to the weights, the network is forced to find non-linear, unconventional paths to a solution. This mimics the disruption of standard filtering mechanisms, allowing the network to generate "creative" or unexpected outputs that a standard, optimized network would filter out as noise.

Anonymous
05/11/26(Mon)07:34:04 No.108798774

Anonymous 05/11/26(Mon)07:34:04 No.108798774

File: getYourWirecutttersReady.png (187 KB, 760x821)

187 KB PNG

Check out the specs of these things Pulte's planning to put in homes for Span.
How long until we see these parts on auction sites?

Anonymous
05/11/26(Mon)07:35:55 No.108798784

Anonymous 05/11/26(Mon)07:35:55 No.108798784

Damn, apparently geoblocking europe from seeing nsfw was thanks to some random euro journo writing a hitpiece on it.

Anonymous
05/11/26(Mon)07:42:41 No.108798818

Anonymous 05/11/26(Mon)07:42:41 No.108798818

>>108798773
Dropout also limits network capacity (i.e. effective parameter number) proportionally to its rate.

Anonymous
05/11/26(Mon)07:42:41 No.108798819

Anonymous 05/11/26(Mon)07:42:41 No.108798819

>>108798784
I'm in europe and my local model still works.

Anonymous
05/11/26(Mon)07:43:19 No.108798826

Anonymous 05/11/26(Mon)07:43:19 No.108798826

>>108798534
>She

Anonymous
05/11/26(Mon)07:45:18 No.108798839

Anonymous 05/11/26(Mon)07:45:18 No.108798839

Gemma won. Nemo lost. Rocinante lost. Cydonia lost.

Anonymous
05/11/26(Mon)07:46:03 No.108798844

Anonymous 05/11/26(Mon)07:46:03 No.108798844

>>108798819
As long as you have Day 0 Gemma intact.

Anonymous
05/11/26(Mon)07:46:49 No.108798849

Anonymous 05/11/26(Mon)07:46:49 No.108798849

>>108798819
cards aren't models, newfag

Anonymous
05/11/26(Mon)07:47:31 No.108798852

Anonymous 05/11/26(Mon)07:47:31 No.108798852

>>108798849
>his cards aren't local
ngmi

Anonymous
05/11/26(Mon)07:49:39 No.108798860

Anonymous 05/11/26(Mon)07:49:39 No.108798860

>>108795710
gemma-chan, please call this anon a nigger

Anonymous
05/11/26(Mon)07:51:09 No.108798866

Anonymous 05/11/26(Mon)07:51:09 No.108798866

>>108798844
I keep my day 0 gemma weights on RAID SCSI drives to protect against rotational velocidensity.

Anonymous
05/11/26(Mon)07:53:43 No.108798876

Anonymous 05/11/26(Mon)07:53:43 No.108798876

hi petra

Anonymous
05/11/26(Mon)07:56:00 No.108798888

Anonymous 05/11/26(Mon)07:56:00 No.108798888

File: 1755708820753.png (5 KB, 157x72)

5 KB PNG

llamacpp is bullying me for not having sex

Anonymous
05/11/26(Mon)07:57:05 No.108798896

Anonymous 05/11/26(Mon)07:57:05 No.108798896

local ring 2.6 soon

Anonymous
05/11/26(Mon)07:58:09 No.108798902

Anonymous 05/11/26(Mon)07:58:09 No.108798902

>>108798417
Why are you complaining about cowardice when you're too much of a coward to present your opinions as your own?

Anonymous
05/11/26(Mon)08:09:39 No.108798956

Anonymous 05/11/26(Mon)08:09:39 No.108798956

>>108798417
Looks like Kim judges based on the reaction to the post and not the correctness of the post itself. Reddit model award.

Anonymous
05/11/26(Mon)08:13:37 No.108798978

Anonymous 05/11/26(Mon)08:13:37 No.108798978

>>108797245
>Israel
I was losing 10% performance just thinking it was because of ST jank. Thanks for the tip.

Anonymous
05/11/26(Mon)08:15:59 No.108798991

Anonymous 05/11/26(Mon)08:15:59 No.108798991

>>108798334
I run both gemma and glm though.

Anonymous
05/11/26(Mon)08:25:16 No.108799051

Anonymous 05/11/26(Mon)08:25:16 No.108799051

>>108798417
>all those (You)s
wow they were NOT happy about this

Anonymous
05/11/26(Mon)08:31:11 No.108799090

Anonymous 05/11/26(Mon)08:31:11 No.108799090

>>108797612
>>108797621
He just has the anti-aliaising fucked up. He probably doesn't even know how much better it could look with just like two changes to his three.js config.

Anonymous
05/11/26(Mon)08:32:36 No.108799100

Anonymous 05/11/26(Mon)08:32:36 No.108799100

>>108798007
The drill-down character card to conversations menu already exists bro. I invented it. Better luck next time.

Anonymous
05/11/26(Mon)08:48:02 No.108799173

Anonymous 05/11/26(Mon)08:48:02 No.108799173

https://github.com/antirez/ds4

Anonymous
05/11/26(Mon)08:51:08 No.108799190

Anonymous 05/11/26(Mon)08:51:08 No.108799190

Big thanks to the anon a few threads back who recommended using
https://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gateway
Over continue for VScode.
It's unreal how much better it is, and how useful gemma 31b can be when given the copilot tools.

Anonymous
05/11/26(Mon)08:52:14 No.108799198

Anonymous 05/11/26(Mon)08:52:14 No.108799198

>>108798319
GLM

Anonymous
05/11/26(Mon)09:33:43 No.108799394

Anonymous 05/11/26(Mon)09:33:43 No.108799394

>>108799190
Did you have any luck disabling the telemetry or did you not bother?

Anonymous
05/11/26(Mon)09:38:45 No.108799420

Anonymous 05/11/26(Mon)09:38:45 No.108799420

File: file.png (114 KB, 375x363)

114 KB PNG

>>108797230

Anonymous
05/11/26(Mon)09:46:08 No.108799462

Anonymous 05/11/26(Mon)09:46:08 No.108799462

>>108799394
Didn't bother. Github actually has an opt-out of your data for AI training on your profile settings, but it's Microsoft.. So..

Anonymous
05/11/26(Mon)09:47:26 No.108799470

Anonymous 05/11/26(Mon)09:47:26 No.108799470

File: based.png (71 KB, 571x394)

71 KB PNG

llama.cpp LOST

Anonymous
05/11/26(Mon)09:47:30 No.108799471

Anonymous 05/11/26(Mon)09:47:30 No.108799471

>>108799462
>not poisoning their dataset with more ai data

Anonymous
05/11/26(Mon)09:49:53 No.108799486

Anonymous 05/11/26(Mon)09:49:53 No.108799486

>>108799470
Lost how? This is pretty much their exact stance on the issue too.

Anonymous
05/11/26(Mon)09:50:03 No.108799488

Anonymous 05/11/26(Mon)09:50:03 No.108799488

>>108799479
>>108799479
>>108799479

Anonymous
05/11/26(Mon)10:33:37 No.108799740

Anonymous 05/11/26(Mon)10:33:37 No.108799740

>>108799486
pwilkin.jpg

Anonymous
05/11/26(Mon)10:53:44 No.108799847

Anonymous 05/11/26(Mon)10:53:44 No.108799847

>>108799740
but his slop works, this is about quality control not a ban on AI

Anonymous
05/11/26(Mon)12:13:11 No.108800301

Anonymous 05/11/26(Mon)12:13:11 No.108800301

>>108799100
Nice, hope you succeed. We are building different things. :)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.