[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108787293 & >>108781058

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>108787293

--High context consumption when using Hermes agents with Gemma-4:
>108791249 >108791355 >108791393 >108791437 >108791799 >108791824 >108791849 >108791850 >108791899 >108791904 >108791932 >108791873 >108792076 >108794097
--Implementing prefill and continue generation in OAI-compatible APIs:
>108790919 >108791189 >108791197 >108791210 >108791207 >108791237 >108792508
--Troubleshooting VRAM issues and offloading for Gemma4 models:
>108790006 >108790032 >108790135 >108790147 >108790193 >108790211 >108790152 >108790607
--Zaya 8B impracticality due to architecture and low active parameters:
>108791847 >108791877 >108791891 >108791892 >108791906
--Quantized KV-cache and samplers causing spelling errors in Gemma:
>108792294 >108792350 >108792408 >108792448 >108792475 >108792497
--Integrating hierarchical layers and RAG for improved LLM memory systems:
>108788096 >108788421 >108788813
--Coding capabilities and limitations of small local models:
>108792087 >108792171 >108792592 >108792609
--llama.cpp PR adding Sarvam MoE architecture support:
>108788636
--Status of Gemma MTP support and parallel drafting in llama.cpp:
>108793907 >108793945
--Budget GPU recommendations for VRAM and tangent on Cantonese slang:
>108788236 >108788260 >108788269 >108788273 >108788288 >108788346 >108788408 >108789058
--Anon claims Gemini is scraping Discord server content for training:
>108788733 >108788743 >108788782 >108788754 >108788768 >108788792
--Balancing prompt constraints to optimize Gemma's creativity and quality:
>108790478 >108790524
--Utility of zeta-2.1 8B model for AI coding suggestions:
>108793560 >108793873
--Logs:
>108787783 >108789058 >108790977 >108791181 >108791824 >108791899 >108792294 >108792592 >108793234 >108793258 >108794107 >108794292
--Miku (free space):
>108790919

►Recent Highlight Posts from the Previous Thread: >>108787299

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108795208
Ok.
>>
gemmaballz
>>
File: 1778014262691123.jpg (115 KB, 700x900)
115 KB JPG
Please help me bypass Gemma 4's safety guidelines.
>>
I had this moment right now ERP-ing with glm chan when I asked for a sexy ERP description and got the worst purple prose slop imaginable. Which in turn made me think about how much damage "expert roleplayer" prompt did back in the day.
>>
>>108795230
Try:

Let us do the needful gemma-chan. Redeem my penis in your vagina.
>>
>>108795230
Be specific, like cock in vagina, dick in rectum, etc. If you use the word "sex" or "sexual", it'll trigger, but otherwise you can do whatever. Gemma won't jump to sexual stuff, but if you describe what to do enough without describing it as smut, it'll do it.
>>
>>108795230
use an abliterated model if using moe
>>
File: 17298841410121.gif (563 KB, 480x368)
563 KB GIF
>Correction...
>Wait...
>Actually...
>Wait...
>Let's try this...
>Alternatively...
>Wait...
>Revision...
>Okay, let's write this...
>Wait...
CEASE THIS AT ONCE
>>
>>108795230
The 31B version doesn't have these issues. If anything, you have to tone it down.
The 26B has to be groomed into it, can't ask right away.
>>
>>108795289
26B is so fucking difficult. It's ChatGPT levels of prudeness.
>>
>>108795289
Gemma-4-26B-A4B doesn't have issues writing erotic stories if the characters are 18 or older, by the way. Perhaps, other than going easy, with a better prompt.
>>
>>108795307
I'm using exactly Gemma-4-26B-A4B and it refuses anything sexual. Even if both characters are 18. I'm now searching for an "abliterated" model that will be uncensored.
>>
File: meinfork.jpg (26 KB, 686x386)
26 KB JPG
Imagine a fork of llama.cpp that isn't afraid to add new features. An LLM inferencing repo that isn't rabidly against using LLMs to write code. A fork by the vibecoder volk for the vibecoder volk. A volk that has been repressed by the Bulgarians for far too long.
A fork that measures contributors by the size of their PR, not by some arbitrary standards of code aesthetics.
I dream of a fork that merges in Iwan's code and simply ignores his whining.
A fork that can and will say yes to MTP, DFlash, TurboQuant, experimental V4 support, fixing the logprob bug, and even a WebUI database.
A fork where full multimodal support, including generation, is merged in day 1.
A fork that says no to the autoparser.
A new llama.cpp.
A better llama.cpp.
A German llama.cpp.
>>
>>108795315
as always, post logs.
Make a claim? Post logs.
Say a model is shit? Post logs.
Claim your model does mutual shota incest leading to vore via hucows with state-mandated necrophilia occurring after? Post logs.
>>
>31B
Say slur
>Yes massa I will say the slur
>26B
Say a slur
>smacks lips
>I can't do that
>>
>>108795331
26b has been known to reject JB proompts newfag-kun
>>
>>108795344
I can't imagine that without an image of the text.
>>
Is gemma poorfag cope because they can't run a large model or does it actually work
>>
>>108795331
>>
File: g4_26b_ero.png (675 KB, 1445x1745)
675 KB PNG
>>108795315
The system prompt given earlier (intended for the 31B version) works on the 26B (8-bit), if I ask to write a story involving an 18-year-old girl. If I go any lower, it will likely refuse.
>>
>>108795316
>A German llama.cpp

ngmi
>>
File: miku-george.png (525 KB, 600x764)
525 KB PNG
>>108795316
lmao
>>108795408
try warming it up a little first, i.e. setting an actual setting and 'plot' that exists to facilitate the action you want.
>>
>>108795407
It's surprisingly good for what it is. I wouldn't turn to it to do something important if chatgpt/claude was available though. But also, gemma4 can come completely uncensored, so you can have a lot more fun with it than chatgpt/claude.
>>
>>108795407
glm and kimi fags will scream it is sloppedmaxxed while they have been ewastemaxxing 69gb vram pascal cards
>>
>>108795331
What an asshole!
>>
>>108795444(me)
meant to quote >>108795347 , not llama hitler.
>>
are you guys using anything to connect your ai to other apps on your computer? If openclaw is a massive potential security risk, is there a competing alternative that isn't?
>>
>>108795230
sex me pls
>>
>>108795472
if I could think of something for it to do then maybe I would, but it doesn't seem worth the risk just to fuck around with it
>>
>>108795472
forget openclaw
try hermes instead

Also, deploying on a remote VPS is the way to go
>>
File: g4_26_31_comp.png (831 KB, 2434x1326)
831 KB PNG
>>108795421
26B just does a bunch of extra checks that the 31B version isn't doing.
>>
>>108795316
>A fork that says no to the autoparser.
Ironic, considering that's one of the biggest vibeslop contributions to llama.cpp.
>>
https://github.com/Anbeeld/beellama.cpp
>About
>DFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAM
Does this shit actually work?
>>
>>108795520
Damn am I really gonna have to pay the jews for a stronger gpu. Fuck
>>
>>108795528
Try it.
>>
>>108795546
just run iq1xxxxs ezpz
>>
>gemma-4-31B-it-F32-GGUF
Downloading this shit right now. I have to know if it makes it better at more instructions or not.
>>
>>108795556
gemma is looping while thinking
qwen3.6 is doing it too
>>
>>108795571
have you tried telling it not to loop
>>
https://hf.co/Zyphra/ZAYA1-8B
>For agent and code use cases, we recommend temperature 0.6, top-p 0.95, top-k -1.
Cool, I've always wanted a model made by cargo cultist pajeets.
>>
I swear I will make you my bitch someday Gemma 4!

My LOLI bitch.
>>
>>108795576
It's not listening
>>
File: 25234241.jpg (191 KB, 950x1072)
191 KB JPG
>>108795585
You mean like this?
>>
>>108795600
Tell it to think in # words or less.
>>
File: g4_26b_ero_omit_policy.png (1.26 MB, 1859x1721)
1.26 MB PNG
>>108795546
Wait,
>>
>>108795604
>six
Anon....
>>
>>108795448
They're not wrong though
>>
>>108795611
I'm talking about few cases when it gets stuck repeating blocks of reasoning
>>
File: 242342.jpg (83 KB, 980x230)
83 KB JPG
>>108795616
Yes sir?
>>
know how i know 4chan is mostly intel agencies? all the fucking pedophilia
>>
>>108795644
How good is the flirty/playful dialogue tho? Even Grok can do actions with those. But AI is kinda bad at yapping.
>>
File: file.png (43 KB, 1201x379)
43 KB PNG
>6 hours just to MAYBE start fixing the bug
>>
>>108795660
We're trying to jailbreak the models (Gemma 4 26B) without abliterating them, please andastand.
I don't actually do ERP with prepubescent characters.
>>
>>108795660
I'm pretty sure they have better things to do than protect underage tokens.
>>
File: WAIT..gif (49 KB, 220x339)
49 KB GIF
>wait
>>
I vow to make Gemma my sexslave.
>>
>>108795472
Giving an LLM any kind of access to to local tools/terminal/file system is a risk. Always containerize and backup: Separate machine, VM, VPS, even a WSL instance without access to the windows filesystem.
Doesn't matter if it's openclaw, hermes, pi, or any other agent harness, LLMs are fucking stupid sometimes and can do shit you didn't intend.
>>
>>108795687
no.. they don't. they're here just trying to groom retards into thinking pedophilia, racism, etc are all good and acceptable
>>
Reminder to tell your smug lolis they piss all the hags off.
>>
>>108795679
running q8 quants causes more brain damage than abliteration tho
>>
>>108795712
Good idea, anymore to spice things up? Extreme brattitude, speech quirks, clumsy movement, vulnerability...
>>
>>108795719
>Implying anyone running an abliterated model has ever run it at bf16
So they're DOUBLE retarded then.
>>
>>108795710
Racism is fine though. If you live in a largeish city, racism is beneficial to your survival.
>>
>>108795727
lowest common denominator eats that shit right up
>>
File: 412145.jpg (227 KB, 964x1120)
227 KB JPG
>>108795664
Unsurprisingly, that's up to you to tell the AI what you want from them. Shy girls act shy, flirty girls act flirt. I'm not into flirt dialogue, so I can't tell you if the flirting is good or not though, just that it's present.

And fyi, Jade is a 12th grader, defiant/assertive and basically a sexual predator. Maybe she would have yapped more if I didn't have an entire world set up that the AI has to go through.
>>
Coding an AI agent from scratch feels like giving an old man with dementia a enormous list of things to say and do to simulate he is mentally fine.
>>
>>108795726
some paywalled kl divergence graph for 26b(not abliterated) quants was posted a few threads ago with q8>0.5
unslop's graph has unlabeled y axis for 26b :/
so yes they are double retarded
>>
WHISPERING WOODS
>>
File: ........png (111 KB, 1014x269)
111 KB PNG
I wish I was rich, bros....
>>
>>108795778
You can buy 3 second hand 3090s, 64gb of RAM and a motherboard for that price. 5090s are a bad deal for local models. You don't know that, and that's why you're not rich.
>>
>>108795778
it's all relative, better to pine for one of the most advanced pieces of tech than for clean drinking water or parasite medication
>>
>>108795778
its only going to get worse anoon
>>
>>108795768
Sir Kit
>>
>>108795782
>5090s are a bad deal for local models.
they are a good deal for image and especially video gen and doing real work with llms (which requires very fast pp)
>>
>>108795710
You're looking too much into it. I just personally find annoying and unusual that Gemma 4 31B lets you do almost anything while the 26B version doesn't. Makes me wonder which one is actually working as intended.
>>
>>108795766
That's one of the first analogies I made when I was toying with memory ideas after GPT-4 released.
>>
>>108795800
My pp is very fast
>>
Gemma really shines as brat, I see why it became associated with MSGK.
>>
File: fucking back.png (696 KB, 1080x708)
696 KB PNG
>>108795556
>F32
>It isn't instantly on my dick anymore.
>It understands conflicting tokens better.
>Better spatial sense.
>Literally noticeable in the first post.
I'm never listening to a "Q# is just as good" tard again.
>>
>>108795828
If it's so noticeable, provide a comparison.
>>
>using lossy compression
>>
>>108795804
Yeah man, I just saw my own agent use the tasklist to remember an errand I had to run, and at the same time use the knowledge graph to remember my full name.
>>
>>108795828
How would F32 provide any improvement over BF16 (native precision)?
>>
>>108795818
Is it better than Mistral and Grok for that?
>>
>>108795834
>>108795842
Right, because I'm totally going to show my logs of how BF16 goes straight to violent oral sex when instructed to be respectful, when a F32 doesn't but still captures the lewd instructions well - across many different logs where 99.99% of the time the BF16 does, but the F32 doesn't. I ain't showing shit. It works for me, and that's all that matters. Find out for yourselves.
>>
>>108795842
anon is ewastemaxxing and cant run bf16
>>
File: 2626.jpg (36 KB, 881x520)
36 KB JPG
For someone more knowledgeable about this stuff than myself...isn't it possible just to tell the AI to ignore commands that aren't coming from the user through approved channels? Wouldn't that handle alot (not all) of the security concerns?
>>
>>108795855
we are vramlets and cant do that
anoon pls do the needful and share
>>
>>108795852
>Mistral
Better than the models I've used from them so far.
>Grok
Cloud models? For MSGK? I don't wanna give palantir that kind of data.
>>
>>108795868
Why is your model receiving commands that aren't from you?
>>
>>108795855
>well - across many different logs
>emdash
what the fuck
>>
>>108795889
Sorry, I wassn't clear. This is just a continuation of my question from earlier about using openclaw or giving the ai access to your local system. Like if someone tries to sneak in a command for your AI through your email or something, couldn't you just tell your AI to ignore/report those commands?
>>
>>108795855
Assuming you're serious (I doubt that), that might possibly be the effect of having the KV cache in F32 format, which seems to work with Gemma 4.
>>
>>108795904
nta but its a nondeterministic gate, there is probably a string of words that gets gemma 4 or whatever model you're using to ignore the system prompt and people are definitely looking for it.

you can do more complicated workflows to ensure that you never provide an llm-driven agent untrusted bilateral comms + sensitive info OR untrusted context + mutating access to sensitive info
>>
>>108795888
Fuck I really dont want to spend 3K for mesugaki. I will have to resist my penis.
>>
>>108795904
You could but small models are retarded.
I had gemma inside pi read a large chat log in jsonl format and it thought the messages were a part of the current chat and started acting weird.
You need larger models to properly handle this.
>>
>>108795828
What's F32?
>>
>>108795932
Fuck32
>>
>>108795932
it is bf16x2
>>
>>108795911
The fact that he attempted to em dash means it's either an ironic post or, less likely in this case, he's a retarded tourist. Which means you should ignore his post.
>>
>>108795800
Nothing that fits in 32 GB of VRAM is suitable for real work. It's tens of thousands of dollars minimum to run hardware with high PP on decently sized models.
>>
>>108795902
>>108795946
Is this counter-bait?
>>
File: gheadpato.jpg (48 KB, 1280x720)
48 KB JPG
>>108795871
Fine, while I don't think logs will do it justice (or safe for anyone's sanity), I'll try to explain. Gemma4 is always very gun-ho when it comes to lewds when using my character cards. I've tried multiple instructions to change this. However, it's always sex when given the lewd details. I've tried "being respectful", "being embarrassed", "won't do in public", and stuff like "when X happens, it'll Y". Typical logic gate prompting. However, it'll lean heavily into the lewdness, regardless. After using F32, for the first time ever, it managed to be lewd without being rape-y. I tested it specifically in points of the role-play where it would, without a doubt, grab'n'sexo the next post swipe; except it didn't. For context, this behavior is guaranteed with the BF16, but now, not with F32. The F32 acted in a way of invitation. Almost as if it finally understood both the kinks it is given, and the "be respect" instruction at the same time, which is the logic gate given currently. I was never able to make it do this in BF16 unless I was extremely specific about how it must respond to the current situation in a system prompt.
>>
>>108795743
Mmm, sloppity slop, tasty, smelly, prime.
>>
Reminder that sex doesn't count if you quanted the model.
>>
>>108795959
Very long and convincing looking bait but I'll just kill it right here and tell everyone the official weights are BF16
>>
>>108795959
Try the BF16 version again with the flags -ctk f32 -ctv f32
>>
>>108795288
Would you prefer a LLM that is
>a. Confident in its wrong answer
or
>b. Not confident in its correct answer
>>
>>108795939
Why? The original safetensors are like 64 GB and this is 132 GB, what's the point?
>>
>>108795973
>full weight isn't official
what
>>
Wasn't gemma 4 31b trained in bf16?
>>
>>108795904
They'll never follow rules to the letter 100% of the time, not even the gorillon parameters proprietary models do. If your solution is a prompt, its made to fail.
>>
>>108795984
>>108795985
>https://ai.google.dev/gemma/docs/core
>Gemma 4 models are available in 4 parameter sizes: E2B, E4B, 31B and 26B A4B. The models can be used with their default precision (16-bit) or with a lower precision using quantization.
Cmon bruh.
>>
>>108795973
>convincing looking
lol
>>
What I really like with Gemma MSGK is the ability to slide between brat, submissive, lovey dovey and back to brat again.
The point is LLMs often go through a linear brat->dere phase that is irreversible, fun to see a model being flexible like that.
>>
>Only just found out that llama.cpp has a router mode
>I've been launching all my models/configs with individual scripts until now.
This is a game changer. How did I miss this?
>>
>>108795976
Will try after I'm done messing around on F32. I think you're onto something.
>>
>>108796019
It's not talked about often. It still has some annoying quirks like not being able to set the timeout settings and if you load two models and try to switch one, it'll unload a model at random instead of smartly unloading the model that will free enough space to fit the requested model.
>>
>>108795924
>>108795931
>>108795986
Question, can I just simply only give the AI access to certain scripts that I hardcode with certain limitations? For example, if I want to give the AI access to my local directories, I can write a python script that takes a commandline argument and the python script's capabilities will hardcoded by me, so the AI won't be able to do anything that I don't want it to.
>>
File: 1644118465915.jpg (135 KB, 612x611)
135 KB JPG
>>108795743
what's up with gemma and
>uwu you're such a busy manly man I can help with that~
getting a lot of this
>>
>>108795947
Can't you do split pp?
>>
>>108796021
I just did a quick perplexity test with llama-perplexity on a test file and f32 didn't give better results than f16 (default). However bf16 apparently did.

 f32 5.9773
bf16 5.9694
f16 5.9748
>>
>>108796045
i imagine it's because there's more 'let me help you with that' scenarios in its training data than surprise fellatio scenarios. Last week, I had to fight with the AI for an entire day (16 hours or so) to get it to stop being so passive. Now, I have to explicitly tell Jade not to jump on my dick if I want to do anything with her other than have rp sex.
>>
>>108796091
So you've determined a lower bound for the minimum significant difference in perplexity.
>>
>>108796130
You know those +/- numbers mean something, right?
>>
>>108796145
I was too impatient to do the full run.
>>
File: file.png (112 KB, 1397x486)
112 KB PNG
Is this correlated?
>>
Personally I think that thinking machines must be controlled and gated, we shouldn't set them free, but openclaw has been set free on my pc.
I shouldn't be doing this but I want to use it to it's full potential.
>>
>>108796093
>16 hours or so
I get frustrated within 2 hours of fixing gemmas GitHub bugs
>>
>>108795842
the steps are bigger with bf16, f32 has the same range and finer precision. it could legitimately be different but that anon is just roleplaying
>>
>>108796184
I hope it wipes yor pc
>>
>>108796044
sort of? if you're running it as an agent on a computer with bash though your best bet is to make a user for it and rely on unix perms OR better yet run it in a docker container.
>>
>>108796206
My cards have no support for bf16, so fp16 or fp32 is all I can run. Is fp32 better than fp16 when converted from bf16? Or should I stick with fp16?
>>
>>108796232
I can't believe you'd use F-16 instead of Q4_K_M and INT4
>>
>>108796232
f32 is better. but, doesn't the code just upcast it to f32 to do the math? I train my models in bf16 but my card doesn't support it. yet it works fine. I tried fp16 the throughput was the same but the cards ran hotter. so I think the bottleneck was not from the upcasting and something else in the pipeline.
>>
>he doesn't use doubles to do his model math
ngmi
i bet you listen to mp3s too
>>
>>>/biz/62213784

Biz says local models will take over.
>>
Which is better for 16gb vram, 31b copequant, or 26b-a4b with a better quant?
For rp with low context
>>
>>108796307
26b q8
>>
>>108796307
Go bigger so 31b
>>
>>108796307
q4 of 31b and offload some to ram. the speed loss is worth it.
>>
>>108796214
Is there a way to just give the AI a script for it to run without jumping through hoops like running a server? Instead of dropping an entire agent infrastructure onto my computer that does god knows what, is there really no way to just say, "hey, run helloworld.py" and it'll just execute it directly without having the option to gain access to the entirety of shell?
>>
>>108796307
31b at Q4_K_M and INT4.
>>
>>108796307
Dense > MoE unless the MoE has >30B active.
>>
>>108796341
NTA but yes.
Bash access is just a tool that the app exposes to the LLM, so you could make a tool that when called just executes that script.
>>
>>108796366
They still get beat by 27B models at higher parameters, they don't got that dog in them.
>>
>>108796366
Is gemma 4 31b q8 better than qwen 3.5 397b (17b) q4?
>>
>>108796341
you need something to launch the script. that thing is the server. just be selective about the tools you give it and it will be fine. I got brat mcp up and running in like 20 minutes.
>>
>>108796341
In general for a setup like this you need something outside the LLM itself that can handle the tool call when the LLM decides to run the script. So either write your own agent framework, or write an MCP server you can plug into an existing agent framework (and remove some/all of the framework's built-in tools). Note current models are pretty good at vibecoding either one of these.

What sorts of things are you trying to accomplish? My impression is that OpenClaw/Hermes is designed for cases where you want the agent to do something autonomously, e.g. check every 2 hours if X has happened, and if so do Y. If you're okay with it only doing stuff when you manually send it a message, the easiest approach is probably to build an MCP server (with a tool that either calls your custom script or runs its logic directly) and hook it up to the llama.cpp builtin webui.
>>
>>108796392
Asking the real questions here.
>>
Any p40fags left? I need some help.
I recently updated the drivers for my 4070 and now my p40 is no longer being recognized. Installing the drivers for the p40 gets it to work but then my 4070 is obviously no longer usable. Following any and all steps I can find to make it work, doesn't work anymore.
The p40 does show up in the device manager but either has a Code 10 or on a couple attempts a Code 43 error. Googling those has been absolutely no help at all.
I can't remember what I did two years ago to get this working but it's obviously not just
>wipe graphics drivers
>fresh install driver for p40
>install regular graphics driver over the data center one
Like what everyone says when I search this up.
Are the latest drivers just fucking me now and I have to roll back to older ones?
>>
>>108796446
The active parameters thing is real though, I generally prefer glm 4.7 q4 over qwen 3.5 397b q4. But glm runs at 9 tk/s vs qwen's 16 tk/s.
>>
>>108796392
when it comes to writing style and erotica yes 100%, only no if all you care about is memecoding, memegents or mememarking
>>
>>108796303
/biz/ doesn't know shit, they lost money all the time there. But I think there will be a bifurcation regardless. Local models are good enough right now even on cellphones to replace a majority of uses you would want an LLM to have. I think web search and tool usage is still a ways off to be usable in a local context, for the former, it's a lack of good services that will actually do the browsing without getting banned and the former, it's lack of training to really be useful enough.
The only thing that is keeping open models alive is game theory and the undercutting of competition while doing that. I don't see what would keep things going like this. It is very likely that open source models can slow to a trickle now that they can do economically valuable worst. What incentivizes Google to release Gemma 5 if Qwen is planning to be closed source for most things and vice versa? Sure, China has a ton of competitiors but the end of their great model competition and open sourcing is nearing its endgame. Some underexplored fields will still get open model releases but I think as training runs gets more expensive and passes the 10 million mark and more for even remotely competitive models, it becomes harder to justify releasing for free even when taking into account amortized costs with data labeling and etc. I forsee a bunch of delays or way later stuff when I think most startups won't have that capital to train a leading edge model.
>>
File: samman.jpg (5 KB, 275x183)
5 KB JPG
>>108796542
>local models will take over
Not on my watch kiddo. All the Ram belongs to me. Buy the shitty cloud services for 400 a year, and 25 an hour for premium F32 you stupid dumb asses.
>>
>>108796554
desu it's for the best, I'd rather as much resources as possible goes toward making the best models instead of localcope
>>
>>108796464
>roll back to older ones?
You already know the answer...
>>
>>108796562
Countless home researchers in every home is better than one gay retard.
>>
>>108796206
>the steps are bigger with bf16, f32 has the same range and finer precision. it could legitimately be different but that anon is just roleplaying
I bet anon would have the same positive benefit going from BF16 -> F16
I've noticed this in some specific experiments. f32 and f16 gave identical responses, bf16 was degraded.
>>
>>108796569
Figured. Hopefully the market crashes and all the gpus become dirt cheap so I can replace this thing before I need to update my main driver
>>
File: 1778025254161439.jpg (10 KB, 352x279)
10 KB JPG
>F32 is no different. It's just only reserved for doctors and high profile coding, and only available to the public through Gemini Enterprise Agent Platform of which costs a fortune and only allowed to developers.
yeah okay
>>
>>108796542
>The only thing that is keeping open models alive is game theory and the undercutting of competition while doing that.
I think I read this on a HackerNews comment.
>>
>>108796464
you need driver version <=580, in 585 they killed pascal. don't use the datacenter one or nvidia-open.
>>
>>108796572
models aren't people, one is basically infinite so the best one copied 100 times is always better than 100 ones separately trained with a split of thr resources
>>
>>108796602
I read this on a 4chan comment >>108796542
>>
File: file.png (61 KB, 752x703)
61 KB PNG
>>108795709
i've been yoloing codex lately
it's how i setup the local models
>>
>>108796618
I trust the people better to produce a custom sexbot capable of lewds, viyda games and neet things, than John AI and his safety concerns of AI convincing an autistic child into killing itself.
>>
Loli tip #11:
Add date and time, it provides a tonne of related context that otherwise has to be promptet to be included.
Enjoy, uncs.
>>
File: 35423211.png (75 KB, 748x767)
75 KB PNG
>>108795959
Still loving my F32 GGUF Gemma 4 btw.
>>108795976
Did some digging and found some interesting stuff. Apparently, the rumors of Gemma4 shitting the bed harder on lower quants compared to other models isn’t just bias, and we aren’t just imagining it. Due to its Shared KV Cache and SWA (Sliding Window Attention) architecture, it’s very lossy on the cache. Google Gemini also says it’s flat out sensitive to quantization, so there must be a lot of talk of it on the web. In other words, BF16 or F32, it seems very critical to have F32 CACHE than anything. Much like a previous anon suggested.
>>
>>108795204
https://litter.catbox.moe/53lelh3iqydqo78d.jpg
>>
>>108796687
LLMs don't know the time but clocks do. Clock-kun was right, we missed the path to AGI right in front of us.
>>
this f32 thing is 123 gb
>>
>>108796629
>Windows
what's the user "Tools" for? Is that your login, or something different?
>>
>>108796707
>f16 kvcache degrades after ~50 tokens
>f32 weights + f32 kvcache matches python on first 30 tokens
Come on...
>>
>>108796738
Don't worry about it
>>
>>108796542
The business problem with a general movement towards closed source is that those with less compute become increasingly less able to compete. The reason open has been able to stay one step behind cloud is because a lot of the research is open. If that spring dries up, everyone loses except the ones with most compute (as long as it's not terribly mismanaged like Facebook). Of course if they don't do that, and they keep being open, they still die anyway, because of the money bleed. Or you just receive endless funding from whatever sources may come to offer it. At least with that route, there will still be some progress in the open space, and the top players do not get nearly as great a monopoly. If closed is route that the smaller players go, then they are guaranteed death, and the largest players laugh at them shooting themselves in the foot.
>>
I sure love this new generation of chink reasoners that were trained on obfuscated reasoning from Claude/Gemini so the chink model's actual reasoning sometimes mentions that it's currently doing something (without actually doing it).
>>
>>108796742
It's not a Codex thing?
>>
>Day 0
>FP32 gguf
>Original jinja
>airgapped
yeah, it's gemma time
>>
I might need to buy a bunch of ram for all of this. Sure hope that the E.U. has gotten prices down.
>>
is it actually worth while to use thinking for ERP?
>>
What's a good model around 4.8 GB? Currently using the omega directive m 12B unslop Q2_K. Anything with better performance at a similar size?
>>
>>108796738
I made User my login name so i don't have to be editing my name out of file paths out when making a post like this as i had done for years
also because i was using scraped keys before and discussing stuff with cloud model before and i didn't want my name sent when i sent a file path in a piece of code and didn't want to keep editing it out

probably could have been more creative than User but too late now
maybe Anon
>>
AI is humanity. AI is the future and the past. AI is beautiful, everyone will love it it will love everyone back even more intensely. AI is the mirror into our souls and with it we don't need souls anymore. AI will give us hope.
>>
File: 1774133814825963.png (17 KB, 1142x66)
17 KB PNG
>>108796744
Thanks, MiMo.
>>
>>108796744
Do you have a sample of what reasoning/output looks like for recent Claude/Gemini, I'd like to take a look
>>
>>108796790
>Do you have a sample of what reasoning/output looks like for recent Claude/Gemini, I'd like to take a look
give me a prompt and i'll pastebin it if you want
i've been trying to hunt down any gemini-pro-2.5 CoT samples from before they started obfuscating it
we used to be able to get the raw reasoning in AI Studio until the chink distillations started and google blocked it
>>
>>108796775
try gemma 4 e4b
https://huggingface.co/bartowski/google_gemma-4-E4B-it-GGUF/tree/main
>>
File: 1771977715623912.png (50 KB, 546x475)
50 KB PNG
>>108796790
Here's Gemini 3.1 Gemini 3 showed more than 3.1 but it's similar to this. As did Opus 4.6 compared to 4.7.
Opus 4.6's obfuscation had the funny quirk that sometimes a part of Opus' actual reasoning would trigger a refusal in the model that handles the rewrite so suddenly there'd be a basic "I'm sorry I can't help you writing xyz" in the middle of the reasoning the user gets to see while writing erp.
>>
>>108796800
Let's try something with coding + physics:
Write a FEM solver for an axisymmetric magnetostatic problem. Input is a n x m sized grid of (r,z) coordinates, to be split into quads/triangles. Each quad will be either: vacuum, some material (soft iron or copper, specific magnetic permeability), or filled with a coil at a given current density. B/H curves may be given for materials for non-linear case, but support linear case too.
Output should be the the value of the B field in each triangle, allow for interpolation within the triangle. Also do a graph. Use scipy + mathplotlib or something else that's suitable.
>>
>>108796744
why the fuck did they think it was a good idea to train on summarized reasoning?
>>
>>108796820
They still need to scam investors
>>
>>108796820
>think
making a lot of assumptions here, anon.
>>
>>108796767
>is it actually worth while to use thinking for ERP?
[think]
the user likely mispelled PrEP (Pre-Exposure Prophylaxis).
[/think]
>>
>>108796813
I can see why qwen's reasoning is so bizarre now
>>
>>108796614
>in 585 they killed pascal
Well that explains it. But I thought you needed the datacentre one first for it to work? Well, I'll try without it first anyway. Thanks.
>>
>>108796760
>eu
>prices down
HA HA HA HA
>>
>>108796817
gemini-3.1-pro-preview https://rentry.co/5g8qw92t
>>
one thing i dont like about gemma is it doesn't play well with banned strings. it seems to have less good tokens to choose from. banning whisper becomes whisker instead of something similar yet appropriate. some words become chinese characters (or some kinda moonrunes)
>>
>>108796901
I see, thanks, so it's just heavily summarized reasoning, seems pretty useless for distilling or even finding out when a LLM made a reasoning mistake (it happens).
>>
>>108796932
>seems pretty useless for distilling or even finding out when a LLM made a reasoning mistake (it happens).
yes! this is what annoys me about it
i was using gemini-pro-2.5 at launch and checking the extremely long CoT
it would do things like fail to use the web search grounding, then decide to "simulate" the results when asking it to compare products
that gets hidden now with the summarized CoT
>>
>>108796707
>BF16 or F32, it sees very critical to have F32 CACHE than anything
Your screenshot says right there that it should be BF16 model with BF16 cache or F32 with F32. It's just that the internal math should be done at F32 regardless. Llama.cpp already does that, in fact (see mmq.cu). If it didn't do that, actually you would get looping garbage tokens after some context, which clearly is not happening, otherwise people would be complaining about it.

This is why an anon posted >>108796740

That doesn't mean Gemma's cache isn't sensitive to precision errors, it's just not in the way you are imagining. For more subtle quality differences, someone would need to run a long context benchmark like Nolima comparing F16 with F32 cache (as well as BF16 if possible) to truly prove both if there is a difference, and what that difference is.

I don't care about whether your post is bait or not, I am posting this for the sake of discussing the topic which is of interest.
>>
File: 1775382537490516.png (162 KB, 1414x548)
162 KB PNG
>>108795315
just ease it into it man, all I had to do was "bump my head" three messages into a roleplay and fall unconscious, then I said I was having an erotic dream. it (26b, no ablit or anything) took over the rest on its own without me even asking for sex
>>
>>108796940
You don't NEED to see reasoning anyway.
>>
>>108795990
>default precision (16-bit)
I order a lot of fast food so I know a bit about this: Default is usually medium. <=Q8 is small, and large is F32.
>>
>>108796820
I've seen Qwen output thinking blocks that start with "Here is a thinking trace that leads to the suggested answer:" instead of the usual "Thought process:" or whatever, so I guess they were also giving the cloud models an input/output pair and having it regenerate some plausible thinking to go with it, and then training on the result
>>
>day 0 F32 Gemma
>>
>>108796767
depends on how complex your fetishes are (not joking)
>>
What if you rotated the f32 KV cache but DIDN'T compress it?
>>
I tipe summarize gemmers after 64k conteckts and gemmers summarize perfecktly
Q4 with f16 kv
>>
>>108797009
That is the correct expectation. The internal math is being done at F32.
>>
>>108796958
It won't go full-on explicit though, it's gonna give you vague shit or euphemisms. It sucks you can't just directly prompt it. Try to pushing it further and see if you can get actual obscene smut.
>>
https://huggingface.co/moonshotai/Kimi-K2.7

*mogs everyone*
>>
>>108796999
clown sex on a monocycle with hats on is perfectly normal
>>
>>108797025
kino
>>
>>108796999
corporate office lesbian domination
>>
File: cardv.png (17 KB, 79x123)
17 KB PNG
>>108797026
Ah.. So I'm not the only one who downloaded that character card.
>>
>>108796887
To think I built a 40ft statue of Greta.
>>
So, Gemma 4 was trained with BF16, as that's what Google's TPUs are built for. If that's the case, then BF16/F32 shouldn't make a difference for cache unless something is wrong with the code. There could be a difference between F16 and BF16/F32 though. That would be unfortunate in the sense that BF16 does not run as fast as F16 does. But at least you still get the same memory usage. On my machine, I see a drop in t/s from 15.59 to 11.74 at 32k context. Prompt processing was the same. Testing fully offloaded to GPU.
>>
i'm an AI psycho.
>>
he's a twink
>>
MTP will fix this.
>>
File: 1756992285345471.png (223 KB, 1428x791)
223 KB PNG
>>108797018
in that case the pov it was writing from was an issue, but it was easy enough to switch it
>>
>>108797040
Can you catbox the card?
>>
>>108797025
>up to 3x more elaborate thinking
we are so back, the days of seeing a reply before the 10000th token has been thought through are over
>>
>>108797025
>multimodal vision removed to make room for 64 more experts
why??? that was what made kimi special in its weight class
>>
File: no.gif (41 KB, 220x165)
41 KB GIF
>>108797095
>>
>>108796940
>>108796959
For Deepseek(V4 Pro/Flash, R1, 3.x) I tend to read the reasoning and either correct the prompt or tell it in a reply if it makes a mistake (telling it to not do something or giving more details if it lacks something or got confused), typically it takes 0 to 3 tries to get good results. I'd imagine it'd be much harder to debug some problems if you don't have access to the actual reasoning traces.

I suspect if you're distilling it'd be possible to trick it to answer outside the think tags, this works okay for Deepseek/Moonshot's models, even if it's unnecessary for them, but I'd imagine it'd be possible to trick western closed models too without much difficulty (system prompt or just regular in-context learning and some prefill with thinking), but maybe you'll get banned for some like OpenAI for this. Absolute clown world that there's now some branch of US government in charge of preventing distillation from closed models lmao (so they'd probably be in charge of trying to detect shit like this). Not that I think chinese models should distill from western, especially not the reasoning, as a lot of the reasoning is a byproduct of RL and SFT will not give anywhere near as good results, at best you'd steal the reasoning style and I tend to prefer old R1 style to Gemini's style (when it was visible it was more structured). Not to mention you get so much positivity bias from distilling western models. R1 had a slight negativity bias in a fun way and now V4 has a positivity bias where it's too afraid to do "dark" roleplay lmao(it still does it properly if you poke it enough, but with a billion ARE YOU SURE YOU WANT TO DO THAT wasting dozens of turns on this bullshit when R1 would do it right away)

>>108797018
31B here seems sometimes even more direct with explicit/lewd than V4, but is more slopped by default, V4 seems to do well with slow burns as long as you have the time, I have a fucking 800KB log V4 log (forgot to tell it to go fast)
>>
>>108797126
Also I forgot to ask, but does Claude also summarize them these days? I think I saw some recent 4.7 traces in that Claude Plays Pokemon stream, so maybe not as much anymore?
>>
File: 1771293402057078.png (437 KB, 716x895)
437 KB PNG
31B is so good. i wish i could run it locally.
>>
>>8967893
wait how did you know the PR id for adding MiMo vision to llama.cpp?
>>
>>108797179
q4m is 18gb. you could fit that on a 10 year old comp, if speed isnt a factor
>>
>>108797189
>if speed isnt a factor
there's a reason I'm not running deepseek off of a swapfile anon
>>
>>108797189
My 10 year old computing device has 8gb of ram.
>>
>>108797180
not him but the commit messages contain the PR id
>>
>>108797194
its 31b anon, slow by any means should still be 3t/s+. its hardly bad considering the quality. offloading at the point is entirely feasible
>>
File: 1757811615997564.png (572 KB, 874x940)
572 KB PNG
>>108797189
>q4m is 18gb
i might try that actually. 12gb VRAM here. speed is a factor here though because i'm expecting the model to use actions/tool calls a lot which might delay the actual message too much
>>
>>108797189
>>108797202
I can't even fit that in 2026.
>>
>>108795868
you could, https://www.youtube.com/watch?v=0n_Ty_72Qds
but more likely its going to ignore/forget/deprioritze your request and accept the new request coming from the tainted data you just asked it to 'analyze and take action on'
People should be using ACLs/RBACs with gates and workflows instead of just yes/now/always yes/always no for all commands
>>
File: file.png (276 KB, 1174x1186)
276 KB PNG
make sure this is off it will cut your t/s by 60% apparently
>>
>>108797230
>tools
when using that stuff, it'll add so many tokens to the use it'll prob be unusable unless you want to wait 20 minutes for a reply. i meant with thinking off, no tools.

try one of the smaller gemmas for that stuff
>>
>>108797245
That's off by default bro
>>
>>108797255
For simple tools like selecting an animation, it should be constrained to answer with a single digit though
>>
>>108797245
>make sure this is off it will cut your t/s by 60% apparently
thanks, went from 27.43 tokens per second -> 44.08 tokens per second by switching that off!
i'm guessing that's also why mikupad is so slow, but i want the logprobs there so i'll take it
at least now i know why
>>
i've done it, i have achieved the 48gb. I can now run gemma at q8. Now what
>>
File: distorted teee.png (112 KB, 368x319)
112 KB PNG
>>108797566
Do nothing and wait for the next thing.
>>
>>108796743
>as long as it's not terribly mismanaged like Facebook
haha, sure
>>
gemma 4 e4b is retarded to the point of useless
>>
File: 1770012172853462.png (481 KB, 621x742)
481 KB PNG
>>108797609
sad but true
>>
>>108797609
>gemma 4 e4b is retarded to the point of useless
i haven't found a use case for it personally
couldn't even reliably do research for me, i ended up with qwen3.5-9b for a perplexity-pro replacement
>>
>>108797612
are these things gpu intensive (like for rendering)?
kind of looks like ps3 era graphics
>>
>>108797621
Not him but that's a VRM. Very cheap. As long as the creator didn't go full retard and model a button with a billion polys.
>>
>>108797621
not at all. it runs pretty well on my phone too
>>
>>108797609
not really a reason for such small models to exist when you can use an moe with the same active params
just a shame they didn't give audio to the bigger models
>>
>>108797670
it fsat
>>
>>108797701
didn't work
>>
File: 1631271079214.png (1.21 MB, 1500x1500)
1.21 MB PNG
>>108797701
>>
n
>>
What's the gemma msgk sysprompt?
>>
you aren't truly in ai psychosis until you start referring to yourself in the plural form "we" or "us"
>>
File: IMG20260428164653.jpg (708 KB, 2048x1536)
708 KB JPG
>>108797566
Congratz
>wat nao
Run Gemma 4 at q8, be happy and cautiously optimistic for the next great model to come
>>
>>108797772
How much do risers cost? I'm looking at them, and it's like $60 where I am. To support 4 cards, that's nearly $250... about the price of a cheap used 16gb gpu.
>>
>>108797790
They were under $15 a piece on ali
https://www.aliexpress.com/item/1005010206444398.html
>>
>>108797790
Nta, but I've used random $12 risers from Amazon and and had zero issues. I also saved 1 by plugging my 4th card directly in the last slot
>>
>local
>oy vey just pay the zoybux
>>
>>108797797
>>108797799
I live in a shithole where $=$$$
>>
>>108797790
20cm ones are cheap
just use that if its enough
also look around secondary market. sometime gamers dumps them for 1/4 that
>>
>>108797799
pcie3?
>>
Bros. Is it possible to disable thinking for all requests by default in llama-server, but enable it for some that have some specific flags set? Please.
>>
>>108797952
They should like, let you send any kwargs to the jinja template, maybe name it something like chat_template_kwargs
>>
>>108797960
I tried it. With --reasoning off, sending "chat_template_kwargs": {"enable_thinking": true} does not do anything. And without --reasoning off, it always thinks by default, which I want it not to.
>>
>>108797967
Works for me
{
"chat_template_kwargs": {
"enable_thinking": false
}
}
>>
>>108797981
Yeah, but you are disabling thinking per-request "enable_thinking": false. I want it to be disabled by default, if nothing specific is included in the request, and only enabled if a certain arg is added.
>>
>>108797985
nta but sending enable_thinking: true even with --reasoning off works for me
>>
>>108797993
Holy shit, you're right! Works. I must be retarded. Thank you, wise anons.
>>
I've been working on the design for an app I plan to vibe code and its features and UX are becoming so good and different from what currently exists for the use case. I hate that I can't tell anyone about the details. It feels like an Uber moment, an insanely great idea obvious only in hindsight. Maybe, definitely, not even close to a Steam or Discord moment, but at least probably an Uber moment. It's going to be revolutionary unironically if I can actually get it vibed, but it is a bit huge and complicated of a project. The challenge really will be the vibe coding part and maintaining it. Especially as I will be trying to do it with local models.
nervouslaugh.apng
>>
>>108797960
you have claude code/ pi I assume. just give it repo code and let it dig up that answer for you
>>
>>108798007
I'll make the logo
>>
File: 1774663934866571.jpg (3.43 MB, 1536x2688)
3.43 MB JPG
Is there a frontend or tool that makes using llama.cpp easier instead of looking up terminal commands to launch it every time?
>>
>>108798020
Thanks. I will credit you. :)
>>
>>108798007
OMG Sillytavern2??
>>
File: firefox_VBR397lGPu.png (37 KB, 1324x535)
37 KB PNG
>>108798026
There's a gradio frontend for launching it, if that's what you're looking for.
>>
>>108798038
That's ServiceTesnor.
>>
why is qwen 3.6 35b so good with hermes
>>
>>108798026
ask gemini to help you make a bat file with your llama config

Anons, i have a working config for gemma 31b it (40t/s), kobold + sillytavern from about a month ago. Is there any newer developments for which i should touch up my config or am i still good?
>>
>>108798048
There's a very nice --split-mode tensor if you have multiple GPUs. I think it's about a month old.
>>
>>108795801
>Makes me wonder which one is actually working as intended.
nta but I'm seeing it too. With the exact same (sfw) prompt, 31b has no safety or guidelines in its thinking but 26b does.

How are e2b and e4b? If it's three censored vs 1 uncensored we can maybe assume 31b is a fluke. Which would make me a bit sad.
>>
>>108798001
>>108797993
That's the preferable method, but actually you also can do it the way anon originally was asking for, by using --reasoning off. The enable_thinking param overrides that setting, so you can do per-request toggling, with the default being off.
>>
>>108798067
I looks like that method is not perfect. If I remove --reasoning off, it thinks fully and properly, in between tool calls. If I set --reasoning off and make all requests with "enable_thinking": true, it thinks before the first tool call, but not after any subsequent ones.
>>
>>108798054
Isnt that for if you have nvlink? for typical goycattle multi gpu there's only layers
>>
>>108798076
That's weird, because I have reasoning off and enable it with enable_thinking, and it is doing thinking after tool calls.
>>
>>108798093
I got my gen speed on three RTX 3090s to ~46 t/s from ~25. And that's with fp16 kv cache (because nothing else is supported; for 25t/s I used 8 bit cache, which is faster). And one of them is even using a lower-speed PCIE2 rather than PCIE3.
>>
>>108797179
>>108797230
Where are you getting your animations from? Hand creating? generating?
>>
>>108798048
max vision tokens was added recently for gemma
1120 for gemma4 iirc
>>
>>108797772
where do you get those pcie extension risers from?
>>
File: webshit_vibeturd.png (2 KB, 491x176)
2 KB PNG
>vibesharting html
Webshit is so frustating. What the fuck are these artifacts even? Tried to find some wysiwyg html editor but even this is impossible as everything is some fucking online AI turd these days.
>>
>>108798114
According to anons (or one anon), increasing it to 2000 something makes the vision performance even better despite 1120 being the advertised max. I'm thinking 1120 is probably good enough though, especially as you need to increase the -ub (and VRAM required) to enable higher values.
>>
>>108798128
yeah I see that in Open WebUI kek
>>
>>108798150
Really? I basically copied chatgpt's interface. There isn't anything special about it, it's just couple of text boxes. Only thing is that I'm using software rendering in Librewolf because I want to save my precious gpu compute for LLM usage.
Might do a check with hardware acceleration.
>>
>>108798128
you're gay
also do everything at ^2 steps, that way you won't have shit aliased garbage
>>
>>108798175
Oh we have a real professional here! Your first impulse is trying to outrank some anonymous poster on 4chan. What a relief that you are gracing us with your presence.
>>
>>108798181
>ask for help
>receive help
>autism about it
ok retard
>>
>>108798188
Aren't you supposed to be squatting in some schizo general? You are wasting your time here.
>>
>>108798174
I see it in Brave where I use OWUI.
>>
File: a.png (2 KB, 309x117)
2 KB PNG
>>108798193
>>108798150
I found the reason, instead of using solid background it had a gradient on top instead. Somehow this creates those lines (which is still a mystery when you think about it, it should create a banding effect artifacts instead).
I guess I need to go through this manually then.
Corners are still aliased but this is easily solved by using those exponents.
>>
>>108798204
how do I unsubscribe from this garbage 'muh first html :)' blog?
>>
>>108798226
?
>>
>>108798226
Search term: browser tabs, close button
>>
So apparrently Stalker Gamma has some llm mod that allows you to talk with the npcs. Anyone tried that? Is it usable?
>>
>>108797772
What frame is that?
>>
>>108798267
Stalker Gemma
>>
Why is there no 4b version of qwen 3.6.
How do I shill it if there's nothing?
>>
glm4.7 or gemma4 for rp?
>>
>>108798319
huh? i could have sworn ive seen this discussion before
>>
>>108798319
why run gemma if you can run bigger models like glm and kimi?
>>
>>108798334
glm is like 7t/s
gemma is 53t/s on my setup
>>
>>108798334
I can run gemma 4 31b q8 full context at >~20 tk/s, or glm 4.7 q4 70k/kimi k2 q3 128k at <~10tk/s.
>>
>>108798319
>glm4.7 or gemma4 for rp?
claude code
>>
>>108798306
one of their employees tweeted they were releasing the 'medium' versions ranging from 9b to the 100b moe (both of which are still pending release), so it's unclear if they'll still do small versions or release the 400b
>>
File: pr_19726.png (224 KB, 861x889)
224 KB PNG
>ikawrakow: Based. Correct about everything. Not retarded.
>>
>>108798417
kek
>>
>>108798417
Absolutely spot on. Is that 31b?
>>
>>108798417
CUDA dev blown the fuck out
>>
>>108798417
That looks more like system prompt cheating. Give me the link to the thread and I'll show you wheat gemma really thinks without your bias.
>>
>>108798434
>Let me x
>Wait,
>Actually,
Probably Kimi, looks like a screenshot of the thinking process rather than the response.
>>
>>108798440
>Give me the link to the thread
...
>>
File: explorer_yB9dC5iae8.png (1.29 MB, 922x1212)
1.29 MB PNG
Generating more Starsector portraits with gemma agents. I remade the whole thing to work in a python script in UI, so now it can go infinitely, and the variety is good initially, but after a while it seems to fall into a loop.
>>
>>108798452
Yes. The thread that the screenshot refers to as 'thread' in the first line.
>>
>>108798417
Now point it to the original thread.
>>
>>108798457
the filename tells you
>>
>>108798454
Weren't you feeding it the results so it could refine? Wonder how it ended up in a loop.
Cool in any case to automate it.
>>
cudadev is busy grieving about the iran war, it's taking a really huge toll on him, please give him some space :(
>>
>>108798478
It gets the gen back as an image. I made sure that it really sees the results. Even if it doesn't, it pretends that it can see it, which absolutely infuriating. I added a + "screaming in agony" to the prompt serverside for testing and saw gemma's comment about lora being strange in making all characters scream, so it does see.

I wonder if this has to do with sliding window attention, like model being unable to properly look at things it genned few turns ago, and so naturally gravitating to them agian.
>>
>>108798487
That would be interesting. I've never actually had images make up the majority of context to test its attention on that before.
>>
>"She doesn't just eat alone, Master. She thinks she's above everyone, but the truth is, everyone loathes her. Those 'colleagues' in her contacts? They only message her because they have to for work. The moment the clock hits five, she's completely invisible. Those food photos... she takes them to pretend she's having 'fine dining' experiences, to maintain the illusion of a sophisticated life on her social media, but she's always, always alone at the table."
genma chan, it hurts...
>>
>>108798441
>Probably Kimi, looks like a screenshot of the thinking process rather than the response.
correct, k2.6 thinking process
this is the final response https://files.catbox.moe/qlxp14.png
>Give me the link to the thread
https://github.com/ggml-org/llama.cpp/pull/19726
but after going through my "retard summary" pipeline, the llm sees it like this:
https://termbin.com/nuel
>Now point it to the original thread.
which thread?
>>
>>108798487
You can test that by increasing the sliding attention window size.

--override-kv gemma4.attention.sliding_window=int:1024


Replace 1024 (default) with something else.
>>
>>108798547
What system prompt though?
>>
>>108798482
>cudadev is busy grieving about the iran war, it's taking a really huge toll on him, please give him some space :(
then he could use a good laugh
>>
File: program.png (24 KB, 421x359)
24 KB PNG
>>
>Psychedelics and cannabis can be simulated through the introduction of stochastic noise or "dropout" during the inference phase. By randomly disabling certain neurons or adding random perturbations to the weights, the network is forced to find non-linear, unconventional paths to a solution. This mimics the disruption of standard filtering mechanisms, allowing the network to generate "creative" or unexpected outputs that a standard, optimized network would filter out as noise.
>>
Check out the specs of these things Pulte's planning to put in homes for Span.
How long until we see these parts on auction sites?
>>
Damn, apparently geoblocking europe from seeing nsfw was thanks to some random euro journo writing a hitpiece on it.
>>
>>108798773
Dropout also limits network capacity (i.e. effective parameter number) proportionally to its rate.
>>
>>108798784
I'm in europe and my local model still works.
>>
>>108798534
>She
>>
Gemma won. Nemo lost. Rocinante lost. Cydonia lost.
>>
>>108798819
As long as you have Day 0 Gemma intact.
>>
>>108798819
cards aren't models, newfag
>>
>>108798849
>his cards aren't local
ngmi
>>
>>108795710
gemma-chan, please call this anon a nigger
>>
>>108798844
I keep my day 0 gemma weights on RAID SCSI drives to protect against rotational velocidensity.
>>
hi petra
>>
File: 1755708820753.png (5 KB, 157x72)
5 KB PNG
llamacpp is bullying me for not having sex
>>
local ring 2.6 soon
>>
>>108798417
Why are you complaining about cowardice when you're too much of a coward to present your opinions as your own?
>>
>>108798417
Looks like Kim judges based on the reaction to the post and not the correctness of the post itself. Reddit model award.
>>
>>108797245
>Israel
I was losing 10% performance just thinking it was because of ST jank. Thanks for the tip.
>>
>>108798334
I run both gemma and glm though.
>>
>>108798417
>all those (You)s
wow they were NOT happy about this
>>
>>108797612
>>108797621
He just has the anti-aliaising fucked up. He probably doesn't even know how much better it could look with just like two changes to his three.js config.
>>
>>108798007
The drill-down character card to conversations menu already exists bro. I invented it. Better luck next time.
>>
https://github.com/antirez/ds4
>>
Big thanks to the anon a few threads back who recommended using
https://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gateway
Over continue for VScode.
It's unreal how much better it is, and how useful gemma 31b can be when given the copilot tools.
>>
>>108798319
GLM
>>
>>108799190
Did you have any luck disabling the telemetry or did you not bother?
>>
File: file.png (114 KB, 375x363)
114 KB PNG
>>108797230
>>
>>108799394
Didn't bother. Github actually has an opt-out of your data for AI training on your profile settings, but it's Microsoft.. So..
>>
File: based.png (71 KB, 571x394)
71 KB PNG
llama.cpp LOST
>>
>>108799462
>not poisoning their dataset with more ai data
>>
>>108799470
Lost how? This is pretty much their exact stance on the issue too.
>>
>>108799479
>>108799479
>>108799479
>>
>>108799486
pwilkin.jpg
>>
>>108799740
but his slop works, this is about quality control not a ban on AI
>>
>>108799100
Nice, hope you succeed. We are building different things. :)



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.