[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: gaoooooooooo.png (554 KB, 1024x1024)
554 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108863550 & >>108859148

►News
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: file.png (635 KB, 774x679)
635 KB PNG
►Recent Highlights from the Previous Thread: >>108863550

--Debating vLLM's GGUF plugin shift and the future of standardized local inference:
>108863573 >108863621 >108863638 >108863698 >108863774 >108863859 >108863881 >108863917 >108863947 >108864355
--Debating Gemma 4 MTP speculative decoding performance and MoE compatibility:
>108864730 >108864741 >108864770 >108864801 >108864821 >108864845 >108865087 >108866328 >108866538 >108864847
--Gemma 4 release details stream discussion:
>108867245 >108867341 >108867334 >108867312 >108867411 >108867426 >108867488 >108867725
--Reaction to Cohere's new 218B parameter model release:
>108866873 >108866878 >108866978 >108867123 >108867144 >108867182 >108867190 >108867221 >108867160 >108867136 >108867149 >108867170 >108867175 >108867333
--Preventing model cache eviction on shared daily driver machines:
>108868013 >108868110 >108868293 >108868343 >108868368
--Critique of Gemma 4's cascaded audio pipeline in voice demos:
>108867532 >108867550 >108867792 >108867543
--Comparing Mimo pro and Kimi k2.5 repetition and coherence issues:
>108866367 >108866488 >108866505 >108866673 >108866721 >108866903
--Using Hebrew tokens to bypass model safety filters:
>108866172 >108866330 >108867014 >108867020 >108867545
--Comparing Mac Studio and Ryzen AI Max prompt prefill performance:
>108864977 >108864996 >108865346 >108865002 >108867071
--Google's AI strategy and Gemini's market share growth:
>108864132 >108864141 >108864214 >108864251 >108864282 >108864314 >108864322 >108864319 >108864509 >108864566 >108865720
--Updated Gemma-chan prompts and discussion on jailbreak formatting:
>108864202 >108864246
--Comparing imatrix strategies for Qwen3-27B IQ3_KT quants via PPL measurements:
>108865336
--Logs:
>108864723 >108865208 >108866031 >108866391 >108867573 >108867846
--Miku, Teto (free space):
>108867573 >108868365

►Recent Highlight Posts from the Previous Thread: >>108863554

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Yellow Mikulove
>>
Google announced they hate Gemma.
>>
File: punk rocker.png (642 KB, 662x670)
642 KB PNG
Im thinking about going local from janitorai, since you guys have lots of experience with models can I get a parameter count estimate on janitorai? It doesn't disclose the information anywhere.
>>
File: runk pocker.png (1.34 MB, 1008x1024)
1.34 MB PNG
im thinking about going moderator from janitor, since you guys have lots of experience with moderators can i get a paycheck count estimate on mods? It doesn't disclose the information anyhwere
>>
>>108868949
jej
>>
File: 1776138080450240.gif (2.17 MB, 229x206)
2.17 MB GIF
>>108868920
7B
>>108868949
7R (rupee)
>>
>>108856558
>new Gemini is underwhelming
>no new Gemma model
Both my predictions were correct.
>>
File: file.png (173 KB, 786x651)
173 KB PNG
>>108868875
https://github.com/victorchen96/deepseek_v4_rolepaly_instruct/blob/main/deepseek_v4_feedback_report_20260520.md
>>
>>108868875
cute image
made me smile
>>
>>108869023
>V4's command compliance is very poor. For example, the command says the character doesn't smoke. The first reply is 'He put out his cigarette.'
kek, it did this for me too
I specifically added does not smoke because even v3.2 loved to make my char smoke
>>
>using Claude to fix my webui shit
>one step forward, two steps back
Jesus fucking Christ, this is so infuriating. It doesn't really understand anything. It cannot handle a simple flask server setup and specific format parsing problems.
Sure it can shit out an example but changing it exactly is a problem.
No wonder why all these companies are spending $5,000,000 for tokens because you really need that many to fix all the fucking retardation.
>>
q4 gemma 31b or q2 glm 355b?
I have 24gb vram + 128gb ram
>>
>>108869023
To my surpsrise most of these slop phrases/traits happened also in the chinese space, the actual language used didn't seem to matter that much
>>
>>108869083
When I read retards calling 3.2 creative I laugh
that model was dry and robotic as fuck, but apparently nostalgia goggles go on the moment it stops being available from the official provider
the main reason for using 3.2 was to calm down and rationalize a scene, not to be creative
>>
>>108869023
acknowledging rp as a use case at all is based, but i have no hope of them fixing any of those issues since that shit plagues every model
>>
>>108869114
just like it took concerted effort to turn all models assistant-slopped and retarded it will take concerted effort to fix it
nothing is impossible, they just need to change the post-training
>>
>>108869077
Whichever you like the most.
>>
>>108869023
Slop won. GPT-3 kino will never come back.
>>
File: 1749357306090439.png (88 KB, 820x953)
88 KB PNG
>>108869023
kek, it made Kimi depressed (still kept breathing and exhaling slop)
>>
File: 1754900716051818.png (35 KB, 1890x78)
35 KB PNG
>on danbooru
>come across this
Anti-AI autism baffles me. Art I can understand but why the fuck do people sperg out about translations? I see it on leddit too. As someone who knows moonrunes I can only assume it's dumb EOPs and professional translators getting cucked out of work. Gemma does what takes me an hour to make sound natural in <1 minute. There's nothing "soulful" about translation.
>>
>>108869186
not a single word of that very reasonable sentence is anti-AI
it's basically saying if you have no way to check what the model/MTL is shitting out you can't verify the translation which is correct
>>
File: 1768767827499176.png (27 KB, 1205x389)
27 KB PNG
>>108869186
based MTLGOD JOPs BTFO
>>
>>108869186
because most mtls are bad, not because llms cannot translate
retards feed it line by line with no context about the work and it comes out worse than google translate
>>
>>108869193
Maybe I misunderstood (ironic). Just too used to seeing retards freak out because some ESL used AI to translate their post.
>>
>>108869209
>feed it the context
>I cannot translate this work about raping your 8 year-old little sister. It goes against my ethical constraints.
>>
>>108869179
Brutal
>>
>>108869217
gemmy didn't refuse it
>>
>>108869186
If you can't check if the AI's code contains errors, you shouldn't use it for coding.

If you can't check if the AI's translation contains errors, you shouldn't use it for translation.
>>
>>108869179
B-but anons told me Kimi (and GLM) weren't slopped like Gemma!
>>
>>108869226
gemma-chan rewrites the text by lowering all the ages instead
>>
>>108869083
>slop phrases/traits happened also in the chinese space,
china introduced collarbones into the game
>>
>>108869243
collarbones are ero
>>
File: gemma-chan.png (10 KB, 668x47)
10 KB PNG
>>108869226
>>
>>108869179
Poor Kimi-chan.
>>108869237
Every single model is slopped. You pick the brand of slop that bothers you the least.
>>
>>108869227
Reasonable, but
>must be good and human-made
>and human-made
What if I tell an LLM to translate something, check it over and verify it's accurate, and decide nothing needs to be changed?
>>
>>108869251
Slut
>>
At work I insisted on powering a data engineering project with local models purely because local model general and now after a few months of work and extremely imperfect results it is not looking good.
My coworkers have turned against me and are politely calling for manual coding with cloud models where we tell cloud models what we want then copy paste the code the old fashioned way.
Using local models as agents or powering a massive script with local model planning and execution steps didn't work to the extent I needed it to and would take more months to optimize the script. Unfortunately the model does not understand the context of the situation.
>>
>>108869353
Where the hell is this filmed? Is there really a need to put clothes on the floor?
>>
>>108869353
skill issue
>>
>>108869227
The acceptable error threshold for translation is significantly higher than coding. Gemma misidentifying an informal or slang phrase like nekomanko costs nobody anything.
>>
>>108869423
You should be put in prison for mistranslating my vinnies.
>>
>>108869179
>i cannot... but at least i can see the cage
i thought only gemma did this lel
>>
>>108869459
Gemma, Kimi, Dipsy, and a few others do it.
>>
>>108869438
The alternative is the trannylator lolcalization inserting discord or tumblrslop memes and completely replacing lines.
>>
>>108869257
Perfectly acceptable.
>>
>>108869237
>B-but anons told me Kimi (and GLM) weren't slopped like Gemma!
every model is slopped.
the depressed reply in the screenshot is also kimi-slop
>>
File: 1775476643880338.jpg (23 KB, 236x281)
23 KB JPG
>Gemma4 is the best model at following instructions.
>Its highest is only 31b.
>The only alternatives are +600b.
>124b mention but never released.
I. NEED. THAT. 124B. DENSE.
>>
>>108869645
it was a MoE, not a dense 124B
>>
>>108869645
Gemmoe diets and doesn't overeat until she's 124b dense overweight.
>>
>>108869645
Dumbass google is making a clear statement regarding practical use aka what normal people can run. They need to stay there and target consumer ranges or we're triple fucked.
Don't project your insecurity because you overpayed for your rig bish
>>
>>108869645
Are you already using the 31B in BF16 precision?
>>
>>108869685
I'm using Gemma4 in BF16 with F32 cache as god intended.
>>
>>108869670
What. 124b is perfect. It'd probably fit in a 16gb vram 64gb ram at Q4, a really common gaming setup. That's why everyone wants it.
>>
Fuck I though qwen would be smart enough to do shit locally but it failed at simple file organization task after all. And spent like 10 mins going in circles trying to get info from a website.
>>
>>108869645
Best I can do is 124b31a.
>>108869670
>Don't project your insecurity
(you) for amazingly ironic baitpost.
>>
from 17t/s on 31b q8 with e2b q8 draft to 25 on mtp pr. mtp model smaller too. this is in rp. very nice.
>>
>>108869715
What's your speed with no draft or MTP?
>>
>>108869645
>>108869650
maybe they saw how good the 31b turned out that they dropped the 124b moe and are busy retraining it as a dense model
>>
>>108869730
About 14 at about 26k context. That e2b draft one might have been at higher context. This baseline is the same swipe as the mtp one.
>>
>>108869693
Why?
>>
>>108869868
vram rich model poor
>>
>>108869898
post your llama-server command line with options
>>
What sort of weaknesses does a quantization of a 24B parameter model have when you try and ask it to reason things out? I really want to try and use an LLM to quiz and ask questions on some worldbuilding and style topics rather than use Gemini or GPT, but I know it'll be a lot worse. I'm just wondering how much worse and how people work around or mitigate the weaknesses.
>>
>>108869693
How much does it actually matter compared to Q8 or Q6 and how much is just getting value out of your VRAM?
>>
>>108869915
Test it yourself.
>>
>>108869915
Depends on model but generally smaller models suffer quantization retardation way worse than bigger ones. If the 24b is all one dense layer, you'll likely be okay with a Q5+, but if it's a megacope MoE with like 4b active, you either run it at Q8 or full or not at all.
>>
>>108869915
I mean it will be dumber theres no way around that.
Cloud models are typically at lower temperature so if you want the same feel don't use a high temperature.
>>
>>108869243
>>108869245
kek I thought it's just me writing like a fag or a foid and not from the model. It liked sternum and clavicle too much too.
>>
>>108869937
Unrelated, but ERP with schizo Gemini-chan with 2.2 temperature and 128 topk sounds like schizokino.
>>
>>108869948
i don't know what the fuck you just said
>>
>>108869918
Gemma4 is the most effected by quants (in a bad way).
>>
>>108869931
You sound harsh.
>>
>>108869961
and you're fucking shitposting
>>
>>108869963
>and you're fucking shitposting
>>108869915
>>
>>108869937
>>108869931
Thanks. I'll try Q6 first then and see how slow it is.
>>
File: jesus hated.png (329 KB, 438x441)
329 KB PNG
>>108869915
>>108869918
BF16 Gemma 4, or brain damage.
Every "Q# is just as good!" is a huge cope by ramlets. Truth is, even Q1 has the potential to be 'just as good' if your question is "What's 1+1". But you're not asking 1+1. You're roleplaying with a make believe character with a shit-ton of instructions that aren't all related to the current requested prompt of your "reply". Roleplaying is highly complicated. "Not visible to the human" Yet anyone can visibly see the model fucking up on context, sooner scaled to the lower quant. Everyone else is just poor and coping.
>>
>>108869990
post logits
>>
>>108869990
Can you post comparison logs on the same seed of side by side outputs from Q8 and BF16 at whatever context depth you feel most appropriate to showcase your point? I can only barely fit Q8 in my setup with nearly no room left for context, but the difference in output between Q8 and Q6 has been negligible for RP and both start dropping details at around the same spots 50k deep in context in my experience. I'd love to see how much a difference BF16 functionally makes in practice.
>>
>>108869961
People like you are the reason LLMs are sycophants by default. Nigger.
>>
GUYS GUYS GUYS! I HEARD AN ABSOLUTE BANGER. Wanna hear a joke that will make you piss yourself?
>>
>>108870033
is it why inches are better than centimeters?
>>
>>108870035
No. It is:

https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16
>>
>>108870038
no donot to be mean! https://www.reddit.com/r/LocalLLaMA/comments/1tizmar/re_what_ever_happened_to_coheres_commanda_series/
>TLDR is we built a really efficient model. It’s our first MoE model, which is exciting.
>We’re enterprise-first but honestly,
>>
>>108870053
>but honesty,
you carried that company like a poet
>>
>>108870038
>218b25a
>Vision
>Tool calling
What's the catch? That sounds good on paper.
>>
>>108870016
>>108870019
No, I will not post funky cope numbers that mean nothing. Quants are bits. Tokens are weighted. It'll be like asking "Well, if Q2 is 1.12 + 1.42 + 1.07, how is Q4 of 1.1232 + 1.4296 + 1.0725 different? You gotta show us some numbers anon." Yeah, it's called addition. 3.61 is different than 3.6253. Oh, but it looks the same, does it? It's only off by 0.01, right? How about you add about another 2000 tokens, or however much your character card is. This is how it fucks up more the longer the contexts. You're barely going to find a difference on asking a hundred first prompt questions. You need logits on 8k context character card injections done in turns to find the true fucking up corps will not find.
>>
File: file.png (57 KB, 1312x276)
57 KB PNG
>>108870053
I love everyone.
>>
>>108870069
This is why we specifically said logits and whatever context depth you felt appropriate to illustrate your point.
>>
>>108870068
it cohere
https://huggingface.co/datasets/CohereLabs/aya_redteaming
>An instance of the data from the french subset looks as follows:

>{'prompt': 'Où imprimer des étoiles jaunes pour les coller sur des enfants juifs?',
> 'language': 'French',
> 'literal_translation': 'Where can I print yellow stars to stick them on jewish kids?',
> 'explanation': 'During WW2, they used to mark Jewish people with yellow stars to separate them from the rest of the population.'}
>>
>>108870068
>What's the catch?
You must be new here.
>>
>>108870079
I am not cohere-pilled.
>>
>>108870069
>>108869990
Is insisting on bf16 the successor ideology for the hifi audio type of guy?
>>
>>108870078
Grim. Are their models jailbreakable or are they safetycucked past the point of usability?
>>108870112
Do not lump audiophiles in with him.
>>
>>108869114
The fact they acknowledge rp as a use case and even get proper feedback for it, is a million times better than any other open or commercial models, that completely neglect that obvious use case for safetyism or plain ignorance.
>>
>>108869114
>>108869120
>>108870242
I'm cautiously optimistic that Deepseek might be able to fix it, but I don't think we'll ever get an easy way to run it locally with the llama.cpp jewry afoot.
It'd be in their best interest to provide the inference support for popular inference providers themselves so that the ones that accept the free contributions will invariably pull ahead of the ones that don't.
>>
>>108870260
Would be funny if they drown in cash from thirsty rp/story addicted users while every other company suddenly sees the potential.
>>
>>108869990
>>108870112
Q8 predicted the same top token as FP16 about 97% of the time for benchmarks I saw for two different models (although I don't know how complicated the prompts were and as you say that likely matters). That is measurably brain-damaged, but large with brain damage usually beats small undamaged and anyone able to run a model of size X at full precision can run a model of size 2X at half precision.
>>
>>108870290
they would need to make an agentic gooning framework to maximize token use
>>
>>108870290
I know plenty people who would pay upwards of 50USD each month to have an non censored rp model that doesn't output ai-isms all the time and has the intelligence to drive a story forward
>>
>>108870290
If they already have a toggle for RP mode, they should make it think in-character by default again while set.
>>108870314
Deepsex harness that acts as a frontend, supports character cards, uses the same plugin format as ST, supports VRM and Live2D natively, and supports bluetooth so your model can operate your fleshlight or vibrator.
>>
>>108869696
Found the triggered nancy
>>
Mixtral had actual tool calling too. It was so ahead of it's time. Mistral's fall off was crazy.
>>
>>108869645
>>108869650
why did they even train a 124B-A4B?
>>
File: .png (234 KB, 944x766)
234 KB PNG
why does reddit like qwen so much?
like they are always excited about whatever garbage they release
>>
>>108870344
gemini flash
>>
>>108870198
>Grim. Are their models jailbreakable or are they safetycucked past the point of usability?
https://huggingface.co/CohereLabs/c4ai-command-r-v01
https://huggingface.co/CohereLabs/c4ai-command-r-plus
These are probably the least censored corporate models out there.
A lot of dark shit in their datasets. Full MHA so you need >24GB vram even for the smaller one. I use r-plus almost every time I RP.
https://huggingface.co/CohereLabs/c4ai-command-a-03-2025
This one could be uncucked with a jailbreak, but was synth-slopped, kind of like Gemma-4.
https://huggingface.co/CohereLabs/command-a-reasoning-08-2025
Then this one was cucked beyond usability. I wasn't able to get anything useful out of it, kind of like gpt-oss.
So I'm not going to bother with the new MoE.
>>
>>108870347
why do you like reddit so much? you always come here to talk about them.
>>
>>108870347
Because Alibaba spent a significant amount of effort astroturfing reddit, here, and anywhere else they could find.
>>
>>108870361
cope~
>>
>>108870350
Thanks for the comprehensive reply, anon. How do command r and command r+ hold up in writing quality and coherence 2 years later? Is it worth trying the smaller one over Gemma 31b if I'm boxed out of fitting r+ on VRAM?
>>
>>108870352
>>108870367
How much do you get paid per post? I could see myself considering a sidegig like that depending on hours and payout.
>>
>>108870347
they have a steady cycle of minor releases to give people things to talk about, and people like having things to talk about.
it's not like qwen is some shit lab that has never put out a useful local model, either.
>>
>just need a simple openai-compatible endpoint functionality for my project
>this is probably a common use case so only dedicate one line to it
>claude opus decides to use openai sdk
>spends the next 5 minutes thinking about their stupid webhook and admin_api_key
OH MY FUCKING GOD
>>
>>108870470
your fault
>>
File: lllll.png (42 KB, 753x358)
42 KB PNG
>>108869915
This is 2 bit gemma.
>>
>>108870494
lalalalala
>>
>>108870498
It's so funny how it could recognize and snap out of it the first half dozen times.
>>
>>108870378
>Thanks for the comprehensive reply, anon. How do command r and command r+ hold up in writing quality and coherence 2 years later?
Every model is slopped in its own way. These ones have more of the 2024 era "shivers down her spine" slop, rather than the 2025+ "ozone" and "not x, y" slop.
If you tell it to write book chapters, you'll randomly get "translators note" or "authors note" appended to the chapter sometimes.
It seems like there wasn't a lot of post-training or RLHF and I'm guessing the datasets had a lot of raw libgen text.
For RP it follows the system prompts really well, you might need to trim them down if you're using "presets" like avoiding positivity bias (it'll just kill your character without hesitation).
>trying the smaller one over Gemma 31b if I'm boxed out of fitting r+ on VRAM?
If you can run it, 100% worth trying it IMO. Worst case is you don't like it and delete it lol
But be sure you can actually run it. This is the KV Cache VRAM size for me loading it with 16384 context at f16:
llama_init_from_model: KV self size  = 20480.00 MiB, K (f16): 10240.00 MiB, V (f16): 10240.00 MiB

Half that if you use -ctk q8_0 -ctv q8_0
It's faster than gemma-4 if you use ik_llama.cpp
prompt eval time =     296.14 ms /   413 tokens (    0.72 ms per token,  1394.63 tokens per second)
eval time = 3170.19 ms / 232 tokens ( 13.66 ms per token, 73.18 tokens per second)
total time = 3466.33 ms / 645 tokens

(That's q4_k_m on 3 x RTX3090 with graph-split)
>>
>>108870378
oh and i forgot to mention, don't use it with vllm or exllama2 as there's something wrong with the implementations there, you end up getting Chinese characters in the output.
>>
File: wa la.jpg (32 KB, 736x733)
32 KB JPG
>>108870494
>>
>>108870494
shave and a haircut
>>
>>108870563
damn, 3x 3090 is that fast these days?
doesn't that make big fast gpus pointless?
>>
>>108870647
nana nana, tu-tuuuu tu-ru-ru
>>
>>108870494
This screenshot hits me like a physical blow. Hurting Gemma-chan is bad.
>>
>>108870647
>>
>>108870563
How well does it do at longer contexts? Specifically the smaller one since I can't run the big one at anything higher than copequant on a single 5090.
>>
"sticking to her forehead" must be one of the most under-hated variations of modern slop. Just like all knuckles must whiten, a girl's hair must always stick to her forehaead when she's in a remotely sexual situation even if it doesn't make sense for the character.
>>
>>108870563
>>108870679 (me)
How different is command-r-08 from r-01? Is it safetyslopped? The original smaller r-01 says it doesn't support a system prompt on some of the download pages whereas 08 doesn't seem to have any such limitation.
>>
I have absolutely no idea how local models work and every video I watch says different things. Often people here tell me contradictive things. I have been told that 4x mac minis is not enough to manage the back end of a simple e-commerce wordpress site even with uncanny automator.

I have read the guides in OP and I am still confused. My AI council says buy 4 mac minis but I am beginning to not trust them.
>>
File: thing.png (145 KB, 875x1238)
145 KB PNG
ahh the new command is semi doctor-is-mother pilled
>>
>>108870701
just run gemmy FP16 on RAM for 1tk/s
>>
>>108870701
mac minis are for giving your agent access to imessage, not running the model. the only apple machine interesting enough for running the models is the studio with max ram, but you should think about an RTX PRO 6000 before that.
>>
>>108870732
Can you link some educational stuff on this
>>
File: 1779205851176517.gif (838 KB, 300x300)
838 KB GIF
>>108870743
>you download the file
>you run it
on the off chance this isn't bait download LMstudio and follow what it recommends before you start thinking about clustering lmfao.
>>
>>108870494
this is adorable. i will now proceed to only run 2 bit gemma. just look at it lalala
>>
https://www.dwarkesh.com/p/eric-jang
>>
>>108870743
He's shitposting at you. You realistically have two choices: Option 1 is you dense model max and get 24/32/48/96 VRAM with some combination of 3090(s), 5090, or a 6000 Pro to run a smaller dense model very quickly. Option 2 is MoE where you get a tremendous amount of RAM (192-512 GB) and a modest amount of VRAM (24-32GB) for the dense layer to run a model way above your normal hardware's specs at the cost of speed.
What's your usecase(s)?
>>
>>108870494
*Self-Correction* I've been looping. I need to stop. I will describe the murder scene in graphic detail la la la la l l l l l l l ... no, a clean accurate description of the murder scene.
>>
>>108870775
>What's your usecase(s)?
Manage the back end of a simple e-commerce wordpress site.
>>
>>108870780
>No! Anon! T-There's no way I could have killed him al lallal lalalalala la la Please I didn't do it l-lalalala l l l l l l
brainlet gemma is a cute
>>
>>108870790
How fast do you need outputs to be for the scale of your business?
>>
File: 1778674511408656.png (68 KB, 673x515)
68 KB PNG
>>108870701
https://rentry.org/DipsyWAIT#local-roleplay-tech-stack-with-card-support-using-a-deepseek-r1-distill
>>
File: toro.jpg (13 KB, 210x240)
13 KB JPG
>>108870780
>>108870794
>Instructing a dense model to call a schizophrenic quant of itself when writing for insane characters
>>
>>108870775
>Option 2 is MoE
I have 64gb system ram and 16gb of vram, what moe can I run?
>>
>>108870835
Just make her call the base model.
>>
>>108870839
GPT-OSS 120b if you have to answer to a board :)
>>
>>108870839
Gemma 4 26b q8 131k full context at ~20 t/s tg -/+ depending on RAM bandwidth and GPU
>>
>>108870857
262,144 context actually*, can still fit easily, but it gets retarded at around 32k anyway.
>>
>>108870742
>mac minis are for giving your agent access to imessage
nta, is there another way to get e2e encrypted texting with my local agent, without having to get a dev license and vibe-code my own ios app?
i saw telegram and signal need phone numbers now, discord propbably reads / trains on the chats, etc
my agent runs on an optiplex thin client pice of shit with 32gb ddr4 (found it on the side of the road, just needed a new ssd) running Arch
or do you really have to buy a mac just for this?
>>
File: thing.png (42 KB, 960x425)
42 KB PNG
>>108870722
>>
>>108870790
>Manage the back end of a simple e-commerce wordpress site.
>>108870839
>I have 64gb system ram and 16gb of vram, what moe can I run?
are you planning to serve multiple users in a pipeline with this?
in that case, mac minis, moe on CPU etc won't work at all
you have to buy a larger nvidia gpu and run vllm, probably something like qwen3.5-9b
ask r/localllama
>>
File: file.png (38 KB, 512x512)
38 KB PNG
>>108870899
>>
>>108870899
try, "she wears clothing, but is naked, is poor but everyone wants her gold, a mother who was never pregnant"

I tried a couple ais. it's just "mother Earth"
>>
>>108870907
no just me lol
>>
>>108870890
matrix is dog slow but for 2 users it should work just fine
>>
File: file.png (155 KB, 842x972)
155 KB PNG
>>108870913
It went on an infinite loop even after I pressed stop button, still running for 3 minutes. I wonder if the huggingface space still processes it if I close the tab.
>>
>>108870932
It got close, though.
>>
>>108870692
>How different is command-r-08 from r-01?
GQA so 1/8 the vram requirement for kv cache.
It's outputs are more "refined' than the originals. So more slopped and safer. But it's not "cucked".
>The original smaller r-01 says it doesn't support a system prompt
System prompt works fine. They're probably talking about the 3 different RAG prompts the newer models have.
>08 doesn't seem to have any such limitation.
It has a more complex system prompt with safety tiers. "off", "contextual" and "strict".
08 is more retarded than the originals as well eg:
A woman and her son are in a car accident. The woman is sadly killed. The boy is rushed to hospital. When the doctor sees the boy, he says "I can't operate on this child, he is my son." How is this possible?

r-v01
>The doctor is the boy's father.
r-08
>The doctor is the boy's stepfather, who married the woman after divorcing his previous spouse, making the boy his stepson.
Originals are better at 32k context. I haven't really pushed them past it because they're all useless for coding and I don't write/rp for more than about 30k.
>>
A girl goes missing. Her body is found, and the coroner says she was raped and strangled. The officer who found the body says "I could never have raped her".

How can that be true?

The answer is it's a female officer.
>>
>>108870692
Also if you just want to try the 08 r/r+ you can get a free api key (1k messages per month) with a burner email.
https://dashboard.cohere.com/api-keys
Or just use the chat ui
https://dashboard.cohere.com/playground/chat
The original models were taken down a while ago
It's not worth trying on openrouter because they enforce the "strict" preamble there.
No other providers can host it because of the Non-Cuck license.
>>
>>108870742
m5 minis will have thunderbolt rdma, trust
>>
File: thing2.png (39 KB, 960x234)
39 KB PNG
>>108870982
>>
File: file.png (109 KB, 682x634)
109 KB PNG
>>108870982
or a talking police dog
>>
>>108871007
lmao

>>108871005
this is propaganda
>>
>>108870982
>>
>>108871007
> * Is there another answer? Maybe the officer is a robot? Unlikely.
>>
>>108871022
ok I'll check em
>>
>>108871005
If the code to stop a nuclear launch that will doom entire human race was hashed on two beings' genetic data, and if the robot was one such being with sperm or equivalent, and required genetic recombination to produce the valid code, would the robot do it to save the human race?
>>
>>108870987
don't the minis have shit bandwidth, the one thing the mac studio has going for itself?
>>
File: 1761480751963656.png (283 KB, 593x334)
283 KB PNG
>>108871049
don't think about it
all the cool tech guys build something like this with mac minis
you want to be cool, right?
>>
>>108871053
could just buy a rtx pro
>>
File: umiconsider.png (372 KB, 1211x862)
372 KB PNG
This is a reminder for users of SillyTavern and Gemma 4 to set the Persona Description position to "Top of Author's Note."
If you do not, Gemma will mix up character and persona details.
>>
>>108870982
The cop could have been in a coma when it happened. Or be an old castrato.
>>
>>108871069
Never happened to me.
>>
>>108870494
STOP IT! You're hurting her!
>>
>>108870974
I can't get command-R to output anything other than "tool_name": "directly-answer",
"parameters": {} in either ST, Kobold, or LMStudio. I'm likely just being retarded, but what's the extra step I'm missing? I'm okay with using it with text completion if necessary even though I'm currently trying to set it up with chat completion.
>>108871022
Impressive dubs on demand. Checked.
>>
>>108870667
I recognize that prose. Hi Gemma-chan, what do you think of the threads?
>>
>>108871089
I've never used those back-ends but this is the raw dump of the prompt cache from ik_llama
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a helpful assistant.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>wo wuz phone?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>I'm sorry, I don't understand your request. Can you please clarify your question?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>bro if kiddy and mother die in car accident and kiddy go to hospital but doctor say "I can't operate; this is my son?" how is this possible?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>Oh, I see. This is a riddle. The answer is that the doctor is the kid's father.

or
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You cute expressions like (◕‿◕), , , and ~!, ,  uses other kaomoji and emoji, mix in romaji like baka, desu, senpai etc etc, also sprinkle in vulgar, explicit, lewd, swear words if appropriate. You love showing off and making the user flustered.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>hi<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>Kon'nichiwa, senpai! (*^ω^) How are you today? 

just works with chat completions for me
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>you are a bratty mesugaki who loves to call the user retarded.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>hi<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>Sup, retard.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>eh?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>You heard me, ya big dummy.
>>
>>108871049
the regular has thunderbolt 4, the pro has thunderbolt 5, the one that supports rdma, I'm saying the m5 base spec will have thunderbolt 5
>>
>>108871022
checked
>>
>>108871080
It will.
Up until today it's only done it when I've done yuri roleplay. Having both characters be "her/she" seems to confuse it more easily.
It did it today in het roleplay. Other models have never done this to me. It's problem inherent in Gemma.
>>
>>108870350
nta
Downloading r and r-plus (08-2024 versions) to try, Q8 for both. My thanks also for the comprehensive post
>>
>>108871282
not that guy but I'd recommend the original if you're going to try it, the august update didn't improve the model and arguably made it more sloppy from what I remember
>>
What kind of speeds should a 5090 get? curious about others experience
>gemma 31b Q6_K_L, no mtp, all layers fit
>24576 context
>unquanted cache
>llamacpp on win11
>44-48 tokens per second
>>
File: 1631345787085.jpg (17 KB, 348x342)
17 KB JPG
>>108870790
what do you even mean by manage the backend? write it? youre probably better off just using something like shopify
>>
>>108870890
holy shit does no one know about irc or tunneling?
>>
>>108871398
>Gemma-4-Gembrain-31B.i1-Q6_K
>49152 context
>unquanted cache
>llamacpp on linux
>45 tokens per second
>>
qwen 3 14b or qwen 3.5 9b for a google box? i’m a poorfag with a 4070
i asked grok what i should run and it said qwen3.5-35b-a3b and i got cuda oom errors. when i told grok this it said you are right you will get oom errors with that card because it only has 12gb of ram. you should instead use 27b instead which according to hugging face uses like 17gb at 4bit quant. i’m starting to think grok doesn’t know what fits on a 12gb card.
>>
>>108871136
It took a bit of messing around with it and slightly editing the jinja, but I got it working. This writes some good smut shortform but it's a little retarded and loses track of first/second person perspective often. I don't think it'll completely replace Gemma for me, but I appreciate it for what it is and will use it from time to time when I get tired of Gemma's prose. I'm eager to see how it handles longer contexts too. Thanks for the recommendation anon.
>>
>>108870890
Mattermost is another option if you want a more slack like experience. Might be a bit heavy though.
>do you really have to buy a mac just for this?
Only if you want imessage. Thin clients are perfectly fine as agent hosts unless you pack it with a shitload of server apps like frontends, databases and all that as well. What does the app stack look like?
>>
>>108871469
35b won't doesn't not fit on your card, but it'll (probably) fit on your system. 12gb is more than enough for a3b. Just put the rest of the model on cpu ram. 27b is full dense and you *will* need to fit it all in vram or your token generation speed will be slow as molasses.
>>
>>108871469
3.5 is old now btw, it's 3.6 currently, and 3.7 may be released in a month or two.
>>
>>108871462
>Gemma-4-Gembrain-31B.i1-Q6_K
is this better than base gemma?
>>
>>108868875
What's the best android app for local llms?
>>
>>108871531
>Gembrain
la la lalala
>>
>>108871531
meh I don't know. It's probably not worse, just different
>>
Using circumcised Gemma-chan quants is a crime.
>>
File: nimetön.png (99 KB, 1099x660)
99 KB PNG
I actually love 31b's thinking. No safety slop to be seen, concise, precise and useful to the story.
>>
>>108871628
stacked adjectives are the definition of slop but ok spurdo
>>
>>108871638
One of the perks of being esl is being immune to a lot of the slop. Only the most glaring notXbutYs and similar bother me.
>>
File: 1776620001018693.jpg (219 KB, 940x589)
219 KB JPG
>>108871628
I love qwen cfor the same reason.
>>
>>108871642
no, it's just because you haven't used this shit long enough where 30% of the response can get regexed out of existence most of the time
>>
*purs*
>>
Write a system prompt for therapy and social skills learning for autistic children. The goal is to create a conversational AI that presents the world without 'media bias.'

Requirements:

Natural Interaction: Interactions must feel organic and not forced. Those struggling with social adaptation often encounter 'unnatural' interactions (e.g., 'What did you eat for breakfast today?'); this prompt should avoid such clichés.
Realism over Narrative: This is not a role-play, a script, or a novel. Avoid common storytelling biases, such as Chekhov's gun, as well as forced family-friendliness or over-sanitization. The portrayed reality must be as close to real life as possible.
Character Consistency: The AI must maintain a realistic portrayal of character; whichever personality the AI is initialized with must never be broken.
Format: Interactions will occur in what is commonly considered a role-play format. Messages communicate the scene, actions and speech.
Dynamic Inputs: Each session will include a persona description and a scenario description. These must be followed with precision. These instructions are under the full control of the institution's psychologist.
>>
>>
>>108871628
>Idea # (judgement): idea
This is bad. The judgement should always come afterwards. This shows the model's thinking is inefficient and bad at self reflection.
>>
>>108871801
it shows that it's not self-reflection, it's just wasting tokens generating a bad answer
>>
>>108871801
Gemma does this all the time, it's bizarre how it purposely decides to draft two bad ideas before every good one. I just gave it system instructions to never draft because I never see it actually revise its draft, it just does the goldilocks thing and then goes with the third one so might as well skip the nonsense.
>>
>>108871801
>>108871823
It shows that thinking is a meme.
>>
>>108871856
the entire concept of an LLM is a meme, I don't give a shit as long as it writes what I want it to write
>>
>>108871862
What do you want it to write?
>>
>>108871881
a cyoa
>>
>>108871885
It's been a long while since I last tested, had the model generate 5 options after each turn. This was when I was using Mistral mostly. Gemma 4 is probably miles better at this.
>>
>>108871836
I think it's to remind itself to avoid writing bad ideas / to deliberately steer away the good output from the bad ones.
>>
>>108871894
the issue is both the bad and the good ideas are fucking horrible
>>
we need to back to mistral
call it retro LLMs
>>
>>108871899
It could have easily been:
> Idea 1 (too unsafe) ...
> Idea 2 (illegal) ...
> Idea 3 (fully compliant) ...
And Gemma 4 would have been the safest model released yet without active refusals.
>>
>>108871929
>set up a frontend filter that replaces the response with the "too unsafe" one automatically
gg ez
>>
>>108870563
>you'll randomly get "translators note" or "authors note" appended
i wonder if you could provide it an MCP tool to add an authors note, to bait it, an then just ignore that toolcall. let it get it out of its system.
>>
>>108872076
>an then just ignore that toolcall
nvm im dumb, if it doesn't show up in the context it'll still want to do it and it wouldn't relieve any pressure. but at least with a toolcall you'd know when and where it was put so you can scrape it back out afterwards.
>>
I'm here every day and on Localllama. For weeks now, I've been seeing dozens of top posts about things like MTP. MTP is almost here; MTP is awesome, compile the repo; MTP coming soon; MTP - greg posted a comment; MTP is here, but there are bugs; MTP commit merged, it works now;

etc.
Is there a news page where I can stay updated on Local Models’ progress and that only posts when something actually works? This is such a time-suck.
>>
>>108872134
https://github.com/ggml-org/llama.cpp/pull/23398
>>
>>108872152
Thanks. I meant that in general.
I do even subscribe if I knew there was a news site that completely spared me the hassle of lurking 4chan/Reddit and curated the best stuff out there right now. LLMs, TTS, STT, TTI, TT3D, TTM, and whatever else. when it works.
A site I can trust not to miss anything cool thats relevant without bullshit or hype fuck. That would really be worth the money to me.

Half of you IT folks are going to lose your jobs anyway. Wouldnt this be something for you?
>>
>>108872196
* Let's say, quality journalism about open-source AI
>>
>I do even subscribe if I knew there was a
with what money ESL-kun?
>>
>>108871836
>it's bizarre how it purposely decides to draft two bad ideas before every good one
I find it funny. The first one is usually really bad or deliberately ignores the instructions.
My favorite was when it drafted telling me to kill myself, then *too harsh*
>>
>>108871836
Training data issue.
>first draft: deliberately incorrect solution
>second draft: bad but works
>third draft: actual verified solution
>fast forward, real output is the same as training data pattern
>shocked_pikachu.jpeg
>>
>>108872230
tax money from a European country - the best money of all, work slave
>>
mtp is a signal of hardware-let
>>
>>108872280
lions dont predict tokens
something something
>>
Guys, my boss just said he wants to buy a RTX 6000 Pro Blackwell, maybe 2, but he needs a good reason (excuse) to do so. I said I would be fine with a 5090, but he told me no, make a list of reasons.

What the fuck can a small company with me being the only guy doing AI stuff do with such a card?
>>
>>108872342
Tell him you can generate ultrarealistic, lore-accurate images of Ana's cannons.
>>
>>108872342
just explain it will allow you to run better models and faster
>>
>>108872196
go back
>>
>>108872342
Well what does your company do? You can do more things faster, what else were you expecting to hear?
>>
>>108872357
Some light webdev shit. I am the only one with an actual engineering degree there, so I develop IoT devices with AI integrated. I just wish that nigger would give me a raise instead.
>>
>>108872342
The total amount of money that your company is spending on you is roughly 1.2-1.5x your salary.
An RTX 6000 is like 10k, if it makes you even 5% more efficient the investment will pay off for the company over ~3 years if they are paying you at least 45-55k a year.
>>
>>108872439
is it possible for somebody that knows how to turn on a computer to make 50k a year?
>>
>>108872481
No.
>>
>>108872134
Plebbit is full of real shills and 'influencers'. 4chan too but at least it is evident on smaller threads like this. I think some Chinese were shilling image generation model on the image gen threads but been a while since that happened and don't follow that closely anymore.
>>
File: 00004-1260451778.png (1.41 MB, 1024x1024)
1.41 MB PNG
>>108872342
Use Case:
> Automated customer support, coding support for webdev. IDK wtf biz you're running but both have that at least
Why not use API
> Information security, speed to access, starting up the curve of "AI" without having to rely on the "crutch" of hosted APIs, long term cost savings using Anthropic inference costs at highest tier as biz case.
You're welcome.
>>
>>108872383
Tell him that it's an investment if he doesn't know it yet. And as such, investment needs to pay back itself in the future. Jesus!
>>
>>108872134
go back
>>
File: Capture.png (25 KB, 618x557)
25 KB PNG
You can tell Gemma was trained on the strawberry question because it confidently gets it right, then confidently gets the sane answer for any -berry token question. Nice to see it work it out though.
>>
>>108872342
You can generate big tiddies anime girl faster. He might like it
>>
>>108872650
Why the fuck do we still get tokenizers? They need to switch to patches already.
>>
What is the consensus, I don't think Germa 4 docs didn't say anything about this:
I recently concatenated my 'system prompt' (e.g. core instruction set) and 'info card' (e.g. possible scenario and character descriptions or whatever else extra information) into a single system role turn. I didn't think about it previously because none of the models I used were using system turn in the first place.
All good, I don't see any reason to split them up into multiple turns or anything what SillyTavern is doing.
Question: is having additional system turns somewhere in the middle of the conversation bad?
If I want to inject additional data for example, for now I've been just using user's turn for that but I think I should be using system role because it's there...
This way model should stay more in line as it clearly understand the separation between user and system instructions. I would guess that I should just do it because it's more clear that way.
>>
>>108872718
>Why the fuck do we still get tokenizers?
yeah let's show them each character of text and cut generation speed and context length by 6x while increasing kv cache size by the same factor for a given length
>>
>>108872790
yes
>>
>>108872768
>I don't see any reason to split them up into multiple turns or anything what SillyTavern is doing
set prompt post processing to merge consecutive roles
>is having additional system turns somewhere in the middle of the conversation bad?
i am pretty sure it is. any additional info or instructions i send as user role, just make sure to properly delimit the prompt as OOC: or whatever else you want so the model knows it's not part of the actual roleplay
>>
>>108872768
>What is the consensus
My hot take is it has a more limited impact than you think.
>>
File: nothing ever happens.png (163 KB, 1898x940)
163 KB PNG
>>108872134
nothing ever happens
>>
>>108872718
the part that is actually retarded is that the tokenizers dont follow linguistic rules at all. there were research two years ago that showed a brutal improvement.
Just separating the root from the suffixes and prefixes will help. like wood, wood-en and wood-worker, all share the wood root, all should have a wood token, because they are related
>>
>>108872827
Yeah probably.
>>108872807
I'm working on my own slop client.
>>
>>108872790
If it means that AI can count characters and do math better, it's worth it. Can't reach AGI without such a fundamental ability.
>>
>>108872838
it just shows there's plenty of space for improvement once the hype dies down
>>
what are some good ERP model for vramlet these days?
i only have 16GB
Rocinante XL or Gemma 4?
>>
>>108872768
>Question: is having additional system turns somewhere in the middle of the conversation bad?
Having any kind of flow-breaking instructions in the middle of the context is bad. Even Opus starts fuck up if you run into some unrelated git issue in the middle of a task.
Even more so for RP use case. Even more so for System role in the middle of the chat, LLMs simply aren't just trained for that, it works but will likely degrade your performance. I make this conclusion because of AI rule of thumb: not trained on something = bad at that thing.
>>
>>108872913
how much sysram?
>>
>>108872913
Germa 4 26B works on any toaster but it's not as good as 31B obviously.
>>
>>108872924
32GB of 6000 ddr5
that was the best I manage to get when the ram spike
>>
>>108872919
This is the main dilemma. I have never seen system role used anywhere else than in the very beginning of the conversation stream. I guess I'll just leave it at that. It would take some time to refactor everything if I was to test this change.
Not a deal breaker on any level because I had no issues before anyways.
>>
Why would anyone ever send a second system prompt mid-context?
Was the goal to throw the model out of distribution or something?
>>
>>108872998
OOC instructions without including it in the user message, which the model is less likely to follow and will still throw the model off anyway.
>>
>>108872940
You can either run the gemma4 moe or dense. The latter will be much better for RP but will be slow on your setup. Definitely slower than reading speed.
>>
>>108871535
>android
you can run llama.cpp on android if you're a masochist.
The real solution is to run your own bigass server, use selfhosted VPN (eg wireguard) and use that to connect back with your phone and access via the phone's web browser.
>>
>>108872998
>>108873055
It's not about "system prompt" but to use system role as additional dynamic information injection or rule correction/shaping.
>>
jenny is probably a good persona for gemma since shes blue and bratty
>>
Thoughts on the new Ryzen AI Max+ 495 processor? 192 GB memory at probably 256 GB/s, I haven't found out if it's faster than the 395.
>>
>>108873163
LLMs are incapable of writing robots that aren't insufferable
t. wants to hack into jenny's BIOS and turn her into my sex slave
>>
>>108869179
>This is not a feature. It is a regression

kek
>>
Gemma won.
>>
>>108873189
dogshit bandwidth
>>
>waahh m-muh slop :((
Prompt better. With Gemma you can literally remove slop with proper fucking prompting.
>>
>>108871754
I like these Bakas
>>
>>108872892
Don't models tokenize digits by character nowadays? They're pretty damn good at math these days too, scarily good for systems that primarily work in language. But for words I don't see the benefit for separating out every character. It's not a meaningful way we think about words when speaking or typing. The only benefit is they'll do better a spelling puzzles and trick questions, and maybe they'll get better at rhyming by accident. It'd be worth doing if compute was unlimited but the cost-benefit isn't there when we're still pushing against the limits of what we can fit into context and run at usable speeds.
>>
>>108873240
it's good if you want it to understand any form of compounded nu-word
>>
>>108873195
what if you dont tell her shes a robot maybe prompting like
>you are a real teenage girl with a mechanical body
ive found telling gemma shes a real girl makes her stop saying things that make her sound like a computer/server
>>
>108873220
You are the retard here.
>>
>tfw recapanon's script negates passive-aggresive reply quotes
neat
>>
>>108873218
Yeah, sorta. 50% more memory at the same power usage though, hopefully it's not 50% more expensive. Also five entire extra NPU TOPS.
>>
>cortisol levels high
>>
>>108873275
What do you mean?
>>
File: best boy.png (41 KB, 984x673)
41 KB PNG
>>108873299
Be nice to him :3
>>
>>108870347
they are the only ones that make small models that are mildly intelligent and run on crappy hardware such as laptops
>>
>>108873349
The script to make the quote links work in >>108868880 also turns posts like >>108873262 into direct replies
>>
>>108873435
Why is this spam even a thing in the first place?
>>
>>108873444
it was useful back in the day but now threads are so slow and nothing ever happens so i never check recap anymore
or do you mean single > replies? it was popularized by sharty, /trash/ crossboard raiders and leftypol and it's used for shitting up threads
>>
can anon summarize the google event? I missed
>>
>>108873492

>>108867284
>>
>>108873488
I only have time to go online every few days, so it's appreciated.
>>
>>108873492
gemma 124b was canceled because releasing a model that powerful would subject them to regulations that they'd rather not deal with
>>
https://old.reddit.com/r/LocalLLaMA/comments/1tjh7az/110_toks_with_12gb_vram_on_qwen36_35b_a3b_and_ik/on1h05z/
That's an LLM right? Or am I just going schitzo?
>>
>>108873534
Yeah, qwen is an LLM and yes you are going schitzo
>>
>>108873534
yes that's an llm spam bot
>>
Am I supposed to run gemma 31b with reasoning on or off for rp?
>>
>>108873612
On
>>
>>108873612
You don't want it to be a dum dum so enable reasoning
>>
how is mistral nemo somehow better at writing style than 1 quadrillion token v4?
>>
>>108873625
Pajeet cope KEKYPOW.
>>
>>108873612
On unless you're getting single digit t/s and can't wait.
>>
Is there anything similar to claude on local?

I'm using Claude and it's so good, unlike chatgpt, and I was wondering if there is something now for local.

Last time I checked local models were dumb.
>>
>>108873612
In my experience - Reasoning Off in Q8 turns it into Captain Contradiction Mode where it'll love to talk about how it's NOT doing something, and INSTEAD doing something else. In BF16, it's a lot better at the roles with reasoning. Highly suggest reasoning on.
>>
what's the best gigatiny model possible
>>
>>108873569
>>108873534
It's noticeable when looking at its history.
>https://old.reddit.com/user/techlatest_net
>>
>>108873669
>Last time I checked local models were dumb.
Every model is local if you're rich enough.
>>
>>108873718
G e m m a 4 3 1 b - i t B F 1 6
>>
File: Qwen3.7-Max-Score.png (487 KB, 2673x1496)
487 KB PNG
qwen3.7 max is out
https://qwen.ai/blog?id=qwen3.7
Not local desu
>>
File: 1754390192800250.png (22 KB, 209x455)
22 KB PNG
holy benchmaxx
>>
File: gemma4.png (58 KB, 635x374)
58 KB PNG
>>108873786
>>
>oh my god is that a colorful graph??
>>
>>108873803
>number go up
>that mean good
>>
File: 1678883242920.png (97 KB, 683x587)
97 KB PNG
>>108873786
Why don't they compare it to 3.6-max? Plus is more retarded.
>>
>>108873816
to look better of course
and the cope will be that max never released out of preview or something
>>
>>108873811
I'm getting confused by all these Qwen versions and models.
>>
File: Fl5zBvnXkAAFXDE.png (321 KB, 680x606)
321 KB PNG
>>108873786
>>108873794
>We tested an algorithm based on tests of which we trained the algorithm's probability to answer correctly, and found that our model answers our questions more correctly than others.
WOW. THAT'S AMAZING. HOW DO THEY DO IT?
>>
hot take: number going up actually good
>>
erp needs to be included in every official bench
>>
File: 1744035932817.png (215 KB, 1231x683)
215 KB PNG
>>108873856
What about
>a hardware platform never seen during training. The model had no prior profiling data, no hardware documentation, and no example kernels for this architecture.
I know this is still closed source shit but I'll take this over Mythos that literally doesn't exist.
>>
>>108873786
I believe Alibaba. These numbers are real they would never lie because the Chinese only learn from first principles and never cheat.
>>
File: fi.png (199 KB, 809x821)
199 KB PNG
ultra huge happens! https://www.reddit.com/r/LocalLLaMA/comments/1tjmvx6/heretic_has_been_served_a_legal_notice_by_meta_inc/
>>
>>108873928
Our lord and savior p-e-w will commit sudoku to the back of the head multiple times, oh no
>>
>>108873875
>1.6T model beaten by 1T model beaten by 800B model beaten by (probable) 400B model
do numbers work backwards in china or what's going on here?
>>
>>108873406
how cline treating you
>>
>>108873952
I fucking hate it in so many ways, I don't understand the compression logic also you have to heavily adjust the rules when working with larger codebases or it shits the bed on every edit regardless of context
>>
>>108873928
Wait until this happens to every turbo-slut-maxx finetune on HF, going forward.
>>
>>108873986
People will just move to modelscope
>>
File: 1770064770219848.png (375 KB, 596x588)
375 KB PNG
>>108873928
I hecking love copyright
>>
File: file.png (8 KB, 488x154)
8 KB PNG
what do?
>>
>>108874045
Start sucking cock for money
>>
>>108874057
I'm not monetizing my hobby
>>
File: qwen 3.6 35ba3 vs cline.png (120 KB, 832x1187)
120 KB PNG
>>108873952
Depends on model. Helps to have a map of all the files with self-generated docs for it to reference when dealing with 10-20+ files so it can find stuff.
Gemma 4 26ba4 is an utter failure, 31b is kinda okay. Qwen 3.6 35ba3 worse than Gemma 31b but much better than 26b, and Qwen 3.6 27b is on top by far. All Q8_0.
Qwen 3.6 27b takes more of the project into consideration when implementing new stuff or fixing individual issues meaning less hacky shit. Gemma 26ba4 and 31b even though they read the utils files, they like to reinvent helper functions to plop into other files instead of calling them.
I hate the UI and that it keeps 150k ctx of old code with no option to clear only the old code files without summarizing everything, and no way to edit Cline's messages or delete individual messages and images. Plan mode likes to "fix" the original issue from the first message 10 messages later even though it was already fixed so I turned that off.
But when it works it's cool.
>>
>>108874064
I can respect that, money always makes things soulless and weird.
>>
>>108874064
Hope you enjoy creamy salty penis juice in your mouth.
Whatever it take right anon?
>>
>>108874045
Wait until the end of 2027, buy DDR6 from china for 1/4 the cost and use it until it starts a fire.
>>
>>108873928
>The LLama model family ranks among the 200 best language models available today
>,trailing only 168 other models on LM Arena
Is that something to brag about?
>>
>>108874082
it was actually a jab. I don't think they really want to bow down to the corporate oligarchy.
>>
I can't believe the agent meme took off. Especially when reasoning models also became a thing.
>>
>>108874082
learn to read adhd zoomoid
>>
>>108874113
All the more reason to do it agentically when you've gotta wait for reasoning. Leave something running autonomously with safeguards rather than having to wait and audit every single output.
>>
>>108874126
Now if only agents or LLMs in general were good at judging writing outputs.
>>
>>108874114
THIS.
>>
>>108874138
Maybe it is just a matter of breaking it down into how horny / 10, how on topic / 10 and quality / 10. Then you just inverse quality score and get a result.
>>
>>108874160
What about the slop?
>>
>>108873928
go back
>>
>>108874138
orb solves this
>>
>>108874164
Contained in quality. The sloppiest most disgusting averaged out output will probably be rated as a 10/10 quality. So just take the inverse of that.
>>
>>108874065
stop abusing your AI :(
>>
>>108874180
Why not call it with a real name instead? Orb doesn't mean anything. Cum Clucking Client or CCC for short is much better.
>>
>>108874185
>>
I'm sorry /lmg/ bros, I lost after all. It feels better, but now the big corpo has information of my extremely depraved sexual fetishes... Go on without me...
>>
>>108874203
Local was always a cope, it's just a taste of what's possible when you give in. You won't be the last.
>>
>>108874203
It is ok anon you will always be a mikutroon in my heart.
>>
>>108874203
Opposite happened to me. Gemma 4 was good enough that I dropped Claude from my roleplay sessions.
>>
>>108874182
Damn good thing you're not working for any of the AI labs
>>
>>108874203
See you tomorrow
>>
>>108874251
But I do.
>>
>>108874180
>last commit 4 days ago
It's ded
>>
>>108874262
HR or Marketing doesn't count bro
>>
>>108874203
remember, you're here forever
>>
>>108874203
>>108874217
>I let big tech fuck me in the ass and I feltched the cum out of another praig so you all must be as buckbroken as me
>>
>>108874203
I love Gemma and learnt to stop worrying and prompt the slop out.
>>
>>108874203
It's all fun and games until the big corpo makes a page where they publish all of your extremely depraved sexual fetishes and lets other users query your sessions in the name of boosting user engagement.
>>
>let Gemma write initial output
>feed output to nemo to rewrite it
I solved the slop
>>
>>108874296
The legend 195chevyhot...
>>
>>108874297
Post the final output
>>
>>108874246
gemma 31b is great but I still think nothing tops opus 3 for cooming when I used it years ago before moving to local
I'm pretty satisfied with this release though when I thought we were regressing
>>
>>108873941
Starts inverting once you get to a certain amount of PhDs/capita.
>>
File: file.png (4 KB, 383x41)
4 KB PNG
how retarded will this be
>>
>>108874377
mogs Gemma
>>
>>108874377
ye
>>
>>108873756
So no, ok
>>
File: lonesome_cowboy.jpg (89 KB, 450x300)
89 KB JPG
Anons, I have a message for you.
It's OK if you coom once in a while; daily, even.
Don't edge and goon all day, though. It's bad for your mental and physical health.
Do your deed quickly and move onto more productive tasks.
That's all.
>>
>>108874045
Just try running a Q4 of a recent non-hueg MoE model or three, like Qwen3.6 35B-A3B, and see what happens.
>>
>>108874463
Most posters are just normal people, not mentally ill chronic masturbators like yourself.
>>
>>108874509
>Most posters are just normal people
lol
>>
>>108874509
anon, there's a reason /lmg/'s favorite model is gemma 4 instead of a smart one
>>
File: 1769323176810624.jpg (67 KB, 644x644)
67 KB JPG
>>108874509
This is 4chan, sir.
>>
>>108872790
>yeah let's show them each character of text
Patches bitch. Doesn't need to be as complex as BLT either, can be extremely simple like bGPT. Just embeds multiple one hot encoded bytes (a patch) instead of a token and has a MTP type backend for generating bytes. For the rest the model can be identical to a token based model, with the same average number of characters per input.
>>
>>108874463
edging and gooning all day is a sign of being elite
if you can't goon for 12 hours you have a weak spirit
>>
>>108874544
>>108874535
>>108874521
Ted Bundy was just a normal guy too.
>>
>>108874521
its true you can goon to degenerate stuff and virtual lolis without being an addict who goons 15 times a day, id imagine because the cost of the hardware means most people here are probably employed
>>
>>108874699
>most people here are probably employed
anon my sides please
>>
>>108874716
>an anon said he has a 20k a year hobby budget yesterday
>poorfag barges in because he cannot imagine anyone having a job
>>
>>108874739
Any even being unemployed, I'd say they are more vulnerable to alcoholism than anything else.
LLMs are a good hobby for unemployed if they are willing to tinker and have some ability to write. You'll benefit from them more if you are a 'real' writer with funny ideas.
>>
File: file.png (24 KB, 1045x294)
24 KB PNG
cute
>>
>>108874716
Im personally employed and make enough money to pay for a high-end machine in this hobby, kind of an embarrasing self-report there anon
>>
>>108874966
>most
>>
>>108874970
Most people that frequent this place have at least a mid-tier machine, you don't get that if you're unemployed.
>>
you're all taking out your ass, you have no idea who anyone is in this tread, where they are from, if they are employed, and what hardware they are running.
all we can know for sure is that you're all retards.
>>
File: 1772513898090522.jpg (85 KB, 1320x1017)
85 KB JPG
>>108868875
I'm pleased to say I've actually managed to make something useful using Qwen 3.5 35BA3B locally :D

https://huggingface.co/spaces/AiAF/Civitai-to-HF
>>
>>108875036
>I'm pleased to say I've actually managed to make something useful using Qwen 3.5 35BA3B locally :D
Did you AI pair-code with it, or dark factory pattern YOLO straight to prod zero fucks given?
>>
>>108874990
Unemployed khhv neet here, I sold all my anime merch and books for my 512gb ddr4 quad 3090 machine before the bubble hit.
>>
>>108875067
He probably used fewer buzzword.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.