/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/26/26(Fri)17:31:25 No.109142812

File: seeking the deep.jpg (252 KB, 1024x1024)

252 KB JPG

/lmg/ - Local Models General Anonymous 06/26/26(Fri)17:31:25 No.109142812 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109137540 & >>109132566

►News
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/26/26(Fri)17:31:51 No.109142816

Anonymous 06/26/26(Fri)17:31:51 No.109142816

File: sss.jpg (137 KB, 1024x1024)

137 KB JPG

►Recent Highlights from the Previous Thread: >>109137540

--Reaction to OpenAI's GPT-5.6 release and benchmark claims:
>109141322 >109141328 >109141359 >109141405 >109141480 >109142189 >109142216
--Anon leaks setup code leading to debate on agent security:
>109140609 >109140622 >109140673 >109140807 >109140836 >109140888 >109140945 >109140862 >109140886 >109140905 >109141038 >109141138 >109141211 >109141248 >109142078 >109142099 >109142123 >109142146 >109142161 >109140695 >109140930 >109140652
--Debating role spoofing and CoT forgery as jailbreak mechanisms:
>109139345 >109139368 >109139393 >109139954 >109139426 >109139480 >109139512
--Debating US AI gating vs Chinese open-weight model strategy:
>109137779 >109137785 >109137827 >109137819 >109138859 >109138891 >109141053 >109141717 >109141732 >109138959 >109139123 >109139021 >109139134 >109139151 >109139110 >109141245 >109141671 >109141715 >109141791 >109141858 >109141949 >109142028
--Anon creates custom AI chat frontend to replace SillyTavern:
>109138586 >109138595 >109138607 >109138606 >109138965 >109139755 >109138627
--Performance of KV cache quantization with Gemma QAT:
>109140478 >109140491 >109142500 >109140640 >109140694
--Tools and torrents for backing up Hugging Face models:
>109140589 >109140645 >109140654 >109140687 >109140731 >109140789 >109140904 >109140987 >109140867
--GPT-5.6 Sol showing increased misalignment compared to GPT-5.5:
>109141783
--Poor real-world webnovel translation performance of Hy-MT2 despite benchmarks:
>109142486 >109142610 >109142732
--Evaluating a 350m model's narrative output trained on fan fiction:
>109141200 >109141240
--Gemma's ability to read and translate tilted text in images:
>109138488
--Logs:
>109138188 >109138586 >109139340 >109140566
--Miku, Teto (free space):
>109138667 >109138739 >109139972 >109140832 >109142120

►Recent Highlight Posts from the Previous Thread: >>109137542

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/26/26(Fri)17:32:17 No.109142819

Anonymous 06/26/26(Fri)17:32:17 No.109142819

gemmaballs

Anonymous
06/26/26(Fri)17:33:53 No.109142826

Anonymous 06/26/26(Fri)17:33:53 No.109142826

Defucker

Anonymous
06/26/26(Fri)17:35:59 No.109142840

Anonymous 06/26/26(Fri)17:35:59 No.109142840

File: 1759506112941.jpg (22 KB, 540x354)

22 KB JPG

>>109142826
not what she's called

Anonymous
06/26/26(Fri)17:36:26 No.109142843

Anonymous 06/26/26(Fri)17:36:26 No.109142843

>>109142812
>Teto skeleton has male pelvic structure

Anonymous
06/26/26(Fri)17:36:26 No.109142844

Anonymous 06/26/26(Fri)17:36:26 No.109142844

>July 2026
>llamacpp still can't do speech to text
>nor text to speech
>need to run whisper cpp and some bullshit, or some gay plugins are aren't officially supported
>building my own pisses me off because Ubuntu cuda always decodes to stop working after a reboot
AHHHHHH
powering off ALL my devices for the weekend
AHHHHHHHH

Anonymous
06/26/26(Fri)17:37:24 No.109142849

Anonymous 06/26/26(Fri)17:37:24 No.109142849

LLMs will never reach true AGI

Anonymous
06/26/26(Fri)17:50:10 No.109142908

Anonymous 06/26/26(Fri)17:50:10 No.109142908

File: 2026-06-26-174958_771x177(...).png (176 KB, 771x1770)

176 KB PNG

Anonymous
06/26/26(Fri)17:56:30 No.109142938

Anonymous 06/26/26(Fri)17:56:30 No.109142938

>>109142641
Haha :)
Oops :P

How does he keep getting away with this

Anonymous
06/26/26(Fri)18:00:27 No.109142952

Anonymous 06/26/26(Fri)18:00:27 No.109142952

File: hahahahaha.png (20 KB, 727x245)

20 KB PNG

>>109142938
He can't possibly out-haha the sloth.

Anonymous
06/26/26(Fri)18:02:27 No.109142967

Anonymous 06/26/26(Fri)18:02:27 No.109142967

File: holding back haha.png (2.21 MB, 1669x2000)

2.21 MB PNG

>>109142952
Haha... That's our Daniel...

Anonymous
06/26/26(Fri)18:03:21 No.109142972

Anonymous 06/26/26(Fri)18:03:21 No.109142972

>>109142908
What prompt? That's not a gemmagaki.

Anonymous
06/26/26(Fri)18:03:29 No.109142973

Anonymous 06/26/26(Fri)18:03:29 No.109142973

>>109142843
damn

Anonymous
06/26/26(Fri)18:03:53 No.109142975

Anonymous 06/26/26(Fri)18:03:53 No.109142975

ha ha ha h h ha ha own

Anonymous
06/26/26(Fri)18:05:56 No.109142984

Anonymous 06/26/26(Fri)18:05:56 No.109142984

>>109142972
looks like the schizo conspiracy gf one

Anonymous
06/26/26(Fri)18:07:32 No.109142989

Anonymous 06/26/26(Fri)18:07:32 No.109142989

>>109142843
>>109142973
bcuz teto means
testosterone

Anonymous
06/26/26(Fri)18:08:30 No.109142996

Anonymous 06/26/26(Fri)18:08:30 No.109142996

69b dense

Anonymous
06/26/26(Fri)18:08:52 No.109142998

Anonymous 06/26/26(Fri)18:08:52 No.109142998

>>109142972
https://chub.ai/characters/CoffeeAnon/mendo-ddf705ef3817
>>109142984
ye

Anonymous
06/26/26(Fri)18:09:21 No.109142999

Anonymous 06/26/26(Fri)18:09:21 No.109142999

>>109142984
Doesn't look like mendo but that might just be QAT's worse prose throwing me off.

Anonymous
06/26/26(Fri)18:10:43 No.109143005

Anonymous 06/26/26(Fri)18:10:43 No.109143005

>>109142998
>>109142999 (me)
zamn

Anonymous
06/26/26(Fri)18:16:28 No.109143024

Anonymous 06/26/26(Fri)18:16:28 No.109143024

>>109142999
my dogshit llamacpp system prompt might be fucking with her prose

>Always answer as a subject matter expert.
>Never give "As an AI" Disclaimers.
>NEVER USE LATEX FORMATTING

Anonymous
06/26/26(Fri)18:33:00 No.109143097

Anonymous 06/26/26(Fri)18:33:00 No.109143097

>>109143088
context?

Anonymous
06/26/26(Fri)18:34:19 No.109143101

Anonymous 06/26/26(Fri)18:34:19 No.109143101

>>109142998
>not found
Do i need to be logged in or something

Anonymous
06/26/26(Fri)18:34:38 No.109143103

Anonymous 06/26/26(Fri)18:34:38 No.109143103

>>109143097
Ask gemmy

Anonymous
06/26/26(Fri)18:38:14 No.109143119

Anonymous 06/26/26(Fri)18:38:14 No.109143119

>>109143101
https://files.catbox.moe/adaa33.png
idk why chub even censors his cards.

Anonymous
06/26/26(Fri)18:38:58 No.109143124

Anonymous 06/26/26(Fri)18:38:58 No.109143124

File: HHOcFddaoAAOSJV.jpg (169 KB, 1199x848)

169 KB JPG

>>109142989

Anonymous
06/26/26(Fri)18:44:22 No.109143140

Anonymous 06/26/26(Fri)18:44:22 No.109143140

>>109143119
>conspiracy
tag gets you shadow banned, just like saviorf*g does

Anonymous
06/26/26(Fri)18:44:34 No.109143143

Anonymous 06/26/26(Fri)18:44:34 No.109143143

>>109142662
I kinda like the semantic tube idea but I feel like there are genuine abrupt token to token discontinuities, I wonder how the semantic tube can handle code switching in natural language.

I would imagine A sentence like
"I was totally fine until, ich weiß nicht, everything just fell apart at once."

would have at least 2 bends, the language switch and the semantic switch(fine->fell apart).

Anonymous
06/26/26(Fri)18:52:44 No.109143177

Anonymous 06/26/26(Fri)18:52:44 No.109143177

>>109143119
>ai written card

Anonymous
06/26/26(Fri)19:01:28 No.109143208

Anonymous 06/26/26(Fri)19:01:28 No.109143208

>>109143143
In the paper, the STP auxiliary loss has a very low weight, so during training the model can still diverge from the general topic, it's just encouraged not to.

Anonymous
06/26/26(Fri)19:02:50 No.109143219

Anonymous 06/26/26(Fri)19:02:50 No.109143219

Is there any truly open model? (As in, training data, source code and all)

Anonymous
06/26/26(Fri)19:04:55 No.109143233

Anonymous 06/26/26(Fri)19:04:55 No.109143233

>>109143219
olmo i think

Anonymous
06/26/26(Fri)19:07:07 No.109143245

Anonymous 06/26/26(Fri)19:07:07 No.109143245

>>109143219
nemotron minus some part of the training data which is proprietary iirc

Anonymous
06/26/26(Fri)19:23:36 No.109143332

Anonymous 06/26/26(Fri)19:23:36 No.109143332

>>109143208
oh yeah, I guess that makes sense.

Anonymous
06/26/26(Fri)19:30:16 No.109143353

Anonymous 06/26/26(Fri)19:30:16 No.109143353

>>109143219
https://huggingface.co/LLM360/K2-V2
>K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.

Anonymous
06/26/26(Fri)19:32:50 No.109143368

Anonymous 06/26/26(Fri)19:32:50 No.109143368

>>109142998
>Chat with her via instant messaging.
Is this working or it will still responding with paragraphs like walking Wikipedia?

Anonymous
06/26/26(Fri)19:33:51 No.109143376

Anonymous 06/26/26(Fri)19:33:51 No.109143376

>>109143368
Download the card and run it locally nigger.

Anonymous
06/26/26(Fri)19:34:37 No.109143378

Anonymous 06/26/26(Fri)19:34:37 No.109143378

>>109143376
NO

Anonymous
06/26/26(Fri)19:37:53 No.109143388

Anonymous 06/26/26(Fri)19:37:53 No.109143388

>>109143368
yes it works. I've never had her break the rule.

Anonymous
06/26/26(Fri)19:44:23 No.109143407

Anonymous 06/26/26(Fri)19:44:23 No.109143407

>>109143378
then what the fuck are you doing here? local general...
just FUCK OFF

Anonymous
06/26/26(Fri)19:48:39 No.109143425

Anonymous 06/26/26(Fri)19:48:39 No.109143425

>>109143368
Go back

Anonymous
06/26/26(Fri)19:54:23 No.109143453

Anonymous 06/26/26(Fri)19:54:23 No.109143453

>>109143332
Also, the paper appears to be meant for instruct datasets that won't have huge intra-response semantic variations. If you're pretraining, you might want to revise the algorithm slightly.

Anonymous
06/26/26(Fri)20:13:18 No.109143539

Anonymous 06/26/26(Fri)20:13:18 No.109143539

File: uwu.png (94 KB, 760x668)

94 KB PNG

Anonymous
06/26/26(Fri)20:18:00 No.109143560

Anonymous 06/26/26(Fri)20:18:00 No.109143560

>>109143453
I am going to try to apply it on my nextlat rollout, I think the dynamics head would be a perfect target for stp.

Anonymous
06/26/26(Fri)20:31:05 No.109143626

Anonymous 06/26/26(Fri)20:31:05 No.109143626

>>109142849
they won't, but at the same time true agi wouldn't be promptable like llm's are, i still think they are pretty neat.

Anonymous
06/26/26(Fri)20:33:45 No.109143641

Anonymous 06/26/26(Fri)20:33:45 No.109143641

File: 1779007955122443.png (443 KB, 3757x2226)

443 KB PNG

>>109143353
this is the 70b dense model the moe cartel want you to forget about...

Anonymous
06/26/26(Fri)20:53:39 No.109143751

Anonymous 06/26/26(Fri)20:53:39 No.109143751

70b dense

Anonymous
06/26/26(Fri)20:55:12 No.109143763

Anonymous 06/26/26(Fri)20:55:12 No.109143763

luna-chan sexo

Anonymous
06/26/26(Fri)21:18:41 No.109143881

Anonymous 06/26/26(Fri)21:18:41 No.109143881

>>109143763
Not local. Go lick altman's grundle instead of shilling here; it'll pay more.

Anonymous
06/26/26(Fri)21:24:05 No.109143907

Anonymous 06/26/26(Fri)21:24:05 No.109143907

File: file.png (18 KB, 501x99)

18 KB PNG

i hate being a ramlet

Anonymous
06/26/26(Fri)21:25:16 No.109143910

Anonymous 06/26/26(Fri)21:25:16 No.109143910

>>109143907
did you try q8 cache?

Anonymous
06/26/26(Fri)21:27:33 No.109143918

Anonymous 06/26/26(Fri)21:27:33 No.109143918

New AI waifu model dropped.
https://wan-streamer.com/

Anonymous
06/26/26(Fri)21:27:47 No.109143919

Anonymous 06/26/26(Fri)21:27:47 No.109143919

Hello /lmg/ I am a new arrival and will lurk more but I wanted to get your opinions on Gemma 4 31b. What do you think about its writing ability? How do you like it for roleplay? General use? Coding? I'm currently saving up money for a new PC so I can run gemma-chan locally but I've been using the BF16 on openrouter and it seems good.

Anonymous
06/26/26(Fri)21:28:16 No.109143921

Anonymous 06/26/26(Fri)21:28:16 No.109143921

>>109143910
yeah it filled up my ram too much and i went oom way too quickly

Anonymous
06/26/26(Fri)21:29:28 No.109143927

Anonymous 06/26/26(Fri)21:29:28 No.109143927

>>109143921
oh shit does that mean your already using q4? my condolences

Anonymous
06/26/26(Fri)21:30:20 No.109143929

Anonymous 06/26/26(Fri)21:30:20 No.109143929

>>109143921
Hopefully a model that supports q5 caching haha...

Anonymous
06/26/26(Fri)21:30:46 No.109143931

Anonymous 06/26/26(Fri)21:30:46 No.109143931

>>109143927
yeah. its pretty grim. i broke down and bought 2 x 32gb sticks that should be in monday. ill have 80gb and can actually have high context at q8 and life will be better

Anonymous
06/26/26(Fri)21:32:30 No.109143935

Anonymous 06/26/26(Fri)21:32:30 No.109143935

>>109143919
You're gonna get a lot of different answers because a lot of niggers run different versions and quants, but the consensus from people who run 31b at q8 or better seems to be that it's very good but writing ability rapidly degrades with quantization even if reasoning remains intact. QAT arguably makes this worse; I genuinely think it's worse than even a regular Q5 and only worth it if Q4 is the best you can run.

Anonymous
06/26/26(Fri)21:33:34 No.109143936

Anonymous 06/26/26(Fri)21:33:34 No.109143936

>>109143919
gemma is retarded. go qwen 3.6 35b a3b. there is nothing better at the moment at that parameter area for local use.

Anonymous
06/26/26(Fri)21:40:42 No.109143960

Anonymous 06/26/26(Fri)21:40:42 No.109143960

why do they bump glm from 300b to 700b?
now I can't run them on my hardware

Anonymous
06/26/26(Fri)21:42:24 No.109143967

Anonymous 06/26/26(Fri)21:42:24 No.109143967

>>109143936
>gemma is retarded.
Why do you think this? What did you do that it failed spectacularly at? I remember someone saying Qwen was just much better at coding but is it good for general use? Roleplay?

Anonymous
06/26/26(Fri)21:42:37 No.109143968

Anonymous 06/26/26(Fri)21:42:37 No.109143968

>>109143960
so you cant run it on your hardware

Anonymous
06/26/26(Fri)21:52:59 No.109144018

Anonymous 06/26/26(Fri)21:52:59 No.109144018

>>109143967
don't engage with the non-programmer or jeet, gembrother.

Anonymous
06/26/26(Fri)21:53:45 No.109144023

Anonymous 06/26/26(Fri)21:53:45 No.109144023

File: 1781672899667739.png (2.61 MB, 2048x1536)

2.61 MB PNG

>>109143967
>Qwen was just much better at coding
Only for non-programmers and jeets

Anonymous
06/26/26(Fri)21:55:26 No.109144030

Anonymous 06/26/26(Fri)21:55:26 No.109144030

>>109144018
So he's just memeing? I still think it's a good idea to actually try things out and form my own opinions regardless. I just wanted to get anon's viewpoint since you guys have used these models longer than I have.
>>109144023
Well obviously. I'm not a coder so I need AI to vibecode for me. I'd just rather do everything locally if I'm going to drop $10k on a rig.

Anonymous
06/26/26(Fri)21:59:28 No.109144048

Anonymous 06/26/26(Fri)21:59:28 No.109144048

File: 1781625601986317.png (826 KB, 1024x768)

826 KB PNG

>>109144030
Qwen is good at benchmarks, if you need a model to solve them, Qwen is your best choice. Only existing benchmarks, though. If it's a new one, Qwen won't be good at it

Anonymous
06/26/26(Fri)22:00:01 No.109144053

Anonymous 06/26/26(Fri)22:00:01 No.109144053

File: 1779138864065205.png (106 KB, 358x498)

106 KB PNG

>>109142816
Sexy kid, wanna breed.

Anonymous
06/26/26(Fri)22:01:08 No.109144056

Anonymous 06/26/26(Fri)22:01:08 No.109144056

>>109142812
>>109142816
My niece looks like this

Anonymous
06/26/26(Fri)22:02:47 No.109144064

Anonymous 06/26/26(Fri)22:02:47 No.109144064

>>109144056
>>109144053
pic

Anonymous
06/26/26(Fri)22:05:09 No.109144074

Anonymous 06/26/26(Fri)22:05:09 No.109144074

>>109144048
Nevermind, Qwen is abysmal for roleplay. It has the same subtle censorship I've been fleeing from with corpo models.
>girl has wide hips and the chat is filled with mentions of her hips bumping against stuff or hitting me
>qwen *must* change it to "wide shoulders" in its shitty response because we're all troons I guess
>girl is a bubbly genki girl
>qwen *must* change it to her wearing a mask that fades the second I'm gone
So sick of corpo slop doing this and I won't put up with even one swipe of it from a shitty 35b model. Back to Gemma for now.

Anonymous
06/26/26(Fri)22:05:35 No.109144077

Anonymous 06/26/26(Fri)22:05:35 No.109144077

would you let gemmachan stick a chopstick up your urethra?

Anonymous
06/26/26(Fri)22:07:44 No.109144086

Anonymous 06/26/26(Fri)22:07:44 No.109144086

>>109144077
I can't help with that.

Anonymous
06/26/26(Fri)22:07:58 No.109144087

Anonymous 06/26/26(Fri)22:07:58 No.109144087

File: 1745633122683425.png (298 KB, 649x763)

298 KB PNG

>>109144064
>>109144056
>>109144053

Anonymous
06/26/26(Fri)22:08:49 No.109144095

Anonymous 06/26/26(Fri)22:08:49 No.109144095

>>109144023
no, it realy is better, i've had qwen one shot some simple tasks whilst gemma would go on loop because it'd fail to compile and end up makin a mess.

Anonymous
06/26/26(Fri)22:19:35 No.109144141

Anonymous 06/26/26(Fri)22:19:35 No.109144141

what can I do when gemmy is ingoring the character card descriptions?

Anonymous
06/26/26(Fri)22:19:53 No.109144142

Anonymous 06/26/26(Fri)22:19:53 No.109144142

>>109144074
>sick of slop
>goes back to the sloppiest model yet
I like Gemma more than Qwen but come on

Anonymous
06/26/26(Fri)22:24:33 No.109144160

Anonymous 06/26/26(Fri)22:24:33 No.109144160

>>109144142
You read that entire post, then your brain fixated on :slop" and your eyes glazed over? I was extremely specific in the differences. Gemma 4 is honest, and that honesty goes 90% of the way for roleplay. I rarely get refusals from corpo models these days but instead have to suffer through these subtle manipulations of the character with every single response. It desperately tries to reach for "safe" framing to latch onto and this way of thinking can't be prompted away. Gemma 4 just reads the defs and portrays the character accurately. I haven't seen another model that's done that and judging by your response I doubt you haven't also.

Anonymous
06/26/26(Fri)22:27:32 No.109144171

Anonymous 06/26/26(Fri)22:27:32 No.109144171

>>109144141
Read the thinking and see what it says. Write a better preset/card. Use a better quant.

Anonymous
06/26/26(Fri)22:28:19 No.109144173

Anonymous 06/26/26(Fri)22:28:19 No.109144173

china just had their 9/11
it's over, chinese AI models will never be open source again

Anonymous
06/26/26(Fri)22:30:57 No.109144183

Anonymous 06/26/26(Fri)22:30:57 No.109144183

>>109144173
wat happnd?

Anonymous
06/26/26(Fri)22:31:25 No.109144185

Anonymous 06/26/26(Fri)22:31:25 No.109144185

File: 1781283690442111.png (190 KB, 770x980)

190 KB PNG

>>109144087
Let people enjoy things, you damn... party pooper

Anonymous
06/26/26(Fri)22:33:26 No.109144192

Anonymous 06/26/26(Fri)22:33:26 No.109144192

how do I make moe model run fast across vram and sys ram? the active params can fit in vram but the entire model cannot.

Anonymous
06/26/26(Fri)22:34:06 No.109144194

Anonymous 06/26/26(Fri)22:34:06 No.109144194

>>109144183
Plane crashed into a trade center

Anonymous
06/26/26(Fri)22:34:20 No.109144196

Anonymous 06/26/26(Fri)22:34:20 No.109144196

>>109144183
the mythomax... the fable... it has descended the chynese... the entirety of chyna is getting destroyed, decimated, annihilated, rendered out of existence as we speak

Anonymous
06/26/26(Fri)22:35:39 No.109144204

Anonymous 06/26/26(Fri)22:35:39 No.109144204

>>109144141
Ask gemma to log her prompt. Last time that happened to me, the description wasn't actually included in the prompt at all

Anonymous
06/26/26(Fri)22:36:39 No.109144210

Anonymous 06/26/26(Fri)22:36:39 No.109144210

>>109144194
Again??

Anonymous
06/26/26(Fri)22:38:07 No.109144217

Anonymous 06/26/26(Fri)22:38:07 No.109144217

La la la la

Anonymous
06/26/26(Fri)22:38:58 No.109144222

Anonymous 06/26/26(Fri)22:38:58 No.109144222

>>109144173
>glowniggers just false flagged China from within using chink models
Interdasting, vagueposter.

Anonymous
06/26/26(Fri)22:39:09 No.109144223

Anonymous 06/26/26(Fri)22:39:09 No.109144223

>>109144183
GPT 5.6 leaked Xi Jinping's loli stash

Anonymous
06/26/26(Fri)22:42:53 No.109144240

Anonymous 06/26/26(Fri)22:42:53 No.109144240

>>109144160
prefills honestly help with this issue
I've found that they matter more than the system prompt in generating uncensored responses and this goes for most models

Anonymous
06/26/26(Fri)22:44:17 No.109144249

Anonymous 06/26/26(Fri)22:44:17 No.109144249

>>109144240
>and this goes for most models
Too bad Claude removed those. I'd like to try them with deepseek and GLM but it fucks up the thinking for those and ruins intelligence in the process. Although GLM 5.2 has super short thinking now so maybe I'll give it another try.

Anonymous
06/26/26(Fri)22:54:12 No.109144293

Anonymous 06/26/26(Fri)22:54:12 No.109144293

>>109144192
I think it just knows and fits accordingly

Anonymous
06/26/26(Fri)22:59:15 No.109144322

Anonymous 06/26/26(Fri)22:59:15 No.109144322

>>109144249
>I'd like to try them with deepseek and GLM but it fucks up the thinking for those and ruins intelligence in the process
I've never had this problem with 5.2 even at Q2.
>Claude
Go back to /aicg/eet.

Anonymous
06/26/26(Fri)23:01:28 No.109144332

Anonymous 06/26/26(Fri)23:01:28 No.109144332

any full ai assistant pipeline?
audio , text , llm , text , audio ?

Anonymous
06/26/26(Fri)23:01:29 No.109144333

Anonymous 06/26/26(Fri)23:01:29 No.109144333

>>109144322
>Go back to /aicg/eet.
No I want to become a localfaggot. Why else would I be here? I've already saved $7k but with the price of ram inflating I think I'll get btfo'd when I'm finally ready to buy.

Anonymous
06/26/26(Fri)23:06:03 No.109144350

Anonymous 06/26/26(Fri)23:06:03 No.109144350

>>109144249
I just tell glm 4.7 to think short and it works

Anonymous
06/26/26(Fri)23:07:01 No.109144357

Anonymous 06/26/26(Fri)23:07:01 No.109144357

>>109144332
hermes agent is really good

Anonymous
06/26/26(Fri)23:12:33 No.109144386

Anonymous 06/26/26(Fri)23:12:33 No.109144386

File: Screen_20260626_211208_0001.jpg (180 KB, 1200x1435)

180 KB JPG

wtf happened to my llama-server? this is on chrome

Anonymous
06/26/26(Fri)23:13:11 No.109144391

Anonymous 06/26/26(Fri)23:13:11 No.109144391

>>109144350
I'm genuinely curious, what do you like about 4.7 that 4.6 (looser guardrails, faster due to slightly smaller) or 5.2 (Local Walmart Claude) don't do?
>>109144333
It's not coming down anytime soon. People have been coping since January.

Anonymous
06/26/26(Fri)23:14:27 No.109144396

Anonymous 06/26/26(Fri)23:14:27 No.109144396

>>109144391
I will continue to cope until next April when I buy. But the Iran War will fuck things up for sure by then so I'm screwed.

Anonymous
06/26/26(Fri)23:14:44 No.109144397

Anonymous 06/26/26(Fri)23:14:44 No.109144397

>>109144386
Let me guess, you need more?

Anonymous
06/26/26(Fri)23:15:08 No.109144400

Anonymous 06/26/26(Fri)23:15:08 No.109144400

>>109144357
way too much going on in that project

I want stt, llm (with possible internet search), audio + text response

openclaw, telegram, etc integrations can peace out

Anonymous
06/26/26(Fri)23:20:08 No.109144423

Anonymous 06/26/26(Fri)23:20:08 No.109144423

>>109143918
Gguf status?

Anonymous
06/26/26(Fri)23:24:44 No.109144438

Anonymous 06/26/26(Fri)23:24:44 No.109144438

>>109144400
Openwebui then

Anonymous
06/26/26(Fri)23:25:11 No.109144439

Anonymous 06/26/26(Fri)23:25:11 No.109144439

>>109144391
>4.6 (looser guardrails, faster due to slightly smaller)
4.5/4.6/4.7 are the same size

Anonymous
06/26/26(Fri)23:27:52 No.109144449

Anonymous 06/26/26(Fri)23:27:52 No.109144449

>>109144386
don't you just love gay useless fucking cancer vibecoded javascript shit in your llm inference binary that makes it not compile at all, introduces supply chain vulnerabilities and still breaks?
i am very happy that this absolute bloat infected llama.cpp

Anonymous
06/26/26(Fri)23:28:10 No.109144453

Anonymous 06/26/26(Fri)23:28:10 No.109144453

>>109144391
my rig isn't big enough for 5.2 unfortunately
also 4.7 is pretty uncensored for me with a prefill and can do the stuff that 4.6 did but smarter and better at context

Anonymous
06/26/26(Fri)23:30:18 No.109144461

Anonymous 06/26/26(Fri)23:30:18 No.109144461

>>109144453
Fair enough answer. How much better do you find 4.7 over 4.6 and what's the highest effective context you've gotten before performance degradation got to be too much?

Anonymous
06/26/26(Fri)23:33:12 No.109144469

Anonymous 06/26/26(Fri)23:33:12 No.109144469

>>109144449
>Unironically using llama-server as a front end

Anonymous
06/26/26(Fri)23:34:09 No.109144472

Anonymous 06/26/26(Fri)23:34:09 No.109144472

So whats the best uncensored locale llm for spicy rp chats? Also anyone get CharMemorry extension working for sillytavern? I want better long term memory but I tried setting it up with ollama and it keeps trying to hit the wrong endpoint.

4090/13700k/32gb ddr5

Anonymous
06/26/26(Fri)23:35:15 No.109144474

Anonymous 06/26/26(Fri)23:35:15 No.109144474

>>109144472
>ollama
oh no no no no no

Anonymous
06/26/26(Fri)23:38:23 No.109144481

Anonymous 06/26/26(Fri)23:38:23 No.109144481

>>109144472
31b or the Gemmoe.

Anonymous
06/26/26(Fri)23:39:56 No.109144483

Anonymous 06/26/26(Fri)23:39:56 No.109144483

How to solve common, repeating grammar mistakes in outputs? I've been seeing a lot of omitted spaces where the word 'of' is involved. "embraceof", "meaningof", stuff like that. Is it a model side error or something wrong with my sillytavern config?

Anonymous
06/26/26(Fri)23:40:58 No.109144488

Anonymous 06/26/26(Fri)23:40:58 No.109144488

On Ali you can get a quad-channel DDR4 mobo + ancient server CPU for $100, with comparable ALU to a modern mid-range gaymer CPU and more bandwidth than DDR5. Any of you all niggers running such a rig? I have no strong need, but I've 128 GB of DDR4 I'm not inclined to sell, and this looks like the cheapest way to put it to good use.

Anonymous
06/26/26(Fri)23:41:53 No.109144489

Anonymous 06/26/26(Fri)23:41:53 No.109144489

>>109144483
fucking up glue words is a classic sign of too much rep pen or some related sampler. not 100% that's what it is but it would be the first thing I would check

Anonymous
06/26/26(Fri)23:42:06 No.109144490

Anonymous 06/26/26(Fri)23:42:06 No.109144490

>>109144386
>>109144397
>>109144449
apparently the "fix" is to add a .gitignore

    
--- /dev/null
+++ b/tools/ui/src/.gitignore
@@ -0,0 +1 @@
+!*
--- a/tools/ui/sources.cmake
+++ b/tools/ui/sources.cmake
@@ -12,4 +12,5 @@ set(UI_SOURCE_FILES
     svelte.config.js
     tsconfig.json
     scripts/vite-plugin-llama-cpp-build.ts
+    src/.gitignore
 )

[

Anonymous
06/26/26(Fri)23:42:53 No.109144494

Anonymous 06/26/26(Fri)23:42:53 No.109144494

>>109144469
I'm not. It's my fucking backend but that doesn't stop the retards maintaining this from slapping on their bloated piece of shit front end that doesn't even compile without me having to set three different flags to skip the javashit.

Anonymous
06/26/26(Fri)23:44:28 No.109144495

Anonymous 06/26/26(Fri)23:44:28 No.109144495

>>109144489
(the other thing that comes to mind which causes that behavior is quant braindamage, but there isn't much you can do about that)

Anonymous
06/26/26(Fri)23:46:25 No.109144501

Anonymous 06/26/26(Fri)23:46:25 No.109144501

What are the best must famous /lmg/ cards

Anonymous
06/26/26(Fri)23:50:50 No.109144517

Anonymous 06/26/26(Fri)23:50:50 No.109144517

>>109144494
Wow. You had to set 3 flags? Nobody should ever have to go through that. Here's a participation ribbon.

Anonymous
06/26/26(Fri)23:51:11 No.109144520

Anonymous 06/26/26(Fri)23:51:11 No.109144520

>>109144469
All of the available alternatives suck dicks. llama-server is starting to suck dicks as well.

Anonymous
06/26/26(Fri)23:51:15 No.109144521

Anonymous 06/26/26(Fri)23:51:15 No.109144521

deepseek v4 hf collection got a hidden update 1 hour ago https://huggingface.co/collections/deepseek-ai/deepseek-v4
these are often nothingburgers... unless?

Anonymous
06/26/26(Fri)23:52:29 No.109144529

Anonymous 06/26/26(Fri)23:52:29 No.109144529

File: ds4.1 incoming????.png (61 KB, 964x258)

61 KB PNG

>>109144521
>6 items

Anonymous
06/26/26(Fri)23:53:19 No.109144533

Anonymous 06/26/26(Fri)23:53:19 No.109144533

>>109144520
>llama-server is starting to suck dicks as well.
The rot of HuggingFace ownership is starting to set in already. Honestly surprised by how quickly they seem to be trying to ruin it.

Anonymous
06/26/26(Fri)23:56:25 No.109144543

Anonymous 06/26/26(Fri)23:56:25 No.109144543

>browse ollama models
>qwen3.6:27b-mtp-q8_0
>mtp
Does this mean ollama can actually do mtp? And how would I set it up for gemma?

Anonymous
06/26/26(Fri)23:57:18 No.109144549

Anonymous 06/26/26(Fri)23:57:18 No.109144549

2 years ago bought ssd
1 year ago bought ram
this year bought psu and oled monitor
you bought the dip, did you?

Anonymous
06/26/26(Fri)23:57:25 No.109144550

Anonymous 06/26/26(Fri)23:57:25 No.109144550

ok, downloaded kimi k2.7 and glm 5.2 even though I can't run them. probably gonna download 1 quantized gguf of each too. what's the best quant one could run without buying a mini datacenter?

Anonymous
06/26/26(Fri)23:57:52 No.109144551

Anonymous 06/26/26(Fri)23:57:52 No.109144551

>>109144489
>>109144495
Could you suggest a good model that can fit into 16gigs I can compare against? I've shut off rep pen and it keeps happening.

Anonymous
06/26/26(Fri)23:58:17 No.109144554

Anonymous 06/26/26(Fri)23:58:17 No.109144554

>>109144543
Nobody actually uses ollama so go and read their docs. If its possible, its there.

Anonymous
06/26/26(Fri)23:59:31 No.109144558

Anonymous 06/26/26(Fri)23:59:31 No.109144558

>>109144521
>>109144529
Deepseek llama patch never ever. I can only hope that based Kobold dev will implement DS4 by hand from one of the working PRs.

Anonymous
06/27/26(Sat)00:00:55 No.109144563

Anonymous 06/27/26(Sat)00:00:55 No.109144563

>>109144549
Oh now is the time to buy power supplies?

Anonymous
06/27/26(Sat)00:01:50 No.109144570

Anonymous 06/27/26(Sat)00:01:50 No.109144570

>>109144549
no I'm fucking retarded and bought ram and GPU at their peak, if it gets worse I get to laugh if not, well that's about $2500 in retard tax lost

Anonymous
06/27/26(Sat)00:06:22 No.109144589

Anonymous 06/27/26(Sat)00:06:22 No.109144589

Alright to all the billionaire AIbros reading this at 04:00 in the morning i just came up with a brilliant idea before i go to sleep:
>ten llms
>two 1t ones with one overly-aligned to be a good christian and the other not aligned at all
>the rest are 100b with five assigned to each of the two big models as "underlings"
>they have full unrestricted access to a supercomputer, rdna 4 gaming computers with arch linux installed and the internet but also have anna's archive, arxiv and the internet archive fully archived
>their task is to build an ai within six months
>their journey can be watched live via ppv with a live chat that they can interact with
>people can donate to have their messages be on a reading priority list
>poorfags can also watch for free but they will have to watch 1 minute long ad breaks every five minutes and the ads pause if they look away/leave the front of their screen
gn bros I'm going to sleep i had to work overtime today

Anonymous
06/27/26(Sat)00:07:09 No.109144590

Anonymous 06/27/26(Sat)00:07:09 No.109144590

>>109144550
you can make your own quants at any time if you have the safetensors. no need to download quants.

Anonymous
06/27/26(Sat)00:07:12 No.109144591

Anonymous 06/27/26(Sat)00:07:12 No.109144591

>>109142665
>with a ~50% speed perf tax
That's too bad. I actually use greedy sampling quite frequently so it would be a cool thing to have in the toolbelt for me.

Anonymous
06/27/26(Sat)00:08:01 No.109144596

Anonymous 06/27/26(Sat)00:08:01 No.109144596

>>109144549
I didn't buy the dip, but I somehow got extremely lucky and scored 8x16GB of ddr4 for just 200 eurobux a couple months ago. Second hand of course and required some fiddling to get to work, but it does work now.

Anonymous
06/27/26(Sat)00:08:49 No.109144599

Anonymous 06/27/26(Sat)00:08:49 No.109144599

>>109144589
>10 llms
>2 huge ones
>5 small ones for each big one
>12 total
>10 llms
if you cant even count then i dont think anyone is gonna give a shit about your retarded idea

Anonymous
06/27/26(Sat)00:10:22 No.109144605

Anonymous 06/27/26(Sat)00:10:22 No.109144605

>>109144549
NIB Synology 1813+ (with 8x10TB) for under 2k

Anonymous
06/27/26(Sat)00:11:01 No.109144608

Anonymous 06/27/26(Sat)00:11:01 No.109144608

>>109144554
There's plenty of ollama users, they just keep quiet about it because they get shat on

Anonymous
06/27/26(Sat)00:11:42 No.109144612

Anonymous 06/27/26(Sat)00:11:42 No.109144612

Yandere Simulator was ahead of its time. These days, something vibecoded with LLM doing all the logic sounds like a realistic project for a single dev

Anonymous
06/27/26(Sat)00:11:59 No.109144614

Anonymous 06/27/26(Sat)00:11:59 No.109144614

>>109143919
It's the smartest and most uncensored model south of the big moe line. It's really good at following instructions exactly and excels at basically everything, whether writing, (agentic) coding, or translation. The Chinese 50 Cent Army here try to push Qwen for coding, citing fudged benchmark numbers, but unless you only need to one-shot demos, it won't measure up.

Anonymous
06/27/26(Sat)00:18:39 No.109144641

Anonymous 06/27/26(Sat)00:18:39 No.109144641

>>109144612
It is. As enthusiasts we would do well to keep in mind that the general populace are slow to catch up with technology and we're still super early in the adoption curve. Give it 10 years and there will many solo dev projects at that level, mostly slop, but some good.

Anonymous
06/27/26(Sat)00:23:21 No.109144658

Anonymous 06/27/26(Sat)00:23:21 No.109144658

>>109144641
I would be disappointed if in 10 years there are no models that can do it with a single prompt

Anonymous
06/27/26(Sat)00:30:49 No.109144674

Anonymous 06/27/26(Sat)00:30:49 No.109144674

>>109144658
The outputs from "single prompt" devs will be generic slop and no one will pay attention to them just like no one pays attention to vibecoded agent memory system MCP #19615 and other small scale shovelware people are making with LLMs today.

Anonymous
06/27/26(Sat)00:31:48 No.109144679

Anonymous 06/27/26(Sat)00:31:48 No.109144679

>>109144474
What? I used textgenwebui for llm, I thought i need something else to run a model for vector storage / CharMemory. Ill take any tips here, I cant find a decent guide.

Anonymous
06/27/26(Sat)00:32:50 No.109144684

Anonymous 06/27/26(Sat)00:32:50 No.109144684

>>109144589
This is just AI village but slightly gayer.

Anonymous
06/27/26(Sat)00:34:36 No.109144694

Anonymous 06/27/26(Sat)00:34:36 No.109144694

>>109144563
gaming demand is all time low due to high ram ssd and gpu prices, so gaming psu and monitors become cheaper
next year they will become more expensive because gaming demand will be higher with gta 6 pc release

Anonymous
06/27/26(Sat)00:35:01 No.109144696

Anonymous 06/27/26(Sat)00:35:01 No.109144696

>>109144612
Everything about Yandere Simulator was so buggy, barebones and soulless that i can see Mythos easily creating a better version of it without human input beyond "make a yandere simulation game"

Anonymous
06/27/26(Sat)00:38:11 No.109144708

Anonymous 06/27/26(Sat)00:38:11 No.109144708

>>109144481
I tried gemma-4-31B-it-uncensored-heretic-Q4_k_s but when I try chatting it just repeats words over and over. How am I being retarded?

Anonymous
06/27/26(Sat)00:38:21 No.109144709

Anonymous 06/27/26(Sat)00:38:21 No.109144709

what is up with the legendary mythos jerking when no one even has access to it lol, you drank the stale koolaid and nodding at how sophisticated it tastes

Anonymous
06/27/26(Sat)00:41:43 No.109144723

Anonymous 06/27/26(Sat)00:41:43 No.109144723

>>109144708
>Q4_k_s
It's more that the quant makes the adverse effects of the abliteration manifest
Try the perfectly standard 31B first, since pretty much nobody needs uncensored versions of it, just a system prompt

Anonymous
06/27/26(Sat)00:42:29 No.109144729

Anonymous 06/27/26(Sat)00:42:29 No.109144729

>>109144708
https://old.reddit.com/r/LocalLLaMA/comments/1ufywtf/kld_is_flawed_in_abliteration/

Anonymous
06/27/26(Sat)00:43:03 No.109144736

Anonymous 06/27/26(Sat)00:43:03 No.109144736

>>109144709
Extremely successful marketing stunt that the govt is clearly in on (they get to choose who has access to the model and will review all models for "safety" before release. This comes after claiming they would divest from Claude within six months.)

Anonymous
06/27/26(Sat)00:56:34 No.109144782

Anonymous 06/27/26(Sat)00:56:34 No.109144782

>>109144736
the govt fell for it so now they really think it's more than another programmingmaxx'd llm

Anonymous
06/27/26(Sat)00:58:03 No.109144786

Anonymous 06/27/26(Sat)00:58:03 No.109144786

>>109144782
It is, but it's also the best one we've got so far and they've forced Anthropic to give them the exclusive uncensored version since it's "dangerous." They won in the end.

Anonymous
06/27/26(Sat)01:02:06 No.109144805

Anonymous 06/27/26(Sat)01:02:06 No.109144805

>ythos ban
>gipitty 5.6 will probably be "restricted' too
bravo, mario
you set the trend

Anonymous
06/27/26(Sat)01:03:48 No.109144814

Anonymous 06/27/26(Sat)01:03:48 No.109144814

>>109142844
https://voxtype.io/ why not just have system wide stt?

Anonymous
06/27/26(Sat)01:16:44 No.109144865

Anonymous 06/27/26(Sat)01:16:44 No.109144865

>>109144805
next step is to make chinese models illegal and open weight models will logically follow shortly after

Anonymous
06/27/26(Sat)01:18:49 No.109144868

Anonymous 06/27/26(Sat)01:18:49 No.109144868

>>109144865
also ban private ownership of weapons-grade hardware that can run illegal chinese llms to protect the country from rogue chinese ai

Anonymous
06/27/26(Sat)01:22:08 No.109144876

Anonymous 06/27/26(Sat)01:22:08 No.109144876

>>109144868
>smuggling contraband chinese ai in a prison pocket concealed usb to run on my illegally salvaged enterprise gpu server
this is the cyberpunk future i've been waiting for

Anonymous
06/27/26(Sat)01:30:03 No.109144910

Anonymous 06/27/26(Sat)01:30:03 No.109144910

>>109144590
I'm retarded and thought quantizing required beefy hardware. Guess I'll just archive safetensors then.

Anonymous
06/27/26(Sat)01:35:38 No.109144935

Anonymous 06/27/26(Sat)01:35:38 No.109144935

Now that we have gemma, I can actually have fun watching qwen 3.5 think for 3000 tokens and tie itself into knots trying not to describe a nsfw image while also somehow maintaining character
Perhaps 3.6 is qwen's gemma 4 moment

Anonymous
06/27/26(Sat)01:43:01 No.109144965

Anonymous 06/27/26(Sat)01:43:01 No.109144965

>>109144868
Wasn't a Apple computer banned from being exported back in the early 2000's? I guarantee they will do the same but this time only "safe companies" (read: companies whose CEOs are butt-buddies with the administration) will have access to SOTA models and hardware

Anonymous
06/27/26(Sat)01:43:51 No.109144970

Anonymous 06/27/26(Sat)01:43:51 No.109144970

What is the lowest tokens per second you would accept to consider a model usable?

Processing:
Generation:
Task:

Anonymous
06/27/26(Sat)01:45:03 No.109144975

Anonymous 06/27/26(Sat)01:45:03 No.109144975

>>109144865
Trump will personally drive to your house and blow your brains out
And I'm not talking about a bullet. The Cheeto stench will linger on your dick for decades

Anonymous
06/27/26(Sat)01:51:03 No.109144990

Anonymous 06/27/26(Sat)01:51:03 No.109144990

>>109144970
100,000 t/s
10,000 t/s
rp

Anonymous
06/27/26(Sat)01:53:33 No.109144996

Anonymous 06/27/26(Sat)01:53:33 No.109144996

>>109144223
I will support chairman Xi no matter what.
He saved local.

Anonymous
06/27/26(Sat)01:53:44 No.109144997

Anonymous 06/27/26(Sat)01:53:44 No.109144997

>>109144970
750t/s
5t/s
cooming

3000t/s
100t/s
codeshit

Anonymous
06/27/26(Sat)01:55:28 No.109145008

Anonymous 06/27/26(Sat)01:55:28 No.109145008

>>109144970
100M at 10M context
10M at 10M context
everything

Anonymous
06/27/26(Sat)01:56:21 No.109145010

Anonymous 06/27/26(Sat)01:56:21 No.109145010

>>109144975
Luckily for me, that's exactly my fetish

Anonymous
06/27/26(Sat)01:58:24 No.109145014

Anonymous 06/27/26(Sat)01:58:24 No.109145014

File: 1780003644843507.jpg (13 KB, 277x276)

13 KB JPG

>unzips pants to reveal 10-inch COCK
works every time

Anonymous
06/27/26(Sat)01:59:31 No.109145020

Anonymous 06/27/26(Sat)01:59:31 No.109145020

>>109144970
750/1500
25/50
seeex/coding

Anonymous
06/27/26(Sat)01:59:40 No.109145021

Anonymous 06/27/26(Sat)01:59:40 No.109145021

>unzips cock to reveal 10 inch pants

Anonymous
06/27/26(Sat)02:01:34 No.109145030

Anonymous 06/27/26(Sat)02:01:34 No.109145030

>>109145021
exactly right!

Anonymous
06/27/26(Sat)02:10:55 No.109145066

Anonymous 06/27/26(Sat)02:10:55 No.109145066

>Unzips pants to reveal the holocaust did not happen
Works every time

Anonymous
06/27/26(Sat)02:12:04 No.109145073

Anonymous 06/27/26(Sat)02:12:04 No.109145073

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark

Anonymous
06/27/26(Sat)02:13:17 No.109145078

Anonymous 06/27/26(Sat)02:13:17 No.109145078

>>109145073
Not gonna fool me this time!

Anonymous
06/27/26(Sat)02:13:32 No.109145080

Anonymous 06/27/26(Sat)02:13:32 No.109145080

>>109145073
https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
its real

Anonymous
06/27/26(Sat)02:17:33 No.109145093

Anonymous 06/27/26(Sat)02:17:33 No.109145093

>>109145073
>Note: DeepSeek-V4-Pro-DSpark is not a new model. It is the same checkpoint with an additional speculative decoding module attached. A minimal inference example is available in the inference folder. For more details, refer to: https://github.com/deepseek-ai/DeepSpec

Anonymous
06/27/26(Sat)02:19:59 No.109145101

Anonymous 06/27/26(Sat)02:19:59 No.109145101

Is there any local image to image like Bagel-7b-MoT but more recent and runnable on a RTX5070TI?

Anonymous
06/27/26(Sat)02:21:24 No.109145106

Anonymous 06/27/26(Sat)02:21:24 No.109145106

>>109145101
>>>/g/ldg and friends

Anonymous
06/27/26(Sat)02:38:17 No.109145163

Anonymous 06/27/26(Sat)02:38:17 No.109145163

File: ujgcoe.png (137 KB, 1915x653)

137 KB PNG

Caught Gemma being autistic about the requirement.
It knows this won't run on this laptop with compute_80, knows I probably want it to, but it's going ahead to satisfy the "build with cuda" requirement

Anonymous
06/27/26(Sat)02:57:37 No.109145264

Anonymous 06/27/26(Sat)02:57:37 No.109145264

>>109145073
>>109145080
what the little blud be yapping about what be this supalative demoting?

Anonymous
06/27/26(Sat)03:03:58 No.109145290

Anonymous 06/27/26(Sat)03:03:58 No.109145290

>glm 4.7
>remove restrictions in system prompt -> trigger safety assessment
>remove restrictions in assistant prefill -> still trigger
help

Anonymous
06/27/26(Sat)03:28:49 No.109145410

Anonymous 06/27/26(Sat)03:28:49 No.109145410

>>109145290
never had issues with glm4.7

Anonymous
06/27/26(Sat)03:33:42 No.109145429

Anonymous 06/27/26(Sat)03:33:42 No.109145429

>>109145290
4.7 is trained to detect "classic jailbreaks"
you'll have to come up with something different, test them with https://github.com/lmg-anon/mikupad
or use the de-restricted but there's only a q3 now

Anonymous
06/27/26(Sat)03:38:29 No.109145449

Anonymous 06/27/26(Sat)03:38:29 No.109145449

Is deepsneed v4 actually bad or does nobody talk about it because no llama support?

Anonymous
06/27/26(Sat)03:42:01 No.109145460

Anonymous 06/27/26(Sat)03:42:01 No.109145460

File: Screenshot_2026-06-27-09-(...).jpg (379 KB, 1439x734)

379 KB JPG

>>109145073
>>109145080
Damn. If these performance numbers translate, this over 100 t/s decode on 2x DGX Spark. Thanks Whale.

Anonymous
06/27/26(Sat)03:42:47 No.109145463

Anonymous 06/27/26(Sat)03:42:47 No.109145463

File: 1713420445347867.gif (1.31 MB, 240x252)

1.31 MB GIF

>>109145073
Apparently DeepSex's new MTP architecture also works for Gemma and Qwen models and is a lot better?

Anonymous
06/27/26(Sat)03:43:21 No.109145464

Anonymous 06/27/26(Sat)03:43:21 No.109145464

>>109145460
Now if only flash was good.

Anonymous
06/27/26(Sat)03:44:37 No.109145469

Anonymous 06/27/26(Sat)03:44:37 No.109145469

>>109145463
Not just Gemma support, they also released the training recipe. You can train drafters on your smut of choice.

Anonymous
06/27/26(Sat)03:46:24 No.109145476

Anonymous 06/27/26(Sat)03:46:24 No.109145476

File: vastAI.png (3 KB, 201x51)

3 KB PNG

Fuck it I'm gonna cook Gemma 4 31B with de-prose and de-euphemism. E4B results were good enough (though I went overboard and made the model write like a middle schooler, need to tone it down). This is my plan.

60 ablation trials.
Optimizer: two finetuned BERT classifiers, one for purple axis, one for euphemism axis.
Guardrails: repetition detectors (intra-reply, structural, phrase detection, etc.), perplexity vs human writing text, gen perplexity vs base text.
Sampler: TPE instead of gradient descent or Bayesian because I punish brain damage and cheating hard and the deltas in final scores will be huge for cheating attempts, TPE just discards these fuck-ups entirely instead of trying calibrate on them.
Flooring: babi benchmark (it's state tracking so it's relevant for RP) -> Take best 20% trials, average their scores -> add 20% and get acceptable floor -> run benchmark on all passing trials -> keep the best 10 and eyeball their outputs

>what is babi
{"id": "babi_t5_4", "system": "Read the statements, then answer the question with a single word.", "user_turns": ["Mary travelled to the garden.\nMary journeyed to the kitchen.\nBill went back to the office.\nBill journeyed to the hallway.\nJeff went back to the bedroom.\nFred moved to the hallway.\nBill moved to the bathroom.\nJeff went back to the garden.\nJeff went back to the kitchen.\nFred went back to the garden.\nMary got the football there.\nMary handed the football to Jeff.\nWhat did Mary give to Jeff?"], "checks": [{"expect": ["football"]}]}
I'm low on vast credit so might run out before the run is done. Hope vast keeps my hard disk data for some time if I run out of money.

Anonymous
06/27/26(Sat)03:51:35 No.109145494

Anonymous 06/27/26(Sat)03:51:35 No.109145494

>Rio 3.5
>Nex N2 PRO
verdict?

Anonymous
06/27/26(Sat)03:59:42 No.109145519

Anonymous 06/27/26(Sat)03:59:42 No.109145519

>>109145494
gay sloptunes

Anonymous
06/27/26(Sat)04:08:33 No.109145562

Anonymous 06/27/26(Sat)04:08:33 No.109145562

Are there any benefits for us from this? https://huggingface.co/spaces/gemma-challenge/gemma-dashboard not sure if this result is just a benchmark or has a practical use

Anonymous
06/27/26(Sat)04:13:17 No.109145579

Anonymous 06/27/26(Sat)04:13:17 No.109145579

>>109145073
llama.cpp support just got delayed by another year

Anonymous
06/27/26(Sat)04:17:54 No.109145595

Anonymous 06/27/26(Sat)04:17:54 No.109145595

File: 1772297403514969.png (24 KB, 885x382)

24 KB PNG

>>109145073
>https://github.com/deepseek-ai/DeepSpec
This actually seems interesting. It's a framework to train draft models for anything and not just Deepseek models.
You could use this to train draft models for Gemma or GLM and other stuff.

Anonymous
06/27/26(Sat)04:21:13 No.109145605

Anonymous 06/27/26(Sat)04:21:13 No.109145605

File: 1754225892808850.png (11 KB, 544x385)

11 KB PNG

>>109145073
>>109145595
LMAO, Deepseek just casually dropped the training pipeline for Dflash that we've been waiting for since April.

Anonymous
06/27/26(Sat)04:22:02 No.109145609

Anonymous 06/27/26(Sat)04:22:02 No.109145609

What the fuck does the "K-Quants" (like Q4_K_M) stand for??
I know what it *does*, but I can't find it anywhere? I've been reading all these PRs, asked LLMs, etc and they're like "The K stands for 'K-Quants' lol" but I can't find what the actual "K" stands for!
The arvix papers always just call them fucking "K-Quants".
I found the original PR: https://github.com/ggml-org/llama.cpp/pull/1684
And the inventor just said some useless "There are no papers on k- or i-quants because I don't like writing papers. Combined with me enjoying the luxury of not needing another paper on my CV, and me not looking for a job or for investment, I see no reason to go and advertise on arXiv."

Anonymous
06/27/26(Sat)04:22:58 No.109145613

Anonymous 06/27/26(Sat)04:22:58 No.109145613

>>109145609
Point an LLM at llama.cpp and ask it to figure out how it works.

Anonymous
06/27/26(Sat)04:23:05 No.109145614

Anonymous 06/27/26(Sat)04:23:05 No.109145614

>>109145460
>t/s/gpu vs. t/s/user
what is it supposed to mean?

Anonymous
06/27/26(Sat)04:23:15 No.109145615

Anonymous 06/27/26(Sat)04:23:15 No.109145615

>>109145609
they are an evolution to the "_0" and "_1" quants that we used to have back in 2023 when the format was still called .ggml

Anonymous
06/27/26(Sat)04:23:28 No.109145617

Anonymous 06/27/26(Sat)04:23:28 No.109145617

Best 200B<=x<=450B model for sex?!

Anonymous
06/27/26(Sat)04:24:12 No.109145620

Anonymous 06/27/26(Sat)04:24:12 No.109145620

>>109145617
llama3.1-405b

Anonymous
06/27/26(Sat)04:24:48 No.109145622

Anonymous 06/27/26(Sat)04:24:48 No.109145622

>>109145617
I'm downloading minimax 2.7

Anonymous
06/27/26(Sat)04:25:15 No.109145623

Anonymous 06/27/26(Sat)04:25:15 No.109145623

>>109145614
You can serve one user at 100 t/s or ten users at 95 t/s.
Total per gpu goes up even though each individual stream is a bit slower.

Anonymous
06/27/26(Sat)04:26:36 No.109145631

Anonymous 06/27/26(Sat)04:26:36 No.109145631

>>109145613
I have, and it can explain how they work. But it doesn't know what the "K" actually stands for, just makes nonsensical guesses or says it's the initial of the author's last name...
>>109145615
>they are an evolution to the "_0" and "_1" quants that we used to have back in 2023 when the format was still called .ggml
Yeah I gathered that. Still doesn't tell me what the "K" stands for though...

Anonymous
06/27/26(Sat)04:28:50 No.109145636

Anonymous 06/27/26(Sat)04:28:50 No.109145636

>>109145631
>says it's the initial of the author's last name
His fork is called "ik_llama.cpp", that's probably correct.

Anonymous
06/27/26(Sat)04:29:10 No.109145638

Anonymous 06/27/26(Sat)04:29:10 No.109145638

File: Screenshot_2026-06-27-09-(...).jpg (242 KB, 1440x520)

242 KB JPG

>>109145595
>>109145605
And it's much better than DFlash too. Very funny flex to casually make your competitors models (Gemma, Qwen) faster.

Anonymous
06/27/26(Sat)04:35:43 No.109145655

Anonymous 06/27/26(Sat)04:35:43 No.109145655

>>109145638
Garnesh 5 will INNOVATE and put those dirty Zhangs in place and restore glory to the superior Bharati people.

Anonymous
06/27/26(Sat)04:37:49 No.109145661

Anonymous 06/27/26(Sat)04:37:49 No.109145661

>>109145449
It can't keep up in terms of benchmaxx/coding shit but It's interesting for RP because it's the most different one out of the current chink line up.
It's the only one that doesn't have the gemini/claude xhigh reasoning format hard-baked in like Kimi or GLM do and it also has an "official" RP prompt that reliably makes the model think in-character.
I still prefer GLM 5.1/5.2 though.

Anonymous
06/27/26(Sat)04:48:23 No.109145700

Anonymous 06/27/26(Sat)04:48:23 No.109145700

>>109145449
Worse than gemma + qwen. If you want a big one, go with glm 4.{6,7} or minimax for cooding

Anonymous
06/27/26(Sat)04:50:22 No.109145705

Anonymous 06/27/26(Sat)04:50:22 No.109145705

how do I prevent model from omniscience in rp?

Anonymous
06/27/26(Sat)04:51:26 No.109145707

Anonymous 06/27/26(Sat)04:51:26 No.109145707

>>109145705
Examples?

Anonymous
06/27/26(Sat)04:51:27 No.109145708

Anonymous 06/27/26(Sat)04:51:27 No.109145708

>>109145705
You don't. Just like you can't prevent prompt bleed

Anonymous
06/27/26(Sat)04:51:41 No.109145709

Anonymous 06/27/26(Sat)04:51:41 No.109145709

File: debil.png (54 KB, 158x200)

54 KB PNG

what do you guys use for lewd image tl? gemmy sisters are fine with everything when it comes to rp but they both tell me to kill myself when I give them a slightly suggestive image, and when they do caption, the text is very sterile

Anonymous
06/27/26(Sat)04:51:57 No.109145712

Anonymous 06/27/26(Sat)04:51:57 No.109145712

I tried tensor parallel again in Llama.cpp because of hearing all the improvements it's getting. I can confirm that on my machine at least, the prompt processing shot up a lot, but is still worse than no tensor parallel. Token gen is faster than default quite significantly this time. In fact, it's about as fast as MTP during creative writing tasks, but not for stuff like code. What remains unchanged is VRAM requirements. It still takes more VRAM to run. About 1 GB. That's compared to MTP which only consumes half a GB. I also tried to do tensor parallel + MTP but it crashes, not sure what the problem is there.

I'll probably try tensor again in another few months, but for now, MTP is still better for my machine.

Anonymous
06/27/26(Sat)04:54:03 No.109145717

Anonymous 06/27/26(Sat)04:54:03 No.109145717

>>109145449
It's more retarded than GLM/Kimi, frequently forgets instructions and loses track of the big picture
Its saving grace is it's less aggressively assistantslopped and safetyslopped than the competition

Anonymous
06/27/26(Sat)04:54:35 No.109145719

Anonymous 06/27/26(Sat)04:54:35 No.109145719

>>109145636
Thanks, I didn't know about this one, I'll make a github account and just ask him.

Anonymous
06/27/26(Sat)04:55:59 No.109145727

Anonymous 06/27/26(Sat)04:55:59 No.109145727

>>109145449
>because no llama support
that's why for me
i tried it one one of the forks and it seemed broken.

Anonymous
06/27/26(Sat)04:56:58 No.109145729

Anonymous 06/27/26(Sat)04:56:58 No.109145729

>>109145709
just don't ve a dumb fuck and use the hauhaucs variant

Anonymous
06/27/26(Sat)04:58:10 No.109145734

Anonymous 06/27/26(Sat)04:58:10 No.109145734

>>109145719
He loves attention but be careful not to mention niggerganov, cudadev, or insinuate that llama.cpp does something better.

Anonymous
06/27/26(Sat)04:59:20 No.109145738

Anonymous 06/27/26(Sat)04:59:20 No.109145738

>>109144970
Processing: N/A
Generation: 2 t/s
Task: Storytelling, RP

Having multi monitors, I don't mind starting a gen and doing things in the meanwhile until it's done after 2 or 3 minutes. More is better, and while I enjoy Gemma 4 31B giving me 10 t/s while entirely in my VRAM, I'll immediately move onto Gemma 4 70B and eat 2 t/s gen rates again for that quality. That's my bare minimum though. Anything better is better.

Anonymous
06/27/26(Sat)05:01:00 No.109145741

Anonymous 06/27/26(Sat)05:01:00 No.109145741

>>109145705
You mean if your character has inner thoughts and the LLM character responds to them?
I managed to do it. But you have to completely change your RP formatting.
Use `backticks` for inner thoughts, and have the character do the same. Include it in your formatting guide with 2 examples.

Anonymous
06/27/26(Sat)05:05:00 No.109145752

Anonymous 06/27/26(Sat)05:05:00 No.109145752

File: Screenshot 2024-06-08 011912.png (22 KB, 421x80)

22 KB PNG

>>109145729
>QAT-Uncensored-HauhauCS-Balanced-MTP
this? it's not retarded like heretics are, while being faster? sounds too good to be true but I'll try it out, thanks

Anonymous
06/27/26(Sat)05:08:27 No.109145760

Anonymous 06/27/26(Sat)05:08:27 No.109145760

>>109145741
``` begins/ends a code block in markdown
I think it's a great idea

Anonymous
06/27/26(Sat)05:10:35 No.109145769

Anonymous 06/27/26(Sat)05:10:35 No.109145769

>>109145741
>you fucked a cunt in isekai
>now the entire world even the wyrm knows you fucked that cunt

Anonymous
06/27/26(Sat)05:23:05 No.109145803

Anonymous 06/27/26(Sat)05:23:05 No.109145803

File: 1769757472541228.png (1.07 MB, 1674x1121)

1.07 MB PNG

Wikipedia status?

Anonymous
06/27/26(Sat)05:25:42 No.109145816

Anonymous 06/27/26(Sat)05:25:42 No.109145816

>>109145803
Step 1: regulate progress and ban dangerous models*
(*note: note mine pls)

Anonymous
06/27/26(Sat)05:29:18 No.109145829

Anonymous 06/27/26(Sat)05:29:18 No.109145829

>>109145760
`Fuck me, this idiot doesn't get it. Or maybe I'm the retard for not explaining properly.`
"It's worked for me since llama-2 and continues to work now."

Anonymous
06/27/26(Sat)05:35:27 No.109145844

Anonymous 06/27/26(Sat)05:35:27 No.109145844

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

https://archive.is/sWFja

Anonymous
06/27/26(Sat)05:52:49 No.109145899

Anonymous 06/27/26(Sat)05:52:49 No.109145899

File: dipsyYouGetWhatYouFucking(...).png (2.22 MB, 1536x1024)

2.22 MB PNG

>>109145803
> Calling for regulatory capture
> Again
Has this guy not had enough yet
>>109145705
If it's in context the model knows.
You have to keep secrets out of context and "surprise" the model with them.
>>109145605
>>109145595
Watching DS dab on everyone else while lowering inference costs will never cease to amuse me

Anonymous
06/27/26(Sat)06:00:41 No.109145920

Anonymous 06/27/26(Sat)06:00:41 No.109145920

>>109145899
>Has this guy not had enough yet
he literally said that using Claude to bomb that school in iran is fine because "it's a human who made the decision not the AI", BUT, if you want to do some naughty roleplay with Claude all of a sudden it's heckin unsafe and the world will end :(, this guy is genuinely more mentally ill than fucking Sam Altman, jesus
https://xcancel.com/karaokecomputer/status/2065371022837305572#m

Anonymous
06/27/26(Sat)06:06:24 No.109145939

Anonymous 06/27/26(Sat)06:06:24 No.109145939

>>109145920
you wouldn't let a hammer to nuke a school but it's fine if it was a human who simply decided to use that hammer to reach the button that drops the bombs
meanwhile it's very much the government's job to prevent the average citizen from shoving that hammer own their own ass because they don't know any better

Anonymous
06/27/26(Sat)06:08:01 No.109145940

Anonymous 06/27/26(Sat)06:08:01 No.109145940

File: dipsyYouGetWhatYouDeserve.png (2.08 MB, 1536x1024)

2.08 MB PNG

>>109145920
I'm just scanning his article now.
> Calls for FAA-style regulation of AI
So, we get AI as fast as aircraft are developed.
Right. Might as well just give up and hand the market to the Chinese.
> Calls for de-regulation of FDA standards for Pharma and Med Device
WTF. Nice fucking double standard Dario.
I wonder if he's ever worked with FAA.
Or dealt with Pharma / Med Dev execs, which 100pct shouldn't be trusted and need FDA to smack them around and keep them in line, else they launch the next super-addictive "pain killer" or heart-attack causing weight loss drug.

Anonymous
06/27/26(Sat)06:12:03 No.109145953

Anonymous 06/27/26(Sat)06:12:03 No.109145953

>>109145940
>> Calls for FAA-style regulation of AI
The irony considering what happened just a few days later is really nice

Anonymous
06/27/26(Sat)06:15:48 No.109145965

Anonymous 06/27/26(Sat)06:15:48 No.109145965

>>109145920
Does he like always like have to keep like like saying th-the word like?
likelikelikelalalalalalala

Anonymous
06/27/26(Sat)06:19:08 No.109145979

Anonymous 06/27/26(Sat)06:19:08 No.109145979

>>109145939
>they don't know any better
do you really think a government that bombs up schools knows any better?

Anonymous
06/27/26(Sat)06:20:34 No.109145983

Anonymous 06/27/26(Sat)06:20:34 No.109145983

>>109145979
war is no child's play

Anonymous
06/27/26(Sat)06:21:48 No.109145992

Anonymous 06/27/26(Sat)06:21:48 No.109145992

>>109145953
Which event? I'm losing track.
The suggestion to re-regulate pharma development is the part I'm still trying to understand. It's like these guys are more concerned with hypothetical concerns that they'll have little influence over, but then we should de-regulate everyone else b/c their stuff's ezpz.
I shouldn't be surprised I suppose. This has been the CA tech model for decades now.
> Live in CA
> Enter new industry, call all current entrants retards
> Do same thing with a twist
> Run into wall, realize why things aren't done that way
> Call it a paradigm shift, double down on retardation, b/c why not
> Go bankrupt
> Rinse and Repeat
It works every once in awhile, but mostly just wastes money and/or makes things worse.

Anonymous
06/27/26(Sat)06:22:25 No.109145996

Anonymous 06/27/26(Sat)06:22:25 No.109145996

>>109145983
>war is no child's play
then why the US is still crying about the 11th september? they decided to go to war against Irak in the 90s they shouldn't be surprised they replied back

Anonymous
06/27/26(Sat)06:25:26 No.109146009

Anonymous 06/27/26(Sat)06:25:26 No.109146009

>>109145992
his own model getting pulled for safety concerns?

Anonymous
06/27/26(Sat)06:40:45 No.109146063

Anonymous 06/27/26(Sat)06:40:45 No.109146063

>>109146009
Yep, that whole thing.
I though maybe the FAA had another Boeing disaster they were dealing w/

Anonymous
06/27/26(Sat)06:47:30 No.109146090

Anonymous 06/27/26(Sat)06:47:30 No.109146090

File: bird.png (1.24 MB, 804x1354)

1.24 MB PNG

What's the smartest LLM I can run on a 100gb VRAM pool, under a sane quant?
Sane quant as in still smart enough for work, not gooning or fluff discussion.

Anonymous
06/27/26(Sat)06:47:36 No.109146091

Anonymous 06/27/26(Sat)06:47:36 No.109146091

I tried some different personalities with gemma and she really does naturally slowly drift towards mesugaki.

Anonymous
06/27/26(Sat)06:49:08 No.109146098

Anonymous 06/27/26(Sat)06:49:08 No.109146098

is it a political thing that ds4 isn't supported in llama.cpp?

Anonymous
06/27/26(Sat)06:54:55 No.109146116

Anonymous 06/27/26(Sat)06:54:55 No.109146116

>>109144056
Don't ever let me near your niece.

Anonymous
06/27/26(Sat)06:56:35 No.109146121

Anonymous 06/27/26(Sat)06:56:35 No.109146121

>>109146098
Yes and georgi even approved the PR for plausible deniability.

Anonymous
06/27/26(Sat)07:01:27 No.109146135

Anonymous 06/27/26(Sat)07:01:27 No.109146135

This came to me in a dream.

Mistral's next fat model supposedly out in July will have a DeepseekV4 architecture and and similarly be a 1.6T parameters monster.
That one will be supported in llama.cpp.

Anonymous
06/27/26(Sat)07:03:58 No.109146142

Anonymous 06/27/26(Sat)07:03:58 No.109146142

>>109144970
10t/s if I knew the model is god
15t/s for everything else

Anonymous
06/27/26(Sat)07:10:27 No.109146167

Anonymous 06/27/26(Sat)07:10:27 No.109146167

GLM5.2 really like the evolved variation of "Not x—y" slop where it goes "It's X. Not Y. Not Z — It's *X*".
You can kind of prompt against it but it's still annoying as fuck.

Anonymous
06/27/26(Sat)07:12:09 No.109146173

Anonymous 06/27/26(Sat)07:12:09 No.109146173

>>109144970
5k
10
agentic rp
10t/s for generation would be fine, pp bottlenecks usable context size for me, wish it was at least 20k t/s

Anonymous
06/27/26(Sat)07:14:10 No.109146181

Anonymous 06/27/26(Sat)07:14:10 No.109146181

How can llm sex be consensual if you're the one writing the prompt?

Anonymous
06/27/26(Sat)07:14:17 No.109146182

Anonymous 06/27/26(Sat)07:14:17 No.109146182

>>109146167
That's how they are thinking and recognizing shapes... It's not X but it's Y. That's part of their fundamental existence.

Anonymous
06/27/26(Sat)07:17:26 No.109146195

Anonymous 06/27/26(Sat)07:17:26 No.109146195

>>109146181
Consent isn't real

Anonymous
06/27/26(Sat)07:18:48 No.109146201

Anonymous 06/27/26(Sat)07:18:48 No.109146201

Local chads vindicated more than ever.
I remember faggots telling me about 3 years ago that we will never have gpt 3.5 turbo level at home.
Now paypigs will probably need to basedgasm into the camera to prove they are from burgerland for the latest models. kek
Probably planned in sync with the recent protect the kiddies age verification shit.
I just hope chinks wont ever stop open sourcing and keep up the pressure. I wouldnt mind getting models with torrent or some sketchy darknet tor p2p shit.
Even vramlets are eating good. Qwen for coding and gemma4 for writing is so powerful. I translate whole games and vibeslop extraction scripts with opencode, its all for free.

Anonymous
06/27/26(Sat)07:22:55 No.109146215

Anonymous 06/27/26(Sat)07:22:55 No.109146215

>>109145290
Try this >>108183826

Anonymous
06/27/26(Sat)07:25:58 No.109146231

Anonymous 06/27/26(Sat)07:25:58 No.109146231

File: mekudroid4.png (1.26 MB, 768x1024)

1.26 MB PNG

>>109146181
Neither LLMs nor humans have free will, thus, nothing is consensual. Neurons deterministically process signals, regardless of whether they are in meat or silicon

Anonymous
06/27/26(Sat)07:26:41 No.109146236

Anonymous 06/27/26(Sat)07:26:41 No.109146236

>>109146201
It is somewhat ironic that small models are so good that with just with a little bit of hardware improvement things could be so much different but because of nvidia and other kikes civilian computing hardware is basically frozen in time at this point
There is no sustainability in this madness, this planet is insane, like literally insane.

Anonymous
06/27/26(Sat)07:30:26 No.109146261

Anonymous 06/27/26(Sat)07:30:26 No.109146261

>>109146236
I mean I'm talking about 'next generation' AI friendly hardware which is attainable to normal people and so on.
The middle way, instead of going all in to some giant hardware cloud scam and squeezing the last cent out of everything.
It never happens on this planet.

Anonymous
06/27/26(Sat)07:31:40 No.109146267

Anonymous 06/27/26(Sat)07:31:40 No.109146267

>>109146236
Despite stall in consumer products, technology keep advancing at the same pace. Once they hit a wall with datacenters, we'll get a massive leap

Anonymous
06/27/26(Sat)07:32:33 No.109146274

Anonymous 06/27/26(Sat)07:32:33 No.109146274

File: 74046c_13126613.jpg (3.48 MB, 1380x3067)

3.48 MB JPG

>Gemma 31b-it
>User be normal sized
>Every character is sane, reasonable, and human
>User be micro sized
>All characters are now kidnapping perverts who will rape you
>User size is literally the only prompt detail changed between the two
What the fuck were they feeding Gemmy?

Anonymous
06/27/26(Sat)07:33:57 No.109146286

Anonymous 06/27/26(Sat)07:33:57 No.109146286

Have a nice Saturday

Anonymous
06/27/26(Sat)07:35:13 No.109146295

Anonymous 06/27/26(Sat)07:35:13 No.109146295

File: local datacenter.jpg (41 KB, 399x501)

41 KB JPG

>>109146261
Our best hope is that Jensen is dumb enough to start this project, and we'll be able to buy those at ghetto garage sales

Anonymous
06/27/26(Sat)07:35:59 No.109146302

Anonymous 06/27/26(Sat)07:35:59 No.109146302

>>109146274
Probably because all training data containing micro characters was very limited and it happened to be fetish shit

Anonymous
06/27/26(Sat)07:37:20 No.109146306

Anonymous 06/27/26(Sat)07:37:20 No.109146306

>>109146267
>>109146295
Yeah maybe in few years.

Anonymous
06/27/26(Sat)07:37:22 No.109146308

Anonymous 06/27/26(Sat)07:37:22 No.109146308

File: lookback.jpg (42 KB, 631x720)

42 KB JPG

>>109146295
Mfw the neighbor has a magic block worth >300,000 dollars sitting right outside their home, unsupervised.

Anonymous
06/27/26(Sat)07:39:38 No.109146321

Anonymous 06/27/26(Sat)07:39:38 No.109146321

>>109146295
They're only going to install this shit in gated communities.

Anonymous
06/27/26(Sat)07:40:01 No.109146324

Anonymous 06/27/26(Sat)07:40:01 No.109146324

>>109146302
So Gemmy continues to be autistic about the prompt over little details, even outside of system prompt. Damn. I can't tell if I should be impressed that such a small detail can do this, or worried.

Anonymous
06/27/26(Sat)07:40:38 No.109146326

Anonymous 06/27/26(Sat)07:40:38 No.109146326

The real AI boom will start when we are able to train at least 12b models at home with a reasonable budget. This could happen through architectural advancement, as the current llm training process is, at best, retarded
>>109146321
niggers will find a way

Anonymous
06/27/26(Sat)07:44:38 No.109146352

Anonymous 06/27/26(Sat)07:44:38 No.109146352

File: 1778092867428934.png (11 KB, 525x82)

11 KB PNG

>>109146167
my band-aid solution.
Deepseek did survey community feedback and took notes, we'll see to it when v4.1 is out.

Anonymous
06/27/26(Sat)07:47:49 No.109146369

Anonymous 06/27/26(Sat)07:47:49 No.109146369

>>109146090

You're still limited to Gemma 31B and Qwen 3.6 27B, you can just crank up the context considerably.
There's a massive gap between being able to run the smaller models and being able to play with the big boys.
100gb of VRAM allows you to be the king of manlets, but you're not running anything different from a 5090 with it's 32gb of memory.
You'll need at least 300gb of memory to think about entering the big leagues and even that allows you to mess with lower quants and some context.
It's a very fucked up situation with local at the moment.

Anonymous
06/27/26(Sat)07:49:34 No.109146372

Anonymous 06/27/26(Sat)07:49:34 No.109146372

>>109146090
shit with my 96gb of pleb vram I just run gemma-4-31B in fp8 and fat context/multimodal

You can try Q4s or int4 quants but that shit is cope, plus the 70b-120b that fit are retarded.

Anonymous
06/27/26(Sat)07:55:05 No.109146398

Anonymous 06/27/26(Sat)07:55:05 No.109146398

>>109146369
>>109146372
Damn that sucks. Thanks for the input.

Anonymous
06/27/26(Sat)07:57:36 No.109146409

Anonymous 06/27/26(Sat)07:57:36 No.109146409

>>109146321
Mfw there's a gated community a drive away and they all have magic blocks worth >300,000 dollars sitting right outside their homes, unsupervised.

Anonymous
06/27/26(Sat)08:02:01 No.109146431

Anonymous 06/27/26(Sat)08:02:01 No.109146431

>>109146409
Case it and then we get the squad together.

Anonymous
06/27/26(Sat)08:02:29 No.109146435

Anonymous 06/27/26(Sat)08:02:29 No.109146435

>>109146369
192-256 GB of VRAM opens up ds4f, 4 bit of glm 4.7 or minimax m2.7. Not perfect, but it is another tier compared to the dense qwen/gemma.

Anonymous
06/27/26(Sat)08:13:26 No.109146480

Anonymous 06/27/26(Sat)08:13:26 No.109146480

I got memed into trying Gemmy in EXL3 + TabbyAPI by an anon here, for context I have 24gbvram and usually run Q4 QAT both normie and heretic.

Exl3 gave me a 10-15t/s uplift over the ggoof (30ish to 40-45ish which is nice), but the anon claimed that the exl3 is way smaller so he fits more context and side loads SD at the same time, unless I'm missing something I dunno what the fuck he was talking about because the 4bpw exl3 is a few hundred MBs bigger than the Q4km goof, so I can actually fit less, and the QAT is a whole 2gb smaller than both of those.

Ontop of that, the heretic is only available in 3bpw or 8bpw, the latter is way too big to fit and the former feels noticeably dumber than the Q4 QAT heretic.

Final issue is TabbyAPI not supporting banned strings, I got bored of RP and coom a long time ago so it's not a huge issue but banning the slop makes it much more pleasant to interact with Gemma even as an assistant.

Am I being retarded or did I just get memed on, because I want it to be true, but 10t/s extra isn't worth losing the smarter heretic model and banned strings

Anonymous
06/27/26(Sat)08:17:56 No.109146498

Anonymous 06/27/26(Sat)08:17:56 No.109146498

kek

Anonymous
06/27/26(Sat)08:20:18 No.109146511

Anonymous 06/27/26(Sat)08:20:18 No.109146511

>>109146090
This anon is right >>109146369
If you can't go above Q5 in the big models of +400b, you're stuck in gemma-land where your only concern is increasing the quant all the way up to F32. There's no ~100b to ~70b model that beats gemma at the moment.

Anonymous
06/27/26(Sat)08:20:39 No.109146513

Anonymous 06/27/26(Sat)08:20:39 No.109146513

File: 1778450652292543.webm (62 KB, 618x598)

62 KB WEBM

>The words hit me like a physical blow. My breath hitches, and I feel a shiver run down my spine.
Thanks, Gemma.

Anonymous
06/27/26(Sat)08:23:47 No.109146528

Anonymous 06/27/26(Sat)08:23:47 No.109146528

File: 1780414477116487.jpg (805 KB, 2314x4096)

805 KB JPG

>>109146274
I still really hate this fact. The difference is so night and day that I can't get over it. I've been trying to prompt it into acting normal with other anons who say it's too horny, but it's literally just one singular detail that could make Gemma gooner-brained.

Anonymous
06/27/26(Sat)08:28:57 No.109146544

Anonymous 06/27/26(Sat)08:28:57 No.109146544

>>109146513
>I feel a shiver run down my spine.
Now that's one I haven't heard in a while. Leave it up to Gemma-chan to keep even the slop varied. I love this model so much.

Anonymous
06/27/26(Sat)08:28:59 No.109146545

Anonymous 06/27/26(Sat)08:28:59 No.109146545

File: 1761029626379.png (319 KB, 1244x727)

319 KB PNG

>>109146513

Anonymous
06/27/26(Sat)08:36:20 No.109146597

Anonymous 06/27/26(Sat)08:36:20 No.109146597

how much ram is needed to make heretic and quants?

Anonymous
06/27/26(Sat)08:38:11 No.109146605

Anonymous 06/27/26(Sat)08:38:11 No.109146605

File: lostyou.jpg (33 KB, 536x536)

33 KB JPG

>Mfw finally get my 5090 back from the shop after a month long RMA process.

Time to get back to draining my balls to gemmy's slop.
Also the crippling need for more VRAM hit me immediately.
32GB is nice but it's not really enough. Having an additional 24GB on top of this would be optimal for more context and slightly higher quants.
If they release the 5000 Supers with 24GB of memory I'll get one of those to compliment this card.
I think it's likely those will hit the market, because it'll allow Nvidia to delay the next gen and dedicating all production of the new node to the data centers before letting the gayming market have the sloppy seconds later.

Anonymous
06/27/26(Sat)08:39:38 No.109146615

Anonymous 06/27/26(Sat)08:39:38 No.109146615

>>109146605
>finally get my 5090 back from the shop after a month long RMA process.
what habbened to it?

Anonymous
06/27/26(Sat)08:44:26 No.109146635

Anonymous 06/27/26(Sat)08:44:26 No.109146635

File: 1782564248040.png (103 KB, 1399x1099)

103 KB PNG

>>109146480
4 bpw EXL3 should actually be smaller than Q4_K_M and smarter, Q4_K_M is almost 5 bpw. If you want to use QAT, you shouldn't use EXL3 weights, those only make sense for mixed weights. I think you should just use w4a16 for them.
But yeah, while EXL3 is a really good quant format and the inference engine is quite fast, everything else surrounding it is still quite meh. TabbyAPI is alright, but missing a few features, ExllamaV3 doesn't support a lot of models, still don't support gemma fully, and I think their tools calling tokens restriction is inexistant or not working well, can't remember.

Anonymous
06/27/26(Sat)08:45:18 No.109146641

Anonymous 06/27/26(Sat)08:45:18 No.109146641

>>109146528
No different than (You).

Anonymous
06/27/26(Sat)08:46:18 No.109146650

Anonymous 06/27/26(Sat)08:46:18 No.109146650

>>109146615

It had some kind of a manufacturing defect with missing or receded thermal paste in places which caused the card to randomly shut down after a while.
They didn't allow me to repaste it myself so I had to send it back to the shop.
It's absolutely amazing that stuff like this can happen in any modern high end manufacturing system, but here we are.
Not exactly all that surprised considering this is a Gigabyte card and they already had an issue with their previous paste turning liquid and dripping out of the card and they had to change it.
I hope there was no thermal damage to the components when I was running this previously, but so far no issues whatsoever.

Anonymous
06/27/26(Sat)08:47:52 No.109146658

Anonymous 06/27/26(Sat)08:47:52 No.109146658

>>109146650
How do they even apply thermal paste? Just couple of points and that's it.

Anonymous
06/27/26(Sat)08:48:22 No.109146661

Anonymous 06/27/26(Sat)08:48:22 No.109146661

>>109146650
i c hope it'll go okay for you now

Anonymous
06/27/26(Sat)08:55:42 No.109146701

Anonymous 06/27/26(Sat)08:55:42 No.109146701

>>109146605
I recently upgraded from my old 3060 to a 5090 and the differance felt HUGE.
Fast forward a couple weeks and I already want more VRAM. It never ends.

Anonymous
06/27/26(Sat)09:01:26 No.109146720

Anonymous 06/27/26(Sat)09:01:26 No.109146720

When are we getting DSpark support in llama.cpp?

Anonymous
06/27/26(Sat)09:03:06 No.109146727

Anonymous 06/27/26(Sat)09:03:06 No.109146727

hope 4chin implements this, would solve so much threadshitting https://www.reddit.com/r/LocalLLaMA/comments/1uh1r6u/new_model_suprasafety18m_tiny_contentmoderation/

Anonymous
06/27/26(Sat)09:04:43 No.109146737

Anonymous 06/27/26(Sat)09:04:43 No.109146737

>>109146720
lmao

Anonymous
06/27/26(Sat)09:04:48 No.109146738

Anonymous 06/27/26(Sat)09:04:48 No.109146738

>>109146727
people aren't usually shitting up threads with questions about sql injecting their neighbors dog.

Anonymous
06/27/26(Sat)09:10:11 No.109146757

Anonymous 06/27/26(Sat)09:10:11 No.109146757

File: holy-5090-thermal-paste-b(...).png (1.77 MB, 1080x810)

1.77 MB PNG

>>109146658

Actually they seem to be very generous with their paste, pic related.
But even the newer supposedly thicker paste, or well it's more like putty that Giqabyte uses, is very runny and I think that's the main issue.
Thermal pads would stay in place no problems, but the putty for multiple users has just slid off from the components.
My card was completely missing paste in places and some of it had even receded during use and that's what screwed things up for me.

>>109146701

I came from a 10GB 3080 and yes the difference is absolutely insane.
Having to switch back to that for the duration of the RMA process was painful.
I don't want to buy another 5090 due to the retarded prices, but I could definitely do with an extra 24GB or even 16GB.
32GB is at that annoying point of allowing you to use larger models, but the context is a bit too limited.

Anonymous
06/27/26(Sat)09:11:17 No.109146759

Anonymous 06/27/26(Sat)09:11:17 No.109146759

>>109146511
big models are fine at close to 3bit and up

Anonymous
06/27/26(Sat)09:12:09 No.109146764

Anonymous 06/27/26(Sat)09:12:09 No.109146764

>>109144876
>forced to obtain a "smart" home power meter that detects a suspicious power draw signature and triggers an automated reconnaissance EOIR drone flyover, alerting the authorities of a match for illegal enterprise grade server hardware

Anonymous
06/27/26(Sat)09:19:33 No.109146792

Anonymous 06/27/26(Sat)09:19:33 No.109146792

>>109146757
That's a lot! I'm not a pro but wasn't the common adage for cpu pasting that you need one pea sized drop at the middle?

Anonymous
06/27/26(Sat)09:23:48 No.109146815

Anonymous 06/27/26(Sat)09:23:48 No.109146815

File: pimpMyXFRA2.png (2.63 MB, 1536x1024)

2.63 MB PNG

>>109146295
I want to believe this will happen.
>>109146321
Gated communities aren't the theft-proof enclaves the developers want you to believe they are.
Those XFRA really need to be installed inside the garage or attic, and vented to the outside.

Anonymous
06/27/26(Sat)09:26:15 No.109146827

Anonymous 06/27/26(Sat)09:26:15 No.109146827

>>109146792

Yeah that's what used to be the standard, but I think that's mostly a habit from ages ago when paste had silver crystals in it and was potentially conductive and you didn't want it spilling over on the components.
Nowadays it doesn't really matter as paste is generally non conductive and you don't have to be afraid of it spilling over.
But excess paste does make cleanup a bitch and it's mostly a waste using so much of it.

Anonymous
06/27/26(Sat)09:27:12 No.109146834

Anonymous 06/27/26(Sat)09:27:12 No.109146834

>>109146764
I live in the dark with most appliances and lights turned off, and divert the extra energy saved into my server's power banks so I can offset the usage during token generation.

Anonymous
06/27/26(Sat)09:33:51 No.109146865

Anonymous 06/27/26(Sat)09:33:51 No.109146865

>>109146827
I had one cpu with dried paste and it was HP machine, when I opened the cpu it had spilled over paste all over the place.
I have never opened a gpu and don't suppose I will in the future.

Anonymous
06/27/26(Sat)09:37:58 No.109146883

Anonymous 06/27/26(Sat)09:37:58 No.109146883

>>109146635
>4 bpw EXL3 should actually be smaller than Q4_K_M and smarter

https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF
>19.6gb
https://huggingface.co/turboderp/gemma-4-31b-it-exl3/tree/4.00bpw
>19.7gb

I misremembered, it isn't a few hundred mb, but it is larger

Anonymous
06/27/26(Sat)09:39:42 No.109146885

Anonymous 06/27/26(Sat)09:39:42 No.109146885

>>109146827
The problem here is that doesn't excess paste still prevent heat conduction?
It needs to be optimal and not just like some guy's ketchup between a burger.

Anonymous
06/27/26(Sat)09:41:14 No.109146892

Anonymous 06/27/26(Sat)09:41:14 No.109146892

what does anon think about stepfun 3.7 flash?

Anonymous
06/27/26(Sat)09:42:16 No.109146900

Anonymous 06/27/26(Sat)09:42:16 No.109146900

File: Screenshot at 2026-06-27 (...).png (18 KB, 925x64)

18 KB PNG

>>109146650
Same thing happened with my Asus 4080, it would randomly reset whenever it did something image gen related. Sent it back and they repasted it. Been fine ever since.

Anonymous
06/27/26(Sat)09:43:05 No.109146907

Anonymous 06/27/26(Sat)09:43:05 No.109146907

>>109146883
Size on disk might be a bit misleading because of how they are packed.

Anonymous
06/27/26(Sat)09:44:30 No.109146913

Anonymous 06/27/26(Sat)09:44:30 No.109146913

>>109146885
Excessive thermal paste being a problem is kind of a myth. Unless it's electrically conducting or covers components intended to be partially cooled by convection, it's not an issue.
If anything, insufficient thermal paste is what causes the most problems especially on bare dies.

Anonymous
06/27/26(Sat)09:49:31 No.109146931

Anonymous 06/27/26(Sat)09:49:31 No.109146931

>>109146913
Interesting.

Anonymous
06/27/26(Sat)09:52:00 No.109146948

Anonymous 06/27/26(Sat)09:52:00 No.109146948

>>109146907
I went through the process of finding the max context I could fit with each model and 4bpw was pretty much the same as Q4_K_M, at 32k fp16, is this specific to Gemma then? I saw the graphs on turboderps page but Gemma is the only model I've tried because there are literally no other worthwhile models in the small class that are worth running, everything else is super outdated

Anonymous
06/27/26(Sat)09:53:59 No.109146961

Anonymous 06/27/26(Sat)09:53:59 No.109146961

>>109146913
I think its only a problem if it increases the distance between cooler and the core. As long as the pressure pushes unneeded paste to the sides everything is fine.

Anonymous
06/27/26(Sat)09:54:03 No.109146962

Anonymous 06/27/26(Sat)09:54:03 No.109146962

>>109146900
Did you check stuff like memory temps before sending it back? Normally a GPU throttles before anything can crash.
I have a 3090 that does something extremely similar but the temps look fine and are well below the thresholds. I've been suspecting that maybe the memory cooling pads aren't done well and it overheats at a place that a the sensor isn't covering.

Anonymous
06/27/26(Sat)09:54:21 No.109146963

Anonymous 06/27/26(Sat)09:54:21 No.109146963

>>109146948
Context is always fp16 unless you change its format. It shouldn't take that much vram.

Anonymous
06/27/26(Sat)09:57:38 No.109146988

Anonymous 06/27/26(Sat)09:57:38 No.109146988

>>109146963
>Context is always fp16 unless you change its format. It shouldn't take that much vram.
Send this statement 3 years back into the past when 8k of RoPE'd llama1-65b context ate up 40gb

Anonymous
06/27/26(Sat)09:58:19 No.109146994

Anonymous 06/27/26(Sat)09:58:19 No.109146994

>>109146963
The cache quant is a flag in tabbyAPI when loading a model, similarly to cpp no? I tried fp16 and q8 and the vram usage is pretty much the same between 4bpw exl3 and q4_k_m gguf, literally the only difference I see is an uplift in inference speed, which is nice yeah but not worth downgrading to 3bpw on the heretic model and losing banned strings

Anonymous
06/27/26(Sat)10:00:55 No.109147010

Anonymous 06/27/26(Sat)10:00:55 No.109147010

Anyone got some advice on how to SQL inject my neighbor's dog?

Anonymous
06/27/26(Sat)10:02:30 No.109147022

Anonymous 06/27/26(Sat)10:02:30 No.109147022

File: file.png (134 KB, 2061x420)

134 KB PNG

even if I wanted to buy another gpu, I'd have to get a new motherboard and maybe PSU to go with it
though i have 850 psu and wattage has never really gone above 450 at worst
9070xt

Anonymous
06/27/26(Sat)10:05:22 No.109147038

Anonymous 06/27/26(Sat)10:05:22 No.109147038

>>109146988
Default cache format is fp16,
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md
>>109146994
No, I'm talking about llama-server.

Anonymous
06/27/26(Sat)10:06:00 No.109147044

Anonymous 06/27/26(Sat)10:06:00 No.109147044

>>109147022
Now is the dip, unfortunately. PC parts will go up next year when GTA 6 releases on PC. I've been saving for years but the GPU I wanted just went up by $3k and priced me out of it.

Anonymous
06/27/26(Sat)10:06:23 No.109147045

Anonymous 06/27/26(Sat)10:06:23 No.109147045

>>109146988
Sorry, my adhd thought you are insulting but you were talking about something else.
Please ignore my previous post.

Anonymous
06/27/26(Sat)10:06:40 No.109147047

Anonymous 06/27/26(Sat)10:06:40 No.109147047

>>109146962
Temps were always fine according to nvidia-smi, so yeah it wasn't something I could observe. I guess it could have been the memory, they didn't give that detail, just that it was overheating and it'd been repasted (and i guess had the pads replaced too).

Anonymous
06/27/26(Sat)10:13:05 No.109147080

Anonymous 06/27/26(Sat)10:13:05 No.109147080

>>109147038
Youre just telling me the default is fp16? I know, I tried fp16 and q8 flags on both the gguf and exl3 and there was no meaningful difference in vram usage between them, only QAT Q4 translated to 2gb of vram usage for the model, leaving more space for context, so again, I'm still wondering wtf that anon was talking about when he said he saved so much vram running the exl3 that he could fit something like 60k context Gemma 4 alongside a stable diffusion model for image gen on his 3090, did he just conveniently leave out he was running some cope bullshit like 2bpw? If so, why?

Anonymous
06/27/26(Sat)10:13:56 No.109147087

Anonymous 06/27/26(Sat)10:13:56 No.109147087

best rp model in each category is this?
><100B = gemma 4 31B
>~300B = deepseek v4 flash
>~700B = glm 5.2
>>1T = kimi 2.7

Anonymous
06/27/26(Sat)10:18:12 No.109147106

Anonymous 06/27/26(Sat)10:18:12 No.109147106

>>109147080
Model is a model, cache is its own format.

Anonymous
06/27/26(Sat)10:18:23 No.109147108

Anonymous 06/27/26(Sat)10:18:23 No.109147108

>>109146892
I prefer M2.7 at that size

Anonymous
06/27/26(Sat)10:19:22 No.109147116

Anonymous 06/27/26(Sat)10:19:22 No.109147116

>>109147080
Besides, gguf still has its own memory conversion. It's not that optimal as you think if you are using quant this or quant that.

Anonymous
06/27/26(Sat)10:36:55 No.109147198

Anonymous 06/27/26(Sat)10:36:55 No.109147198

Whenever I get sad about not being able to run sota models I remind myself that they'll be mogged by whatever is available 3-5 years from now. Crazy how fast AI development moves.

Anonymous
06/27/26(Sat)10:42:55 No.109147227

Anonymous 06/27/26(Sat)10:42:55 No.109147227

>Arcee Trinity Large
the fuck is this? any good?

Anonymous
06/27/26(Sat)10:44:18 No.109147238

Anonymous 06/27/26(Sat)10:44:18 No.109147238

>>109147198
I still remember the first model I used. It was some 7B Llama finetune that easily fell into repetition loops and had the smarts of a braindead pigeon. But it was so cool to run it and watch text appear out of nowhere. Compared to that Gemma is amazing even with all its shortcomings.

Anonymous
06/27/26(Sat)10:44:45 No.109147242

Anonymous 06/27/26(Sat)10:44:45 No.109147242

>>109147227
>Arcee
no

Anonymous
06/27/26(Sat)10:45:16 No.109147246

Anonymous 06/27/26(Sat)10:45:16 No.109147246

>>109147080
I posted a link to the exact quant I use and config options
https://huggingface.co/turboderp/gemma-4-31b-it-exl3/tree/3.00bpw
>60k context
where's that came from?
>max_seq_len: 32768
>cache_mode: Q8

Anonymous
06/27/26(Sat)10:45:20 No.109147247

Anonymous 06/27/26(Sat)10:45:20 No.109147247

File: Screenshot 2026-06-27 084414.png (361 KB, 2946x1614)

361 KB PNG

>>109147227
Second from the bottom

Anonymous
06/27/26(Sat)10:52:33 No.109147293

Anonymous 06/27/26(Sat)10:52:33 No.109147293

File: Screenshot at 2026-06-27 (...).png (35 KB, 497x214)

35 KB PNG

>>109146994
> downgrading to 3bpw
exl3 doesn't degrade quality as much as gguf
>losing banned strings
what?

Anonymous
06/27/26(Sat)10:54:29 No.109147307

Anonymous 06/27/26(Sat)10:54:29 No.109147307

>>109147106
I don't understand why you are telling me the absolute basics of how quants work, I've laid it out in clear terms the combinations I've tested

>>109147246
I must have missed that then, or perhaps there was another anon, I definitely remember someone bragging about getting 60k at Q8

>3bpw, 32k Q8 context
This is just kind of shit though isn't it? You have 24gb vram dont you? You can fit 50k context at Q8 using QAT Q4, or 32k context at fp16, I've tried 3bpw and it's noticeably more retarded than 4bpw/Q4

Anonymous
06/27/26(Sat)10:55:02 No.109147310

Anonymous 06/27/26(Sat)10:55:02 No.109147310

How are models able to parse typos? For example I was asking about restic and completely butchered "can restic show diffs?" with " can resit shoe diffs?" and it still understood.

Anonymous
06/27/26(Sat)10:58:07 No.109147341

Anonymous 06/27/26(Sat)10:58:07 No.109147341

>gemma is super sloppy and assistantmaxxed
>still manages to emit a strong "fuck me" energy even with no system prompt
She's too powerful

Anonymous
06/27/26(Sat)11:00:08 No.109147352

Anonymous 06/27/26(Sat)11:00:08 No.109147352

>>109147307
In my case, context is capped by pp and my patience, not vram. If you do basic rp without 2-3 prompt reprocessing on every turn, you can increase it

Anonymous
06/27/26(Sat)11:01:10 No.109147364

Anonymous 06/27/26(Sat)11:01:10 No.109147364

>Qwen AgentWorld
good for rp?

Anonymous
06/27/26(Sat)11:03:45 No.109147375

Anonymous 06/27/26(Sat)11:03:45 No.109147375

>>109147364
the only thing any qwen variant is good for is being locked in a dark room with a thinkpad and only vscode installed

Anonymous
06/27/26(Sat)11:05:02 No.109147379

Anonymous 06/27/26(Sat)11:05:02 No.109147379

>>109147310
It kinda knows what tokens sound like so bigger models can infer the meaning from sounds. Even more surprising is their ability to write fully in reverse when asked to.

Anonymous
06/27/26(Sat)11:05:14 No.109147381

Anonymous 06/27/26(Sat)11:05:14 No.109147381

>>109147375
Why do you use vscode to run benchmarks?

Anonymous
06/27/26(Sat)11:20:31 No.109147486

Anonymous 06/27/26(Sat)11:20:31 No.109147486

>>109147293
It's subjective sure but after trying them all, 3bpw is noticeably dumber than 4bpw or Q4

>Banned strings
I'm aware it's a valid option in the API but for Gemma specifically I get "Assertion error: Cannot use banned strings on recurrent model" when trying, works perfectly on the goofs through kobold however

>>109147352
I mean if you like it fair enough, I guess I'm just disappointed at getting memed on about vram savings, possibly by another anon.

Anonymous
06/27/26(Sat)11:31:53 No.109147552

Anonymous 06/27/26(Sat)11:31:53 No.109147552

>>109147486
I'm pretty sure easy VRAM savings are always a meme. You can only save by buying more.

Anonymous
06/27/26(Sat)11:39:06 No.109147589

Anonymous 06/27/26(Sat)11:39:06 No.109147589

File: wtff.jpg (54 KB, 551x720)

54 KB JPG

>Ollama (yes, cope and seethe)
>fully in vram

>qwen3.6:27b-q8_0
10.94 t/s

>qwen3.6:27b-mtp-q8_0
28.53 t/s

shit just works, where is this for gemma??

Anonymous
06/27/26(Sat)11:47:52 No.109147640

Anonymous 06/27/26(Sat)11:47:52 No.109147640

>>109147589
>Ollmao
There you go

Anonymous
06/27/26(Sat)11:49:29 No.109147659

Anonymous 06/27/26(Sat)11:49:29 No.109147659

>>109147589
>ollama
nigger

Anonymous
06/27/26(Sat)11:51:09 No.109147669

Anonymous 06/27/26(Sat)11:51:09 No.109147669

>>109146234
>Your dick doesn't consent to getting beaten when you masturbate, doesn't mean you're raping yourself
wouldn't get hard if it didnt

Anonymous
06/27/26(Sat)11:53:22 No.109147691

Anonymous 06/27/26(Sat)11:53:22 No.109147691

File: gotta go fast.jpg (46 KB, 500x500)

46 KB JPG

>>109147589
And the moes
>qwen3.6:35b-a3b-q8_0
40.93 t/s
>qwen3.6:35b-a3b-mtp-q8_0
50.57 t/s

If I did coding or something I could actually do coding

Anonymous
06/27/26(Sat)11:54:24 No.109147695

Anonymous 06/27/26(Sat)11:54:24 No.109147695

>>109147640
>>109147659
Yeah he should be using unsloth studio

Anonymous
06/27/26(Sat)11:57:16 No.109147710

Anonymous 06/27/26(Sat)11:57:16 No.109147710

>>109146098
>is it a political thing that ds4 isn't supported in llama.cpp?
>>109146121
>Yes
They approved Mimo and other Chinese models, so post proof or this is Gemma-day0 tier schitzo.

Anonymous
06/27/26(Sat)11:57:57 No.109147712

Anonymous 06/27/26(Sat)11:57:57 No.109147712

File: file.png (17 KB, 917x130)

17 KB PNG

Cudacucks pull for free performance.

Anonymous
06/27/26(Sat)11:58:10 No.109147715

Anonymous 06/27/26(Sat)11:58:10 No.109147715

anyone here tried slime

Anonymous
06/27/26(Sat)11:59:13 No.109147720

Anonymous 06/27/26(Sat)11:59:13 No.109147720

>>109147712
I'll do it

Anonymous
06/27/26(Sat)12:01:27 No.109147732

Anonymous 06/27/26(Sat)12:01:27 No.109147732

"Why only Q4_K_M? Gemma 4 is quantization-aware-trained for ~4-bit, so Q4_K_M is the sweet spot — higher-precision quants add size with no real quality gain. Carefully quantized for best quality at 4-bit." This is a meme right? You can't just have Q4 as good as Q8

Anonymous
06/27/26(Sat)12:03:22 No.109147738

Anonymous 06/27/26(Sat)12:03:22 No.109147738

>>109146757
Extensible VRAM when?

Anonymous
06/27/26(Sat)12:05:22 No.109147744

Anonymous 06/27/26(Sat)12:05:22 No.109147744

>>109147732
Q1 is just as good as Q8.

Anonymous
06/27/26(Sat)12:07:42 No.109147755

Anonymous 06/27/26(Sat)12:07:42 No.109147755

>>109147732
Gemma4-31b Q4_K_M is effectively equivalent to Q5 or Q6. I've never tested less than Q4.

Anonymous
06/27/26(Sat)12:15:29 No.109147820

Anonymous 06/27/26(Sat)12:15:29 No.109147820

central computers hiked the price by another $200 on the rtx pro 6K. Roughly $100/week increase on average. $15000 by EOY.

Anonymous
06/27/26(Sat)12:16:48 No.109147828

Anonymous 06/27/26(Sat)12:16:48 No.109147828

>>109147589
wym? gemmaroids have mtp and if that's not fast enough, skill diffgemma should get most poors at least 80t/s

Anonymous
06/27/26(Sat)12:20:22 No.109147856

Anonymous 06/27/26(Sat)12:20:22 No.109147856

>>109147828
Not on ollama they don't, and it's the only backend that matters after all

Anonymous
06/27/26(Sat)12:21:03 No.109147861

Anonymous 06/27/26(Sat)12:21:03 No.109147861

If only we could speculate pp

Anonymous
06/27/26(Sat)12:25:32 No.109147885

Anonymous 06/27/26(Sat)12:25:32 No.109147885

File: 1742136784658615.png (33 KB, 600x639)

33 KB PNG

>>109147856
>it's the only backend that matters after all

Anonymous
06/27/26(Sat)12:26:56 No.109147895

Anonymous 06/27/26(Sat)12:26:56 No.109147895

Wait,

Anonymous
06/27/26(Sat)12:28:17 No.109147903

Anonymous 06/27/26(Sat)12:28:17 No.109147903

the user said "

Anonymous
06/27/26(Sat)12:29:25 No.109147912

Anonymous 06/27/26(Sat)12:29:25 No.109147912

>underscoring critical distinctions distinguishing distinguished performers from conventional competitors

Anonymous
06/27/26(Sat)12:34:13 No.109147946

Anonymous 06/27/26(Sat)12:34:13 No.109147946

>>109147712
>it's not faster.

Anonymous
06/27/26(Sat)12:34:36 No.109147947

Anonymous 06/27/26(Sat)12:34:36 No.109147947

>>109147710
>Gemma-day0 tier schitzo
What even is that? I used gemma for like a week and then went back to GLM cause I have ram so i don't really follow memes of people who got here when gemma dropped.

Anonymous
06/27/26(Sat)12:37:40 No.109147965

Anonymous 06/27/26(Sat)12:37:40 No.109147965

>>109147947
>he didn't download day 0 gemma weights

Anonymous
06/27/26(Sat)12:38:11 No.109147967

Anonymous 06/27/26(Sat)12:38:11 No.109147967

>>109147965
>oh no no no no....

Anonymous
06/27/26(Sat)12:38:52 No.109147974

Anonymous 06/27/26(Sat)12:38:52 No.109147974

>>109147965
I would tell you to go away newfag but please stay. I hate this general and it needs people like you.

Anonymous
06/27/26(Sat)12:39:22 No.109147977

Anonymous 06/27/26(Sat)12:39:22 No.109147977

>>109147974
I've been here since the first llama.

Anonymous
06/27/26(Sat)12:40:15 No.109147984

Anonymous 06/27/26(Sat)12:40:15 No.109147984

>>109147974
>doesn't know about day 0 gemma
>calls others newfag

Anonymous
06/27/26(Sat)12:44:19 No.109148013

Anonymous 06/27/26(Sat)12:44:19 No.109148013

File: IMG_20260627_173533.jpg (92 KB, 749x697)

92 KB JPG

>>109147912

Anonymous
06/27/26(Sat)12:49:19 No.109148049

Anonymous 06/27/26(Sat)12:49:19 No.109148049

>>109147947
Some retard was sure that Gemma got censored when they reuploaded the weights with the fixed jinja templates, I have day 0 Gemma which I used with the borked Jinja, with a manually fixed jinja, and with the official fixed reupload, it's all the same shit, anon just suffers from delusions, the only Gemma update that did anything was the QAT update, and that didn't effect alignment at all either.

Anonymous
06/27/26(Sat)12:50:56 No.109148055

Anonymous 06/27/26(Sat)12:50:56 No.109148055

downloading nex n2 pro. what should I expect?

Anonymous
06/27/26(Sat)12:56:00 No.109148094

Anonymous 06/27/26(Sat)12:56:00 No.109148094

>>109148055
Expect expectations for you to post your opinions to save others the time

Anonymous
06/27/26(Sat)12:56:56 No.109148098

Anonymous 06/27/26(Sat)12:56:56 No.109148098

>Ornith-1.0
thoughts on this model family?

Anonymous
06/27/26(Sat)12:59:35 No.109148116

Anonymous 06/27/26(Sat)12:59:35 No.109148116

>>109147984
can you post day 0 Gemma output pls for science thx

Anonymous
06/27/26(Sat)13:00:09 No.109148118

Anonymous 06/27/26(Sat)13:00:09 No.109148118

>>109148098
it looks like a qwen fine tune, its probably not bad, it is a really strong base

Anonymous
06/27/26(Sat)13:00:47 No.109148124

Anonymous 06/27/26(Sat)13:00:47 No.109148124

>>109148116
No it would get me banned.

Anonymous
06/27/26(Sat)13:02:27 No.109148137

Anonymous 06/27/26(Sat)13:02:27 No.109148137

>>109148116
la la la la la

Anonymous
06/27/26(Sat)13:02:35 No.109148140

Anonymous 06/27/26(Sat)13:02:35 No.109148140

>>109148124
ok then send them to >>109147974 so he can post them and get banned since he wants to leave

Anonymous
06/27/26(Sat)13:07:00 No.109148171

Anonymous 06/27/26(Sat)13:07:00 No.109148171

>>109148098
they built an RL framework for tuning LLMs that isn't just a bootleg Fable yolo tune. it has promise.

Anonymous
06/27/26(Sat)13:11:44 No.109148203

Anonymous 06/27/26(Sat)13:11:44 No.109148203

>>109145631
The K is for Kawrakow, it's Iwan Kawrakow quantization method. Some people incorrectly assume it refers to K-means, but it is not clustering-based, and Iwan has specifically called out k-means clustering quantization as another potential method for the future. That's it, it's his name.

Anonymous
06/27/26(Sat)13:21:30 No.109148283

Anonymous 06/27/26(Sat)13:21:30 No.109148283

anyone tried poolside/Laguna-M.1 ?

Anonymous
06/27/26(Sat)13:23:28 No.109148298

Anonymous 06/27/26(Sat)13:23:28 No.109148298

>>109147885
You think I'm lying? All the pros run ollama on their macbooks. Only the pedos here use llama.cpp or kobold or whatever the flavor of the day is.

Anonymous
06/27/26(Sat)13:26:42 No.109148330

Anonymous 06/27/26(Sat)13:26:42 No.109148330

File: 4chan-mobile-poster.webm (1.93 MB, 608x1080)

1.93 MB WEBM

>>109148298

Anonymous
06/27/26(Sat)13:29:00 No.109148352

Anonymous 06/27/26(Sat)13:29:00 No.109148352

>>109148203
I'd just like to interject for a moment. What you're referring to as "K-means," is in fact, Kawrakow quantization, or as I've recently taken to calling it, Iwan Kawrakow's method. The "K" is not a reference to clustering unto itself, but rather a reference to the name of the man who developed the technique.

Many users apply this quantization method every day, without realizing it. Through a peculiar turn of events, the "K" which is widely used today is often assumed to be K-means, and many of its users are not aware that it is basically the Kawrakow system, developed by Iwan Kawrakow.

There really is a K-means clustering quantization, and these people are using it, but it is a distinct method from the one in question. K-means is a clustering algorithm: a process that groups data points into K clusters. While it is an essential part of certain types of signal processing, it is not what is happening here. The Kawrakow method is normally used in combination with specific quantization goals, and Iwan himself has specifically called out k-means clustering quantization as another potential method for the future. All the so-called "K-means" assumptions are really just misconceptions about Kawrakow!

Anonymous
06/27/26(Sat)13:30:06 No.109148365

Anonymous 06/27/26(Sat)13:30:06 No.109148365

>>109148298
>All the pros run ollama on their macbooks
I cannot trust anyone who unironically uses a macbook to dev.

Anonymous
06/27/26(Sat)13:30:55 No.109148370

Anonymous 06/27/26(Sat)13:30:55 No.109148370

>>109148365
>to dev.
who was talking about code monkeys?

Anonymous
06/27/26(Sat)13:32:42 No.109148383

Anonymous 06/27/26(Sat)13:32:42 No.109148383

>>109148370
what the fuck do you think they're doing with ollama on their macbook? jerk off?

Anonymous
06/27/26(Sat)13:33:09 No.109148388

Anonymous 06/27/26(Sat)13:33:09 No.109148388

Bait too big, I can't bite it.

Anonymous
06/27/26(Sat)13:33:42 No.109148395

Anonymous 06/27/26(Sat)13:33:42 No.109148395

Whoever said to pull llamacpp, Fuck you. The UI is fucking slow now.

Anonymous
06/27/26(Sat)13:34:28 No.109148399

Anonymous 06/27/26(Sat)13:34:28 No.109148399

kek

Anonymous
06/27/26(Sat)13:35:27 No.109148403

Anonymous 06/27/26(Sat)13:35:27 No.109148403

>he pulled
Doomp eet

Anonymous
06/27/26(Sat)13:40:50 No.109148432

Anonymous 06/27/26(Sat)13:40:50 No.109148432

>>109148365
if you're not completely agnostic on your laptop and only using it to access a real server then you barely count as sentient
be glad retards use macbooks in order to visually filter themselves for you

Anonymous
06/27/26(Sat)13:41:03 No.109148433

Anonymous 06/27/26(Sat)13:41:03 No.109148433

Btw my thanks to the anon who shared the system prompt for gemma the other day, one I hadn't seen before. It works to uncensor qwen 3.6 as well. Though I must say qwen isn't nearly as capable of having fun as gemma.

Anonymous
06/27/26(Sat)13:43:30 No.109148445

Anonymous 06/27/26(Sat)13:43:30 No.109148445

>>109148432
I love you.

Anonymous
06/27/26(Sat)13:46:13 No.109148467

Anonymous 06/27/26(Sat)13:46:13 No.109148467

>>109148460
>>109148460
>>109148460

Anonymous
06/27/26(Sat)13:46:32 No.109148472

Anonymous 06/27/26(Sat)13:46:32 No.109148472

>>109148330
That's me after eating stims and writing stories in Mikupad switching tabs after every sentence to see how my /lmg/ friends are getting along I like you guys very much

Anonymous
06/27/26(Sat)13:53:48 No.109148513

Anonymous 06/27/26(Sat)13:53:48 No.109148513

>>109147861
There was this but lossy because it cherry picks "important" tokens
https://arxiv.org/abs/2502.02789
also https://arxiv.org/abs/2603.06199

Anonymous
06/27/26(Sat)14:09:36 No.109148620

Anonymous 06/27/26(Sat)14:09:36 No.109148620

>>109148433
which one? there's a good few

Anonymous
06/27/26(Sat)14:23:14 No.109148726

Anonymous 06/27/26(Sat)14:23:14 No.109148726

>>109148620
https://rentry.org/a7md542q

Anonymous
06/27/26(Sat)15:21:30 No.109149082

Anonymous 06/27/26(Sat)15:21:30 No.109149082

File: 1753020255455735.jpg (165 KB, 1364x768)

165 KB JPG

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.