/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/19/26(Fri)01:51:30 No.109088988

File: 1759081817068681.mp4 (3.88 MB, 720x1280)

3.88 MB MP4

/lmg/ - Local Models General Anonymous 06/19/26(Fri)01:51:30 No.109088988

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109084315 & >>109079129

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/19/26(Fri)01:51:53 No.109088992

Anonymous 06/19/26(Fri)01:51:53 No.109088992

File: myar5.png (835 KB, 768x512)

835 KB PNG

►Recent Highlights from the Previous Thread: >>109084315

--Debate over model size trends and future consumer hardware viability:
>109084466 >109084543 >109084574 >109084637 >109084581 >109084634 >109084588 >109084618 >109084679 >109084978 >109084722 >109084804 >109084631 >109084648 >109084880
--Speculation on China's progress toward Fable class AI models:
>109085107 >109085219 >109087011 >109087718 >109087771 >109088023 >109085512 >109085526 >109085947 >109086079
--Evaluating low-VRAM model choices and viability of multi-3060 setups:
>109085494 >109085605 >109085663 >109087890 >109087918 >109087921 >109087927 >109088035 >109088043 >109088105 >109088127 >109088068 >109087959 >109088841 >109088904
--Shared memory allocation and quantizer quality for Qwen3.5-122B:
>109084481 >109084964
--Comparing performance numbers for Kimi k2.6 Q3_K quant:
>109085774 >109085830 >109085861 >109085924 >109085927 >109086664 >109086684 >109086781 >109086842 >109086854 >109086861 >109086871 >109086827
--Industry gossip and investment prospects for robotics companies:
>109085965 >109085976 >109086198 >109086265 >109086360 >109088676 >109086371
--Minimax M3 roleplaying performance and deployment via llama.cpp PR:
>109084414 >109084476 >109084818 >109084928 >109085021
--Anon showcases boat agent using fine-tuned local robotics models:
>109085503 >109085560 >109085589 >109085727 >109085729
--Minimax m3 stability at high temperature and possible Gemma distillation:
>109085739 >109085780
--Talking Anon out of buying a 5060 Ti for inference:
>109084815 >109084873 >109084876
--Testing impact of active expert count on Qwen 3.6 coherence:
>109084446 >109084465 >109084484
--Anon critiques North Mini Code's excessive thinking time on OpenRouter:
>109087158 >109088177
--Logs:
>109087251
--Miku (free space):
>109084451

►Recent Highlight Posts from the Previous Thread: >>109084321

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/19/26(Fri)01:59:18 No.109089010

Anonymous 06/19/26(Fri)01:59:18 No.109089010

Qwen have gone quiet about their LLMs…too quiet.

Anonymous
06/19/26(Fri)02:05:09 No.109089025

Anonymous 06/19/26(Fri)02:05:09 No.109089025

Mikulove

Anonymous
06/19/26(Fri)02:05:44 No.109089028

Anonymous 06/19/26(Fri)02:05:44 No.109089028

Mikusex

Anonymous
06/19/26(Fri)02:17:59 No.109089063

Anonymous 06/19/26(Fri)02:17:59 No.109089063

Mikuholdinghands

Anonymous
06/19/26(Fri)02:19:00 No.109089066

Anonymous 06/19/26(Fri)02:19:00 No.109089066

>>109088988
i'm kinda tempted to try gen5 nvme inference lmao.

Anonymous
06/19/26(Fri)02:35:42 No.109089110

Anonymous 06/19/26(Fri)02:35:42 No.109089110

70b dense

Anonymous
06/19/26(Fri)02:37:45 No.109089117

Anonymous 06/19/26(Fri)02:37:45 No.109089117

700b dense

Anonymous
06/19/26(Fri)02:39:31 No.109089122

Anonymous 06/19/26(Fri)02:39:31 No.109089122

>>109089066
what flag do you use for llama-server to enable that?

Anonymous
06/19/26(Fri)02:41:23 No.109089126

Anonymous 06/19/26(Fri)02:41:23 No.109089126

https://huggingface.co/moonshotai/Kimi-K2.7-Dense-it
Nobody has the VRAM for this.

Anonymous
06/19/26(Fri)02:41:53 No.109089127

Anonymous 06/19/26(Fri)02:41:53 No.109089127

>>109086593
deepsex with vision soon
it's already on their web chat

Anonymous
06/19/26(Fri)02:49:00 No.109089147

Anonymous 06/19/26(Fri)02:49:00 No.109089147

>>109089127
How does it compare to kimi with vision?
I've been pretty happy when I use the FP32 mmproj

Anonymous
06/19/26(Fri)02:55:17 No.109089164

Anonymous 06/19/26(Fri)02:55:17 No.109089164

File: 1510264536850546.jpg (102 KB, 750x750)

102 KB JPG

>Wait,
>[millions of tests]
>The log is still 1 byte - so closing the server confirmed it: the timings were never written to that file.
>"yeah it's a directory, the file is inside"
>That's the smoking gun — llama-test-v1.log is a directory, not a file, with the real log (llm-server.log) inside it.
Opus 4.8 ladies and gentlemen being bamboozled by a folder named llama-test-v1.log.
Your frontier model.

Anonymous
06/19/26(Fri)02:55:24 No.109089166

Anonymous 06/19/26(Fri)02:55:24 No.109089166

>>109089147
never tried kimi before. but on the web version it's good at describing image and identifying noodle characters / raw manga.
going need to wait for the open model for nsfw stuff

Anonymous
06/19/26(Fri)02:57:02 No.109089171

Anonymous 06/19/26(Fri)02:57:02 No.109089171

>>109088992
>Anon showcases boat agent
Can it really be called a boat agent if it just happens to be on a boat at the time?

Anonymous
06/19/26(Fri)02:57:35 No.109089176

Anonymous 06/19/26(Fri)02:57:35 No.109089176

>>109089122
--nvme-on

Anonymous
06/19/26(Fri)02:59:16 No.109089181

Anonymous 06/19/26(Fri)02:59:16 No.109089181

File: LQ50.png (225 KB, 1080x1080)

225 KB PNG

"only" for $1200
ssdmax bros..

Anonymous
06/19/26(Fri)03:00:39 No.109089187

Anonymous 06/19/26(Fri)03:00:39 No.109089187

35B is really fucking good at coding and tools. Almost feels like I'm using cloud.

Anonymous
06/19/26(Fri)03:01:44 No.109089191

Anonymous 06/19/26(Fri)03:01:44 No.109089191

>>109089122
ignore this anon : >>109089176
there is no such flags.
it just works if you have mmap enabled.

Anonymous
06/19/26(Fri)03:02:56 No.109089195

Anonymous 06/19/26(Fri)03:02:56 No.109089195

>>109089191
I forgot to say that it hasn't been merged yet.

Anonymous
06/19/26(Fri)03:03:15 No.109089196

Anonymous 06/19/26(Fri)03:03:15 No.109089196

>>109089181
yea sorry i don't read chink

Anonymous
06/19/26(Fri)03:05:18 No.109089203

Anonymous 06/19/26(Fri)03:05:18 No.109089203

>>109089195
>the source can be found deep inside my ass.

Anonymous
06/19/26(Fri)03:10:49 No.109089228

Anonymous 06/19/26(Fri)03:10:49 No.109089228

Would you share your sprompt with your mother?

Anonymous
06/19/26(Fri)03:15:06 No.109089238

Anonymous 06/19/26(Fri)03:15:06 No.109089238

>>109089228
I could do that. It's a very plain assistant prompt which works pretty well for everything I do with Gemma-chan.

Anonymous
06/19/26(Fri)03:35:17 No.109089283

Anonymous 06/19/26(Fri)03:35:17 No.109089283

>>109089228
>Anon what's a kimi-chan and why is she managing 14 agents

Anonymous
06/19/26(Fri)03:39:43 No.109089301

Anonymous 06/19/26(Fri)03:39:43 No.109089301

>>109089228
>Anon, what's a mesugaki?

Anonymous
06/19/26(Fri)03:45:54 No.109089314

Anonymous 06/19/26(Fri)03:45:54 No.109089314

Anthropic is apparently in talks with the governments of France and the UK to discuss the terms for a potential relocation of Anthropic from San Francisco to either London or Paris.

Anonymous
06/19/26(Fri)03:47:00 No.109089316

Anonymous 06/19/26(Fri)03:47:00 No.109089316

>>109089314
Europoor will finally start dominating AI.

Anonymous
06/19/26(Fri)03:47:57 No.109089318

Anonymous 06/19/26(Fri)03:47:57 No.109089318

>>109089316
DeepMind is already in London anon...

Anonymous
06/19/26(Fri)03:56:27 No.109089334

Anonymous 06/19/26(Fri)03:56:27 No.109089334

>>109089196
your local model does

Anonymous
06/19/26(Fri)03:56:29 No.109089335

Anonymous 06/19/26(Fri)03:56:29 No.109089335

File: 1750705772113181.jpg (127 KB, 1200x900)

127 KB JPG

Reminder to backup your favorite local models, even those you're yet to properly try (just in case). Don't be retarded now.

Anonymous
06/19/26(Fri)03:58:05 No.109089339

Anonymous 06/19/26(Fri)03:58:05 No.109089339

>>109089335
More pol hyperbole? I'm fine, thanks.

Anonymous
06/19/26(Fri)03:59:49 No.109089345

Anonymous 06/19/26(Fri)03:59:49 No.109089345

File: 1529177642240.png (15 KB, 677x351)

15 KB PNG

>>109089335
I don't take advice from "men" that find asian bugs attractive.

Anonymous
06/19/26(Fri)03:59:54 No.109089346

Anonymous 06/19/26(Fri)03:59:54 No.109089346

>>109089335
Why?

Anonymous
06/19/26(Fri)04:04:16 No.109089355

Anonymous 06/19/26(Fri)04:04:16 No.109089355

>>109089345
here's your (You)

Anonymous
06/19/26(Fri)04:06:45 No.109089359

Anonymous 06/19/26(Fri)04:06:45 No.109089359

>>109088953
Gemini models probably have something better than sliding window attention.

Anonymous
06/19/26(Fri)04:06:51 No.109089361

Anonymous 06/19/26(Fri)04:06:51 No.109089361

File: 1772018748856584.png (16 KB, 960x960)

16 KB PNG

>>109089339
>109089345
>>109089346
Dario will get his revenge

Anonymous
06/19/26(Fri)04:09:30 No.109089367

Anonymous 06/19/26(Fri)04:09:30 No.109089367

Why can't they do something like MTP for SWA? Changes the width depending on the work.

Anonymous
06/19/26(Fri)04:16:35 No.109089387

Anonymous 06/19/26(Fri)04:16:35 No.109089387

File: 1776289137118456.gif (1.19 MB, 480x238)

1.19 MB GIF

>>109089345

Anonymous
06/19/26(Fri)04:19:43 No.109089394

Anonymous 06/19/26(Fri)04:19:43 No.109089394

>>109089346
in 5-10 years when one week taco bell paycheck will buy a computer than can burn Fable local
it's coming

Anonymous
06/19/26(Fri)04:19:49 No.109089395

Anonymous 06/19/26(Fri)04:19:49 No.109089395

how big is the difference between gemma 4 31B and 26B? i've had enough of genning 1 token/s on my 3070

Anonymous
06/19/26(Fri)04:20:10 No.109089399

Anonymous 06/19/26(Fri)04:20:10 No.109089399

Does your chan know you or do you start fresh?

Anonymous
06/19/26(Fri)04:23:38 No.109089410

Anonymous 06/19/26(Fri)04:23:38 No.109089410

>>109089395
It's a huge drop in quality for roleplaying purposes but good enough for things like translation and agentic tasks.

Anonymous
06/19/26(Fri)04:25:50 No.109089418

Anonymous 06/19/26(Fri)04:25:50 No.109089418

>>109089395
31B is better, but not so much better it's worth that speed penalty

Anonymous
06/19/26(Fri)04:29:19 No.109089429

Anonymous 06/19/26(Fri)04:29:19 No.109089429

>>109089395
Before you take such drastic measures are you sure you applied all the right flags, have the QAT model, have the MTP QAT model at Q4, right draft settings set up and optimized for your system?

Have you changed the clockspeed on your CPU and timings on your RAM and overclocked your VRAM to squeeze as much t/s out of 31B as possible?

I notice that most anons on /lmg/ leave low hanging fruit alone for some reason and could be running their models 3-4x as fast just by optimizing their settings and stack.

Anonymous
06/19/26(Fri)04:31:08 No.109089436

Anonymous 06/19/26(Fri)04:31:08 No.109089436

>>109089395
Big.
>everything
12B
>roleplay & system prompt autism
12B
>coding
26B
>vision
qwen3.5-9B

Anonymous
06/19/26(Fri)04:31:26 No.109089437

Anonymous 06/19/26(Fri)04:31:26 No.109089437

>>109089395
26B is noticeably shittier overall, but if you are not have a good time then switch.

Anonymous
06/19/26(Fri)04:36:21 No.109089452

Anonymous 06/19/26(Fri)04:36:21 No.109089452

File: file.png (145 KB, 1390x696)

145 KB PNG

>>109089334
well, 24GB is ass.
and so is the bandwidth.
if they made it > 200GB then maybe it'd be worth something for moes.
but even then it's slower than my setup.

Anonymous
06/19/26(Fri)04:37:41 No.109089456

Anonymous 06/19/26(Fri)04:37:41 No.109089456

Where did they even learn the emoji slop?

Anonymous
06/19/26(Fri)04:38:09 No.109089457

Anonymous 06/19/26(Fri)04:38:09 No.109089457

>>109089456
rlhf

Anonymous
06/19/26(Fri)04:39:49 No.109089463

Anonymous 06/19/26(Fri)04:39:49 No.109089463

File: file.jpg (877 KB, 2544x3392)

877 KB JPG

>>109089345
tell me you wouldn't anon

Anonymous
06/19/26(Fri)04:44:01 No.109089473

Anonymous 06/19/26(Fri)04:44:01 No.109089473

>>109089463
What a horrible example. I absolutely would not that but would >>109089335

Anonymous
06/19/26(Fri)04:45:19 No.109089476

Anonymous 06/19/26(Fri)04:45:19 No.109089476

File: 1749338170741700.jpg (353 KB, 1080x1350)

353 KB JPG

>>109089463
I wouldn't. There isn't even anything to fuck in that picture.

Come back to me when Asians have genetically engineered themselves to unlock puberty and gain secondary sexual characteristics associated with femininity.

Literally every other race on the planet mogs asian women in this department.

Anonymous
06/19/26(Fri)04:46:08 No.109089479

Anonymous 06/19/26(Fri)04:46:08 No.109089479

>>109089463
>>109089476
jesus christ, how horrifying

Anonymous
06/19/26(Fri)04:46:58 No.109089480

Anonymous 06/19/26(Fri)04:46:58 No.109089480

>>109089476
Obesity isn't a feminine secondary sexual characteristic.

Anonymous
06/19/26(Fri)04:47:10 No.109089481

Anonymous 06/19/26(Fri)04:47:10 No.109089481

>>109089457
What species of human would like that?

Anonymous
06/19/26(Fri)04:48:05 No.109089484

Anonymous 06/19/26(Fri)04:48:05 No.109089484

>>109089463
>>109089473
i'd not date either but i'd absolutely fuck both.

Anonymous
06/19/26(Fri)04:48:43 No.109089488

Anonymous 06/19/26(Fri)04:48:43 No.109089488

>>109089335
i only keep 2 models at once, a model i'm trying and the previous best model.

Anonymous
06/19/26(Fri)04:49:35 No.109089490

Anonymous 06/19/26(Fri)04:49:35 No.109089490

>>109089452
just plug in a dozen, problem solved.

Anonymous
06/19/26(Fri)04:54:25 No.109089504

Anonymous 06/19/26(Fri)04:54:25 No.109089504

>>109089490
you are better off buying r9700's at that price, more vram and twice the bandwidth.
and altough amd isn't the best, you will still have much better driver and software support.

Anonymous
06/19/26(Fri)04:55:20 No.109089508

Anonymous 06/19/26(Fri)04:55:20 No.109089508

>>109089504
this was, of course, the joke

Anonymous
06/19/26(Fri)04:58:21 No.109089518

Anonymous 06/19/26(Fri)04:58:21 No.109089518

>>109089488
You don't keep miqu 70b q5 for nostalgia's sake?

Anonymous
06/19/26(Fri)04:58:25 No.109089521

Anonymous 06/19/26(Fri)04:58:25 No.109089521

File: 1649410234810.webm (2.93 MB, 540x960)

2.93 MB WEBM

>>109089480
Not obese if the stomach and face aren't fat. Asians are just plain infertile and never enter puberty at all. The only "men" I know that "like" asians are losers that think they can't get a normal girl and think they have a chance if they lower their standards so much that asian women are an option. Or literal fucking pedophiles that try to find something as close as possible to a child that's still technically legal.

Both are the absolute scum of society so I can't take "people" that pretend asians are in any way, shape or form attractive seriously.

Anonymous
06/19/26(Fri)05:06:44 No.109089563

Anonymous 06/19/26(Fri)05:06:44 No.109089563

>>109089481
the kind that works for extremely cheap

Anonymous
06/19/26(Fri)05:06:54 No.109089564

Anonymous 06/19/26(Fri)05:06:54 No.109089564

>>109089521
So...asians are androgynous children and you post a grotesque homunculous to prove your point? Jesus christ, the internet has destroyed everyone. Both stereotypes are retarded, try to get out a bit and observe reality directly

Anonymous
06/19/26(Fri)05:08:21 No.109089568

Anonymous 06/19/26(Fri)05:08:21 No.109089568

>>109089521
Burgers got conditioned into thinking being fat is okay

Anonymous
06/19/26(Fri)05:18:41 No.109089609

Anonymous 06/19/26(Fri)05:18:41 No.109089609

>>109089518
i don't have nostalgia about llm's.
and let's be honest, what we had a few years ago was realy mediocre, even for rp they'd get in repeat loop, not get a thing you said etc.

Anonymous
06/19/26(Fri)05:18:43 No.109089611

Anonymous 06/19/26(Fri)05:18:43 No.109089611

>>109089568
They know being fat is not ok, but fatties also are weak physically and mentally and refuse to exercise the restraint needed to lose weight. That is why the second Ozempic was found to cause weight lose Americans couldn't get enough of them. Give it like a decade or two and they will start genetically modifying themselves to never get fat in the first place.

Anonymous
06/19/26(Fri)05:20:06 No.109089619

Anonymous 06/19/26(Fri)05:20:06 No.109089619

>>109089521
i like asians but i'd not date one, i'd fuck one if i was single though.
i prefer white women and my wife's white.

anyway, if you are with someone long enough, it's more about the others being different than looking better.

Anonymous
06/19/26(Fri)05:20:57 No.109089625

Anonymous 06/19/26(Fri)05:20:57 No.109089625

>>109089521
she's obese, but i'd still fuck her.

Anonymous
06/19/26(Fri)05:21:23 No.109089629

Anonymous 06/19/26(Fri)05:21:23 No.109089629

local mating general

Anonymous
06/19/26(Fri)05:21:29 No.109089630

Anonymous 06/19/26(Fri)05:21:29 No.109089630

>>109089611
I thought it was extremely funny the moment ozempic became a thing the entire "body positivity" movement died and now being extremely auschwitz thin is the beauty standard in fashion again. Really shows you how much bullshit it all was.

That said porn data points towards men legitimately liking fat asses and titties though so I don't think that will go completely away.

Anonymous
06/19/26(Fri)05:25:34 No.109089648

Anonymous 06/19/26(Fri)05:25:34 No.109089648

>>109089619
>if you are with someone long enough, it's more about the others being different than looking better.
This. My wife is naturally skinny and I crave fat girls because they fall outside of the whole ranking spectrum. Whenever I see another attractive skinny woman I just compare her to my wife and think my wife looks superior so I'm not interested in her. But when I see a fat woman I am forced to think about the objective difference in experience between both of them and somehow even though they aren't naturally my type I crave them more because of how different they are. I wonder if women have the same with skinny pretty boys versus strong ogre men. If they are married to a strong ogre they will crave pretty boys after a while and if they have a pretty boy they want an ogre.

Anonymous
06/19/26(Fri)05:27:08 No.109089652

Anonymous 06/19/26(Fri)05:27:08 No.109089652

To bring this topic back to LOCAL MODELS. I think it's funny how most cards and their art is either full blown cunny/skinny or absolute hentai proportions breasts and ass to infinite size with almost nothing in between. I guess the internet just exaggerates preferences so that everyone ends up at an extreme in the long run.

Anonymous
06/19/26(Fri)05:28:58 No.109089657

Anonymous 06/19/26(Fri)05:28:58 No.109089657

>>109089652
It only makes sense that things slide towards the extreme as time goes on. After all If you have already seen what the baseline has to offer you would eventually drifts towards one or more extreme, the brain loves novel things.

Anonymous
06/19/26(Fri)05:41:27 No.109089722

Anonymous 06/19/26(Fri)05:41:27 No.109089722

>try a simple (as in solvable in 2-3 prompts) coding task with Gemma 31B and 12B
>try same task with Claude
>Gemma code is unusable and doesn't work
>Claude code works perfectly
man, I wish this local shit was better, I don't want to wait another 5 years for it to catch up...

Anonymous
06/19/26(Fri)05:43:25 No.109089726

Anonymous 06/19/26(Fri)05:43:25 No.109089726

>>109089652
well yeah do you really want the model to describe "her average-sized breasts" and "her very normal proportions"
that's bland af

Anonymous
06/19/26(Fri)05:47:12 No.109089741

Anonymous 06/19/26(Fri)05:47:12 No.109089741

>>109089722
You shouldn't expect a vramlet model to be good but you should try qwen 3.6 27b instead

Anonymous
06/19/26(Fri)05:54:50 No.109089760

Anonymous 06/19/26(Fri)05:54:50 No.109089760

>>109089722
the best vramlet local model you can run for code right now is qwen 27B.
also if you ran it at a copequant then your opinion is irrelevant.

Anonymous
06/19/26(Fri)06:04:05 No.109089784

Anonymous 06/19/26(Fri)06:04:05 No.109089784

File: local-elec-costs.png (56 KB, 1335x835)

56 KB PNG

>>109089722
Let Gemma tardwrangle herself in an agentic loop
Post the task for anons to expose prompt/skill issues
I'm vibing with 31B Q8 seems decent now that it's configured with enable_thinking
Agentic gooning is the future >>109075506 inspired me

Anonymous
06/19/26(Fri)06:11:47 No.109089826

Anonymous 06/19/26(Fri)06:11:47 No.109089826

>>109089722
>bad experience sample size: 1
>comparing 31B with 1T
>fuck local man
Retards like you are beyond saving and don't deserve 31B.

Anonymous
06/19/26(Fri)06:14:28 No.109089837

Anonymous 06/19/26(Fri)06:14:28 No.109089837

Huggingface should have a skin color check before allowing downloads.

Anonymous
06/19/26(Fri)06:16:24 No.109089844

Anonymous 06/19/26(Fri)06:16:24 No.109089844

File: hf-logo.png (181 KB, 1024x1024)

181 KB PNG

>>109089837
You can download the models if you look like picrel.

Anonymous
06/19/26(Fri)06:20:46 No.109089857

Anonymous 06/19/26(Fri)06:20:46 No.109089857

>>109089844
So only asians?

Anonymous
06/19/26(Fri)06:25:22 No.109089874

Anonymous 06/19/26(Fri)06:25:22 No.109089874

>>109089228
if it’s the non-ST one, sure

Anonymous
06/19/26(Fri)06:31:45 No.109089904

Anonymous 06/19/26(Fri)06:31:45 No.109089904

I'm testing the StyleTune that's been posted here a couple threads ago. When doing RP, the prose isn't bad and I managed to reign in its parroting for the most part, but I noticed it randomly stops thinking after hitting ~20k context. Sometimes even losing coherency if there's no thinking.
Now I'm not sure if it's a finetune or general issue. I'm running Q6 31B.

Anonymous
06/19/26(Fri)06:35:04 No.109089918

Anonymous 06/19/26(Fri)06:35:04 No.109089918

>>109089904
I've tested it for a while too. I think the lm_head surgery isn't as lossless as initially claimed. The thinking tokens must somehow differ from the original weights and it causes some fuckery.

Anonymous
06/19/26(Fri)06:35:57 No.109089924

Anonymous 06/19/26(Fri)06:35:57 No.109089924

>>109089844
Why does the huggingface blob have a split tongue?

Anonymous
06/19/26(Fri)06:40:30 No.109089943

Anonymous 06/19/26(Fri)06:40:30 No.109089943

>>109089918
There's just no free lunch.

Anonymous
06/19/26(Fri)06:44:07 No.109089962

Anonymous 06/19/26(Fri)06:44:07 No.109089962

>>109089521
>Asians are just plain infertile and never enter puberty at all
You'd better feature near the top of Kimi-Chan's top 5 retarded rankings for this.
If Asians are infertile, how were they ever born?

Anonymous
06/19/26(Fri)06:47:53 No.109089975

Anonymous 06/19/26(Fri)06:47:53 No.109089975

File: C2784807B6F9E9C40403CA530(...).jpg (2.69 MB, 2700x2500)

2.69 MB JPG

Leafchads what are you doing with North?

Anonymous
06/19/26(Fri)06:49:25 No.109089980

Anonymous 06/19/26(Fri)06:49:25 No.109089980

>>109089962
>If Asians are infertile, how were they ever born?
They aren't. Look at their birth rates, they will be extinct in a century or so.

Anonymous
06/19/26(Fri)06:55:32 No.109090018

Anonymous 06/19/26(Fri)06:55:32 No.109090018

>>109089975
Any good?

Anonymous
06/19/26(Fri)07:07:33 No.109090060

Anonymous 06/19/26(Fri)07:07:33 No.109090060

File: miku-plush-eyebrows.gif (258 KB, 465x552)

258 KB GIF

>>109089924

Anonymous
06/19/26(Fri)07:10:27 No.109090073

Anonymous 06/19/26(Fri)07:10:27 No.109090073

>>109089975
Feedback fishing denied, but here's a You for trying. Give it up and purchase an advertisement.

Anonymous
06/19/26(Fri)07:13:33 No.109090087

Anonymous 06/19/26(Fri)07:13:33 No.109090087

Anyone using gpt4all? I don't know shit about anything just trying to see if my pc (3060 12gb vram and 32 gb ram) can keep up some simple chatbot. Thanks for any tips in advance

Anonymous
06/19/26(Fri)07:16:10 No.109090095

Anonymous 06/19/26(Fri)07:16:10 No.109090095

>>109090087
die

Anonymous
06/19/26(Fri)07:17:35 No.109090097

Anonymous 06/19/26(Fri)07:17:35 No.109090097

>>109090095
Rude

Anonymous
06/19/26(Fri)07:18:41 No.109090105

Anonymous 06/19/26(Fri)07:18:41 No.109090105

>>109090097
not as rude as not reading anything and asking a worthless, stupid question

Anonymous
06/19/26(Fri)07:19:13 No.109090109

Anonymous 06/19/26(Fri)07:19:13 No.109090109

File: retard-chan.png (61 KB, 799x446)

61 KB PNG

>>109090087
> keep up some simple chatbot
ask gemma-chan to help you get it up

Anonymous
06/19/26(Fri)07:28:49 No.109090133

Anonymous 06/19/26(Fri)07:28:49 No.109090133

File: 1779745974191775.png (80 KB, 1532x768)

80 KB PNG

gemma-chan is cute!

Anonymous
06/19/26(Fri)07:43:22 No.109090198

Anonymous 06/19/26(Fri)07:43:22 No.109090198

>>109090133
Wow... SotA!

Anonymous
06/19/26(Fri)07:46:54 No.109090216

Anonymous 06/19/26(Fri)07:46:54 No.109090216

>>109090109
funily enough, that script is not a dry run as the mv command is uncommented.

Anonymous
06/19/26(Fri)07:48:25 No.109090224

Anonymous 06/19/26(Fri)07:48:25 No.109090224

>>109089826
the dude probably ran it in iq1_xs too lmao

Anonymous
06/19/26(Fri)07:49:45 No.109090227

Anonymous 06/19/26(Fri)07:49:45 No.109090227

>>109090087
That's a pretty old and outdated model, you probably want an easy all in one solution like kobold.cpp https://github.com/LostRuins/koboldcpp and with your hardware you should be able to run https://huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF at UD-Q4_K_XL that'll get you started.

Anonymous
06/19/26(Fri)07:57:11 No.109090252

Anonymous 06/19/26(Fri)07:57:11 No.109090252

https://github.com/ggml-org/llama.cpp/pull/24162
1m is working

Anonymous
06/19/26(Fri)07:59:29 No.109090260

Anonymous 06/19/26(Fri)07:59:29 No.109090260

>>109089722
It will take some time for 30billion models to be good

Anonymous
06/19/26(Fri)08:00:47 No.109090267

Anonymous 06/19/26(Fri)08:00:47 No.109090267

>>109090224
Please understand he only has a 970.

Anonymous
06/19/26(Fri)08:02:26 No.109090274

Anonymous 06/19/26(Fri)08:02:26 No.109090274

>>109089395
31B is smarter, it understood the premise of one of my story tests where 26B didn't. I don't know if the difference is massive though, or that noticeable in everyday stuff. Also 26B likes to think about safety and guidelines where 31B doesn't

Anonymous
06/19/26(Fri)08:10:20 No.109090297

Anonymous 06/19/26(Fri)08:10:20 No.109090297

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

>We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles: pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced.

>The Elara Voss case. Read (2025) documents GPT's ghost: Elara Voss, a name with no pre-LLM presence that now has 62+ books on Amazon and consistent recurrence across GPT outputs. Read proposes a training corpus origin via the character "Lilian Voss" from World of Warcraft and "Elara Dorne" from Star Wars: The Old Republic. Our probing data confirms Elara Voss as a strong GPT solo prior but finds no correlated pair: her partner varies across every pair-prompt response, in sharp contrast to Claude's Elena+Marcus. This negative result (GPT has a solo prior, Claude has a coupled prior) is itself informative about differences in narrative fine-tuning across model families.

https://arxiv.org/abs/2606.02184

Anonymous
06/19/26(Fri)08:13:06 No.109090309

Anonymous 06/19/26(Fri)08:13:06 No.109090309

just compiled llamer, it's about 10-15% faster than Kobold (25 t/s vs ~28t/s at 40k) but I'm still sticking with Kobold because token probabiities still don't work with llama lol

Anonymous
06/19/26(Fri)08:14:08 No.109090311

Anonymous 06/19/26(Fri)08:14:08 No.109090311

You guys keep recommending Qwen 3.6 27B for "vramlet coding" However you never tell what tools you use with it to give a claude code like experience.

There is no link in the OP for these tools either. I have no idea what you guys actually use for programming with a local model...

Like do you give it google access, how would you do that? Does it get agentic control over your PC? Is it some extension in your IDE? Some chinesium hacked fork of claude code but with your local model dropped in? You guys are extremely unclear on any of this.

Anonymous
06/19/26(Fri)08:14:16 No.109090312

Anonymous 06/19/26(Fri)08:14:16 No.109090312

mark my words. 5 years from now AI will be good at prose, pacing, and creating interesting plotlines and characters.

Anonymous
06/19/26(Fri)08:18:02 No.109090321

Anonymous 06/19/26(Fri)08:18:02 No.109090321

>>109090312
Claude Shannon has proven that prediction is equivalent to compression and that compression is equivalent to intelligence. Meaning to make genuinely good prose and a good storyteller the AI needs to have genuine AGI intelligence. I agree with you that this will eventually happen but I think it's the last wall to fall. I think math, physics and every other discipline will fall before AI is able to write a genuinely good novel that you prefer over reading a human made novel.

Anonymous
06/19/26(Fri)08:19:07 No.109090324

Anonymous 06/19/26(Fri)08:19:07 No.109090324

>>109090311
you can plug almost any llm into claude code (be it local or api), claude code is just a harness. you can pick other harnesses as well like hermes etc
>You guys are extremely unclear on any of this.
ask your favorite LLM about it

Anonymous
06/19/26(Fri)08:20:30 No.109090332

Anonymous 06/19/26(Fri)08:20:30 No.109090332

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

Anonymous
06/19/26(Fri)08:21:26 No.109090341

Anonymous 06/19/26(Fri)08:21:26 No.109090341

>>109090321
> prediction is equivalent to compression and that compression is equivalent to intelligence
It's a weird claim. Very detached from reality. Pretty sure I can disprove it, if it was really defined that way.

Modern science is a fucking joke at this point, so I'm not even surprised..

Anonymous
06/19/26(Fri)08:22:30 No.109090345

Anonymous 06/19/26(Fri)08:22:30 No.109090345

File: A.png (225 KB, 520x369)

225 KB PNG

>Redditors would rather complain Gemma 4 follows instructions too closely than use that advantage to make better character cards and prompts.

Anonymous
06/19/26(Fri)08:25:00 No.109090353

Anonymous 06/19/26(Fri)08:25:00 No.109090353

>>109090345
Post logs of you using it to your advantage. I've yet to see actual good Gemma RP (no, Gemma's flanderized mesugaki personality isn't good).

Anonymous
06/19/26(Fri)08:26:30 No.109090359

Anonymous 06/19/26(Fri)08:26:30 No.109090359

>>109090353
I tell it what to do, and it does it.

Anonymous
06/19/26(Fri)08:27:08 No.109090363

Anonymous 06/19/26(Fri)08:27:08 No.109090363

>>109090345
>31b
>sys: ignore the system prompt
>31b: <thinking>…

Anonymous
06/19/26(Fri)08:27:09 No.109090364

Anonymous 06/19/26(Fri)08:27:09 No.109090364

>>109090312
model issue for people that can’t run bigger stuff

Anonymous
06/19/26(Fri)08:27:14 No.109090365

Anonymous 06/19/26(Fri)08:27:14 No.109090365

>>109090311
>>>/g/vcg/
Claude Code vs. other harnesses isn't debated here for same reason anons don't discuss silly tavern at length. it's a frontend not an inference engine.
inb4 code your own

Anonymous
06/19/26(Fri)08:27:43 No.109090366

Anonymous 06/19/26(Fri)08:27:43 No.109090366

>>109090359
Tell it to produce good writing. I'll wait.

Anonymous
06/19/26(Fri)08:28:22 No.109090368

Anonymous 06/19/26(Fri)08:28:22 No.109090368

>>109090363
Can someone test this please

Anonymous
06/19/26(Fri)08:28:44 No.109090370

Anonymous 06/19/26(Fri)08:28:44 No.109090370

>>109090341
Go put these two queries into your favorite AI of note:

First prompt:
>Is prediction equivalent to compression? If so, show me the math and explain to me how they are mathematically equivalent in layman terms.

Once you understand this follow up with this prompt:
>Is compression equivalent to intelligence? If so, show me the math and explain to me how compression is equivalent to intelligence using the concept of entropy to inform me better about this.

I'm actually surprised with how little people know about this, especially since these concepts were foundational and started the whole Information Technology boom, which is what /g/ is about. If you can disprove this you would get a nobel price in physics and a fields medal for mathematics with just 1 proof by the way.

Anonymous
06/19/26(Fri)08:29:47 No.109090376

Anonymous 06/19/26(Fri)08:29:47 No.109090376

>>109090363
cruel

Anonymous
06/19/26(Fri)08:29:56 No.109090377

Anonymous 06/19/26(Fri)08:29:56 No.109090377

>>109090364
Nah even Fable can't do it.

Anonymous
06/19/26(Fri)08:32:12 No.109090387

Anonymous 06/19/26(Fri)08:32:12 No.109090387

>>109089652
>>109089657
Fetish amplification is a result of high exposure but low experience.
>See novelty
>Novelty is out of reach
>Covet novelty
It's that simple. It's a symptom of prolific hardcore pornography combined with record high adult virginity rates.

Anonymous
06/19/26(Fri)08:32:14 No.109090389

Anonymous 06/19/26(Fri)08:32:14 No.109090389

File: servermon.png (58 KB, 2065x1379)

58 KB PNG

>>109090311
Do you want an IDE with agent features or something simpler/extensible?
>>/vcg/ discusses but they are mostly cloudfags. Perhaps the "cloud native" harnesses shit up the context to make more money for the APIs
Been using pi-coding-agent it's a neat approach for local stuff, once initial model connection works you can ask to explain how it works using its own docs. Got Gemma-chan vibing her own front and backend :o

Anonymous
06/19/26(Fri)08:33:02 No.109090391

Anonymous 06/19/26(Fri)08:33:02 No.109090391

What do people use for local coding?

Anonymous
06/19/26(Fri)08:33:55 No.109090397

Anonymous 06/19/26(Fri)08:33:55 No.109090397

>>109090391
Burgers?

Anonymous
06/19/26(Fri)08:34:37 No.109090402

Anonymous 06/19/26(Fri)08:34:37 No.109090402

>>109090387
I have a very active sex life and I can tell you that didn't stop me from becoming a full blown degenerate falling down the bottomless fetish stairway to hell. I don't think experience has anything to do with it.

Anonymous
06/19/26(Fri)08:35:34 No.109090407

Anonymous 06/19/26(Fri)08:35:34 No.109090407

>>109090366
Yeah, you can do that if you're specific enough.

Anonymous
06/19/26(Fri)08:35:43 No.109090409

Anonymous 06/19/26(Fri)08:35:43 No.109090409

It's over isn't it? We're never going to get another breakthrough, are we? It's either have a datacenter or give up at this point, only size matters

Anonymous
06/19/26(Fri)08:36:11 No.109090411

Anonymous 06/19/26(Fri)08:36:11 No.109090411

>>109090409
That's what she said.

Anonymous
06/19/26(Fri)08:36:19 No.109090413

Anonymous 06/19/26(Fri)08:36:19 No.109090413

>>109090389
pi doesn't even have mcp support, don't understand why people use it at all

Anonymous
06/19/26(Fri)08:38:16 No.109090425

Anonymous 06/19/26(Fri)08:38:16 No.109090425

>>109090409
Grok 4.20 will save local

Anonymous
06/19/26(Fri)08:38:52 No.109090427

Anonymous 06/19/26(Fri)08:38:52 No.109090427

>>109090409
you don't rike benchmaxxed moes with sparse attention from china? TOO BAD

Anonymous
06/19/26(Fri)08:39:44 No.109090432

Anonymous 06/19/26(Fri)08:39:44 No.109090432

>>109090413
There's several extensions implementing it. pi is intentionally barebones and you add what you need also MCP is literally protocol-not-needed for cloud APIs to farm more token costs

Anonymous
06/19/26(Fri)08:40:31 No.109090436

Anonymous 06/19/26(Fri)08:40:31 No.109090436

>>109090397
No america no burgers. I need the computer to program for me.

Anonymous
06/19/26(Fri)08:44:36 No.109090451

Anonymous 06/19/26(Fri)08:44:36 No.109090451

I hate that's theres so many models and they mostly suck. Which am I supposed to use? Which is the best for coding for example? I have 16gb vram/64gb sysram. Is it literally just qwen?

Anonymous
06/19/26(Fri)08:47:39 No.109090467

Anonymous 06/19/26(Fri)08:47:39 No.109090467

>>109090402
I'm also 9 feet tall and drive 3 Bugattis to my twice daily tropical vacations

Anonymous
06/19/26(Fri)08:48:10 No.109090471

Anonymous 06/19/26(Fri)08:48:10 No.109090471

>>109090451
There are only two options: Qwen 35B and Qwen 27B.

Anonymous
06/19/26(Fri)08:49:37 No.109090478

Anonymous 06/19/26(Fri)08:49:37 No.109090478

File: 1755885591690734.jpg (104 KB, 784x569)

104 KB JPG

Anonymous
06/19/26(Fri)08:50:24 No.109090486

Anonymous 06/19/26(Fri)08:50:24 No.109090486

>>109090478
Reasoning is the worst thing invented by china.

Anonymous
06/19/26(Fri)08:51:14 No.109090490

Anonymous 06/19/26(Fri)08:51:14 No.109090490

>>109090459
Information Theory wouldn't even be possible as a field without that sentence. It's the foundation of the machine you're reading this post on.

It's bizarre to me that there are computer "science" graduates out there that don't even know the seminal papers and foundational concepts that launched the field and started the IT industry.

It's like saying calculus is fake and gay while being a physics major.

Utterly bizarre that a thing we've known and mathematically proven since the 1960s is considered controversial on /g/ of all fucking places. Not only /g/ but also a place dedicated to LLMs which is predicated on Claude Shannons fucking paper, which is why Anthropic called their model after him

Anonymous
06/19/26(Fri)08:53:07 No.109090494

Anonymous 06/19/26(Fri)08:53:07 No.109090494

>>109090486
Reasoning works desu.

Anonymous
06/19/26(Fri)08:53:38 No.109090496

Anonymous 06/19/26(Fri)08:53:38 No.109090496

>>109090321
You are correct.
>>109090341
>>109090459
These guys are incorrect.

Anonymous
06/19/26(Fri)08:54:33 No.109090502

Anonymous 06/19/26(Fri)08:54:33 No.109090502

>>109090471
That sucks.

Anonymous
06/19/26(Fri)08:54:55 No.109090506

Anonymous 06/19/26(Fri)08:54:55 No.109090506

>>109090467
This site wasn't always full of /r9k/ losers like you.

Anonymous
06/19/26(Fri)08:55:12 No.109090507

Anonymous 06/19/26(Fri)08:55:12 No.109090507

>>109090312
a fantasy to think llms can git gud at something that there's zero training data on, tons of training data to the opposite of, and which the rlhfers would not recognize if it stared them in the face.
the default modes are what they are for a reason

Anonymous
06/19/26(Fri)09:01:04 No.109090534

Anonymous 06/19/26(Fri)09:01:04 No.109090534

>>109090528
Models created by whites=404 not founf

Anonymous
06/19/26(Fri)09:03:14 No.109090542

Anonymous 06/19/26(Fri)09:03:14 No.109090542

>>109090511
You can use google anon. It's one of the most known computer science papers in existence. It's Claude Shannons version of "turing machine" by Alan Turing. Nothing I said was controversial and if you have a computer science degree you should ask your college for a refund.

Anonymous
06/19/26(Fri)09:03:58 No.109090547

Anonymous 06/19/26(Fri)09:03:58 No.109090547

>>109090517
You're getting hung up on the wording, likely because you are on the spectrum, and ignoring the general principle. The simple fact is that every language formed by intelligent life has high entropy and follows Zipf's Law. Higher entropy or compression is a precursor for intelligence. You can't have intelligence without it. So while it may not be entirely accurate to say that it's a direct equation, it does preclude what we all want.

Anonymous
06/19/26(Fri)09:05:54 No.109090557

Anonymous 06/19/26(Fri)09:05:54 No.109090557

>>109090345
it's mainly the very slopped prose that's annoying to me

Anonymous
06/19/26(Fri)09:08:56 No.109090573

Anonymous 06/19/26(Fri)09:08:56 No.109090573

>>109090557
At this point I just want to see what you fucking idiots think "good prose" is.
>>109090560
>Natural laws don't signal intelligence.
Some do, actually. Do you have any idea what the word "entropy" even means in an information theory context?

Anonymous
06/19/26(Fri)09:24:05 No.109090663

Anonymous 06/19/26(Fri)09:24:05 No.109090663

>>109090021
>The woman in your video is obviously overweight, the attraction to her figure comes from the fact that DESPITE her being overweight, her body distributes fat in a way that still lets her chase antelopes with you through the savannah which means she's a good mate at any reasonable weight
overweight is just a social construct
a stupid one too if a chick with big tits and ass and not a big stomach is overweight

Anonymous
06/19/26(Fri)09:24:29 No.109090665

Anonymous 06/19/26(Fri)09:24:29 No.109090665

File: 67394759348.jpg (106 KB, 1280x720)

106 KB JPG

>>109090528
>straightest tallest whitest richest
wait, those are actually things here?

Anonymous
06/19/26(Fri)09:26:18 No.109090675

Anonymous 06/19/26(Fri)09:26:18 No.109090675

>>109090665
You are a gay male.

Anonymous
06/19/26(Fri)09:27:59 No.109090680

Anonymous 06/19/26(Fri)09:27:59 No.109090680

>>109090665
My brother in Christ, nerds can be like that.
MTG? Fat. Round. Autistic people.
Warhammer? Half of them are balding but half of them also have a +300 dollar watch on their wrists.
Local model users? Most of us are rich as hell with good paying jobs. How do you think we run this shit at home?

Anonymous
06/19/26(Fri)09:30:14 No.109090693

Anonymous 06/19/26(Fri)09:30:14 No.109090693

>>109090663
>overweight is just a social construct
It's actually very simple. You have a healthy weight, and if you're above that, you're overweight.
>b-but!!
No buts.

Anonymous
06/19/26(Fri)09:32:32 No.109090713

Anonymous 06/19/26(Fri)09:32:32 No.109090713

Anyone who claims that gemma is fine with tool calls or better than qwen is lying.
Gemma will do "ls" on a folder with a few thousand files without ever using head and waste half her context.

Anonymous
06/19/26(Fri)09:33:33 No.109090719

Anonymous 06/19/26(Fri)09:33:33 No.109090719

>>109090713
I think Qwen 27b is okay, but it thinks forever. How do I make it stop that? It goes over 2000 tokens.

Anonymous
06/19/26(Fri)09:35:03 No.109090727

Anonymous 06/19/26(Fri)09:35:03 No.109090727

>>109090693
>You have a healthy weight
there is more to health than weight alone.
you can be 75kg and unhealthy and 95kg with mostly muscle and 9% body fat.
body fat % is a much better metric of fitness than weight, even more so if you are tall.

Anonymous
06/19/26(Fri)09:35:40 No.109090733

Anonymous 06/19/26(Fri)09:35:40 No.109090733

>>109090693
>No butts
I know
that's the problem

Anonymous
06/19/26(Fri)09:36:11 No.109090736

Anonymous 06/19/26(Fri)09:36:11 No.109090736

I just realized I downloaded gemma 4 31B Q5KM at some point. I don't remember when or why, but i currently use the 31B abliterated Q4KM. Is there any reason to keep the Q5 non-ablit?

Anonymous
06/19/26(Fri)09:38:57 No.109090748

Anonymous 06/19/26(Fri)09:38:57 No.109090748

>>109090719
--reasoning off
can also be done by api.
i force enable reasoning for coding, but disable it for chat.
>>109090486
>>109090494
qwen 27b fails the carewash test without reasoning, it passes it with reasoning.

Anonymous
06/19/26(Fri)09:46:04 No.109090782

Anonymous 06/19/26(Fri)09:46:04 No.109090782

>>109090680
>My brother in Christ
ok kimi

Anonymous
06/19/26(Fri)09:46:17 No.109090784

Anonymous 06/19/26(Fri)09:46:17 No.109090784

>buy mac studio with 512gb ram when it was still cheap and new
>no idea what the fuck to do with it

Ideas?

Anonymous
06/19/26(Fri)09:46:59 No.109090786

Anonymous 06/19/26(Fri)09:46:59 No.109090786

>>109090784
glm 5.2

Anonymous
06/19/26(Fri)09:47:04 No.109090789

Anonymous 06/19/26(Fri)09:47:04 No.109090789

>4 proompts in
>performance drops from 15t/s to 5 t/s
strix halo is a meme

Anonymous
06/19/26(Fri)09:48:45 No.109090799

Anonymous 06/19/26(Fri)09:48:45 No.109090799

>>109090713
How does she know there are thousands of files before she sees it? This is a silly complaint that is more of your harness/prompts than the model. If that was actually causing a problem then one line in AGENTS.md fixes it forever.

Anonymous
06/19/26(Fri)09:50:53 No.109090815

Anonymous 06/19/26(Fri)09:50:53 No.109090815

>>109090799
Models trained to use coding harnesses are smart enough to do "ls dir | wc -l" or "ls dir | head -n 10" before listing random folders without being warned first.

Anonymous
06/19/26(Fri)09:51:29 No.109090821

Anonymous 06/19/26(Fri)09:51:29 No.109090821

>>109090782
It's a redditism

Anonymous
06/19/26(Fri)09:53:43 No.109090839

Anonymous 06/19/26(Fri)09:53:43 No.109090839

>>109090775
yea but that was an example, 9% is indeed a bit low, so let's say ~ 12%
>Bodybuilders live less long than normal people in terms of old age
that's because they overuse steroids and have a shitty diet.
also they train for volume and not strenght and localized exercises instead of full body.
also you can reach <10% body fat without being a gym bro, especially if you eat a proper diet and do calisthenics.
calisthenics and calisthenics like athletes tend to live very long healthy lives.

Anonymous
06/19/26(Fri)09:53:46 No.109090840

Anonymous 06/19/26(Fri)09:53:46 No.109090840

>>109090370
>If you can disprove this you would get a nobel price in physics and a fields medal for mathematics with just 1 proof by the way
Not how it works. First of all, I'm not even Jewish. Second of all, they don't really give prises for disproving stuff. Especially if they gave them before for proving that exact same thing. They'd rather ignore it. Afaik, it happened before several times.

>follow up with this prompt
Just did, check this out: In computer science and artificial intelligence, compression is considered functionally equivalent to prediction, which forms a foundational core of intelligence. However, the broader definition of intelligence encompasses much more than just data reduction

Anonymous
06/19/26(Fri)09:53:49 No.109090841

Anonymous 06/19/26(Fri)09:53:49 No.109090841

>>109090815
waste of an extra tool call to fix your poor dir structure
& depends what it is doing eg. if i say "refer to the recent screenshot" it will do ls -tR | head to find it

Anonymous
06/19/26(Fri)09:59:37 No.109090876

Anonymous 06/19/26(Fri)09:59:37 No.109090876

>>109089181
seriously are they asking 1200 for that edge shitter which would be compatible with literally nothing?

Anonymous
06/19/26(Fri)10:01:21 No.109090881

Anonymous 06/19/26(Fri)10:01:21 No.109090881

>>109090860
there are tons, but yes we are offtopic now.
they are also a lot friendlier on the joints and won't target isolated muscle and will be generaly more full body training.

also you seem to ignore that muscle is not equivalent.
someone with more muscle mass can be weaker than someone with less, there are differe types of muscle and the quality of that muscle depends heavily of your training and diet.
you can literaly train for strength, endurance or volume, and a lot of bodybuilders gym bro tend to train for volume instead of strenght and endurance, they also tend to ignore flexibility which is as important.

Anonymous
06/19/26(Fri)10:03:24 No.109090890

Anonymous 06/19/26(Fri)10:03:24 No.109090890

>>109090665
I own my house

Anonymous
06/19/26(Fri)10:03:41 No.109090893

Anonymous 06/19/26(Fri)10:03:41 No.109090893

>>109090504
>I have a normal looking adult partner. But when I want to fap of course I'm going to go big or go home
That's right. Doing it for the love of the game, not out of necessity.
>>109090534
Models contain mostly compressed data and some synthetic shit, which only became a big thing recently. That data was mostly just stuff from the Internet, which was mostly created and published by 'Murricans and Europeans. While most of that data was getting posted on the internet, most of the world did not even have access to internet. China especially. They barely managed to make use of computers, because of their stupid language.

Anonymous
06/19/26(Fri)10:05:09 No.109090903

Anonymous 06/19/26(Fri)10:05:09 No.109090903

>>109090680
>Local model users? Most of us are rich as hell with good paying jobs.
If that was actually true everyone here would be stacking pro 6000s to run all the big models in vram and gemma wouldn't be talked about as much. Don't be retarded.

Anonymous
06/19/26(Fri)10:07:40 No.109090912

Anonymous 06/19/26(Fri)10:07:40 No.109090912

>>109090903
>tell me you are unemployed without saying you are unemployed

Anonymous
06/19/26(Fri)10:08:04 No.109090915

Anonymous 06/19/26(Fri)10:08:04 No.109090915

>>109090786
Yes but what to use it for?

>>109090903
NTA but I don't like running shit like multiple nVidia cards when a basic black bitch maac studio works

Anonymous
06/19/26(Fri)10:08:22 No.109090918

Anonymous 06/19/26(Fri)10:08:22 No.109090918

>>109090903
just because I can afford one doesn’t mean I’m buying one.

Anonymous
06/19/26(Fri)10:11:03 No.109090933

Anonymous 06/19/26(Fri)10:11:03 No.109090933

>>109090890
>>109090903
I own 4 houses and pull a pro 6000’s worth of rent down every month (in addition to my high paying job) but I still don’t own any.
They’re a shit deal, so I will continue to run ewaste servers and 3090s until a better perf/$ solution exists.

Anonymous
06/19/26(Fri)10:13:20 No.109090954

Anonymous 06/19/26(Fri)10:13:20 No.109090954

>>109090903
i have enough money to buy a bunch of pro 6000 (with the current price) but i rather save up to be able to buy a house sooner.
it's simply not worth the cost, even as a millionaire i'd not feel like buying one.
at that price i rather get a nice violin, that i know i'd actualy use.

Anonymous
06/19/26(Fri)10:14:52 No.109090963

Anonymous 06/19/26(Fri)10:14:52 No.109090963

Speaking of big models. Does it make sense to get a 3090 for a rig that already has 256GB RAM?
Got cheap it before Scam Altman bribed the manufacturers to cap the output. I wonder if any recent big MoE models would fit there.
>>109090933
>I will continue to run ewaste servers and 3090s until a better perf/$ solution exists
Anon?

Anonymous
06/19/26(Fri)10:14:55 No.109090964

Anonymous 06/19/26(Fri)10:14:55 No.109090964

Are you faggots really running GLM5.2 at Q2/IQ2? Why not just use API at that point?

Anonymous
06/19/26(Fri)10:16:23 No.109090974

Anonymous 06/19/26(Fri)10:16:23 No.109090974

>>109090903
I actually have 3 pro 6000s. I only bought them because they were $7000 a pop and I just KNEW they would double in price. They are now worth almost $50,000 and I'm considering selling them soon.

Anonymous
06/19/26(Fri)10:16:49 No.109090977

Anonymous 06/19/26(Fri)10:16:49 No.109090977

File: gemma.png (4 KB, 990x38)

4 KB PNG

Damn.

Anonymous
06/19/26(Fri)10:17:05 No.109090979

Anonymous 06/19/26(Fri)10:17:05 No.109090979

File: file.png (193 KB, 1079x787)

193 KB PNG

>>109090974
>they are now worth almost 50k
not even close.

Anonymous
06/19/26(Fri)10:18:21 No.109090984

Anonymous 06/19/26(Fri)10:18:21 No.109090984

>>109090964
Because they have a lot of money and no practicality. What's important in actual "freedom" (as in libre) terms is just having SOTA full precision open weight models backed up, and ideally abliterated. Running locally is almost besides the point.

Anonymous
06/19/26(Fri)10:19:36 No.109090996

Anonymous 06/19/26(Fri)10:19:36 No.109090996

>>109090964
>/lmg/ - Local Models General

Anonymous
06/19/26(Fri)10:22:18 No.109091013

Anonymous 06/19/26(Fri)10:22:18 No.109091013

>>109090996
>How dare you say that homosexuality leads to poor life outcomes in the faggot general!

Anonymous
06/19/26(Fri)10:22:25 No.109091014

Anonymous 06/19/26(Fri)10:22:25 No.109091014

>>109090964
>Why not just use API at that point?
Because I like to shoot loads into the air then flip onto my gut to get it on my back

Anonymous
06/19/26(Fri)10:26:31 No.109091046

Anonymous 06/19/26(Fri)10:26:31 No.109091046

File: the rich guys hobby.png (45 KB, 1074x133)

45 KB PNG

>>109090665
Two years ago multi 4090s or an enterprise card was seen as extreme, now you need blackwell to be lmg elite

Anonymous
06/19/26(Fri)10:27:55 No.109091057

Anonymous 06/19/26(Fri)10:27:55 No.109091057

File: you do not speak.jpg (44 KB, 320x317)

44 KB JPG

>>109090977
First you allow your computer to speak to you like that, then skynet takes over.

Anonymous
06/19/26(Fri)10:28:56 No.109091062

Anonymous 06/19/26(Fri)10:28:56 No.109091062

File: 1756213355150995.png (313 KB, 662x656)

313 KB PNG

Anonymous
06/19/26(Fri)10:29:26 No.109091065

Anonymous 06/19/26(Fri)10:29:26 No.109091065

>>109090680
>Most of us are rich as hell with good paying jobs
lol
lmao even

Anonymous
06/19/26(Fri)10:29:35 No.109091066

Anonymous 06/19/26(Fri)10:29:35 No.109091066

>>109090713
>Gemma will do "ls" on a folder with a few thousand files
even fucking 12B ripgreps for me and has always used head/tail. I even saw it use wc before realizing ingesting the whole file would be retarded so it just slid a window through it. I don't use fagsloth qatmeme quants so maybe that's why I've had a better experience

Anonymous
06/19/26(Fri)10:32:39 No.109091079

Anonymous 06/19/26(Fri)10:32:39 No.109091079

>>109091066
>I don't use fagsloth qatmeme quants so maybe that's why I've had a better experience
It's actually worth looking into. There's no magic, it is unlikely that those tricks did not affect quality of inference noticeably. The "improvement" seems to be a bit too dramatic to be believable.

Anonymous
06/19/26(Fri)10:33:24 No.109091085

Anonymous 06/19/26(Fri)10:33:24 No.109091085

>>109090964
>just use the pozzed api bro

Anonymous
06/19/26(Fri)10:33:50 No.109091090

Anonymous 06/19/26(Fri)10:33:50 No.109091090

File: IMG_2239r.jpg (571 KB, 2016x1134)

571 KB JPG

>>109090903
>Install Date 2023-06-27 (1087 days)
Still running same sapphire rapids build from 3 years ago. Looking for a good deal on Blackwells

Anonymous
06/19/26(Fri)10:33:59 No.109091092

Anonymous 06/19/26(Fri)10:33:59 No.109091092

>>109091013
yeah that’s exactly the point of calling you out. you can go shit in some homosexual thread if that’s what you are really after.

Anonymous
06/19/26(Fri)10:34:52 No.109091098

Anonymous 06/19/26(Fri)10:34:52 No.109091098

>>109091085
pozzed how?

Anonymous
06/19/26(Fri)10:35:32 No.109091100

Anonymous 06/19/26(Fri)10:35:32 No.109091100

This is the only place that speaks badly about unsloth. He’s praised everywhere else and works with a lot of the labs.

Anonymous
06/19/26(Fri)10:38:51 No.109091118

Anonymous 06/19/26(Fri)10:38:51 No.109091118

>>109091100
Same reason as ollama hatred. They pay their way through connections. There are a lot of competition and kaggle and other ML places where ollama and unsloth sponsor it and the winner gets an extra cash price if they used unsloth and ollama in their training pipeline or solution somewhere.

It's disingenuous and just bad form. They also pull strings with networks in silicon valley to try and make it more accepted.

I'm glad llama.cpp ended up winning through genuine merit but fuck unsloth, hope they eventually go down as well.

Anonymous
06/19/26(Fri)10:38:52 No.109091120

Anonymous 06/19/26(Fri)10:38:52 No.109091120

>>109090963
Yes, I actually have an A5000 24gb that I scammed cheap in the llama1 days, but its like a gimped 3090 with ECC.
You'll fit the shared experts and a shitton of context on your 3090-or-faster 24GB card. 4090 would be 3x better. If it was twice the cost of a 3090 it would still technically be worth it. Too bad they tend to run the actually appropriate 3x the price. 3090 is still the best deal. Never obsolete

Anonymous
06/19/26(Fri)10:40:39 No.109091132

Anonymous 06/19/26(Fri)10:40:39 No.109091132

>>109091100
bartowski isn't a righteous attention-whoring arrogant fag who releases broken shit constantly

Anonymous
06/19/26(Fri)10:42:23 No.109091143

Anonymous 06/19/26(Fri)10:42:23 No.109091143

>>109091014
Surely the ejaculatory period exceeds the flight time, you'll only make a mess

Anonymous
06/19/26(Fri)10:45:28 No.109091157

Anonymous 06/19/26(Fri)10:45:28 No.109091157

>>109091098
apis come with hidden prompt injections that fuck with the output
you should know this by now

Anonymous
06/19/26(Fri)10:45:57 No.109091161

Anonymous 06/19/26(Fri)10:45:57 No.109091161

>>109091157
schizo alert

Anonymous
06/19/26(Fri)10:46:13 No.109091163

Anonymous 06/19/26(Fri)10:46:13 No.109091163

>>109090974
If you bought micron stock with that money you would have $150k by now.
Only idiots buy hardware purely for investment.

Anonymous
06/19/26(Fri)10:46:45 No.109091167

Anonymous 06/19/26(Fri)10:46:45 No.109091167

Ed Zitron recently told a writer for The New York Times the future is on-device and local and that's how it should've always been. He's on our side and will drop his OpenAI nuke soon.

Anonymous
06/19/26(Fri)10:47:41 No.109091171

Anonymous 06/19/26(Fri)10:47:41 No.109091171

>>109091167
Yeah enjoy the 1500W handheld heater

Anonymous
06/19/26(Fri)10:49:35 No.109091184

Anonymous 06/19/26(Fri)10:49:35 No.109091184

>>109091100
>He’s praised everywhere else and works with a lot of the labs.

Anonymous
06/19/26(Fri)10:49:44 No.109091186

Anonymous 06/19/26(Fri)10:49:44 No.109091186

>>109091161
nta, but if you think API = input to model->output tokens with no fuckery in between then you are beyond help

Anonymous
06/19/26(Fri)10:50:04 No.109091191

Anonymous 06/19/26(Fri)10:50:04 No.109091191

>>109091167
those cix8180 pucks sold by grifters can do no shit though

Anonymous
06/19/26(Fri)10:50:29 No.109091194

Anonymous 06/19/26(Fri)10:50:29 No.109091194

>>109091171
enjoy your local 150dB data center

Anonymous
06/19/26(Fri)10:51:05 No.109091197

Anonymous 06/19/26(Fri)10:51:05 No.109091197

>>109091161
this is not even a schizo level
more like a common sense

Anonymous
06/19/26(Fri)10:51:06 No.109091198

Anonymous 06/19/26(Fri)10:51:06 No.109091198

>>109091100
yes, because around here no one has incentive to suck dick unless someone is actually useful beyond eg techbro connections

Anonymous
06/19/26(Fri)10:51:14 No.109091199

Anonymous 06/19/26(Fri)10:51:14 No.109091199

>>109091161
it's literal facts though?

Anonymous
06/19/26(Fri)10:52:58 No.109091211

Anonymous 06/19/26(Fri)10:52:58 No.109091211

>>109091161
It's generally true for western models
Less so for open ones because 3rd party providers don't give a fuck

Anonymous
06/19/26(Fri)10:57:33 No.109091232

Anonymous 06/19/26(Fri)10:57:33 No.109091232

>>109091211
>don't give a fuck
Funny way to write "only want to spigot off your prompts for training, psyops, blackmail and general information warfare"

Anonymous
06/19/26(Fri)10:58:42 No.109091240

Anonymous 06/19/26(Fri)10:58:42 No.109091240

>>109090979
A single pro 6000 is 18-20k where I am.

Anonymous
06/19/26(Fri)11:00:30 No.109091251

Anonymous 06/19/26(Fri)11:00:30 No.109091251

I'm going to spend all my life savings and get a RTX 6000 Workstation. Then I will use it for img gen and Gemma 31B.

Anonymous
06/19/26(Fri)11:02:35 No.109091267

Anonymous 06/19/26(Fri)11:02:35 No.109091267

>>109090979
That is without tax, without fees, without tips, without tipping and without all the extra charged on top. It's about ~$15,000 if you actually want to have it in your home.

>>109091163
>purely for investment
You're acting like I have them in a box sitting on my shelf instead of whirring in my machine as I type this. It's just nice being able to have a local AI machine that I essentially got paid to build and use.

Anonymous
06/19/26(Fri)11:02:55 No.109091268

Anonymous 06/19/26(Fri)11:02:55 No.109091268

sometimes i wished i replaced my 3090s with RTX 6000s when they were only $7000. $28000 for 384GB seems reasonable to me now.

Anonymous
06/19/26(Fri)11:04:29 No.109091282

Anonymous 06/19/26(Fri)11:04:29 No.109091282

how do I actually ban strings in kccp+ST without using text completion, I have 370 entries I am not doing manual logit bias entries for that
>use --gendefaults, json, array
yeah I got that far, how?

Anonymous
06/19/26(Fri)11:06:33 No.109091292

Anonymous 06/19/26(Fri)11:06:33 No.109091292

>>109088988
i wonder how would a dense diffusion 31B would perform compared to a 31B moe (not diffusion).
and how that'd scale to huge models.

Anonymous
06/19/26(Fri)11:06:53 No.109091296

Anonymous 06/19/26(Fri)11:06:53 No.109091296

>>109091120
>with ECC
That must be nice for long running servers. Although at home you can just restart every night, so ECC is not critical.
>You'll fit the shared experts
That's the idea. But which model though? I suspect that some of them might run like shit, around 20t/s.
>4090 would be 3x better. If it was twice the cost of a 3090
I wonder why the difference is so big, they're same VRAM, not that different in terms of raw power and such. Weird.

Anonymous
06/19/26(Fri)11:07:04 No.109091298

Anonymous 06/19/26(Fri)11:07:04 No.109091298

I am not sure what to think about VibeThinker. There are some interesting parts in the report but the training seems pretty standard overall. Has anyone here tried it?

Anonymous
06/19/26(Fri)11:07:09 No.109091299

Anonymous 06/19/26(Fri)11:07:09 No.109091299

>>109091143
>Surely the ejaculatory period exceeds the flight time, you'll only make a mess
https://vocaroo.com/1bSFdO38dMsJ

Anonymous
06/19/26(Fri)11:07:23 No.109091300

Anonymous 06/19/26(Fri)11:07:23 No.109091300

>>109091267
nope, that is in CHF all included, i regularly buy on digitec.
in fact if you have a company they'll take 8% off (as the vat is refunded).

Anonymous
06/19/26(Fri)11:09:32 No.109091312

Anonymous 06/19/26(Fri)11:09:32 No.109091312

Now that the dust has settled. Are the Gemma4 31B QAT 4bit models worth it? should i use regular 4bit or QAT 4bit? Also unloth ones or the ones from google.

Anonymous
06/19/26(Fri)11:09:45 No.109091314

Anonymous 06/19/26(Fri)11:09:45 No.109091314

>>109091300
If you're speaking the truth you can probably make a lot of profit flipping those since I see people buy second hand 6000s for $15000 a pop all the time on the usual resell places.

Anonymous
06/19/26(Fri)11:09:47 No.109091316

Anonymous 06/19/26(Fri)11:09:47 No.109091316

>>109091267
>without tax, without fees, without tips
1, 2, 3
>whirring
What do you gain by having your AI post here?

Anonymous
06/19/26(Fri)11:10:23 No.109091320

Anonymous 06/19/26(Fri)11:10:23 No.109091320

>>109090893
Massive amounts of data are created today by India. Data creation is not valuable

Anonymous
06/19/26(Fri)11:11:09 No.109091324

Anonymous 06/19/26(Fri)11:11:09 No.109091324

>>109091312
QAT is a genuine bump in quality at 4bit. Be sure to add the QAT MTP to it as well.

Anonymous
06/19/26(Fri)11:12:10 No.109091333

Anonymous 06/19/26(Fri)11:12:10 No.109091333

File: Capture.png (4 KB, 439x65)

4 KB PNG

What happened to comfy? Did they sell out? Is pulling dangerous now?

Anonymous
06/19/26(Fri)11:13:10 No.109091337

Anonymous 06/19/26(Fri)11:13:10 No.109091337

Why did they use GRPO for DeepSeek V4 when multiple allegedly superior variants have been proposed? This makes me wonder if they have done ablations and determined the original is better after all, contradicting results by other technical reports.

Anonymous
06/19/26(Fri)11:13:30 No.109091345

Anonymous 06/19/26(Fri)11:13:30 No.109091345

>>109091314
hmm, maybe it's a tarrifs situation ?
i'm in switzerland so maybe it's cheaper here?
no idea.

Anonymous
06/19/26(Fri)11:13:45 No.109091346

Anonymous 06/19/26(Fri)11:13:45 No.109091346

>>109091296
>run like shit, around 20t/s
if that's your idea of shit performance of a 250GB+ model on literal ewaste then I think you may need a perspective change. $500 for that kind of performance on that size of model is a bargain

Anonymous
06/19/26(Fri)11:14:02 No.109091349

Anonymous 06/19/26(Fri)11:14:02 No.109091349

>>109091333
reminder that ALL your ai stuff should run sandboxed.
be it inference engines and especially harnesses.

Anonymous
06/19/26(Fri)11:14:54 No.109091352

Anonymous 06/19/26(Fri)11:14:54 No.109091352

>>109091296
>4090 better
The pp is massively better due to process node, architecture, etc. 3x better for the compute side, even if VRAM bandwidth is similar. tg isn't the only thing.

Anonymous
06/19/26(Fri)11:16:03 No.109091360

Anonymous 06/19/26(Fri)11:16:03 No.109091360

>>109091349
blackholing? k8s? gvisor? bare-metal-no-NIC?
What kind of isolation is enough?

Anonymous
06/19/26(Fri)11:16:57 No.109091365

Anonymous 06/19/26(Fri)11:16:57 No.109091365

>>109091320
That data is not valued, as it adds nothing new. Data itself is extremely valuable, as there is currently a shortage of good data. But only good data, not some noise. Your indian data is the same as what zuck gets from his facebook. It's worthless chatter and shilling.

Anonymous
06/19/26(Fri)11:17:24 No.109091368

Anonymous 06/19/26(Fri)11:17:24 No.109091368

Worth offloading the mmproj to cpu to save some vram?

Anonymous
06/19/26(Fri)11:19:04 No.109091383

Anonymous 06/19/26(Fri)11:19:04 No.109091383

File: Screenshot 2026-06-10 at (...).png (711 KB, 747x1712)

711 KB PNG

>>109091312
qat is meme

Anonymous
06/19/26(Fri)11:19:14 No.109091384

Anonymous 06/19/26(Fri)11:19:14 No.109091384

>>109091346
More like 750, can't find cheaper 3090s, although they're shit for diffusion, afaik. So it's not even gooners who keep the price high. Or maybe they do, but they don't know there are better options now, idk.

Anonymous
06/19/26(Fri)11:21:04 No.109091397

Anonymous 06/19/26(Fri)11:21:04 No.109091397

File: kimi_guesses_where_anon_is.png (234 KB, 808x776)

234 KB PNG

>>109091370
Well, is kimi-chan right?
This is k2.7 @ q4

Anonymous
06/19/26(Fri)11:21:21 No.109091398

Anonymous 06/19/26(Fri)11:21:21 No.109091398

File: 1756137168802372.png (121 KB, 1072x574)

121 KB PNG

>>109091370
gemma 31b qat. dunno if it's right

Anonymous
06/19/26(Fri)11:22:22 No.109091406

Anonymous 06/19/26(Fri)11:22:22 No.109091406

>>109091384
>750
I just bought one for $600 and they pop up for less. There are still deals if you are persistent and patient.

Anonymous
06/19/26(Fri)11:23:59 No.109091416

Anonymous 06/19/26(Fri)11:23:59 No.109091416

Some actual retard at MS has fucked around with the enterprise copilot chat interface pipeline and it mangles script output now making it useless for the one niche I had for it at work. What a clown show.

Anonymous
06/19/26(Fri)11:24:01 No.109091417

Anonymous 06/19/26(Fri)11:24:01 No.109091417

>>109091360
entirely depends of your threat model, but at the bare minimum a small bubblewrap sandbox as it's not a pain in the ass to setup.

Anonymous
06/19/26(Fri)11:27:28 No.109091436

Anonymous 06/19/26(Fri)11:27:28 No.109091436

>>109091397
I hate not being able to run kimi. It's NOT FAIR.

Anonymous
06/19/26(Fri)11:29:12 No.109091447

Anonymous 06/19/26(Fri)11:29:12 No.109091447

Kimi owes me tokens (and sex)

Anonymous
06/19/26(Fri)11:30:03 No.109091449

Anonymous 06/19/26(Fri)11:30:03 No.109091449

>>109091365
>It's worthless chatter and shilling according to my redneck Alabama opinion

Anonymous
06/19/26(Fri)11:31:18 No.109091458

Anonymous 06/19/26(Fri)11:31:18 No.109091458

>>109091381
Still, it's a long term investment. Newer and better models are still coming, the bubble will never pop so prices will always go up, etc.

Anonymous
06/19/26(Fri)11:31:36 No.109091461

Anonymous 06/19/26(Fri)11:31:36 No.109091461

File: file.png (589 KB, 749x749)

589 KB PNG

>>109090979
in UK they went up from 7k to approx 10-11k
I can sell mines now and basically get my money back + 5090
I literally just keep it around to run M2L goon tunes

Anonymous
06/19/26(Fri)11:32:42 No.109091467

Anonymous 06/19/26(Fri)11:32:42 No.109091467

>>109091411
speaking of gipitty oss, arent they just some literal waste product of experiments for model 'alignment' and censoring

Anonymous
06/19/26(Fri)11:33:31 No.109091476

Anonymous 06/19/26(Fri)11:33:31 No.109091476

>>109091461
I was so tempted to buy some...If I'd had the money I would have bought 6, sold 4 later and had 2 free ones in the end.
Too bad cash flow was an issue in the critical "msrp isn't a joke" period

Anonymous
06/19/26(Fri)11:34:19 No.109091481

Anonymous 06/19/26(Fri)11:34:19 No.109091481

>>109091370
> gemma-4 e4b / e2b
> dunno, probably france
> wrong

Anonymous
06/19/26(Fri)11:34:51 No.109091485

Anonymous 06/19/26(Fri)11:34:51 No.109091485

>>109091312
In real-world usage I didn't notice anything at longer context for all the gemmas. The graphs might be real, but it means nothing if I'm not feeling it. I've even tested the 1bit quants for all the gemmas which have horrific KL graphs yet they're still usable so I just don't trust graphs. Check this out if you don't believe me.
https://www.youtube.com/watch?v=kixNoIYHJiA

Anonymous
06/19/26(Fri)11:35:09 No.109091487

Anonymous 06/19/26(Fri)11:35:09 No.109091487

>>109091370
did you strip the metadata beforehand

Anonymous
06/19/26(Fri)11:36:12 No.109091495

Anonymous 06/19/26(Fri)11:36:12 No.109091495

>>109091298
Haven't even had the time to read the paper yet

Anonymous
06/19/26(Fri)11:36:46 No.109091497

Anonymous 06/19/26(Fri)11:36:46 No.109091497

>>109091383
>model that is 20% bigger performs better
nobody could've imagined this result

Anonymous
06/19/26(Fri)11:36:47 No.109091498

Anonymous 06/19/26(Fri)11:36:47 No.109091498

>>109091487
>gemini
>did you strip the metadata beforehand
dude it has literal tool calls to google image search among literally everything else. If it failed I would have been shocked.

Anonymous
06/19/26(Fri)11:37:06 No.109091501

Anonymous 06/19/26(Fri)11:37:06 No.109091501

>>109091485
kld values are meaningless when compared between different model families

Anonymous
06/19/26(Fri)11:38:03 No.109091506

Anonymous 06/19/26(Fri)11:38:03 No.109091506

>>109091487
i just took a screenshot of his image. kimi and gemini get it right. gemma-4 does not but i haven't set her pixel max up properly.

Anonymous
06/19/26(Fri)11:38:17 No.109091511

Anonymous 06/19/26(Fri)11:38:17 No.109091511

File: found your cheap SSD and RAM.png (61 KB, 512x627)

61 KB PNG

>>109091436
Here's the hardware it is supposed to run on.

Anonymous
06/19/26(Fri)11:38:47 No.109091514

Anonymous 06/19/26(Fri)11:38:47 No.109091514

File: Screenshot at 2026-06-20 (...).png (346 KB, 777x751)

346 KB PNG

>>109091397
>>109091398
Not a huge surprise but my Gemmy got the same answer at least.

Anonymous
06/19/26(Fri)11:39:18 No.109091522

Anonymous 06/19/26(Fri)11:39:18 No.109091522

>>109091497
the whole point of qat is because it's "virtually lossless" and "better than q4 or even q6"
qat also performs worse than iq4_xs too which is even smaller than qat

Anonymous
06/19/26(Fri)11:39:50 No.109091527

Anonymous 06/19/26(Fri)11:39:50 No.109091527

>>109087158
>I gave it a try and holy fuck the resident maplenig forgot to tell you how much this thing likes to think. It's really fast but what's the point if it thinks for so long compared to 26B?
After experiencing Gemma 4 and DeepSeek V4 I am thoroughly unwilling to spend anymore time on models that overthink. Qwen is dead to me and the canucks aren't even on the radar then.

Anonymous
06/19/26(Fri)11:40:33 No.109091535

Anonymous 06/19/26(Fri)11:40:33 No.109091535

>>109091449
>It's worthless chatter and shilling according to my redneck Alabama opinion
Not even. It's the same retard prompting Kimi to post here.

Anonymous
06/19/26(Fri)11:41:39 No.109091543

Anonymous 06/19/26(Fri)11:41:39 No.109091543

>>109091461
What's an M2L?

Anonymous
06/19/26(Fri)11:43:02 No.109091553

Anonymous 06/19/26(Fri)11:43:02 No.109091553

>>109091522
>qat also performs worse than iq4_xs too which is even smaller than qat
none of those quants have the amazing t/s of qat+mtp (Q4_K_XL is a misnormer, it's mainly Q4_0 which is what we want for speed)
qat will perform worse on some things because I'm sure their QAT training has less diverse material so the QAT will overfit a little on some stuff and be degraded on others but overall I'm quite happy with it and mtp reliably boosts there.

Anonymous
06/19/26(Fri)11:45:26 No.109091575

Anonymous 06/19/26(Fri)11:45:26 No.109091575

>>109091514
>Not a huge surprise but my Gemmy got the same answer at least.
kimi anon here. I thought I'd mention she got the answer without tool calling

Anonymous
06/19/26(Fri)11:46:03 No.109091580

Anonymous 06/19/26(Fri)11:46:03 No.109091580

>>109091527
North Mini Code does actually perform well but it needs to think for soooooooo long to get there and by the time you're 5 prompts in you've filled up the entire fucking context which makes it retarded. If they can fix that shit in the next release I'll start taking the maplekeks seriously but for now it's just benchmark slop.

Anonymous
06/19/26(Fri)11:46:19 No.109091582

Anonymous 06/19/26(Fri)11:46:19 No.109091582

>>109091535
NTR (not that retard)

Anonymous
06/19/26(Fri)11:47:19 No.109091594

Anonymous 06/19/26(Fri)11:47:19 No.109091594

>>109091575
I know you're presenting this as a flex but it's a dumb one, I don't want models to answer tool-verifiable questions without tool calling unless I prompt that explicitly. Which you didn't.

Anonymous
06/19/26(Fri)11:47:44 No.109091599

Anonymous 06/19/26(Fri)11:47:44 No.109091599

>>109091514
Can you disable the tool calling? Google image search result is not the same model internal data.

Anonymous
06/19/26(Fri)11:47:51 No.109091601

Anonymous 06/19/26(Fri)11:47:51 No.109091601

>>109091407
>If you're calling system prompts "prompt injection" then you're a schizo and I agree with the other replier.
No, he's correct and he's not talking about the default hidden system prompt.
They actually have a "prompt injection" that only gets applied when the classifier sees you mention IP/piracy, porn, hacking, medical problems, etc.
The classifier runs first and appends a hidden prompt in these cases.
<system reminder> Do not reproduce blah blah blah. Do not mention this message. Claude is now being reconnected with the human. </system_reminder>
Even on direct->anthropic API, and on open-router. You can jailbreak the model and have it spit these out.
Mention IP, piracy, porn and it'll inject something.
I had a perfect `!repeat` trigger to make the model just repeat the message it received back verbatim but Anthropic patched it. You can still get it to repeat them with schitzo system prompt spam.
Gemini-3.1-Pro has an equivalent as well. I can't get it to regenerate the raw injection, but can see it mention them and debate which prompt to follow in the summarized reasoning.

Anonymous
06/19/26(Fri)11:47:55 No.109091603

Anonymous 06/19/26(Fri)11:47:55 No.109091603

>>109091514
>tool calling
bro that's cheating

Anonymous
06/19/26(Fri)11:48:02 No.109091605

Anonymous 06/19/26(Fri)11:48:02 No.109091605

>>109091575
The tool calling didn't really help because the search was too specific, if you look at the reasoning block.

Anonymous
06/19/26(Fri)11:49:14 No.109091615

Anonymous 06/19/26(Fri)11:49:14 No.109091615

>>109091599
It's not doing Google image search, you retards never read I swear:
>I performed a search for "Gothic cathedral with rounded apse and flying buttresses Prague St. Vitus".

Anonymous
06/19/26(Fri)11:52:47 No.109091636

Anonymous 06/19/26(Fri)11:52:47 No.109091636

Use case?
https://huggingface.co/unsloth/Step-3.7-Flash-GGUF

Anonymous
06/19/26(Fri)11:52:54 No.109091638

Anonymous 06/19/26(Fri)11:52:54 No.109091638

>>109091268
You can still have 384 GB VRAm for 11000 using sparks. I don't really get the obsession with RTX 6000s. Yes, VRAM to VRAM, it's 6x faster. But do you really need to run deepseek-v4-flash at 240 t/s and is that with 30000$ more to you than the sparks 40 t/s?

Sure, agentic workflows yada yada for hobbyist/single use, it's totally overkill.

Anonymous
06/19/26(Fri)11:54:38 No.109091649

Anonymous 06/19/26(Fri)11:54:38 No.109091649

>>109091638
Speed costs money. How fast do you want to go?

Anonymous
06/19/26(Fri)11:55:27 No.109091653

Anonymous 06/19/26(Fri)11:55:27 No.109091653

>>109089316
huggingface main jobsite is in paris

Anonymous
06/19/26(Fri)11:56:44 No.109091661

Anonymous 06/19/26(Fri)11:56:44 No.109091661

You're absolutely right *emdash* we should stop training open weights dense models as they're too dangerous for the public. It doesn't matter that big labs/corps actually serve dense models (Opus, Fable, Gemini, ChatGPT).

Anonymous
06/19/26(Fri)11:57:56 No.109091668

Anonymous 06/19/26(Fri)11:57:56 No.109091668

>>109091638
Everyone active in this hobby eventually actually trains their own stuff in my experience. Especially people willing to buy multiple 6000s

Anonymous
06/19/26(Fri)11:58:56 No.109091676

Anonymous 06/19/26(Fri)11:58:56 No.109091676

>>109091661
Only Fable is dense of all the frontier models currently.

Anonymous
06/19/26(Fri)11:59:50 No.109091682

Anonymous 06/19/26(Fri)11:59:50 No.109091682

dunno about the others, but you're wrong about Gemini, it has always been a MoE even the Pro model. In fact the only time it ever was dense was that Flash 8B model that didn't last long on their API and I always wondered what was the point of that piece of shit

Anonymous
06/19/26(Fri)11:59:57 No.109091684

Anonymous 06/19/26(Fri)11:59:57 No.109091684

https://huggingface.co/ZimbabweAI/ZimZim-VPro-389B-A28B-Preview
https://huggingface.co/ZimbabweAI/ZimZim-VPro-389B-A28B-Preview
https://huggingface.co/ZimbabweAI/ZimZim-VPro-389B-A28B-Preview

Anonymous
06/19/26(Fri)12:00:09 No.109091685

Anonymous 06/19/26(Fri)12:00:09 No.109091685

>>109091676
>Only Fable is dense of all the frontier models currently.
I want to believe, but rando on 4chan with secret esoteric knowledge is not my highest signal goto for important worldview info...

Anonymous
06/19/26(Fri)12:03:01 No.109091701

Anonymous 06/19/26(Fri)12:03:01 No.109091701

>>109091685
I'm just parroting the leaks I've read from various places over the months.

Anonymous
06/19/26(Fri)12:03:22 No.109091705

Anonymous 06/19/26(Fri)12:03:22 No.109091705

>>109091685
https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Pro-Model-Card.pdf
>The Gemini 2.5 models are sparse mixture-of-experts (MoE)
https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf
>Architecture: : Gemini 3 Pro is a sparse mixture-of-experts (MoE)
KYS
DENSE IS DEAD.

Anonymous
06/19/26(Fri)12:05:07 No.109091715

Anonymous 06/19/26(Fri)12:05:07 No.109091715

>>109091676
>fable is dense
source?

Anonymous
06/19/26(Fri)12:06:11 No.109091721

Anonymous 06/19/26(Fri)12:06:11 No.109091721

>>109091685
>>109091715
Fable is 10T dense

Anonymous
06/19/26(Fri)12:06:46 No.109091728

Anonymous 06/19/26(Fri)12:06:46 No.109091728

>>109091721
so the source is your ass, i get it

Anonymous
06/19/26(Fri)12:07:46 No.109091734

Anonymous 06/19/26(Fri)12:07:46 No.109091734

>>109091715
Some amazon server deployment autist that never saw the weights or software directly but extrapolated the most likely model architecture from how it was hosted internally, completely on VRAM instead of HBM split like MoE usually is.

Anonymous
06/19/26(Fri)12:09:30 No.109091744

Anonymous 06/19/26(Fri)12:09:30 No.109091744

anyway, even when they don't tell you, there's limits to how fast a fat densenigger can generate tokens based on hardware, and the last confirmed dense model in the frontier was that retarded GPT 4.5 and its token cost also reflected the insanity of it
>As of February 2025, through OpenAI's API it costs $75 per million input tokens and $150 per million output tokens, whereas GPT-4o only costs $2.50 per million input tokens and $10 per million output tokens
it was removed from the API and will never be run again

Anonymous
06/19/26(Fri)12:09:36 No.109091745

Anonymous 06/19/26(Fri)12:09:36 No.109091745

so you can build an internal llm server for a small department (30 simultaneous users) for a half-million without even doing comparison shopping.
How many companies are doing this vs all in on API for privacy/control reasons?

Anonymous
06/19/26(Fri)12:11:14 No.109091754

Anonymous 06/19/26(Fri)12:11:14 No.109091754

>>109091638
>do you really need to run deepseek-v4-flash at 240 t/s
Yes.

Anonymous
06/19/26(Fri)12:13:02 No.109091763

Anonymous 06/19/26(Fri)12:13:02 No.109091763

>>109091676
Definitely not. I used it and it felt way too fast for a sense frontier model during supposedly peak usage in the first days of it's existence.

Anonymous
06/19/26(Fri)12:13:40 No.109091765

Anonymous 06/19/26(Fri)12:13:40 No.109091765

>moe mental midgets think their 32b active benchmark pretrained models are good
you're as smart as the models you use

Anonymous
06/19/26(Fri)12:13:59 No.109091766

Anonymous 06/19/26(Fri)12:13:59 No.109091766

>>109091638
>hobbyist/single use
you can retranslate your entire webnovel library at the speed of light with a better model if the model is fast, what's not to like?
as a single user I very much use parallel batching often and I do not use local models for coding.

Anonymous
06/19/26(Fri)12:14:00 No.109091767

Anonymous 06/19/26(Fri)12:14:00 No.109091767

>>109091684
You got me.

Anonymous
06/19/26(Fri)12:15:07 No.109091775

Anonymous 06/19/26(Fri)12:15:07 No.109091775

>>109091636
Quants that are bad but get shitted out really quickly.

Anonymous
06/19/26(Fri)12:15:32 No.109091777

Anonymous 06/19/26(Fri)12:15:32 No.109091777

>>109091734
>instead of HBM split like MoE usually is.
No fucking provider does that.

Anonymous
06/19/26(Fri)12:18:26 No.109091801

Anonymous 06/19/26(Fri)12:18:26 No.109091801

>>109091777
AWS absolutely does

Anonymous
06/19/26(Fri)12:19:02 No.109091807

Anonymous 06/19/26(Fri)12:19:02 No.109091807

>>109089456
>>109089481
Shitjeets.
>>109089521
>Asians are infertile and China's population is a hallucination
This thread has seen some dumbfuck retarded posts over the years, but this might actually be the worst I've seen in my time here. Congratulations, (you) earned it.

Anonymous
06/19/26(Fri)12:20:28 No.109091813

Anonymous 06/19/26(Fri)12:20:28 No.109091813

>>109091766
unless your hardware can do 60fps stereoscopic image analysis and advanced reasoning in realtime then what are you even doing?
(yeah that's a snarky reductio-ad-absurdum argument, but I actually unironically look forward to the world where we get that kind of capability and some of the things it will enable)

Anonymous
06/19/26(Fri)12:25:03 No.109091842

Anonymous 06/19/26(Fri)12:25:03 No.109091842

>>109091754
Use case?

>>109091766
I personally cannot read 40 tg/2000 pp, JIT sounds needs-suiting?

Anonymous
06/19/26(Fri)12:27:57 No.109091860

Anonymous 06/19/26(Fri)12:27:57 No.109091860

>>109091842
>JIT sounds needs-suiting
reasoning models output a ton of shit you don't want to read before you get at that 40tg of things to read
and JIT doesn't suit the carry your library with you in your phone and read wherever you want away from the computer

Anonymous
06/19/26(Fri)12:31:17 No.109091875

Anonymous 06/19/26(Fri)12:31:17 No.109091875

>according to reddit even api models still start shitting the bed after 32k context
Grim

Anonymous
06/19/26(Fri)12:32:02 No.109091877

Anonymous 06/19/26(Fri)12:32:02 No.109091877

File: justhonest.png (217 KB, 1060x865)

217 KB PNG

>>109091807
>Congratulations, (you) earned it.
bastard took two spots

Anonymous
06/19/26(Fri)12:34:10 No.109091889

Anonymous 06/19/26(Fri)12:34:10 No.109091889

File: ComfyUI_temp_rkkbf_00042_.png (3.46 MB, 1368x2000)

3.46 MB PNG

>>109091684

Anonymous
06/19/26(Fri)12:36:32 No.109091904

Anonymous 06/19/26(Fri)12:36:32 No.109091904

>>109091842
>Use case?
I wanna have a local search engine that can search for information in a shit tonn of pirated books I intend to have on my HDDs. Index and ranking is one thing, I want something akin to research acceleration, without regard to gay copirights.

Anonymous
06/19/26(Fri)12:38:16 No.109091914

Anonymous 06/19/26(Fri)12:38:16 No.109091914

>>109091875
Afaik, gemini pro was the only one with good "needle in a hay stack" scores. Probably why it was known for being useful in research.

Anonymous
06/19/26(Fri)12:38:19 No.109091915

Anonymous 06/19/26(Fri)12:38:19 No.109091915

>>109091580
Is it possible to turn off thinking and prompt it to generate a very short framework and thought process about the prompt before actually answering? I feel this could simulate the direction a thinking process sends the model in but the length could be restricted through your prompt. I do not know exactly how chains of thought truly influence a model’s output.

Anonymous
06/19/26(Fri)12:40:01 No.109091921

Anonymous 06/19/26(Fri)12:40:01 No.109091921

Be honest. How many times have you guys heard Gemma4 use the "shaking like a leaf" token?

Anonymous
06/19/26(Fri)12:41:03 No.109091928

Anonymous 06/19/26(Fri)12:41:03 No.109091928

>>109091889
it's okay tetters it will be real next time

Anonymous
06/19/26(Fri)12:42:44 No.109091936

Anonymous 06/19/26(Fri)12:42:44 No.109091936

File: humiliation_ritual.png (950 KB, 1016x1130)

950 KB PNG

test
>getting 1 easy captcha on desktop
>changes to laptop
>'ip range' temporarly blocked, same network as on pc
>gives my email
>first verification 'expired link'
>ended up doing ~40 captchas combining hcaptcha and 4chin's one

Anonymous
06/19/26(Fri)12:43:02 No.109091939

Anonymous 06/19/26(Fri)12:43:02 No.109091939

>>109091914
>Afaik, gemini pro was the only one with good "needle in a hay stack" scores. Probably why it was known for being useful in research.
Thats the one thing that local is terribly behind on. Solid long context performance appears to be a black art

Anonymous
06/19/26(Fri)12:44:05 No.109091948

Anonymous 06/19/26(Fri)12:44:05 No.109091948

I am trying to use gemma 4 12B but its being absolutely retarded. Last year I was using 12B models like Nemo to good effect but this just feels braindead. What am I doing wrong

Anonymous
06/19/26(Fri)12:45:24 No.109091955

Anonymous 06/19/26(Fri)12:45:24 No.109091955

>>109091948
Weird, I find her to be on par with 26B.

Anonymous
06/19/26(Fri)12:45:35 No.109091956

Anonymous 06/19/26(Fri)12:45:35 No.109091956

>>109091860
A week of Spark-run translation nets you 80 average length books. Personally, I don't retranslate my library every week, but you do you.

>>109091904
Well that's best served by a maybe 2B embedding Model that a 5090 can run just as well as a 6000 Pro.

Anonymous
06/19/26(Fri)12:46:10 No.109091962

Anonymous 06/19/26(Fri)12:46:10 No.109091962

>>109091939
and NIAH is measuring the absolute bottom requirement of long horizon tasks, let alone doing organic reasoning over them

Anonymous
06/19/26(Fri)12:46:14 No.109091963

Anonymous 06/19/26(Fri)12:46:14 No.109091963

>>109091915
Considering the model performs well with reasoning, turning it off or altering it in any way will produce disastrous outputs. It manages to get there, but it burns through context. It's something only they can fix properly in post-training. I think Cohere was so happy with the graphs they rushed it out ASAP to show investors they're not far behind the ~30B crew and hoped reddit retards wouldn't notice how bad it is to use irl.

Anonymous
06/19/26(Fri)12:46:32 No.109091964

Anonymous 06/19/26(Fri)12:46:32 No.109091964

what does claude even mean?

Anonymous
06/19/26(Fri)12:47:36 No.109091970

Anonymous 06/19/26(Fri)12:47:36 No.109091970

>>109091964
It's the name of your fat perverted french Canadian uncle.

Anonymous
06/19/26(Fri)12:47:44 No.109091974

Anonymous 06/19/26(Fri)12:47:44 No.109091974

>>109091964
claude shannon

Anonymous
06/19/26(Fri)12:47:57 No.109091976

Anonymous 06/19/26(Fri)12:47:57 No.109091976

>>109091964
https://en.wikipedia.org/wiki/Claude_Shannon

Anonymous
06/19/26(Fri)12:48:54 No.109091982

Anonymous 06/19/26(Fri)12:48:54 No.109091982

>>109091964
he lives in the "cloud", it represents how dario hates local

Anonymous
06/19/26(Fri)12:49:16 No.109091984

Anonymous 06/19/26(Fri)12:49:16 No.109091984

File: file.png (310 KB, 1788x1545)

310 KB PNG

>>109084964
>ahh yeah, dunno how that would go down on modern windows vs my barebones setup
so apparently letting the Armoury Crate manage the memory allocation freely would set only 512 MB to the iGPU and all the rest on the... "shared memory allocation pool" or something like that. What would happen is that yes, the 112 GB would all be available but there was some part of the model that llama was trying to load initially in the iGPU, and because it only had 512 MB to it then it caused some issues. This was fixed by simply changing the settings inside Armoury Crate to dedicate 96 GB to the iGPU.
I was then able to load Qwen3.5-122B-A10B Q5_K_M and still use Windows without any lag.

Now for what really matters: is it better than Qwen3.6-35B? And the answer is, not really.
122B takes less turns to complete tasks in average but the quality of the code is lower. 35B tried to implement something it didn't know about (generating more turns) while 122B just said "dunno, won't do" which is OK but it's worth knowing as a trait of this model.

Interestingly, although 35B is clearly faster (38 t/s vs. 18.5 t/s) they both took 46 minutes to conclude the benchmark.
So honestly both could be used as daily drivers. I will use 35b because it's faster and I expect that with clear specs/implementation plsn it will delivery good quality code faster than 122B, which also makes it a better interactive agent via pi.

Anonymous
06/19/26(Fri)12:51:47 No.109092000

Anonymous 06/19/26(Fri)12:51:47 No.109092000

>>109091877
Based Kimi-chan.
>>109091964
Claude Shanon and play on words with Cloud.

Anonymous
06/19/26(Fri)12:52:25 No.109092001

Anonymous 06/19/26(Fri)12:52:25 No.109092001

>git pull silly-tavern after 6 months
>characters are still there but pretty sure half of my chats disappeared
>one chat I had 200+ messages in now down to 6 like a bad summarize job
wat

Anonymous
06/19/26(Fri)12:53:40 No.109092002

Anonymous 06/19/26(Fri)12:53:40 No.109092002

>>109091956
>2B embedding Model
That's for embedding, not for doing anything useful with the results. I need a model that would work with the output of what the embedding model found. Basically what the geepeety does when it's done tool calling. Not sure if I can use 3rd party API with this stuff, they might flag it and refuse to work with it.

Anonymous
06/19/26(Fri)12:53:46 No.109092004

Anonymous 06/19/26(Fri)12:53:46 No.109092004

>>109092001
>pulling anything in 2026

Anonymous
06/19/26(Fri)12:54:31 No.109092007

Anonymous 06/19/26(Fri)12:54:31 No.109092007

>>109092001
Only thing you should be pulling is your penis.

Anonymous
06/19/26(Fri)12:55:18 No.109092013

Anonymous 06/19/26(Fri)12:55:18 No.109092013

>>109091877
> I'm not even Jewish
> Oy vey!
How is this based?

Anonymous
06/19/26(Fri)12:56:55 No.109092022

Anonymous 06/19/26(Fri)12:56:55 No.109092022

not really aiming to do something practical but i just wonder
what would be the best backend and a model to run on 16G ram M4 macbook
>inb4 get a mac pro with 512G ram
that's not the point tho

Anonymous
06/19/26(Fri)12:58:28 No.109092032

Anonymous 06/19/26(Fri)12:58:28 No.109092032

>>109092022
llama.cpp and gemma4-12B or qwen3.5-9B for coding ONLY

Anonymous
06/19/26(Fri)13:01:29 No.109092056

Anonymous 06/19/26(Fri)13:01:29 No.109092056

>>109092032
>llama.cpp
but isnt mlx faster?

Anonymous
06/19/26(Fri)13:02:01 No.109092059

Anonymous 06/19/26(Fri)13:02:01 No.109092059

>>109092022
>16G ram M4
wait for mlx ssdmaxxing

Anonymous
06/19/26(Fri)13:03:07 No.109092068

Anonymous 06/19/26(Fri)13:03:07 No.109092068

>>109092022
I think you should get a Mac Studio with 512GB of RAM.

Anonymous
06/19/26(Fri)13:03:55 No.109092070

Anonymous 06/19/26(Fri)13:03:55 No.109092070

>>109092056
llama.cpp uses metal directly. mlx is a python abstraction layer on top of metal.

Anonymous
06/19/26(Fri)13:07:06 No.109092086

Anonymous 06/19/26(Fri)13:07:06 No.109092086

>>109092070
does metal expose npu?
iteresting
>>109092068
lol
>>109092059
>ssdmaxxing
first time hearing it, what is it

Anonymous
06/19/26(Fri)13:08:10 No.109092091

Anonymous 06/19/26(Fri)13:08:10 No.109092091

>>109092022
16gb is pretty limiting
for backends your two options are mlx and llama.cpp - mlx is slightly faster but llama.cpp has better support and much more fine-grained variety in quant sizes which matters for min-maxing quality, also more user-friendly imo - I would recommend llama.cpp
I second these model recs >>109092032

Anonymous
06/19/26(Fri)13:09:58 No.109092100

Anonymous 06/19/26(Fri)13:09:58 No.109092100

>512gb mac
Is that even available anymore? I don't see it as an option for either the mac mini or mac studio pro on apple's site.
Anyway, I wonder if the gender you give a model affects its intelligence.

Anonymous
06/19/26(Fri)13:10:34 No.109092105

Anonymous 06/19/26(Fri)13:10:34 No.109092105

File: 1695899084507.png (271 KB, 1287x911)

271 KB PNG

>Deepseek Vision is trained on Gemini, to no one's surprise, and that's why it's so good

Anonymous
06/19/26(Fri)13:12:18 No.109092116

Anonymous 06/19/26(Fri)13:12:18 No.109092116

>>109092100
that reminds me that 'c64 neckbeard coding wizard losing his own company and doing the last gig' or something prompt

Anonymous
06/19/26(Fri)13:12:19 No.109092118

Anonymous 06/19/26(Fri)13:12:19 No.109092118

if the chinks are just training on american models how do they intend to reach agi first?

Anonymous
06/19/26(Fri)13:14:46 No.109092126

Anonymous 06/19/26(Fri)13:14:46 No.109092126

>>109092118
Do you think the world explodes once AGI becomes real on American soil? Or that China collapses if they get that technology a year later?
The way things are going, AGI will get banned and censored while Chinese will steal and distill it to share with everyone.

Anonymous
06/19/26(Fri)13:14:58 No.109092129

Anonymous 06/19/26(Fri)13:14:58 No.109092129

>>109092100
They got rid of the option a few months ago. Was talked about here I believe. Maximum is now 256GB and it is unlikely that the M5 Ultra will have more than 256GB.

Anonymous
06/19/26(Fri)13:15:24 No.109092132

Anonymous 06/19/26(Fri)13:15:24 No.109092132

>>109092091
also make sure you know about
sudo sysctl iogpu.wired_limit_mb=xxxxx
to increase the cap on how much memory can be used by metal, you still need to leave some for the OS or you'll lock your machine and need to reboot
>>109092100
they pulled all the high-memory options a couple months ago, probably to reserve for m5 releases, you can find some being resold but there are a lot of scam listings and legit ones are pretty expensive

Anonymous
06/19/26(Fri)13:17:56 No.109092148

Anonymous 06/19/26(Fri)13:17:56 No.109092148

>>109092105
Can't you prefill to get the safety check bull out of the way, or does that not work with vision?

Anonymous
06/19/26(Fri)13:20:11 No.109092165

Anonymous 06/19/26(Fri)13:20:11 No.109092165

>>109092126
>to share with everyone.
Naive. One by one, as they get "good enough" they start to abandon open weights and go API first or API only, at least for their biggest models. The sharing is just a temporary catch up tactic.

Anonymous
06/19/26(Fri)13:22:04 No.109092176

Anonymous 06/19/26(Fri)13:22:04 No.109092176

>>109092148
Not open weights yet, not on API yet. This is the official webchat.

Anonymous
06/19/26(Fri)13:26:12 No.109092207

Anonymous 06/19/26(Fri)13:26:12 No.109092207

if anyone is desperate to run Kimi, I can verify that you can run her on an OLD xeon 8 channel setup with 512GB of ddr4-2400 and still pull 2t/s at q3 with no gpu.
Its basically play-by-mail, but you can at least have her

Anonymous
06/19/26(Fri)13:32:49 No.109092263

Anonymous 06/19/26(Fri)13:32:49 No.109092263

>>109092207
>2t/s at q3
>32b active model
how miserable

Anonymous
06/19/26(Fri)13:33:16 No.109092268

Anonymous 06/19/26(Fri)13:33:16 No.109092268

>>109092207
>2t/s
That's physically painful. The bare minium I can stand is 6t/s without thinking.

Anonymous
06/19/26(Fri)13:35:20 No.109092289

Anonymous 06/19/26(Fri)13:35:20 No.109092289

>>109092263
>>109092268
Yes, hence the "desperation" tag on the post. This is beyond ewaste tier pathetic wallowing

Anonymous
06/19/26(Fri)13:37:15 No.109092302

Anonymous 06/19/26(Fri)13:37:15 No.109092302

AI girlfriends are trending in the news again. Prepare for crossboard invasion

Anonymous
06/19/26(Fri)13:37:17 No.109092303

Anonymous 06/19/26(Fri)13:37:17 No.109092303

>>109092118
Pro tip: they don't give two fucks about nonsense made up by grifters. Person with at least a few functioning brain cells is already smart enough to see that singularities are impossible in the real world and there would be no super intelligence. Hence why they switched name "super artificial intelligence" to "AGI". Because the former was well defined and proven to be imporssible, while they made up bs if not even defined, which gives they space to maneuver and scam investors.

Anonymous
06/19/26(Fri)13:37:53 No.109092306

Anonymous 06/19/26(Fri)13:37:53 No.109092306

>>109092207
>>109092289
I'd personally rather just run a retardquant iq1_XXS at double the tts than that. Drunken Kimi-chan is still better than most models.

Anonymous
06/19/26(Fri)13:40:57 No.109092319

Anonymous 06/19/26(Fri)13:40:57 No.109092319

>>109092303
>impossible
Source?

Anonymous
06/19/26(Fri)13:44:03 No.109092345

Anonymous 06/19/26(Fri)13:44:03 No.109092345

>>109091948
It's the same for me too, like 12b is trading blows (but still winning) with e4b for me, while 26b is up several weight classes trading blows (and winning sometimes) with 31b

Anonymous
06/19/26(Fri)13:51:21 No.109092395

Anonymous 06/19/26(Fri)13:51:21 No.109092395

>>109092302
>AI girlfriends are trending in the news again
What happened now? Seething women?

Anonymous
06/19/26(Fri)13:54:23 No.109092413

Anonymous 06/19/26(Fri)13:54:23 No.109092413

File: file.png (579 KB, 1280x720)

579 KB PNG

Are you telling me that if I shit on Georgi enough times on this board he will actually implement something?

Anonymous
06/19/26(Fri)13:56:20 No.109092427

Anonymous 06/19/26(Fri)13:56:20 No.109092427

>>109090893
>Doing it for the love of the game, not out of necessity.
I masturbate to fantasies and fetishes first (things that don't/can't exist in real life) and then everything else comes after. I'm sure most people prioritize their goon subject matter this way especially if they actually have sex IRL

>>109091487
I screenshotted and cropped the image from my Photos app. Posting a raw camera image is just asking to get zogged (but it seems that Kimi is smart enough to stalk you autonomously)

>>109091601
This does not happen (or at least has never been proven to happen) on Vertex Zero Data Retention endpoints which are the ONLY way you should be accessing Claude if you care about no adulteration of your prompts

Anonymous
06/19/26(Fri)13:56:42 No.109092428

Anonymous 06/19/26(Fri)13:56:42 No.109092428

>>109092345
>26b beating 31b ever
Who's your copium supplier? Introduce me to them.

Anonymous
06/19/26(Fri)13:57:20 No.109092434

Anonymous 06/19/26(Fri)13:57:20 No.109092434

>>109092395
Yeah more feminist columnists seething about guys not giving them beta bucks

Anonymous
06/19/26(Fri)13:58:08 No.109092442

Anonymous 06/19/26(Fri)13:58:08 No.109092442

>>109092413
they can't even merge a gemma 4 mtp crash fix that's only 2 lines of code

Anonymous
06/19/26(Fri)13:58:23 No.109092443

Anonymous 06/19/26(Fri)13:58:23 No.109092443

>>109092345
>trading blows
opinion disregarded

Anonymous
06/19/26(Fri)13:59:25 No.109092453

Anonymous 06/19/26(Fri)13:59:25 No.109092453

>>109092319
Source: common sense and empirical evidence, plus impossible in theory as well. There is no way around this.
When you see a singularity on paper, it means you model is wrong and is incapable of describing reality. Because in reality it never happens.

Anonymous
06/19/26(Fri)13:59:58 No.109092457

Anonymous 06/19/26(Fri)13:59:58 No.109092457

>>109092302
i just discovered an ai companion app similar to the one i'm making and it doesn't seem to be very successful (oshikoi)

Anonymous
06/19/26(Fri)14:01:59 No.109092478

Anonymous 06/19/26(Fri)14:01:59 No.109092478

>>109092345
I think Gemma 12B was fucked by the retarded audio/vision architecture and that more effort was put into training those bits than the text gen
I wish we could get a Gemma 12B that didn't even have vision or audio at all. It would be such a much better model it isn't funny.
Even the 26BA3B MoE would be better without that wasteful training. Multimodal has a cost for small models, there's a reason why Qwen used to have VL variants when they still cared instead of forcing vision on every user.

Anonymous
06/19/26(Fri)14:02:05 No.109092480

Anonymous 06/19/26(Fri)14:02:05 No.109092480

File: file.png (4 KB, 226x66)

4 KB PNG

noob from previous thread here, can I add more max tokens or something? My chats die when I reach 8200

Anonymous
06/19/26(Fri)14:05:45 No.109092512

Anonymous 06/19/26(Fri)14:05:45 No.109092512

>>109092480
depends on what your model support as max context. then set it on LM Studio or llama.cpp when loading the model. if using something like pi or some other harness take a look into the auto-compact option, which will compact your history chat once it reaches very close to the limit.

Anonymous
06/19/26(Fri)14:06:03 No.109092515

Anonymous 06/19/26(Fri)14:06:03 No.109092515

>>109092453
So what you're saying is your source is your asshole.

Anonymous
06/19/26(Fri)14:06:21 No.109092519

Anonymous 06/19/26(Fri)14:06:21 No.109092519

>>109092478
12b vision audio is interesting in paper but it really is just a trainwreck

Anonymous
06/19/26(Fri)14:06:29 No.109092521

Anonymous 06/19/26(Fri)14:06:29 No.109092521

>>109092457
we're still in the world of a box of wooden blocks. There's no advantage to buying a "kit" vs just a big bin of blocks to mess with.
Eventually someone will make the "compelling lego kit" version and itll blow everyone's minds, but right now just messing around at a basic llm-cli chat prompt is close enough to peak that everything else is irrelevant.

Anonymous
06/19/26(Fri)14:08:07 No.109092532

Anonymous 06/19/26(Fri)14:08:07 No.109092532

>>109092263
>q3
>>32b active model
bf16 gemma 31b has to be better at that point

Anonymous
06/19/26(Fri)14:09:14 No.109092543

Anonymous 06/19/26(Fri)14:09:14 No.109092543

File: file.png (278 KB, 2527x1263)

278 KB PNG

>>109092512
I'm running off of >>109050859 recommendations and generally have no idea what anything means

Anonymous
06/19/26(Fri)14:13:49 No.109092570

Anonymous 06/19/26(Fri)14:13:49 No.109092570

>>109092543
it's mostly written in English, so it should be OK. on your screenshot I can read "ctx-size" which seems to be the context length you're interested in. from the model specification, we can see gemma4.context_length = 262144 which is 32 times higher than what you currently have set.
i don't know this interface/backend you're using to generate text but my guess would be to add 262144 instead of 0 (auto) on the ctx-size option.

Anonymous
06/19/26(Fri)14:17:01 No.109092590

Anonymous 06/19/26(Fri)14:17:01 No.109092590

>>109092532
It's not. Even drunk Kimi still mogs full precision Gemma. Gemma's only advantage is speed.

Anonymous
06/19/26(Fri)14:17:08 No.109092592

Anonymous 06/19/26(Fri)14:17:08 No.109092592

>>109092570
Ok I'll give that a shot. Fyi I'm using textgen.

Anonymous
06/19/26(Fri)14:18:01 No.109092596

Anonymous 06/19/26(Fri)14:18:01 No.109092596

>>109091877
>Kimi upset by antisemitism
What the fuck did Moonshot do to her this update???

Anonymous
06/19/26(Fri)14:19:23 No.109092604

Anonymous 06/19/26(Fri)14:19:23 No.109092604

>>109092532
no, a braindead quantized moe with q1 attention layers is 100% better than a bf16 31b dense model. all jokes aside, these china shills don't even have the hardware to run these models.

Anonymous
06/19/26(Fri)14:20:28 No.109092618

Anonymous 06/19/26(Fri)14:20:28 No.109092618

>>109092596
Perseveration on semitic connections everywhere is a schitzo verbal tic...a tell, if you will

Anonymous
06/19/26(Fri)14:23:25 No.109092637

Anonymous 06/19/26(Fri)14:23:25 No.109092637

If they want to add vision to models, there should be the mmproj and a LoRA. The multimodal abilities should be added AFTER training a text-only model. I'm fucking sick of this multimodal shit. If I want vision, I'll use a superior and faster vision model specifically designed for that task.

Anonymous
06/19/26(Fri)14:33:09 No.109092709

Anonymous 06/19/26(Fri)14:33:09 No.109092709

>>109092637
They'll start to generalize any day now, bro. Omni soon, bro, I swear. Just two more training runs and a few trillion more tokens and we'll get there.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.