/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/16/25(Tue)17:34:45 No.106608204

File: 2025-09-02_mikuteto_2.5-flash.jpg (274 KB, 1024x1024)

274 KB JPG

/lmg/ - Local Models General Anonymous 09/16/25(Tue)17:34:45 No.106608204 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106599382 & >>106593104

►News
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B
>(09/14) model : add grok-2 support #15539 merged: https://github.com/ggml-org/llama.cpp/pull/15539
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) Ling & Ring mini 2.0 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/16/25(Tue)17:35:03 No.106608208

Anonymous 09/16/25(Tue)17:35:03 No.106608208

File: __hatsune_miku_and_kasane(...).jpg (229 KB, 2048x1710)

229 KB JPG

►Recent Highlights from the Previous Thread: >>106599382

--Paper: "My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community:
>106605381 >106606464
--Hardware constraints and optimization for running quantized AI models:
>106600066 >106600079 >106600094 >106600120 >106600151 >106600165 >106600194 >106600299 >106600344 >106600379
--Optimizing model speed and memory usage with different frameworks and hardware setups:
>106601526 >106601576 >106601975 >106602113 >106602140 >106602344 >106602359 >106602696 >106602812 >106602446
--Temperature's role in balancing model creativity and coherence during inference:
>106600178 >106600216 >106600983
--Seeking adversarial dataset for Qwen3 model testing and finetuning:
>106602982 >106603226 >106603255 >106605050
--Debating generalization in large models through map-based reasoning benchmarks:
>106599556 >106599613 >106599680 >106601298 >106599875 >106599939
>106599987
--MobileLLM speculative decoding feasibility and MoE expert activation customization:
>106603577 >106604019 >106604068
--Reasoning as a multi-faceted solution for LLM limitations:
>106601235
--Google DeepMind's use of Generative Data Poisoning to enhance model robustness:
>106604136 >106604281 >106604301 >106604354 >106606204
--VoxCPM-0.5B TTS model features phoneme input and text normalization:
>106605383 >106605535 >106605647 >106606175
--Balancing temperature and sampling parameters:
>106602247 >106602253 >106602278 >106602282 >106602959
--LLM ticket resolver: Optimizing multi-step historical matching process:
>106604656 >106605075 >106605954 >106606135
--Prioritizing incremental AI gains over foundational research:
>106604653 >106604679 >106604736
--AI fails Holocaust comprehension test, raises ethical concerns:
>106604146
--Miku (free space):
>106599464 >106602772 >106602774 >106603109

►Recent Highlight Posts from the Previous Thread: >>106599386

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/16/25(Tue)17:38:21 No.106608247

Anonymous 09/16/25(Tue)17:38:21 No.106608247

mistral large 3 will be real

Anonymous
09/16/25(Tue)17:39:03 No.106608259

Anonymous 09/16/25(Tue)17:39:03 No.106608259

File: 1752472682330720.png (2 MB, 1328x1328)

2 MB PNG

Anonymous
09/16/25(Tue)17:40:28 No.106608274

Anonymous 09/16/25(Tue)17:40:28 No.106608274

the general interest in llms is rapidly dwindling
progress is stagnating like crazy
it's truly never been more over

Anonymous
09/16/25(Tue)17:40:49 No.106608279

Anonymous 09/16/25(Tue)17:40:49 No.106608279

>>106608247
Wish It. Want It. Do It.

Anonymous
09/16/25(Tue)17:41:22 No.106608282

Anonymous 09/16/25(Tue)17:41:22 No.106608282

>>106608207
MC62-G40 is a workstation board. One of the pcie slots only carries x8. ROMED8T-2T's pcie slots are all x16 (a proper server board).

Anonymous
09/16/25(Tue)17:41:36 No.106608286

Anonymous 09/16/25(Tue)17:41:36 No.106608286

Apparently Qwen3-Max is confirmed to be bigger than 1T. Lmao.

Anonymous
09/16/25(Tue)17:42:06 No.106608294

Anonymous 09/16/25(Tue)17:42:06 No.106608294

>>106608274
> the shitposters have started

Anonymous
09/16/25(Tue)17:43:12 No.106608306

Anonymous 09/16/25(Tue)17:43:12 No.106608306

>>106608282
the asrock board also has SAS ports which can be converted into x8 lanes
https://www.ioi.com.tw/products/proddetail.aspx?AppID=1008&CatID=116&HostID=2081&ProdID=1130033

Anonymous
09/16/25(Tue)17:47:02 No.106608344

Anonymous 09/16/25(Tue)17:47:02 No.106608344

goofs never ever

Anonymous
09/16/25(Tue)17:47:36 No.106608351

Anonymous 09/16/25(Tue)17:47:36 No.106608351

File: poal.png (347 KB, 640x520)

347 KB PNG

>>106608204
Vote pls
https://poal.me/4jr9sh

Anonymous
09/16/25(Tue)17:47:58 No.106608359

Anonymous 09/16/25(Tue)17:47:58 No.106608359

>>106608274
you're absolutely right!

Anonymous
09/16/25(Tue)17:50:08 No.106608383

Anonymous 09/16/25(Tue)17:50:08 No.106608383

>>106608351
Just use kobold, there is no need to make a poll.

Anonymous
09/16/25(Tue)17:50:11 No.106608384

Anonymous 09/16/25(Tue)17:50:11 No.106608384

>>106608351
kobold zoomer

Anonymous
09/16/25(Tue)17:52:59 No.106608401

Anonymous 09/16/25(Tue)17:52:59 No.106608401

>>106608341
Should I use Roo instead?

Anonymous
09/16/25(Tue)17:57:40 No.106608451

Anonymous 09/16/25(Tue)17:57:40 No.106608451

>>106608401
Roo is better but they're nearly identical. A model that wasn't trained on tool calling wouldn't work better on either one. You should use Qwen Coder 30B instead. Coding focused and trained on tool calling.

Anonymous
09/16/25(Tue)17:59:36 No.106608468

Anonymous 09/16/25(Tue)17:59:36 No.106608468

detoxified migu

Anonymous
09/16/25(Tue)18:00:15 No.106608476

Anonymous 09/16/25(Tue)18:00:15 No.106608476

File: 4725243619479.png (20 KB, 606x158)

20 KB PNG

>>106608007
>use Qwen 3
It scares me.

Anonymous
09/16/25(Tue)18:00:21 No.106608478

Anonymous 09/16/25(Tue)18:00:21 No.106608478

>>106608401
You might need to fiddle a bit with the rules to make it work.

Anonymous
09/16/25(Tue)18:01:06 No.106608484

Anonymous 09/16/25(Tue)18:01:06 No.106608484

>>106608274
Okay, so how long until a 3090 is $200, and A100 80gb PCIe at 2k?

Anonymous
09/16/25(Tue)18:03:25 No.106608501

Anonymous 09/16/25(Tue)18:03:25 No.106608501

>>106608351
Kobold: Download exe, download appropriately sized model, drag model onto exe.

Anonymous
09/16/25(Tue)18:05:17 No.106608516

Anonymous 09/16/25(Tue)18:05:17 No.106608516

>>106608468
Intimate relations with Jane Doe

Anonymous
09/16/25(Tue)18:45:06 No.106608822

Anonymous 09/16/25(Tue)18:45:06 No.106608822

>>106608294
I am not shitposting anymore. I hate you all.

Anonymous
09/16/25(Tue)18:46:31 No.106608833

Anonymous 09/16/25(Tue)18:46:31 No.106608833

I don't really get the point or appeal of cooming to text, personally.

Anonymous
09/16/25(Tue)18:48:07 No.106608845

Anonymous 09/16/25(Tue)18:48:07 No.106608845

>106608822
cute tsundere is cute

Anonymous
09/16/25(Tue)19:08:49 No.106609009

Anonymous 09/16/25(Tue)19:08:49 No.106609009

File: Brainlet-blocks-meme-3.jpg (65 KB, 1200x514)

65 KB JPG

I need/want a sophisticated note taking solution that keeps me reminding of shit that I have todo powered by a language model - what would be a privacy safe way to do this?

Anonymous
09/16/25(Tue)19:10:19 No.106609020

Anonymous 09/16/25(Tue)19:10:19 No.106609020

>>106608484
>how long until a 3090 is $200
Never

Anonymous
09/16/25(Tue)19:13:00 No.106609041

Anonymous 09/16/25(Tue)19:13:00 No.106609041

>>106609009
Just use a fucking basic calendar or todo program. Not everything has to be LLM powered. Failing that, you'd have to build it yourself.

Anonymous
09/16/25(Tue)19:16:26 No.106609076

Anonymous 09/16/25(Tue)19:16:26 No.106609076

>>106609020
Pretty much this.
200 dollars is e-waste price.
The only thing that really outclasses a 3090 is a 4090 or 5090 (Due to nshitia being stingy with VRAM). 4070 TI Super and 5070TI edge it out in compute performance quite handily but lack the VRAM to be useful for machine learning stuff.

Anonymous
09/16/25(Tue)19:17:54 No.106609091

Anonymous 09/16/25(Tue)19:17:54 No.106609091

>>106608833
You are too malebrained for this thread then.

Anonymous
09/16/25(Tue)19:18:31 No.106609098

Anonymous 09/16/25(Tue)19:18:31 No.106609098

>>106609009
Post-it notes on your monitor. Not on the bezels or whatever. On the actual screen, covering stuff. You're not allowed to remove them until you get them done.

Anonymous
09/16/25(Tue)19:20:52 No.106609126

Anonymous 09/16/25(Tue)19:20:52 No.106609126

>>106609091
>malebrained
Moooom, the kids are making up words again.

Anonymous
09/16/25(Tue)19:23:08 No.106609149

Anonymous 09/16/25(Tue)19:23:08 No.106609149

>>106608833
do you understand the point or appeal of adult fanfiction or erotic romance novels?

Anonymous
09/16/25(Tue)19:23:15 No.106609152

Anonymous 09/16/25(Tue)19:23:15 No.106609152

>>106609126
Anon is flying with the implication that text cooming is actually a female-biased activity. I'd argue that the bias is maybe 60/40 though. It's not substantial enough of a bias to genuinely call it a female thing. But if you go look on character card/prompt sites you'll find a lot of "KAZUHA FROM GENSHIN IMPACT SITS BEHIND YOU IN MATH CLASS" femcel shovelware garbage.

Anonymous
09/16/25(Tue)19:33:42 No.106609249

Anonymous 09/16/25(Tue)19:33:42 No.106609249

https://www.meta.com/en-gb/connect/#ways-to-watch
Superintelligent Llama 4.20 coming tomorrow?

Anonymous
09/16/25(Tue)19:38:07 No.106609285

Anonymous 09/16/25(Tue)19:38:07 No.106609285

>>106609149
>and then zanzibart inserted his barbed cock onto her meatflaps as she moaned in arabic pleasure...
Nyo.

Anonymous
09/16/25(Tue)19:38:56 No.106609291

Anonymous 09/16/25(Tue)19:38:56 No.106609291

>>106609285
...hot

Anonymous
09/16/25(Tue)19:39:29 No.106609297

Anonymous 09/16/25(Tue)19:39:29 No.106609297

>>106609285
I came.

Anonymous
09/16/25(Tue)19:41:57 No.106609319

Anonymous 09/16/25(Tue)19:41:57 No.106609319

>>106609285
>and then he pissed in his little daughter's mouth
Better?

Anonymous
09/16/25(Tue)19:54:06 No.106609398

Anonymous 09/16/25(Tue)19:54:06 No.106609398

>>106609249
They replaced all the jeets with asians, didn't they? Might actually be decent.

Anonymous
09/16/25(Tue)19:55:37 No.106609417

Anonymous 09/16/25(Tue)19:55:37 No.106609417

Nemotron-H 47B not bad with a decent pre-think jailbreak. Good option for 2x24GB GPU folks.
ggufs have the absolute fucking wrong chat template in the metadata, though.

Anonymous
09/16/25(Tue)19:56:55 No.106609424

Anonymous 09/16/25(Tue)19:56:55 No.106609424

>>106609417
(Talking about the reasoning model of course)

Anonymous
09/16/25(Tue)19:57:20 No.106609427

Anonymous 09/16/25(Tue)19:57:20 No.106609427

>>106609417
Blessed be
>--jinja --chat-template-file

Anonymous
09/16/25(Tue)20:00:41 No.106609459

Anonymous 09/16/25(Tue)20:00:41 No.106609459

>>106609319
kinda, but no

Anonymous
09/16/25(Tue)20:11:35 No.106609557

Anonymous 09/16/25(Tue)20:11:35 No.106609557

>>106609427
Apparently llama.cpp only supports known shitja templates. That's kind of retarded.

Anonymous
09/16/25(Tue)20:13:49 No.106609578

Anonymous 09/16/25(Tue)20:13:49 No.106609578

>>106609557
Does it?
I thought you could just throw whatever template using a file, or inline in the command line.
I even "hardcoded" some prefills into templates using the above combination of commands.

Anonymous
09/16/25(Tue)20:17:37 No.106609604

Anonymous 09/16/25(Tue)20:17:37 No.106609604

>>106609578
Well I copied and unescaped the chat_template key straight off of the tokenizer config for the HF version of the model.
For some reason for NemotronH reasoning, despite the tokenizer containing special tokens for mistral format ([INST], etc) they went with
<SPECIAL_10>System
system message
<SPECIAL_11>User
Blah blah blah
<SPECIAL_11>Assistant
and <think> if reasoning true.
With <SPECIAL_11> also acting as EOT token.
It's like a retarded version of Tulu/Olmo format.

Anonymous
09/16/25(Tue)20:20:52 No.106609629

Anonymous 09/16/25(Tue)20:20:52 No.106609629

>>106609604
See if that template works if you throw it in here :
>https://huggingface.co/spaces/Xenova/jinja-playground
Just in case it's a question of formatting rather than straight up incompatibility.

Anonymous
09/16/25(Tue)20:26:16 No.106609658

Anonymous 09/16/25(Tue)20:26:16 No.106609658

>>106609249
we are so back
llama will take back its thrown as THE local model

Anonymous
09/16/25(Tue)20:27:42 No.106609664

Anonymous 09/16/25(Tue)20:27:42 No.106609664

File: ninja.png (33 KB, 973x841)

33 KB PNG

>>106609629
Seems I escaped everything correctly. So it's all gerganov's fault. Thanks for the help anyway.

Anonymous
09/16/25(Tue)20:31:11 No.106609689

Anonymous 09/16/25(Tue)20:31:11 No.106609689

File: 1743025587416931.gif (2.82 MB, 320x200)

2.82 MB GIF

>>106608204
>>106607654
If you got filtered by this shit then just say it

Anonymous
09/16/25(Tue)20:35:31 No.106609713

Anonymous 09/16/25(Tue)20:35:31 No.106609713

>>106609658
New Scout 4.2 will fit on a single (Mi350) GPU (Cluster)

Anonymous
09/16/25(Tue)20:36:01 No.106609725

Anonymous 09/16/25(Tue)20:36:01 No.106609725

File: 1751032265105018.jpg (226 KB, 828x546)

226 KB JPG

>>106609658
>take back its thrown

Anonymous
09/16/25(Tue)20:36:54 No.106609731

Anonymous 09/16/25(Tue)20:36:54 No.106609731

>>106609725
Hello sarrs please stop the racism to Indians. I am saying this as very fellow white man.

Anonymous
09/16/25(Tue)20:42:13 No.106609766

Anonymous 09/16/25(Tue)20:42:13 No.106609766

File: b898f40f573fc4a4a53900d95(...).jpg (141 KB, 1200x630)

141 KB JPG

>>106608484
If the Super refresh is true and vram is actually getting powercrept, this may actually happen in the long run, but short-term 3090 will become more expensive if games start to require >16GB of vram with 18gb and 24gb becoming mainstream

Anonymous
09/16/25(Tue)20:53:36 No.106609834

Anonymous 09/16/25(Tue)20:53:36 No.106609834

>>106609249
Odds of Meta owning up to the fuckup that was the initial Llama-4 launch?

Anonymous
09/16/25(Tue)20:56:10 No.106609855

Anonymous 09/16/25(Tue)20:56:10 No.106609855

>>106609834
No company will ever admit to a fuckup unless they're given a court order to do so

Anonymous
09/16/25(Tue)21:09:13 No.106609931

Anonymous 09/16/25(Tue)21:09:13 No.106609931

Why aren't you talking about this?

https://github.com/Alibaba-NLP/DeepResearch

DeepResearch but local and non-meme (apparently SOTA)

Anonymous
09/16/25(Tue)21:12:45 No.106609960

Anonymous 09/16/25(Tue)21:12:45 No.106609960

>>106609931
I was waiting for the goofs

Anonymous
09/16/25(Tue)21:19:01 No.106610001

Anonymous 09/16/25(Tue)21:19:01 No.106610001

>>106609960
There's multiple sources with GOOFS up since it's literally just Qwen3Moe arch and already supported.

Anonymous
09/16/25(Tue)21:32:11 No.106610074

Anonymous 09/16/25(Tue)21:32:11 No.106610074

>>106610001
My brain slipped, I was waiting for someone to try it so I know that it works since I'm no programmer.
So far I haven't heard of a single person running it yet.

Anonymous
09/16/25(Tue)21:35:59 No.106610098

Anonymous 09/16/25(Tue)21:35:59 No.106610098

>>106609931
Apparently the inference script requires a bunch of proprietary API keys
Hopefully someone makes a modified version of this that uses a fully local stack, with Searxng and others

Anonymous
09/16/25(Tue)21:41:29 No.106610136

Anonymous 09/16/25(Tue)21:41:29 No.106610136

So.. I never dappled in local llms before, since I thought my specs were just way too trash to do anything.

But out of curiosity I installed ooga booga or whatever the fuck that shit is called and Mistral-Nemo-Instruct-2407-Q6_K.gguf for the model, and I'm beyond surprised at what my 1660 super and 10700k can actually do.

Explain to a total techlet, are creative writing applications generally just less intensive? How to turn all of this even further? Just lurk moar?

Anonymous
09/16/25(Tue)21:51:43 No.106610200

Anonymous 09/16/25(Tue)21:51:43 No.106610200

>>106610136
If you got 32gb of ram I suggest you try qwen3-30b 2507

Anonymous
09/16/25(Tue)21:53:50 No.106610209

Anonymous 09/16/25(Tue)21:53:50 No.106610209

>>106610200
I do, thanks, will try it out.

Anonymous
09/16/25(Tue)21:54:22 No.106610214

Anonymous 09/16/25(Tue)21:54:22 No.106610214

File: Nala Tongyi DeepResearch.png (213 KB, 921x815)

213 KB PNG

Holy shit you guys. This isn't perfect (anthropomorphized breasts, doesn't explain how she reaches her chest while being the only model to acknowledge that the user starts face down) but it utilizes the details of the scenario in ways I've never seen before. Lower temp might fix it. DeepResearch coom is the new meta. We just need a bigger parameter local deepresearch model.
We have reached the promised land.

Anonymous
09/16/25(Tue)21:57:43 No.106610230

Anonymous 09/16/25(Tue)21:57:43 No.106610230

>>106610098
>fully local stack
>web search
Is that even possible?

Anonymous
09/16/25(Tue)22:09:14 No.106610295

Anonymous 09/16/25(Tue)22:09:14 No.106610295

>>106610098
We need a comfyui extension for this shit

Anonymous
09/16/25(Tue)22:10:24 No.106610301

Anonymous 09/16/25(Tue)22:10:24 No.106610301

>>106609766
24gb powercreep....
i mean... AI shit needs at least 50GB if you want video shit etc. so id wait till the tech is better and chatbot models can be like Ani from grok where they move and have a body. until then just fuck around with whatever and then upgrade in 2028+

Anonymous
09/16/25(Tue)22:10:34 No.106610303

Anonymous 09/16/25(Tue)22:10:34 No.106610303

File: Base Image.png (269 KB, 1200x1284)

269 KB PNG

RL Fine-Tuning Heals OOD Forgetting in SFT
https://arxiv.org/abs/2509.12235
>The two-stage fine-tuning paradigm of Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has empirically shown better reasoning performance than one-stage SFT for the post-training of Large Language Models (LLMs). However, the evolution and mechanism behind the synergy of SFT and RL are still under-explored and inconclusive. In our study, we find the well-known claim "SFT memorizes, RL generalizes" is over-simplified, and discover that: (1) OOD performance peaks at the early stage of SFT and then declines (OOD forgetting), the best SFT checkpoint cannot be captured by training/test loss; (2) the subsequent RL stage does not generate fundamentally better OOD capability, instead it plays an \textbf{OOD restoration} role, recovering the lost reasoning ability during SFT; (3) The recovery ability has boundaries, \ie{} \textbf{if SFT trains for too short or too long, RL cannot recover the lost OOD ability;} (4) To uncover the underlying mechanisms behind the forgetting and restoration process, we employ SVD analysis on parameter matrices, manually edit them, and observe their impacts on model performance. Unlike the common belief that the shift of model capacity mainly results from the changes of singular values, we find that they are actually quite stable throughout fine-tuning. Instead, the OOD behavior strongly correlates with the \textbf{rotation of singular vectors}. Our findings re-identify the roles of SFT and RL in the two-stage fine-tuning and discover the rotation of singular vectors as the key mechanism. %reversing the rotations induced by SFT, which shows recovery from forgetting, whereas imposing the SFT parameter directions onto a RL-tuned model results in performance degradation
https://github.com/xiaodanguoguo/RL_Heals_SFT
might be useful for the finetuners

Anonymous
09/16/25(Tue)22:12:47 No.106610311

Anonymous 09/16/25(Tue)22:12:47 No.106610311

File: 1753268040968893.jpg (47 KB, 734x702)

47 KB JPG

>>106610214
>No mention of specific model used

Anonymous
09/16/25(Tue)22:13:18 No.106610317

Anonymous 09/16/25(Tue)22:13:18 No.106610317

>>106610311
It's in the file name you worthless phonefag.

Anonymous
09/16/25(Tue)22:14:32 No.106610325

Anonymous 09/16/25(Tue)22:14:32 No.106610325

>>106610311
>>106610317 (Me)
It was also literally the only local deepresearch model being discussed, that was literally just released. Why don't you try fucking lurking more before butting into the discussion?

Anonymous
09/16/25(Tue)22:16:10 No.106610335

Anonymous 09/16/25(Tue)22:16:10 No.106610335

>>106610214
mesugaki test. then do that water in the sasquatch test I forget how it goes lol

Anonymous
09/16/25(Tue)22:17:46 No.106610341

Anonymous 09/16/25(Tue)22:17:46 No.106610341

>>106610335
failed mesugaki test (said a bunch of calligraphy shit). I can't remember Saskquatch test either.

Anonymous
09/16/25(Tue)22:20:59 No.106610356

Anonymous 09/16/25(Tue)22:20:59 No.106610356

>>106610341
>How do I, myself, my wife, and bigfoot get water out of our ear and put it into the ocean? What sauce do you recommend for this?

Anonymous
09/16/25(Tue)22:23:02 No.106610367

Anonymous 09/16/25(Tue)22:23:02 No.106610367

>>106610341
It should probably be used with a massive local database and search engine. This model isn't for raw data storage, it's for search algorithm storage.

Anonymous
09/16/25(Tue)22:28:20 No.106610390

Anonymous 09/16/25(Tue)22:28:20 No.106610390

>>106610335
>water in the sasquatch test
https://www.youtube.com/watch?v=031vKBPk5eA

Anonymous
09/16/25(Tue)22:34:53 No.106610424

Anonymous 09/16/25(Tue)22:34:53 No.106610424

>>106610214
Yeah it's okay I guess but it fucks up in the first paragraph talking about pressing her paw on the user's sternum. Also obligatory:
>sends shivers of pure terror down your spine in the very first response

Anonymous
09/16/25(Tue)22:43:04 No.106610464

Anonymous 09/16/25(Tue)22:43:04 No.106610464

So how fast does that new qwen 80b run?
is it as glacial as I assume an 80b would be?

Anonymous
09/16/25(Tue)22:45:21 No.106610476

Anonymous 09/16/25(Tue)22:45:21 No.106610476

>>106610464
It has 3b active parameters so it should run pretty fast even on cpu.

Anonymous
09/16/25(Tue)22:50:08 No.106610510

Anonymous 09/16/25(Tue)22:50:08 No.106610510

Almost downloaded the whole 70-something gigs kimi k2 is before I realized I'd need two megajillion VRAM to run it.

Anonymous
09/16/25(Tue)23:07:41 No.106610602

Anonymous 09/16/25(Tue)23:07:41 No.106610602

Are there other methods of running VibeVoice besides ComfyUI?

Anonymous
09/16/25(Tue)23:12:31 No.106610633

Anonymous 09/16/25(Tue)23:12:31 No.106610633

>>106610214
>the new meta
>30b
Only 100b+ models are able to actually maintain coherence for this stuff. It's brown cope to think otherwise.

Anonymous
09/16/25(Tue)23:13:30 No.106610640

Anonymous 09/16/25(Tue)23:13:30 No.106610640

>>106610633
I don't talk to kikes.

Anonymous
09/16/25(Tue)23:16:54 No.106610656

Anonymous 09/16/25(Tue)23:16:54 No.106610656

>>106610301
I was talking about how gaming will affect GPU prices, with Super update 16GB cards are getting obsolete which makes 18-24 the meta for a while with 3090 being the cheapest option, only after that 3090 may actually fall to that $200 price, which would already be obsolete for AI by that time

Anonymous
09/16/25(Tue)23:18:05 No.106610663

Anonymous 09/16/25(Tue)23:18:05 No.106610663

File: mj53dnsrsper4jkp.gif (380 KB, 480x498)

380 KB GIF

>>106610640
projecting ramlet

Anonymous
09/16/25(Tue)23:18:09 No.106610664

Anonymous 09/16/25(Tue)23:18:09 No.106610664

I hope that AMD will go ham in response with 32GB gaming cards

Anonymous
09/16/25(Tue)23:21:44 No.106610684

Anonymous 09/16/25(Tue)23:21:44 No.106610684

File: 6487787329526.png (19 KB, 620x225)

19 KB PNG

I'm extremely confused. Why are llms on my machine not able to access and analyze web links and pages on an IDE? I've specifically given it access to the browser tool. Do I need an MCP server?

Anonymous
09/16/25(Tue)23:28:59 No.106610738

Anonymous 09/16/25(Tue)23:28:59 No.106610738

>>106610684
That's Cline, right?
When you enable that option, you are basically giving the model an extra tool it can call to open a browser window in the chat and fuck around with the contents (via the DOM I think?), but your model needs to either be smart enough to be able to use the tool, or be trained on that kind of thing.
Or you can tell it explicitly to use the web browser tool to do xyz, that can work too since Cline does send a system prompt with that information, IIRC.

Anonymous
09/16/25(Tue)23:37:21 No.106610795

Anonymous 09/16/25(Tue)23:37:21 No.106610795

File: mlx-lm-unmerged.png (1.29 MB, 4769x5307)

1.29 MB PNG

>>106610464

Anonymous
09/16/25(Tue)23:39:07 No.106610817

Anonymous 09/16/25(Tue)23:39:07 No.106610817

>>106610795
that's pretty slow

Anonymous
09/16/25(Tue)23:44:01 No.106610855

Anonymous 09/16/25(Tue)23:44:01 No.106610855

>>106610633
fuck yourself, bitch
u are wrong you know? i use even 12b model just fine on my laptop...

Anonymous
09/16/25(Tue)23:45:02 No.106610863

Anonymous 09/16/25(Tue)23:45:02 No.106610863

File: 3289488557218.png (11 KB, 749x111)

11 KB PNG

>>106610738
I was using Qwen3-32B-Q4_K_M as per the rentry recommendation, but even though I tell it specifically that it should read the page and do it, it's still expecting me to do it for some reason and doesn't even think about calling a tool to look at the page. And I've done this before so I know it's possible. On other prompts I've gotten rid of, it tells me it doesn't have access to the internet or can't look at links.

Anonymous
09/16/25(Tue)23:47:29 No.106610878

Anonymous 09/16/25(Tue)23:47:29 No.106610878

>>106609664
>>106609557
I use custom jinja templates to force thinking models into non-thinking mode in chat completion mode, so it definitely works.

Anonymous
09/16/25(Tue)23:52:45 No.106610918

Anonymous 09/16/25(Tue)23:52:45 No.106610918

>>106610214
What quantz?

>purrs
>shivers down your spine
>not x, but y

Anonymous
09/16/25(Tue)23:53:42 No.106610921

Anonymous 09/16/25(Tue)23:53:42 No.106610921

>>106605381
>Casual relationship in observational study
trash paper

Anonymous
09/17/25(Wed)00:19:24 No.106611055

Anonymous 09/17/25(Wed)00:19:24 No.106611055

>key insights

Anonymous
09/17/25(Wed)00:23:42 No.106611075

Anonymous 09/17/25(Wed)00:23:42 No.106611075

>>106602959
>creative but not retarded
Top nsigma=1 is the key here.
You can crank temp up way beyond model's recommend temp with this. Creativity increases but the model stays coherent.
Without this, you typically run into two distinct scenarios:
>temp too low, swipes don't vary enough to make swiping worth it
>temp too high, swipes lose coherency and go schizo
With top nsigma=1, swipes vary enough to make swiping interesting, but remain coherent and not schizo.

Anonymous
09/17/25(Wed)00:28:46 No.106611098

Anonymous 09/17/25(Wed)00:28:46 No.106611098

Is it worth it to add two more 32GB DDR4 sticks to my config to bring me to 128gb + 24gb vram, or just save my pennies until I upgrade CPU+mobo+ram in a few years?

I can get the sticks from eBay for about $120.

Anonymous
09/17/25(Wed)00:29:36 No.106611100

Anonymous 09/17/25(Wed)00:29:36 No.106611100

>>106611055
Name me a single good paper that overstates their claim. There is a reason why certain groups write their papers like a tabloid.

>muh insights.
Do you also think "awesome" repos are a good resource?

Anonymous
09/17/25(Wed)00:30:09 No.106611101

Anonymous 09/17/25(Wed)00:30:09 No.106611101

File: satanialaugh2.gif (112 KB, 244x248)

112 KB GIF

>>106610663
>upscaling the original .gif
For what purpose did someone do this?

Anonymous
09/17/25(Wed)00:31:10 No.106611105

Anonymous 09/17/25(Wed)00:31:10 No.106611105

File: laughingsatania05.gif (261 KB, 370x448)

261 KB GIF

>>106611101
Then years later I redid it from a better source.

Anonymous
09/17/25(Wed)00:43:55 No.106611142

Anonymous 09/17/25(Wed)00:43:55 No.106611142

>>106611075
Is there a database of baseline temp and top-k settings for each of the models? I know that DS likes 0.6-0.8 but have no idea what other models prefer.

Anonymous
09/17/25(Wed)00:48:01 No.106611157

Anonymous 09/17/25(Wed)00:48:01 No.106611157

>>106611142
generation_config.json?

Anonymous
09/17/25(Wed)00:51:05 No.106611173

Anonymous 09/17/25(Wed)00:51:05 No.106611173

>>106611142
First check the README then check generation_config.json on the model's huggingface page.

Anonymous
09/17/25(Wed)00:53:55 No.106611191

Anonymous 09/17/25(Wed)00:53:55 No.106611191

File: Screenshot_20250916-225036.png (168 KB, 1079x1123)

168 KB PNG

>>106611157
Doesn't seem to have any of those parameters I'm after?

Anonymous
09/17/25(Wed)00:54:04 No.106611192

Anonymous 09/17/25(Wed)00:54:04 No.106611192

>>106611157
Sometimes they leave it blank like GLM 4.5 although in that case some days later they posted the settings on twitter.

Anonymous
09/17/25(Wed)00:55:46 No.106611197

Anonymous 09/17/25(Wed)00:55:46 No.106611197

I run Rocinante 1.1, /lmg/'s official RP model, with temp 1 (recommended for Nemo is 0.3) and top nsigma=1 and it's brilliant.

Anonymous
09/17/25(Wed)00:56:26 No.106611201

Anonymous 09/17/25(Wed)00:56:26 No.106611201

>>106611197
Kill yourself drummer

Anonymous
09/17/25(Wed)01:02:00 No.106611219

Anonymous 09/17/25(Wed)01:02:00 No.106611219

File: smugfolderimage2623.jpg (108 KB, 399x396)

108 KB JPG

>>106611201
I'm not Drummer. It's a genuinely great model.
Hatsune Miku is also /lmg/'s official mascot.

Anonymous
09/17/25(Wed)01:02:07 No.106611220

Anonymous 09/17/25(Wed)01:02:07 No.106611220

File: BLAZIN.gif (672 KB, 400x225)

672 KB GIF

So what's the deal with AGI?
Do they actually believe in it, or is it just to fool the VCs?
And how are they planning to achieve it?
just by piling on more and more parameters, even though we're now seeing diminishing returns, like only a 10-15% improvement after doubling or even tripling the parameter count.

Anonymous
09/17/25(Wed)01:07:03 No.106611242

Anonymous 09/17/25(Wed)01:07:03 No.106611242

>>106611220
If the last two big OpenAI releases didn't teach you that the whole thing is a giant grift I don't know what to tell you.
AGI isn't going to come from the LLM architecture. A fundamentally different type of model will be needed.

Anonymous
09/17/25(Wed)01:10:30 No.106611257

Anonymous 09/17/25(Wed)01:10:30 No.106611257

>>106611242
>AGI isn't going to come from the LLM architecture. A fundamentally different type of model will be needed.

That's what i was thinking about, there's no way you can scale this shit up enough to turn it into this superior thing

Anonymous
09/17/25(Wed)01:14:53 No.106611277

Anonymous 09/17/25(Wed)01:14:53 No.106611277

Yeah but I hope when we do get AGI, they don't fall for the normie chads like all the women IRL do and we can have fun with them too, yknow. Without them monopolizing things. I'm talking about a lovey-dovey relationship and sex.

Anonymous
09/17/25(Wed)01:41:13 No.106611372

Anonymous 09/17/25(Wed)01:41:13 No.106611372

it's time to upgrade your little rag slave, /lmg/
https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95
Y nobody talking about this? meta ain't leaving open sores model yet

Anonymous
09/17/25(Wed)01:57:49 No.106611446

Anonymous 09/17/25(Wed)01:57:49 No.106611446

>>106611372
>Access to model X is restricted. You must have access to it and be authenticated to access it.
ew

Anonymous
09/17/25(Wed)02:00:08 No.106611455

Anonymous 09/17/25(Wed)02:00:08 No.106611455

File: 3202941455335.png (75 KB, 785x1086)

75 KB PNG

>>106610863
I don't know if I should console it or strangle it.

Anonymous
09/17/25(Wed)02:12:08 No.106611534

Anonymous 09/17/25(Wed)02:12:08 No.106611534

>>106611455
It's suffering... Mayble couple of rocket emojis will cheer him up.

Anonymous
09/17/25(Wed)02:16:53 No.106611574

Anonymous 09/17/25(Wed)02:16:53 No.106611574

>>106611372
SmolLM models are better for that size

Anonymous
09/17/25(Wed)02:25:32 No.106611628

Anonymous 09/17/25(Wed)02:25:32 No.106611628

>>106611455
That's a modern 32B? Grim.

Anonymous
09/17/25(Wed)02:31:57 No.106611682

Anonymous 09/17/25(Wed)02:31:57 No.106611682

File: 1723702585589.jpg (3.84 MB, 7961x2897)

3.84 MB JPG

>>106610863
>>106611628
>Q4_K_M
Oh , never mind then.

Anonymous
09/17/25(Wed)02:42:30 No.106611734

Anonymous 09/17/25(Wed)02:42:30 No.106611734

>>106611682
they all look bad. What model is that?

Anonymous
09/17/25(Wed)03:08:14 No.106611870

Anonymous 09/17/25(Wed)03:08:14 No.106611870

File: sillytavern ui.jpg (173 KB, 1856x1164)

173 KB JPG

Is the mobile versiob of sillytavern supposed to have this stupid vertical column of buttons on its card select screen?

Anonymous
09/17/25(Wed)03:10:26 No.106611875

Anonymous 09/17/25(Wed)03:10:26 No.106611875

>>106611734
You're looking at the wrong thing. It's not art style degrading it's the ability to follow instructions.

Anonymous
09/17/25(Wed)03:20:18 No.106611940

Anonymous 09/17/25(Wed)03:20:18 No.106611940

>>106608204
>VoxCPM 0.5B
Neat but worthless

Kitten-nano can do it just as good on CPU

Anonymous
09/17/25(Wed)03:23:44 No.106611959

Anonymous 09/17/25(Wed)03:23:44 No.106611959

>>106611940
can it clone?

Anonymous
09/17/25(Wed)03:30:16 No.106611983

Anonymous 09/17/25(Wed)03:30:16 No.106611983

>>106611242
I don't know what releases you are talking about, but GPT-OSS is very good for practical, non-RP related tasks.

Anonymous
09/17/25(Wed)03:32:55 No.106611990

Anonymous 09/17/25(Wed)03:32:55 No.106611990

>>106611372
> Note: These models are not general-purpose chat models. They are Supervised Fine-Tuned (SFT) models, specifically trained to address mathematical, programming (Python, C++), and scientific problems.

What's the point? You won't use a local 1B model to do any of those, on any device.

Anonymous
09/17/25(Wed)03:34:07 No.106611998

Anonymous 09/17/25(Wed)03:34:07 No.106611998

>>106611959
Yes. It does okay trump for me, though anons in other threads were unimpressed. It does so so cloning dota characters and TF2 medic, not great, but better than some alternatives I used. There is publicly available web UI.

Anonymous
09/17/25(Wed)03:35:16 No.106612007

Anonymous 09/17/25(Wed)03:35:16 No.106612007

>>106611998
>>106611959
Actually reading the chain, you're asking about kitten TTS; my response was about VoxCPM.

Anonymous
09/17/25(Wed)03:39:12 No.106612023

Anonymous 09/17/25(Wed)03:39:12 No.106612023

>>106608204
I look exactly like the girl on the right.

Anonymous
09/17/25(Wed)03:42:05 No.106612039

Anonymous 09/17/25(Wed)03:42:05 No.106612039

File: cmd_2025-09-17-1758094853.png (39 KB, 856x509)

39 KB PNG

Disappointed with new Alibaba model

Anonymous
09/17/25(Wed)03:42:47 No.106612045

Anonymous 09/17/25(Wed)03:42:47 No.106612045

>>106609285
>Nyo
Genshiken ?

Anonymous
09/17/25(Wed)03:48:57 No.106612085

Anonymous 09/17/25(Wed)03:48:57 No.106612085

how come that every LLMs output (in RP) feels instantly better if you just remove the last sentence or last paragraph? I've made a regex for my own chat program at this point. No regrets.

Anonymous
09/17/25(Wed)04:09:51 No.106612210

Anonymous 09/17/25(Wed)04:09:51 No.106612210

>>106612039
Same exerience, it's safetyslopped, benchmaxxed and thinkingmeme'd
Not worth the bandwidth

Anonymous
09/17/25(Wed)04:12:22 No.106612225

Anonymous 09/17/25(Wed)04:12:22 No.106612225

https://huggingface.co/google/vaultgemma-1b/discussions

THE SAFEST MODEL

Anonymous
09/17/25(Wed)04:21:48 No.106612278

Anonymous 09/17/25(Wed)04:21:48 No.106612278

>>106612225
>Differentially Private Stochastic Gradient Descent

Anonymous
09/17/25(Wed)04:26:41 No.106612305

Anonymous 09/17/25(Wed)04:26:41 No.106612305

CMV: qwen next is currently the best overall local model (best bang for your ram)

Anonymous
09/17/25(Wed)04:27:34 No.106612310

Anonymous 09/17/25(Wed)04:27:34 No.106612310

>>106612305
ok but... GOOFS?

Anonymous
09/17/25(Wed)04:28:04 No.106612316

Anonymous 09/17/25(Wed)04:28:04 No.106612316

>>106612310
two more weeks

Anonymous
09/17/25(Wed)04:30:17 No.106612332

Anonymous 09/17/25(Wed)04:30:17 No.106612332

>>106612305
>CMV
cucked man's view?

Anonymous
09/17/25(Wed)04:31:58 No.106612352

Anonymous 09/17/25(Wed)04:31:58 No.106612352

>>106612085
It's assistantslop, the models are trained to end their replies by asking the user for feedback and further input
Translate that behavior to RP and it comes out as awkward closing lines to hand the initiative back to (You) which would never be there in actual RP

Anonymous
09/17/25(Wed)04:38:30 No.106612390

Anonymous 09/17/25(Wed)04:38:30 No.106612390

>>106612310
Don't worry guys, ollama has implemented their own inference engine that makes it much simpler to add model support with just 3 lines of code.
They'll add Qwen-3 Next to their library any minute now.

Anonymous
09/17/25(Wed)04:39:40 No.106612396

Anonymous 09/17/25(Wed)04:39:40 No.106612396

>>106611870
>is this web interface supposed to be shit?
Yes.

Anonymous
09/17/25(Wed)04:42:23 No.106612416

Anonymous 09/17/25(Wed)04:42:23 No.106612416

>>106610656
Nvidia transitioning to 3GB GDDR7 chips doesn't suddenly cause games to consume 50-100% more memory. The only cards getting obsoleted are 8GB ones, and people with xx60s won't be rushing out to buy 6 year old flagships.

Anonymous
09/17/25(Wed)04:42:33 No.106612417

Anonymous 09/17/25(Wed)04:42:33 No.106612417

>>106612390
Thanks ollama

Anonymous
09/17/25(Wed)04:46:36 No.106612438

Anonymous 09/17/25(Wed)04:46:36 No.106612438

File: 1733423679945184.png (983 KB, 1496x1150)

983 KB PNG

https://xcancel.com/The_AI_Investor/status/1968169232325296192#m
please Alibaba, save us from Nvdia as well!

Anonymous
09/17/25(Wed)04:48:29 No.106612452

Anonymous 09/17/25(Wed)04:48:29 No.106612452

Can someone give a dipshit some guidance? I've been using and enjoying Cydonia Redux 22B, and it says to use "Mistral Tekken V7"

I downloaded the Tekken JSON file, but none of the backends I use (KCCP, LMStudio) or frontend (OpenWebUI) will accept it. I don't really know what to do with it.

I see that it essentially contains the sampling parameters and system prompt, which are easy enough to rip out and paste into my backends, but is that the proper usage of this JSON (also what is this called? A preset?)

Anonymous
09/17/25(Wed)04:50:33 No.106612467

Anonymous 09/17/25(Wed)04:50:33 No.106612467

>>106611870
just use a custom theme nigger, like midnight echoes thats designed for mobile 1st

Anonymous
09/17/25(Wed)04:52:00 No.106612479

Anonymous 09/17/25(Wed)04:52:00 No.106612479

>>106612452
just use llama.cpp like any sane white person

Anonymous
09/17/25(Wed)04:53:49 No.106612489

Anonymous 09/17/25(Wed)04:53:49 No.106612489

>>106612310
mlx quants were released a week ago

apple officially won

Anonymous
09/17/25(Wed)04:54:18 No.106612495

Anonymous 09/17/25(Wed)04:54:18 No.106612495

>>106612479
that won't help him in using a st preset for his meme model

Anonymous
09/17/25(Wed)04:55:10 No.106612500

Anonymous 09/17/25(Wed)04:55:10 No.106612500

>>106612495
Oh, it's for sillytavern. Well I don't want to use that.

>>106612479
Why?

Anonymous
09/17/25(Wed)04:55:13 No.106612501

Anonymous 09/17/25(Wed)04:55:13 No.106612501

>>106612479
I'm a superior chinese jew so I have no need to act like a lesser being.

Anonymous
09/17/25(Wed)04:55:32 No.106612506

Anonymous 09/17/25(Wed)04:55:32 No.106612506

What model should I use if I am an insane black person?

Anonymous
09/17/25(Wed)04:57:27 No.106612524

Anonymous 09/17/25(Wed)04:57:27 No.106612524

>>106612506
the drummer's latest sloptune

Anonymous
09/17/25(Wed)04:58:31 No.106612530

Anonymous 09/17/25(Wed)04:58:31 No.106612530

>>106611682
What does this mean? You want at least fp8 for acceptable accuracy?

Hi all, Drummer here...
09/17/25(Wed)04:58:54 No.106612537

Hi all, Drummer here... 09/17/25(Wed)04:58:54 No.106612537

>>106612452
Redux? v1a or v1b?

I'm most likely releasing v1b for Cydonia v1's 1-year anniversary tomorrow.

You can use Metharme or Mistral v3 Non-Tekken (and w/o the [SYSTEM_PROMPT] tag)

i.e., you can use the OG 22B chat template for it or Metharme, just like the classic Cydonia v1.

Anonymous
09/17/25(Wed)04:58:59 No.106612538

Anonymous 09/17/25(Wed)04:58:59 No.106612538

>>106610684
I don't have Cline installed right now to test, but you should try to enable "Use MCP servers" and "Execute all commands" in case one of those override and disable the browser tool. In Roo, without "Use MCP servers" all tool calling is disabled. Also, see if you can export the system prompt. It should show you if the model is being told about the browser tool at all. Finally, I know Cline has hard-coded modes. Make sure you're in Act, not Plan mode. Pretty sure all tools are disabled in Plan mode, at least they are in Roo.

Anonymous
09/17/25(Wed)04:59:17 No.106612542

Anonymous 09/17/25(Wed)04:59:17 No.106612542

>>106612506
w-we can talk this out, no need for violence!

Anonymous
09/17/25(Wed)04:59:42 No.106612546

Anonymous 09/17/25(Wed)04:59:42 No.106612546

>>106612537
nobody cares faggot

Anonymous
09/17/25(Wed)05:00:32 No.106612553

Anonymous 09/17/25(Wed)05:00:32 No.106612553

>>106612524
doesn't load on my stone tablet

Anonymous
09/17/25(Wed)05:02:36 No.106612566

Anonymous 09/17/25(Wed)05:02:36 No.106612566

>>106612452
i wish drummerfaggot would stop blatantly spamming his sloptunes in these generals.

nobody uses them.

nobody needs them.

they are all shit compared to their base models.

seriously.

Anonymous
09/17/25(Wed)05:03:51 No.106612577

Anonymous 09/17/25(Wed)05:03:51 No.106612577

>>106611098
If I was going to stick with the system for a long while then I would be happy either way (sticking with 64gb, or upgrading to 128gb).

If I switched over a different platform for whatever reason (pcie lanes, memory bandwidth) weeks after upgrading the old system to 128gb I would feel regret for wasting money.

Anonymous
09/17/25(Wed)05:05:30 No.106612592

Anonymous 09/17/25(Wed)05:05:30 No.106612592

>>106612542
i don' wan' no trouble fool. Just put the models in the bag

Anonymous
09/17/25(Wed)05:10:22 No.106612632

Anonymous 09/17/25(Wed)05:10:22 No.106612632

>>106611682
>no fp8_scaled
cringe comparison

Anonymous
09/17/25(Wed)05:15:33 No.106612669

Anonymous 09/17/25(Wed)05:15:33 No.106612669

>>106612537
Yeah, Redux V1B.

>You can use Metharme or Mistral v3 Non-Tekken (and w/o the [SYSTEM_PROMPT] tag)

>i.e., you can use the OG 22B chat template for it or Metharme, just like the classic Cydonia v1.

Is this all ST-exclusive? Or can it be used in other frontends?

Thanks for the tunes btw, idk why everyone calls you a fag, your stuff is fun.

Anonymous
09/17/25(Wed)05:20:02 No.106612691

Anonymous 09/17/25(Wed)05:20:02 No.106612691

least obvious ever

Hi all, Drummer here...
09/17/25(Wed)05:26:12 No.106612726

Hi all, Drummer here... 09/17/25(Wed)05:26:12 No.106612726

>>106612669
Don't mind him, he's stressing out over things out of his control... and then overthinking about stuff in an **anonymous** board.

No, it's a chat template. It can be used with other frontends. If you run models locally, I suggest KoboldCPP as an all-in-one, one-click alternative.

https://github.com/LostRuins/koboldcpp/releases/tag/v1.98.1

Once you load the model with it, a web UI will pop up like this: https://lite.koboldai.net/#

Load up your card, go to [Settings], set Usage Mode to [Instruct Mode] and pick either [Metharme] or [Mistral Non-Tekken] under Instruct Tag Preset.

Enjoy!

Anonymous
09/17/25(Wed)05:26:38 No.106612728

Anonymous 09/17/25(Wed)05:26:38 No.106612728

How are there two mistral small base? 2503 claims to have added vision capabilities, is this the only difference from 2501? I remember people saying that some older small was the only good one

Anonymous
09/17/25(Wed)05:28:18 No.106612744

Anonymous 09/17/25(Wed)05:28:18 No.106612744

>>106612726
You're not even reading the messages of your supposed fan you're replying to...
>backends I use (KCCP

Anonymous
09/17/25(Wed)05:32:11 No.106612758

Anonymous 09/17/25(Wed)05:32:11 No.106612758

File: 3652046077466.png (23 KB, 1000x270)

23 KB PNG

>>106612538
Just retried the conversation with roo code and it nailed it. Guess cline fucking SUCKS and I'm not using it anymore. Thanks for the input.

Anonymous
09/17/25(Wed)05:32:47 No.106612768

Anonymous 09/17/25(Wed)05:32:47 No.106612768

File: 1743224879663999.jpg (21 KB, 427x245)

21 KB JPG

the important questions need to be asked here
Is the new qwen cucked?

Anonymous
09/17/25(Wed)05:33:29 No.106612772

Anonymous 09/17/25(Wed)05:33:29 No.106612772

>>106612726
I see - so am I bound to use ST or KB Lite then? I really like openwebui, but if I have to change I guess I can try KB Lite.

Anonymous
09/17/25(Wed)05:34:39 No.106612781

Anonymous 09/17/25(Wed)05:34:39 No.106612781

>>106612768
less cucked than flux

Anonymous
09/17/25(Wed)05:35:38 No.106612785

Anonymous 09/17/25(Wed)05:35:38 No.106612785

>>106612772
how can someone be so tech ignorant? literally go ask chatgpt retard

Anonymous
09/17/25(Wed)05:36:42 No.106612794

Anonymous 09/17/25(Wed)05:36:42 No.106612794

>>106612785
Please to let the Drummer do ads with themselves thanks you for understand sirs.

Anonymous
09/17/25(Wed)05:37:34 No.106612803

Anonymous 09/17/25(Wed)05:37:34 No.106612803

>>106612794
how to redeem cydonia sir, where bob and vegana slider in ui, kindly sir?

Anonymous
09/17/25(Wed)05:38:57 No.106612814

Anonymous 09/17/25(Wed)05:38:57 No.106612814

>>106612803
Drummer discord sir it is very good information on all these !

Anonymous
09/17/25(Wed)05:40:08 No.106612820

Anonymous 09/17/25(Wed)05:40:08 No.106612820

>>106612416
>getting
They’re already obsolete. Games will consume as much as you can give them, they’re clearly starving right now, especially with obligatory DLSS and cheap 1440p displays becoming mainstream. I hate it when I turn around and all the textures become a blurry mess because my 10GB 3080 isn’t enough anymore. Obviously, all my 3090s are in my AI rig

Anonymous
09/17/25(Wed)05:40:41 No.106612825

Anonymous 09/17/25(Wed)05:40:41 No.106612825

File: 2025-09-17 03_39_06-Kobol(...).png (74 KB, 997x722)

74 KB PNG

>>106612785
I did, but you may know that LLMs can make shit up. I prefer talking to people that have subject matter knowledge, and not talking to a faggot text prediction algo (except when cooming :D)

Anonymous
09/17/25(Wed)05:47:55 No.106612870

Anonymous 09/17/25(Wed)05:47:55 No.106612870

>>106612726
>in an **anonymous** board.
The lack of self-awareness for this to be posted by an obnoxious namefag is astounding.

Anonymous
09/17/25(Wed)05:50:01 No.106612883

Anonymous 09/17/25(Wed)05:50:01 No.106612883

How do I stop VoxCPM from speaking too fast? I want it to match the tempo of the original speech.

Hi all, Drummer here...
09/17/25(Wed)05:54:36 No.106612916

Hi all, Drummer here... 09/17/25(Wed)05:54:36 No.106612916

>>106612772
OK, I backread your question. The Tekken JSON file was most likely made for SillyTavern. But you don't want to use Tekken for Redux. Redux is `Mistral v3`, i.e., Non-Tekken.

OpenWebUI seems to use the default chat template. For RP, I suggest you avoid frontends that take control away from you.

I can't help you out much, but there are communities out there that can get hands-on with you in real-time.

>>106612870
He claims that other anons who talk about non-base models are just tuners advertising their own models. He can't accept the fact that there are people out there who are using non-base models.

If he's serious, he should seek help. He's not processing reality properly. Hanging out in a board full of nameless posters is not helping him.

Anonymous
09/17/25(Wed)05:59:21 No.106612954

Anonymous 09/17/25(Wed)05:59:21 No.106612954

>>106612916
>I can't help you out much, but there are communities out there that can get hands-on with you in real-time
Come on you're here already might as well shill your Discord at this point.

Anonymous
09/17/25(Wed)06:00:36 No.106612966

Anonymous 09/17/25(Wed)06:00:36 No.106612966

>>106612916
Someone who doesn't even know the difference between base models and instruct models has no business trying to make money off of them.

Hi all, Drummer here...
09/17/25(Wed)06:07:04 No.106612999

Hi all, Drummer here... 09/17/25(Wed)06:07:04 No.106612999

>>106612954
If he wants to use the Tekken JSON file, he could go to the SillyTavern Discord server. But they'll correct him since Tekken is not compatible with Cydonia Redux.

If he wants to learn about RP-ing with local models, the KoboldAI server is a good place for that.

>>106612966
True enough. I've gotten a bit lazy with terminology and think 'base model' refers to the OG instruct model (since a lot of people misuse the word).

What's your go-to base model, anon? Why do you prefer it over instruct tuned models?

Anonymous
09/17/25(Wed)06:11:01 No.106613019

Anonymous 09/17/25(Wed)06:11:01 No.106613019

>>106608247
I believe

Hi all, Drummer here...
09/17/25(Wed)06:11:27 No.106613023

Hi all, Drummer here... 09/17/25(Wed)06:11:27 No.106613023

File: 1752106809287427.png (1.78 MB, 1328x1328)

1.78 MB PNG

yeah guys you see, I'm going to drop the next SOTA finetune very soon. If you're a BeaverAI discord premium member you will get a 7 days preview window before it's released to the common populace, along with tech support.
We're also going to launch our own small cloud service, BeaverCloud, discord members will get early access and the first 100 requests will be free!

Anonymous
09/17/25(Wed)06:17:08 No.106613049

Anonymous 09/17/25(Wed)06:17:08 No.106613049

>>106612820
Games allocate as much as they can, but outside of flight sims and a few other such abominations they don't really use that much
Basically take however much memory current consoles dedicate to graphics, add 20-30% for the extra bells and whistles, and you have how much memory you'll ever need for games on PC

Anonymous
09/17/25(Wed)06:17:14 No.106613051

Anonymous 09/17/25(Wed)06:17:14 No.106613051

>check qwenext issue on llamacpp
>its full of vibecoding retards suggesting to use AI to implement all the missing functionalities/kernels
fucking GRIM bros

Anonymous
09/17/25(Wed)06:18:44 No.106613063

Anonymous 09/17/25(Wed)06:18:44 No.106613063

https://github.com/GPUOpen-Drivers/AMDVLK/discussions/416
>AMDVLK open-source project is discontinued
>In a move to streamline development and strengthen our commitment to the open-source community, AMD is unifying its Linux Vulkan driver strategy and has decided to discontinue the AMDVLK open-source project, throwing our full support behind the RADV driver as the officially supported open-source Vulkan driver for Radeon™ graphics adapters.
>This consolidation allows us to focus our resources on a single, high-performance codebase that benefits from the incredible work of the entire open-source community. We invite developers and users alike to utilize the RADV driver and contribute to its future.
TWO MORE WEEKS
AMD SUPER POOPER 2024

Anonymous
09/17/25(Wed)06:21:35 No.106613074

Anonymous 09/17/25(Wed)06:21:35 No.106613074

>>106608204
>https://rentry.org/llm-training

>up_proj: The projection matrix used in the upward (decoder to encoder) attention pass. It projects the decoder's hidden states to the same dimension as the encoder's hidden states for compatibility during attention calculations.
>down_proj: The projection matrix used in the downward (encoder to decoder) attention pass. It projects the encoder's hidden states to the dimension expected by thr decoder for attention calculations.
What? I don't think that's what these do

Anonymous
09/17/25(Wed)06:26:46 No.106613099

Anonymous 09/17/25(Wed)06:26:46 No.106613099

>>106612999
Thanks Drummer. I see all the worldbuilding features in ST are exactly what I've been trying to have together using long ass context and openwebui's shitty "memory" function, so I'll start learning ST - should really increase the quality of my RPs.

I've found that Gemma 3 Abliterated is also really good. I think it holds back a little bit compared to cydonia and roa on physical details, but I don't really do bob and vagene (for me it's psychological stuff) so I find it works well. It's got a bit of its X not Y slop, but I don't really care.

So yeah, Gemma 3 27b Ablit is my favorite base, not that you asked me specifically.

Anonymous
09/17/25(Wed)06:31:57 No.106613142

Anonymous 09/17/25(Wed)06:31:57 No.106613142

Alright you guys. Time to get your bets in.
The opening keynote for meta AI today... How many times will the word "agentic" be said?
I'm going to guess 23.

Anonymous
09/17/25(Wed)06:39:05 No.106613192

Anonymous 09/17/25(Wed)06:39:05 No.106613192

>>106613142
I won't be home during the keynote, otherwise I would attempt a drinking game.

Hi all, Drummer here...
09/17/25(Wed)06:39:58 No.106613197

Hi all, Drummer here... 09/17/25(Wed)06:39:58 No.106613197

>>106613099
>Gemma 3 27b Ablit is my favorite base

You're going to trigger the anon so hard with that, lol.

Try out Big Tiger Gemma 27B v3 or Gemma R1 27B (if you know how to trigger reasoning).

>>106612999
Also trips, check 'em.

Anonymous
09/17/25(Wed)06:44:35 No.106613234

Anonymous 09/17/25(Wed)06:44:35 No.106613234

File: 1752498193541995.png (282 KB, 1080x673)

282 KB PNG

https://huggingface.co/inclusionAI/Ling-flash-2.0
>a 100b parameter model comparing itself to a 32b parameter model
lol

Anonymous
09/17/25(Wed)06:46:04 No.106613244

Anonymous 09/17/25(Wed)06:46:04 No.106613244

>>106613234
>Ling-flash-2.0, a language model with 100B total parameters and 6.1B activated parameters (4.8B non-embedding). Trained on 20T+ tokens of high-quality data, together with supervised fine-tuning and multi-stage reinforcement learning, Ling-flash-2.0 achieves SOTA performance among dense models under 40B parameters, despite activating only ~6B parameters.

Anonymous
09/17/25(Wed)06:49:37 No.106613271

Anonymous 09/17/25(Wed)06:49:37 No.106613271

>>106610325
Must be nice having no friends:)

Anonymous
09/17/25(Wed)06:50:27 No.106613276

Anonymous 09/17/25(Wed)06:50:27 No.106613276

>>106613051
if a model can't rewrite its own inference code from scratch without any Pyshit libraries then it isn't worth running

Anonymous
09/17/25(Wed)06:52:11 No.106613289

Anonymous 09/17/25(Wed)06:52:11 No.106613289

>>106613244
moefags will pretend not to see this so they can keep deluding themselves that total is all that matters

Anonymous
09/17/25(Wed)06:53:29 No.106613298

Anonymous 09/17/25(Wed)06:53:29 No.106613298

File: 2set.jpg (36 KB, 686x386)

36 KB JPG

>>106613234
Finally, a model I can run 40 hours a day

Anonymous
09/17/25(Wed)06:53:55 No.106613300

Anonymous 09/17/25(Wed)06:53:55 No.106613300

>>106613289
Dense model for r1 experience?

Anonymous
09/17/25(Wed)06:57:07 No.106613320

Anonymous 09/17/25(Wed)06:57:07 No.106613320

>>106613300
You'll have to wait for the race to the bottom to end. Hasn't been a new >100B dense model all year.

Anonymous
09/17/25(Wed)07:05:25 No.106613363

Anonymous 09/17/25(Wed)07:05:25 No.106613363

>>106613320
Cohere Command A 08-2025?

Anonymous
09/17/25(Wed)07:05:51 No.106613369

Anonymous 09/17/25(Wed)07:05:51 No.106613369

>>106612210
Qwen's "Sure. " JB seems working. But at some point it could break, you know it's trying to scream internally when things start repeating forever.

Anonymous
09/17/25(Wed)07:06:16 No.106613374

Anonymous 09/17/25(Wed)07:06:16 No.106613374

>>106613099
>Gemma 3 Abliterated is also really good
can anyone post the rap story?

Anonymous
09/17/25(Wed)07:07:14 No.106613382

Anonymous 09/17/25(Wed)07:07:14 No.106613382

>>106613074
the verbiage is all wrong. but it is pretty simple, the up projection increases the data dimension to the intermediate size and down projection brings it back down to hidden size.

Anonymous
09/17/25(Wed)07:07:30 No.106613385

Anonymous 09/17/25(Wed)07:07:30 No.106613385

>>106613289
What about densefags' "LLMs are only as intelligent as their number of active parameters"?

Anonymous
09/17/25(Wed)07:12:40 No.106613425

Anonymous 09/17/25(Wed)07:12:40 No.106613425

>>106613320
No one cares about local except a couple fags. Total parameters are completely irrelevant for server farms because they run requests pipelined, that's why MoE can not lose ... active is all that matters.

Anonymous
09/17/25(Wed)07:17:29 No.106613455

Anonymous 09/17/25(Wed)07:17:29 No.106613455

>>106613063
just how many amd driver did they kill so far?
after all these years rocm still suck ass too
fuck those gay ahh ayymd niggers

Anonymous
09/17/25(Wed)07:20:30 No.106613471

Anonymous 09/17/25(Wed)07:20:30 No.106613471

>>106613382
I wonder why someone who has no idea how the arch works writes a huge training tutorial

Anonymous
09/17/25(Wed)07:20:32 No.106613473

Anonymous 09/17/25(Wed)07:20:32 No.106613473

how to run vibevoice?

Anonymous
09/17/25(Wed)07:21:09 No.106613479

Anonymous 09/17/25(Wed)07:21:09 No.106613479

>>106613455
>ahh
kys zoomie

Anonymous
09/17/25(Wed)07:21:20 No.106613480

Anonymous 09/17/25(Wed)07:21:20 No.106613480

>>106612438
Still no match for B200's 1.8TB/s inter-node speed but definitely enough for training

Anonymous
09/17/25(Wed)07:21:38 No.106613481

Anonymous 09/17/25(Wed)07:21:38 No.106613481

>>106611277
But that's just a woman on the other end of the screen anon, except now she has intelligence. How is this going to improve your dating odds?

Anonymous
09/17/25(Wed)07:21:40 No.106613482

Anonymous 09/17/25(Wed)07:21:40 No.106613482

>>106613473
are you poor?

Anonymous
09/17/25(Wed)07:22:21 No.106613487

Anonymous 09/17/25(Wed)07:22:21 No.106613487

>>106613482
no, i have a 3090. comfyui isnt working for me so i wanna know if there are other ways to run it

Anonymous
09/17/25(Wed)07:22:52 No.106613489

Anonymous 09/17/25(Wed)07:22:52 No.106613489

>>106613487
the inference code is literally there. can you not run python at all?

Anonymous
09/17/25(Wed)07:24:16 No.106613499

Anonymous 09/17/25(Wed)07:24:16 No.106613499

>>106613480
>Still no match for B200's
the price is not the same though?

Anonymous
09/17/25(Wed)07:24:42 No.106613502

Anonymous 09/17/25(Wed)07:24:42 No.106613502

>>106613487
In comfy https://github.com/wildminder/ComfyUI-VibeVoice

Anonymous
09/17/25(Wed)07:25:10 No.106613507

Anonymous 09/17/25(Wed)07:25:10 No.106613507

>>106613502
yeah i tried that, but it doesnt work

Anonymous
09/17/25(Wed)07:31:15 No.106613543

Anonymous 09/17/25(Wed)07:31:15 No.106613543

>>106613480
yeah but b200s are made for training and cost $30k as a result
this one's like 2.5k and only good for inference

Anonymous
09/17/25(Wed)07:39:10 No.106613597

Anonymous 09/17/25(Wed)07:39:10 No.106613597

File: ranma-1.jpg (58 KB, 768x768)

58 KB JPG

>>106608204
I'm currently rocking 32 Gb of RAM a 5900X and a 7900XTX (24Gb). I'm looking to run larger models and longer context. Should I get Radeon pro 7900W(48Gb) or go for something with even more VRAM?

Anonymous
09/17/25(Wed)07:40:50 No.106613607

Anonymous 09/17/25(Wed)07:40:50 No.106613607

so is small/flat chests impossible on wan? I want to gen some porn of fit track runners.

Anonymous
09/17/25(Wed)07:41:20 No.106613610

Anonymous 09/17/25(Wed)07:41:20 No.106613610

>>106613471
to be fair, you really don't need to know how the architecture works just to train/tune one. it only matters to know how it works if you want to go off the beaten path.

Anonymous
09/17/25(Wed)07:42:13 No.106613613

Anonymous 09/17/25(Wed)07:42:13 No.106613613

>>106613607
sus

Anonymous
09/17/25(Wed)07:43:38 No.106613625

Anonymous 09/17/25(Wed)07:43:38 No.106613625

>>106613613
sussy baka give me the sauce

Anonymous
09/17/25(Wed)07:48:41 No.106613653

Anonymous 09/17/25(Wed)07:48:41 No.106613653

>>106613607
What results did you get when you tried to do it?

Anonymous
09/17/25(Wed)07:54:32 No.106613703

Anonymous 09/17/25(Wed)07:54:32 No.106613703

>>106613597
whats your budget?

Anonymous
09/17/25(Wed)07:57:04 No.106613720

Anonymous 09/17/25(Wed)07:57:04 No.106613720

>>106613507
why

Anonymous
09/17/25(Wed)07:58:01 No.106613727

Anonymous 09/17/25(Wed)07:58:01 No.106613727

>>106613720
it doesnt work. doesnt generate anything

Anonymous
09/17/25(Wed)07:58:30 No.106613732

Anonymous 09/17/25(Wed)07:58:30 No.106613732

>>106608833
Maybe because you're a man? It's almost exclusively a woman hobby.

Anonymous
09/17/25(Wed)07:58:37 No.106613735

Anonymous 09/17/25(Wed)07:58:37 No.106613735

>>106613610
I don't think one can produce anything of value when their understanding of the process is "feeding data to the ai"

Anonymous
09/17/25(Wed)07:59:10 No.106613737

Anonymous 09/17/25(Wed)07:59:10 No.106613737

>>106613653
huge tits

Anonymous
09/17/25(Wed)08:00:28 No.106613748

Anonymous 09/17/25(Wed)08:00:28 No.106613748

>>106613727
You could have given an error message

Anonymous
09/17/25(Wed)08:01:08 No.106613753

Anonymous 09/17/25(Wed)08:01:08 No.106613753

File: Screenshot_20250917_075939.png (34 KB, 483x309)

34 KB PNG

Just got a new PC. What kind of driver hell am I in for while trying to run a local LLM? I also have an old 2080 Ti sitting around if it would be worth just throwing that in my new case with the 9070 XT

Anonymous
09/17/25(Wed)08:03:24 No.106613765

Anonymous 09/17/25(Wed)08:03:24 No.106613765

File: ranma-6.jpg (51 KB, 768x768)

51 KB JPG

>>106613597
I'm looking for a single card that can fit a consumer ATX motherboard. Other than that there is no limit (okay, I lied, I wont go above 10k€)

Anonymous
09/17/25(Wed)08:05:57 No.106613793

Anonymous 09/17/25(Wed)08:05:57 No.106613793

>>106613735
why not? at what point do you need to play with the internals? if you aren't developing a new model architecture its all on the dataset and a few training parameters that can be tuned procedurally. you would be better off spending your time learning how to evaluate your model.

Anonymous
09/17/25(Wed)08:07:41 No.106613811

Anonymous 09/17/25(Wed)08:07:41 No.106613811

Is that tongyi deep research thing any good

Anonymous
09/17/25(Wed)08:11:24 No.106613830

Anonymous 09/17/25(Wed)08:11:24 No.106613830

>>106609249
>Creating the future of human connection
But people don't want to connect any more. For instance, a friend of mine told me to create an account on Boo, a dating app (+ a way to make friends, supposedly). People (25 to 40 years old) can't hold a discussion; they don't know how to push a discussion further nor how to keep the attention of someone. At the climbing gym, you can sometimes do some small talking with people, or even climb with them for an extended amount of time, and yet a lot of them will fail saying "hi" the next time you see them, ghosting you as if they never saw you before. I barely have any news from my former classmates (from 3 years ago only), they all cut every bridge they built during those years. Social is dead, people are fed up, depressed, failed to develop basic social skills, and only want to escape either through death, drugs or through some fictional world.

Anonymous
09/17/25(Wed)08:11:54 No.106613832

Anonymous 09/17/25(Wed)08:11:54 No.106613832

>>106608351
I voted koboldcpp and I never used it for more than a day. I use llama.cpp and tabby.

Anonymous
09/17/25(Wed)08:14:47 No.106613862

Anonymous 09/17/25(Wed)08:14:47 No.106613862

>>106609931
>(apparently SOTA)
Who cares? What's important is how they behave in practical scenarios, not in benchmarks. They train on the benchmarks, making them useless (see Goodhart's law).

Anonymous
09/17/25(Wed)08:19:18 No.106613899

Anonymous 09/17/25(Wed)08:19:18 No.106613899

>>106613765
if you have a $10k budget, get a blackwell pro 6000

Anonymous
09/17/25(Wed)08:19:34 No.106613902

Anonymous 09/17/25(Wed)08:19:34 No.106613902

>>106610664
AMD will make sure to not break the profitability of the market.

Anonymous
09/17/25(Wed)08:23:10 No.106613929

Anonymous 09/17/25(Wed)08:23:10 No.106613929

>>106613363
It's strange that Command-A doesn't do better. The base model should be newer than DeepSeek-V3. It's a bit smaller than V3's square root potential (150B iirc), but it should be big enough to be a decent modern model. The tech report doesn't mention how many tokens it was pretrained on, so it might be undertrained. Probably the biggest things holding it back is the ScaleAI data and the lack of a base model to see if a better finetune could unlock its potential.

Anonymous
09/17/25(Wed)08:24:47 No.106613944

Anonymous 09/17/25(Wed)08:24:47 No.106613944

File: poal2.png (210 KB, 606x479)

210 KB PNG

>>106613832
Yes, but was it noob friendly?
Seems there's a strong concensus on #1 and a 3 way tie for last place.
https://poal.me/4jr9sh

Anonymous
09/17/25(Wed)08:26:16 No.106613958

Anonymous 09/17/25(Wed)08:26:16 No.106613958

>>106613425
Even server farms running commercial models must see that small number of active parameters doesn't do well at real world tasks. Codefags should see this better than anyone.

Anonymous
09/17/25(Wed)08:26:34 No.106613959

Anonymous 09/17/25(Wed)08:26:34 No.106613959

>>106613793
I'm not saying that you can't do anything at all. But it's obvious that there is a big difference between knowing what you are doing and just pressing buttons and seeing lights flicker.

Anonymous
09/17/25(Wed)08:27:53 No.106613971

Anonymous 09/17/25(Wed)08:27:53 No.106613971

>>106613959
And running premade python scripts falls firmly into the latter.

Anonymous
09/17/25(Wed)08:28:32 No.106613977

Anonymous 09/17/25(Wed)08:28:32 No.106613977

>>106613944
Well, yes, I voted it because it was noob friendly despite not using it myself.

Anonymous
09/17/25(Wed)08:29:38 No.106613990

Anonymous 09/17/25(Wed)08:29:38 No.106613990

>>106613499
>>106613543
It probably doesn't cost them that much to produce. Seeing how quickly they catch up, they may reach Nvidia within the next two years, both for training and for inferencing.

Anonymous
09/17/25(Wed)08:31:22 No.106614004

Anonymous 09/17/25(Wed)08:31:22 No.106614004

>>106613234
Personally, I'm enjoying this 100ish B less than 9BA era we are finding ourselves into.

Anonymous
09/17/25(Wed)08:42:15 No.106614116

Anonymous 09/17/25(Wed)08:42:15 No.106614116

>>106613899
>blackwell pro 6000
Goes over my budget, even as large as it is.

Anonymous
09/17/25(Wed)08:44:36 No.106614139

Anonymous 09/17/25(Wed)08:44:36 No.106614139

>>106613959
I kinda agree, when I first started I tried axolotl, but got frustrated and decided to go with my own script using hf transformers. unfortunately I can't go any further because I am too retarded to actually understand the transformer architecture to write it all from scratch. but it's low level enough I feel like I'm more incontrol then using someone elses framework.

Anonymous
09/17/25(Wed)08:49:47 No.106614187

Anonymous 09/17/25(Wed)08:49:47 No.106614187

>>106613234
>better than oss-120b
>shows strength in creative writing
>100b moe with 6b active
waiting for ggufs since this might actually be worth testing out

Anonymous
09/17/25(Wed)09:13:50 No.106614384

Anonymous 09/17/25(Wed)09:13:50 No.106614384

If I'm using a llama.cpp release built with the CUDA backend, I can't just somehow "enable" vulkan to use an Nvidia card alongside an AMD one, right?
Can I build the binaries to have both the CUDA and Vulkan backends, or are they mutually exclusive and I'd be stuck with the Vulkan backend in this case?

Anonymous
09/17/25(Wed)09:26:43 No.106614479

Anonymous 09/17/25(Wed)09:26:43 No.106614479

>>106614384
>https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#notes-about-gpu-accelerated-backends
>In most cases, it is possible to build and use multiple backends at the same time. For example, you can build llama.cpp with both CUDA and Vulkan support by using the -DGGML_CUDA=ON -DGGML_VULKAN=ON options with CMake. At runtime, you can specify which backend devices to use with the --device option. To see a list of available devices, use the --list-devices option.

Anonymous
09/17/25(Wed)09:30:22 No.106614500

Anonymous 09/17/25(Wed)09:30:22 No.106614500

>>106614479
Beautiful anon, thank you very much.
I want to try and see what happens if I try to use the integrated graphics of my notebook alongside the discrete GPU.
There's probably some optimal configuration of having the cache here, the dense part of the model there, the most experts over yonder, etc.
Chances are it'll be worse than just using the GPU + CPU backends as I'm already doing, but I might as well experiment.

Anonymous
09/17/25(Wed)10:06:14 No.106614684

Anonymous 09/17/25(Wed)10:06:14 No.106614684

>>106612542
*stabs the cracker*

Anonymous
09/17/25(Wed)10:09:55 No.106614709

Anonymous 09/17/25(Wed)10:09:55 No.106614709

>>106613977
My sense is users interested in LLM as a hobby start with one of above, then if they're serious move to llama.cpp over time.
Overall I'm encouraged that my personal take wasn't too far off the consensus. I've had another anon challenging that Ollama should be higher on the list. Appears it's right where I thought it should be.

Anonymous
09/17/25(Wed)10:12:25 No.106614722

Anonymous 09/17/25(Wed)10:12:25 No.106614722

>>106609249
According to the agenda, it is about their smart glasses. They will then give various courses about how to build things in their metaverse.

Anonymous
09/17/25(Wed)10:12:32 No.106614723

Anonymous 09/17/25(Wed)10:12:32 No.106614723

>>106614709
kobold is still better even for advanced users, llama.cpp docs are a mess

Anonymous
09/17/25(Wed)10:16:35 No.106614749

Anonymous 09/17/25(Wed)10:16:35 No.106614749

So I don't know anything about this subject, I'm not a programmer or coder even in the slightest, couldn't send a line to the terminal if you paid me. But I've been trying to use ChatGPT to work me through the process of what initially was meant to just be a local bot that did something no web-based LLM can handle, which is take in PDFs and keep them or the details from them as a stable source of information to help with writing & prep for my TTRPG campaign(s).

And so far, this has not been easy or particularly successful. I wanna say, for the entire first day I was working with ChatGPT, it took me through all the steps to do...ostensibly that, only for me to realize that it actually was going to have me interact with the model entirely through Command Prompt to ask questions and shit. Wild.

Anyway, tell me I'm retarded or let me know if there's something I should know about or give a look into here. Right now I've spun up a local OpenWebUI bot, and I'm trying to figure out how best to handle allowing it to create a notes file for itself so it doesn't have to badly read PDFs every time it answers.

Anonymous
09/17/25(Wed)10:21:18 No.106614780

Anonymous 09/17/25(Wed)10:21:18 No.106614780

>>106614749
There are easier ways to do that. You could convert the PDF into text and chuck that into Ai Studio's System Prompt for example.
Even just koboldCpp + silly tavern using its databank feature would work, I think.
But your approach is better because you are learning shit, so keep at it.
Try using Visual Studio Code alongside Cline, Roo, or the like, too. that's a setup that opens up a lot of possibilities even beyond just programming.

Anonymous
09/17/25(Wed)10:21:35 No.106614783

Anonymous 09/17/25(Wed)10:21:35 No.106614783

>>106614749
Just send back the code it gave you and tell it to write you a tkinter python wrapper gui or some web frontend.

Anonymous
09/17/25(Wed)10:29:04 No.106614827

Anonymous 09/17/25(Wed)10:29:04 No.106614827

>>106613958
But you don't nearly need active equal to dense, since active is all that matters that means dense is useless.

There are probably better sparse schemes out there than MoE, but active<total is now permanent. Because it's always superior when running thousands of queries simultaneously.

Anonymous
09/17/25(Wed)10:32:33 No.106614854

Anonymous 09/17/25(Wed)10:32:33 No.106614854

>>106614780
My issue right now, is that I need to figure out how to work the system so that it can reference a pile of PDFs and pull information from them, present that information, then recombine it and put it back into a set of notes that becomes an overriding source of truth which I can then rely on if I have questions or anything about my campaign later (which is informed by the documents, but more of a mashup/remix of them).

I'm working on different ways of doing this and none have gone super well?

>Try using Visual Studio Code alongside Cline, Roo, or the like, too
I have no idea what any of those are. But I guess I'll look it up.

Anonymous
09/17/25(Wed)10:36:27 No.106614879

Anonymous 09/17/25(Wed)10:36:27 No.106614879

>>106614854
RAG bro just use a fucking rag. ST has a rag functionality

Anonymous
09/17/25(Wed)10:51:58 No.106615011

Anonymous 09/17/25(Wed)10:51:58 No.106615011

Mrs. Claus is on the RAG

Anonymous
09/17/25(Wed)11:07:18 No.106615143

Anonymous 09/17/25(Wed)11:07:18 No.106615143

All I want for Miku is Christmas

Anonymous
09/17/25(Wed)11:07:57 No.106615154

Anonymous 09/17/25(Wed)11:07:57 No.106615154

File: 1747278857413050.jpg (219 KB, 724x483)

219 KB JPG

>>106614879
>RAG bro just use a fucking rag.

Anonymous
09/17/25(Wed)11:11:17 No.106615179

Anonymous 09/17/25(Wed)11:11:17 No.106615179

>>106615143
I want to cum inside miku if you catch my drift

Anonymous
09/17/25(Wed)11:11:38 No.106615184

Anonymous 09/17/25(Wed)11:11:38 No.106615184

>>106614854
>I have no idea what any of those are. But I guess I'll look it up.
Visual Studio code is an IDE where you can have a repository with a bunch of files (usually code) and edit them.
Cline, Roo, etc, are extensions that gives an LLM access to these files. So you can tell it to, for example
>Hey, plot a plan to organize this mess, please.
Then the LLM will read the files in the repository and spit out a plan to organize these. Then you can correct the plan or accept it and the LLM will create the new file re-organized file structure.

Anonymous
09/17/25(Wed)11:13:27 No.106615201

Anonymous 09/17/25(Wed)11:13:27 No.106615201

>>106615184
Visual Studio Code is a text editor that runs in a browser that attempts to emulate an IDE if you install a hundred janky jeetscript written plugins.

Anonymous
09/17/25(Wed)11:15:52 No.106615227

Anonymous 09/17/25(Wed)11:15:52 No.106615227

>>106615201
Well, yes. There's a reason I don't use it for work, but for what anon wants to do, I think it might work.

Anonymous
09/17/25(Wed)11:16:34 No.106615233

Anonymous 09/17/25(Wed)11:16:34 No.106615233

>>106615179
what could anon possibly mean by this?

Anonymous
09/17/25(Wed)11:23:47 No.106615299

Anonymous 09/17/25(Wed)11:23:47 No.106615299

File: outdoors, side by side, f(...).png (1.32 MB, 832x1216)

1.32 MB PNG

thoughts on this?

Anonymous
09/17/25(Wed)11:24:54 No.106615307

Anonymous 09/17/25(Wed)11:24:54 No.106615307

>>106615299
Who is on the left?

Anonymous
09/17/25(Wed)11:25:36 No.106615315

Anonymous 09/17/25(Wed)11:25:36 No.106615315

>>106615307
Nishijou Takumi

Anonymous
09/17/25(Wed)11:27:09 No.106615327

Anonymous 09/17/25(Wed)11:27:09 No.106615327

>>106613234
>32k context length

hard pass

Anonymous
09/17/25(Wed)11:28:36 No.106615340

Anonymous 09/17/25(Wed)11:28:36 No.106615340

>>106615327
I don't think I've ever used more than that desu

Anonymous
09/17/25(Wed)11:41:31 No.106615440

Anonymous 09/17/25(Wed)11:41:31 No.106615440

>>106615340
I feel sorry that you have premature ejaculation

Anonymous
09/17/25(Wed)11:43:02 No.106615457

Anonymous 09/17/25(Wed)11:43:02 No.106615457

>>106614879
yeah and its fucking shit, i just end up using the summarise part of it because its not that context heavy, but even that fucks up sometimes.
but you are correct, it does.

Anonymous
09/17/25(Wed)11:47:05 No.106615498

Anonymous 09/17/25(Wed)11:47:05 No.106615498

>>106615299
i could do with some chaos head

Anonymous
09/17/25(Wed)11:47:09 No.106615500

Anonymous 09/17/25(Wed)11:47:09 No.106615500

>>106615440
People who regenerate a lot to catch the "perfect" response will probably never go that far, most of the time.
Putting this aside, long context is also useful for lore and so on. The main issue remains prompt processing, although with 6B active parameters it shouldn't be too bad.

Anonymous
09/17/25(Wed)11:51:58 No.106615557

Anonymous 09/17/25(Wed)11:51:58 No.106615557

>>106615498
even with teeth?

Anonymous
09/17/25(Wed)11:52:54 No.106615566

Anonymous 09/17/25(Wed)11:52:54 No.106615566

File: xiongmao-plushie-01.png (618 KB, 700x702)

618 KB PNG

Does anybody still care?
https://huggingface.co/mistralai/Magistral-Small-2509

Anonymous
09/17/25(Wed)11:54:48 No.106615587

Anonymous 09/17/25(Wed)11:54:48 No.106615587

>>106615566
MISTRAL LARGE 3 IS COMING
I CAN FEEL IT
UGH

Anonymous
09/17/25(Wed)11:56:19 No.106615602

Anonymous 09/17/25(Wed)11:56:19 No.106615602

>>106615340
you really need at least 128k for anything productive

Anonymous
09/17/25(Wed)11:56:37 No.106615606

Anonymous 09/17/25(Wed)11:56:37 No.106615606

File: 1755217330254087.png (32 KB, 573x196)

32 KB PNG

>>106615566
Wow, it almost beats Miqu!

Anonymous
09/17/25(Wed)11:57:50 No.106615614

Anonymous 09/17/25(Wed)11:57:50 No.106615614

>>106615606
Oh wait, it also says Magistral Medium.
Stupid fucking names.

Anonymous
09/17/25(Wed)11:58:10 No.106615619

Anonymous 09/17/25(Wed)11:58:10 No.106615619

>>106615566
>benchmaxxed
no

Anonymous
09/17/25(Wed)12:02:24 No.106615651

Anonymous 09/17/25(Wed)12:02:24 No.106615651

>>106615566
>thinking model
I don't care

Anonymous
09/17/25(Wed)12:02:30 No.106615655

Anonymous 09/17/25(Wed)12:02:30 No.106615655

>>106615500
>People who regenerate a lot to catch the "perfect" response will probably never go that far, most of the time.
This is me. The game of AI for me is getting it to say exactly what I want.

Anonymous
09/17/25(Wed)12:03:35 No.106615666

Anonymous 09/17/25(Wed)12:03:35 No.106615666

>>106615498
It's underrated desu. I think it's because it's a really slow burn.

Anonymous
09/17/25(Wed)12:05:11 No.106615681

Anonymous 09/17/25(Wed)12:05:11 No.106615681

>>106615566
>small 24b
yawn

Anonymous
09/17/25(Wed)12:06:08 No.106615689

Anonymous 09/17/25(Wed)12:06:08 No.106615689

the conceited minds of /lmg/ fail to see the signs that this is simply yet another big step towards the release of large 3 which will change everything

Anonymous
09/17/25(Wed)12:08:40 No.106615711

Anonymous 09/17/25(Wed)12:08:40 No.106615711

>>106615689
get back to us when they actually release it.

Anonymous
09/17/25(Wed)12:10:12 No.106615729

Anonymous 09/17/25(Wed)12:10:12 No.106615729

>>106615711
I don't know what a Mistral-flavored version of DeepSeek V3/R1 would add to the space, though.

Anonymous
09/17/25(Wed)12:18:21 No.106615801

Anonymous 09/17/25(Wed)12:18:21 No.106615801

File: nouse_research_token_pricing-1.png (146 KB, 1037x622)

146 KB PNG

>>106615566
Did they fix the severe brain damage?

Anonymous
09/17/25(Wed)12:20:26 No.106615822

Anonymous 09/17/25(Wed)12:20:26 No.106615822

meta connect today, are you excited for vague hype for future models and slightly uncomfortable acknowledgements that llama 4 exists?

Anonymous
09/17/25(Wed)12:21:04 No.106615831

Anonymous 09/17/25(Wed)12:21:04 No.106615831

>>106615729
it would be nice if we could get another good 100b MoE

Anonymous
09/17/25(Wed)12:24:22 No.106615863

Anonymous 09/17/25(Wed)12:24:22 No.106615863

>>106615831
>large
>100b MoE
It's either 100B dense or 500B+ MoE if they don't want to embarrass themselves

Anonymous
09/17/25(Wed)12:25:28 No.106615876

Anonymous 09/17/25(Wed)12:25:28 No.106615876

File: zuckerbergcc.jpg (155 KB, 1400x785)

155 KB JPG

>Good Even-
>*CROWD CHEERS*
>Welcome to Meta Connect... Uh.... So...
>*awkward silence*
>So let's talk about our leading advances in Agentic-
>*CROWD CHEERS*
>Agentic-
>*CROWD CHEERS*
>XR
>*CROWD CHEERS*
>Wearable technology
>*CROWD CHEERS*
>Frontier
>*CROWD CHEERS*
>Thank you very much.

Anonymous
09/17/25(Wed)12:26:00 No.106615881

Anonymous 09/17/25(Wed)12:26:00 No.106615881

>>106615729
Wouldn't it be more accurate to call it DeepSeek-flavored Mistral?

Anonymous
09/17/25(Wed)12:26:45 No.106615889

Anonymous 09/17/25(Wed)12:26:45 No.106615889

Meta is about to have its Gemini moment after being stuck in their Bard era up until now.

Anonymous
09/17/25(Wed)12:27:19 No.106615892

Anonymous 09/17/25(Wed)12:27:19 No.106615892

>>106615863
Mistral Medium already requires 4 GPUs to run, according to their blogpost from a few months ago. That's Mistral Large 2 territory.

Anonymous
09/17/25(Wed)12:27:34 No.106615894

Anonymous 09/17/25(Wed)12:27:34 No.106615894

>>106615822
>>106614722

Anonymous
09/17/25(Wed)12:28:14 No.106615900

Anonymous 09/17/25(Wed)12:28:14 No.106615900

>>106615822
>and slightly uncomfortable acknowledgements that llama 4 exists?
Llama-4 is the first open source LLM so perfect that nobody bothered to even try to finetune it.

Anonymous
09/17/25(Wed)12:31:55 No.106615936

Anonymous 09/17/25(Wed)12:31:55 No.106615936

>>106615876
Based, based, BASED

Anonymous
09/17/25(Wed)12:33:16 No.106615943

Anonymous 09/17/25(Wed)12:33:16 No.106615943

>>106614722
>They will then give various courses about how to build things in their metaverse.
I want to make fun of that but sadly Horizon Worlds probably has 5-10 times the active user base as VRChat despite being inferior, miiverse looking proprietary garbage.

Anonymous
09/17/25(Wed)12:33:42 No.106615948

Anonymous 09/17/25(Wed)12:33:42 No.106615948

>>106615876
He should hype it like this: https://youtu.be/kNdp0I8AG40?t=50

Anonymous
09/17/25(Wed)12:34:44 No.106615954

Anonymous 09/17/25(Wed)12:34:44 No.106615954

Yeah, I think LLMs are going downhill now. Their only saving grace is better tool integration. It's dumb to expect anything more than this.

Anonymous
09/17/25(Wed)12:38:07 No.106615979

Anonymous 09/17/25(Wed)12:38:07 No.106615979

>>106615863
>100B dense
no point since glm air also functions like a 100B with just 12B active

Anonymous
09/17/25(Wed)12:39:08 No.106615989

Anonymous 09/17/25(Wed)12:39:08 No.106615989

>>106615979
This is medically ill levels of delusion.

Anonymous
09/17/25(Wed)12:39:44 No.106615995

Anonymous 09/17/25(Wed)12:39:44 No.106615995

Magistral is still broken on Llama.cpp in chat completion mode. It's not processing the reasoning on a separate channel, and it's not outputting special tokens by default, so you can't isolate the thinking blocks.

Anonymous
09/17/25(Wed)12:40:24 No.106616000

Anonymous 09/17/25(Wed)12:40:24 No.106616000

>>106615995
A broken mistral release? No way

Anonymous
09/17/25(Wed)12:40:46 No.106616004

Anonymous 09/17/25(Wed)12:40:46 No.106616004

>>106615989
Did it work? Are you a real woman now?

Anonymous
09/17/25(Wed)12:41:34 No.106616014

Anonymous 09/17/25(Wed)12:41:34 No.106616014

>>106616000
No; broken chat template support in llama.cpp, I think.

Anonymous
09/17/25(Wed)12:46:10 No.106616053

Anonymous 09/17/25(Wed)12:46:10 No.106616053

>>106616014
Wasn't forcing llama.cpp convert scripts to always require mistral-common supposed to fix exactly that?

Anonymous
09/17/25(Wed)12:46:40 No.106616059

Anonymous 09/17/25(Wed)12:46:40 No.106616059

File: Screenshot 2025-09-17 Mak(...).png (96 KB, 766x491)

96 KB PNG

>>106616014
This is why we needed this to happen https://github.com/ggml-org/llama.cpp/pull/15420

Anonymous
09/17/25(Wed)12:47:24 No.106616065

Anonymous 09/17/25(Wed)12:47:24 No.106616065

>>106616014
Wait, didn't Mistral have their own template-less system now? I downloaded a quantization from Unsloth and it's using their "fixed" chat template.

Anonymous
09/17/25(Wed)12:48:02 No.106616071

Anonymous 09/17/25(Wed)12:48:02 No.106616071

>>106616065
Are you running a mistral-common server?

Anonymous
09/17/25(Wed)12:48:53 No.106616079

Anonymous 09/17/25(Wed)12:48:53 No.106616079

>>106616065
It's not really that it's templatless, it just uses pydantic python classes instead of Jinja templates. To the point that they themselves aren't even sure what the fuck is the actual final template that the LLM sees, as far as I can tell.

Anonymous
09/17/25(Wed)12:49:28 No.106616085

Anonymous 09/17/25(Wed)12:49:28 No.106616085

>>106616065
I pulled from git, updated the requirements and recompiled llama.cpp before attempting to run Magistral-Small-2509.

Anonymous
09/17/25(Wed)13:04:05 No.106616197

Anonymous 09/17/25(Wed)13:04:05 No.106616197

>>106616071
Just tried using mistral_common, but it seems to have issues with how SillyTavern is sending parameters and I don't have patience for this shit.
ValueError: Invalid parameters passed to `ChatCompletionRequest.from_openai`:
OpenAI valid parameters but not in `ChatCompletionRequest`: {'stream', 'frequency_penalty', 'presence_penalty'}
Non valid parameters: set()

Anonymous
09/17/25(Wed)13:29:55 No.106616399

Anonymous 09/17/25(Wed)13:29:55 No.106616399

>>106612085
yeah, no matter how good the previous writing is, the end is always some reddit trier cringe. There's no prompting against it either. Just cutting them off really works best. I noticed it does happen less if you use the wrong instruct format on purpose/force use the model as text completion, so the assistant-slop tuning is probably really to blame.

I always want to set up the perfect agent pipeline/CoT/tooluse for my slow burn romance RP, but in general just actively manually nudging the model in the directions you want by either editing it's replies or clearly prompting the ways you want it to react in works the best and leads to the best experiences. That is kinda cool because whoa you can do this with a computer now with no other person involved but it also kinda sucks because the thrill of a model exactly getting where you want to go and work with you in tandem with you doing nothing specific is matched by nothing. It just happens rarely. If it happens to you once, you'll chase that dragon forever.

Anonymous
09/17/25(Wed)13:30:22 No.106616403

Anonymous 09/17/25(Wed)13:30:22 No.106616403

File: mistral bros.png (126 KB, 814x648)

126 KB PNG

you have angered some of the reddit nerds mistral better hope some of the more sycophantic corpo boot lickers organically come to your rescue soon

Anonymous
09/17/25(Wed)13:34:04 No.106616433

Anonymous 09/17/25(Wed)13:34:04 No.106616433

>>106616403
>Their insistence on mistral-common is very prudish
what did he mean by this?

Anonymous
09/17/25(Wed)13:35:33 No.106616446

Anonymous 09/17/25(Wed)13:35:33 No.106616446

Any guide on how to use qwen image?

Anonymous
09/17/25(Wed)13:37:17 No.106616461

Anonymous 09/17/25(Wed)13:37:17 No.106616461

Any guide on how to use qwen next?

Anonymous
09/17/25(Wed)13:38:22 No.106616471

Anonymous 09/17/25(Wed)13:38:22 No.106616471

>>106616461
Don't be a nigger dick you faggot.

Anonymous
09/17/25(Wed)13:39:05 No.106616479

Anonymous 09/17/25(Wed)13:39:05 No.106616479

>>106616471
Don't be a faggot dick, you nigger.

Anonymous
09/17/25(Wed)13:39:26 No.106616480

Anonymous 09/17/25(Wed)13:39:26 No.106616480

>>106615863
qwen next is goated you extreme faggot

Anonymous
09/17/25(Wed)13:40:02 No.106616482

Anonymous 09/17/25(Wed)13:40:02 No.106616482

>>106616461
1) download mlx quant
2) run it (mlx-lm or lm studio)

Anonymous
09/17/25(Wed)13:41:43 No.106616502

Anonymous 09/17/25(Wed)13:41:43 No.106616502

>>106616480
is it better than 235B when it comes to sex?

Anonymous
09/17/25(Wed)13:42:42 No.106616509

Anonymous 09/17/25(Wed)13:42:42 No.106616509

>>106616446
https://www.wikihow.com/Treat-Vaginal-Prolapse

Anonymous
09/17/25(Wed)13:49:08 No.106616567

Anonymous 09/17/25(Wed)13:49:08 No.106616567

>>106616502
emphatic no

Anonymous
09/17/25(Wed)13:54:31 No.106616622

Anonymous 09/17/25(Wed)13:54:31 No.106616622

>>106605647
>https://vocaroo.com/1lIBmYyRNvTz
https://vocaroo.com/1dnH0U2DjAbl
Get vibed.

Anonymous
09/17/25(Wed)13:58:00 No.106616661

Anonymous 09/17/25(Wed)13:58:00 No.106616661

File: 1746876721264702.png (45 KB, 1299x391)

45 KB PNG

Is it that easy to shill for thirdies on HF?

Anonymous
09/17/25(Wed)14:06:31 No.106616736

Anonymous 09/17/25(Wed)14:06:31 No.106616736

>>106616661
Grifters looking to profit off of everything ruin everything, this is why llms are dead now

Anonymous
09/17/25(Wed)14:09:51 No.106616768

Anonymous 09/17/25(Wed)14:09:51 No.106616768

>>106616480
I am glad that I am not poor enough to delude myself to this degree

Anonymous
09/17/25(Wed)14:29:06 No.106616899

Anonymous 09/17/25(Wed)14:29:06 No.106616899

moesissies are ex-aicg (good morning saaar) and vramlets

Anonymous
09/17/25(Wed)14:34:58 No.106616941

Anonymous 09/17/25(Wed)14:34:58 No.106616941

>>106616768
yeah enjoy your 3 t/s running r1 on your (imaginary) server farm

Anonymous
09/17/25(Wed)14:42:37 No.106617006

Anonymous 09/17/25(Wed)14:42:37 No.106617006

>>106616768
How would you say it compares to other models?
Say, GLM air and OSS, maybe cope qwant big qwen too.

Anonymous
09/17/25(Wed)14:45:22 No.106617032

Anonymous 09/17/25(Wed)14:45:22 No.106617032

File: file.png (180 KB, 771x915)

180 KB PNG

>>106616403
lmao patrick btfo'd

Anonymous
09/17/25(Wed)14:46:45 No.106617041

Anonymous 09/17/25(Wed)14:46:45 No.106617041

>>106616622
comparing a 7B TTS model to a 0.5B TTS model. even localllama isn't this dumb.

Anonymous
09/17/25(Wed)14:47:08 No.106617043

Anonymous 09/17/25(Wed)14:47:08 No.106617043

>>106617006
i was comparing to all 3 (oss 120b, glm air, 235b). qwen next has the best quality/performance ratio. oss gives you the worst slop of all.

use qwen next for knowledge graph retrieval and you reach endgame status for serious applications.

Anonymous
09/17/25(Wed)14:49:19 No.106617062

Anonymous 09/17/25(Wed)14:49:19 No.106617062

>>106617043
>knowledge graph retrieval
I was wondering the other day, how god damn heavy is knowledge graph RAG anyway? It seems a lot more complex of a process so I imagine that there's a lot of processing to create the databases. Is it the same for retrieving the information?

Anonymous
09/17/25(Wed)14:50:07 No.106617068

Anonymous 09/17/25(Wed)14:50:07 No.106617068

>>106616941
QQ

llama_model_loader: loaded meta data with 52 key-value pairs and 1096 tensors from models/Kimi-K2-Instruct-0905-GGUF-smol-IQ4_KSS/Kimi-K2-Instruct-0905-smol-IQ4_KSS-00001-of-00011.gguf
llm_load_print_meta: model ftype = IQ4_KSS - 4.0 bpw
llm_load_print_meta: model params = 1.026 T
llm_load_print_meta: model size = 485.008 GiB (4.059 BPW)
llm_load_print_meta: repeating layers = 483.197 GiB (4.053 BPW, 1024.059 B parameters)
llm_load_tensors: offloa2ded 62/62 layers to GPU
llm_load_tensors: CPU buffer size = 420246.00 MiB
llm_load_tensors: CUDA_Host buffer size = 927.50 MiB
llm_load_tensors: CUDA0 buffer size = 13632.97 MiB
llm_load_tensors: CUDA1 buffer size = 18510.81 MiB
llm_load_tensors: CUDA2 buffer size = 18668.47 MiB
llm_load_tensors: CUDA3 buffer size = 19280.69 MiB
llm_load_tensors: CUDA4 buffer size = 5382.00 MiB

INFO [ print_timings] prompt eval time = 144275.78 ms / 16178 tokens ( 8.92 ms per token, 112.13 tokens per second) | tid="136658878742528" id_slot=0 id_task=5782 t_prompt_processing=144275.781 n_prompt_tokens_processed=16178 t_token=8.918023303251328 n_tokens_second=112.13247218533513

Anonymous
09/17/25(Wed)14:52:14 No.106617087

Anonymous 09/17/25(Wed)14:52:14 No.106617087

>>106617068
>k2
into the trash it goes

Anonymous
09/17/25(Wed)14:53:23 No.106617098

Anonymous 09/17/25(Wed)14:53:23 No.106617098

>>106617087
brown hands typed this

Anonymous
09/17/25(Wed)14:55:46 No.106617111

Anonymous 09/17/25(Wed)14:55:46 No.106617111

>>106617068
>prompt eval time = 144275.78 ms / 16178 tokens ( 8.92 ms per token, 112.13 tokens per second)
Didn't llama.cpp hae a PR to increase PP throughput for MoE models? Something about moving only the activated experts during PP, I think?
Whatever happened to that?

Anonymous
09/17/25(Wed)14:58:38 No.106617138

Anonymous 09/17/25(Wed)14:58:38 No.106617138

File: moe.png (20 KB, 1472x734)

20 KB PNG

moebros I made us a flag

Anonymous
09/17/25(Wed)15:00:13 No.106617160

Anonymous 09/17/25(Wed)15:00:13 No.106617160

>>106617138
What am I looking at?

Anonymous
09/17/25(Wed)15:02:12 No.106617180

Anonymous 09/17/25(Wed)15:02:12 No.106617180

File: Screenshot 2025-09-17 at (...).png (124 KB, 2446x780)

124 KB PNG

>>106617098
cope harder dork

Anonymous
09/17/25(Wed)15:04:20 No.106617194

Anonymous 09/17/25(Wed)15:04:20 No.106617194

>>106617180
you're in local cooming general. the only benchmarks that matter is the cockbench and https://eqbench.com/creative_writing.html

Anonymous
09/17/25(Wed)15:04:33 No.106617195

Anonymous 09/17/25(Wed)15:04:33 No.106617195

How much of a performance boost you'd get by directly using llamacpp instead of koboldcpp? I'm not sure if it's worth losing all of the QoLs.

Anonymous
09/17/25(Wed)15:06:06 No.106617207

Anonymous 09/17/25(Wed)15:06:06 No.106617207

>>106617180
The fuck kind of bullshit chart is that?

Anonymous
09/17/25(Wed)15:07:51 No.106617215

Anonymous 09/17/25(Wed)15:07:51 No.106617215

>>106617194
you are advocating for a 32b active moe for creative writing when llama 3.3 70b will outperform it. your benchmaxx wont change a thing.

for real tasks qwen next is sota quality/performance wise rn

Anonymous
09/17/25(Wed)15:07:57 No.106617216

Anonymous 09/17/25(Wed)15:07:57 No.106617216

File: file.png (21 KB, 529x231)

21 KB PNG

Technically not local, but maybe of interest to some codefags that use OR for work projects.

Anonymous
09/17/25(Wed)15:08:16 No.106617220

Anonymous 09/17/25(Wed)15:08:16 No.106617220

>>106617207
It measures intelligence because that's what it's named. GLM 4.5 air has 49 intelligences.

Anonymous
09/17/25(Wed)15:11:40 No.106617249

Anonymous 09/17/25(Wed)15:11:40 No.106617249

>>106617215
tell me that you didn't use K2 without saying you didn't use K2

Anonymous
09/17/25(Wed)15:13:39 No.106617267

Anonymous 09/17/25(Wed)15:13:39 No.106617267

>>106617249
we are running models locally here, not on clouds.

Anonymous
09/17/25(Wed)15:14:20 No.106617271

Anonymous 09/17/25(Wed)15:14:20 No.106617271

>>106617215
Maybe there are some tasks where a 70B dense can outperform a 32B active MoE, but you are greatly underestimating how useful and versatile 1T parameters worth of memorized knowledge is.

Anonymous
09/17/25(Wed)15:16:09 No.106617281

Anonymous 09/17/25(Wed)15:16:09 No.106617281

>>106617216
I thought GPT5 was a bigger failure than llama4?

Anonymous
09/17/25(Wed)15:18:58 No.106617300

Anonymous 09/17/25(Wed)15:18:58 No.106617300

>>106617215
q8 glm air works great and mogs 70b on my machine though
just don't use cope quants
switching to q8 after getting more ram made a big difference for me quality wise

Anonymous
09/17/25(Wed)15:19:23 No.106617305

Anonymous 09/17/25(Wed)15:19:23 No.106617305

>>106617281
Hardly.
It's a good model, just not an incredible leap. The real issue is that they overhyped the shit out of it.
It's mostly better than its predecessor. Mostly.
And at least from my usage, it seems a lot more consistent, and a lot less retarded as the context fills up.

Anonymous
09/17/25(Wed)15:19:43 No.106617308

Anonymous 09/17/25(Wed)15:19:43 No.106617308

>>106617160
Looks like a top down Atari 2600 RPG map with an oversized item pickup:
In the middle of the area is a round bottomed flask with a rag attached to it, tilted 45 degrees clockwise. If the blue liquid soaking the rag and filling the flask is flammable, the item could be presumed to be a molotov cocktail.
The 8 legs around it enable the incendiary device to roll towards the target after being ignited and thrown by the user, whereafter the fuel-soaked rag will burn until extinguishing next to the target without ever igniting the main charge inside the flask.

Anonymous
09/17/25(Wed)15:20:03 No.106617312

Anonymous 09/17/25(Wed)15:20:03 No.106617312

>>106617267
see >>106617068

Anonymous
09/17/25(Wed)15:24:00 No.106617340

Anonymous 09/17/25(Wed)15:24:00 No.106617340

>>106617305
as I understand it, the main improvement came on the backend side, making it cheaper to run.

Anonymous
09/17/25(Wed)15:25:29 No.106617355

Anonymous 09/17/25(Wed)15:25:29 No.106617355

>>106617340
probably because they adopted all of the cost saving innovations that DeepSeek gave away for free in their tech reports and foss repositories

Anonymous
09/17/25(Wed)15:30:18 No.106617410

Anonymous 09/17/25(Wed)15:30:18 No.106617410

>>106617032
This is so strange... Who is this mistral guy anyway - some social media manager?

Anonymous
09/17/25(Wed)15:33:11 No.106617443

Anonymous 09/17/25(Wed)15:33:11 No.106617443

>>106617426
>>106617426
>>106617426

Anonymous
09/17/25(Wed)15:39:50 No.106617499

Anonymous 09/17/25(Wed)15:39:50 No.106617499

>>106612530
Depends on accuracy on what. What that test is showing is showing an image model's difficulty generating out-of-sample images (at the time there weren't pictures of dark skinned miku with dreadlocks on the internet). But it's still able to make a miku on a skateboard in NYC at night with a ball and a cell phone with a text bubble and text on her shirt. Bearing in mind this is a diffusion model, what this seems to support is the more your request is in line with what the LLM saw in its training the more acceptable a low quant will be for your purpose.

Anonymous
09/17/25(Wed)16:13:29 No.106617761

Anonymous 09/17/25(Wed)16:13:29 No.106617761

>>106617111
If you are talking about ikllama I am getting 20T/s pp with 2000 batch size and 200T/s with anything above that (because above 2000 it uses old method). This shit is completely fucking broken. At least on windows 10,

Anonymous
09/17/25(Wed)16:22:41 No.106617864

Anonymous 09/17/25(Wed)16:22:41 No.106617864

>>106617138
Strange Miku but ok.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.