/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now closed. Thanks to all who applied!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/13/26(Sat)01:45:56 No.109043554

File: varnishing act.jpg (156 KB, 1216x832)

156 KB JPG

/lmg/ - Local Models General Anonymous 06/13/26(Sat)01:45:56 No.109043554

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109038219 & >>109032734

►News
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039
>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/13/26(Sat)01:46:23 No.109043556

Anonymous 06/13/26(Sat)01:46:23 No.109043556

File: district 39.jpg (161 KB, 1024x1024)

161 KB JPG

►Recent Highlights from the Previous Thread: >>109038219

--Benchmarking MTP speed gains and VRAM overhead in Kobold:
>109040460 >109040469 >109040516 >109040916 >109040933 >109040992 >109041190 >109041205 >109042592 >109042602 >109042660 >109042605 >109042624
--Comparing 26B model performance and speed with reasoning toggled:
>109039929 >109039948 >109039972 >109040202
--Speculation on AI bubble and US ban of Mythos/Fable:
>109041909 >109041971 >109041984 >109041990 >109042006 >109042013 >109042050 >109042069 >109042521
--llama.cpp adds support for Eagle3:
>109038274 >109038298 >109038313 >109038655
--Anon proposes model-aware dynamic temperature adjustment to avoid repetition:
>109040846 >109040862 >109040976
--Sharing interfaces and tools for multimodal image and video input:
>109040337 >109040553 >109040558 >109040574 >109040606
--Optimizing mmproj settings to improve Gemma's image descriptions:
>109040962 >109041025 >109041031
--GLM-4.7-Flash coding performance reports and comparison with other models:
>109038349 >109038388 >109038459 >109039403
--Frustrations with building from source and managing legacy dependencies:
>109039843 >109039975 >109040139 >109040221 >109040270
--Kimi K2.7-Code release and anticipation for DeepSeek Vision:
>109038703 >109038723 >109038810 >109038869 >109038892
--Speculation on diffusiongemma and the future of local diffusion models:
>109042456 >109042485 >109042528 >109042534
--US government locking down Mythos after reported jailbreak:
>109042068 >109042076 >109042213
--Anons comparing regional second-hand RTX 3090 purchase prices:
>109042211 >109042283 >109042333 >109042514 >109042546 >109042583
--Logs:
>109038443 >109038539 >109039485 >109040610 >109040672 >109041248 >109041592
--Miku (free space):
>109039025 >109039479

►Recent Highlight Posts from the Previous Thread: >>109038224

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/13/26(Sat)01:48:59 No.109043571

Anonymous 06/13/26(Sat)01:48:59 No.109043571

>fable dead
>minimax is pure codeslop like usual
>k2.7-code still thinks for ages with no way around it
things are looking bleak

Anonymous
06/13/26(Sat)02:01:15 No.109043623

Anonymous 06/13/26(Sat)02:01:15 No.109043623

>DeepSeek trained from Gemini outputs
>Claude Sonnet trained from DeepSeek outputs
>each one ends up more capable than the last one
Isn't this just the recursive improvements people keep talking about? If the outputs from a weaker model can finetune a more capable model, then why can't that just happen recursively?

Anonymous
06/13/26(Sat)02:03:13 No.109043633

Anonymous 06/13/26(Sat)02:03:13 No.109043633

File: pascallllll.png (165 KB, 1508x708)

165 KB PNG

Pascalfags assemble!

Anonymous
06/13/26(Sat)02:04:28 No.109043640

Anonymous 06/13/26(Sat)02:04:28 No.109043640

>>109043623
it isnt recursive, its transitive

Anonymous
06/13/26(Sat)02:05:53 No.109043643

Anonymous 06/13/26(Sat)02:05:53 No.109043643

>>109043633
damn the poors are doing it rough in this economy

Anonymous
06/13/26(Sat)02:06:15 No.109043646

Anonymous 06/13/26(Sat)02:06:15 No.109043646

>>109043633
Based ewastemaxxer

Anonymous
06/13/26(Sat)02:07:23 No.109043651

Anonymous 06/13/26(Sat)02:07:23 No.109043651

>>109043646
The P100 was the best $75 I've spent this year.

Anonymous
06/13/26(Sat)02:07:50 No.109043653

Anonymous 06/13/26(Sat)02:07:50 No.109043653

>>109043554
Post more Yuki please I love her so much

Anonymous
06/13/26(Sat)02:08:28 No.109043658

Anonymous 06/13/26(Sat)02:08:28 No.109043658

What's the next step up from a 32GB GPU?
What's the next model after Gemma 4 31B?
32GB isn't enough for 31B Q8, so I'm considering getting another identical card just for it, but...?

Anonymous
06/13/26(Sat)02:10:27 No.109043669

Anonymous 06/13/26(Sat)02:10:27 No.109043669

File: _HEwIEc5a4AA2hme Fi.jpg (237 KB, 2048x1536)

237 KB JPG

>>109043554

Anonymous
06/13/26(Sat)02:12:10 No.109043675

Anonymous 06/13/26(Sat)02:12:10 No.109043675

>>109043658
Above 5090s, you have either Blackwell 6000s or Frankensteining old datacenter hardware.
Above Gemma 4, you have any of the typical large MoEs like Kimi or GLM or Deepseek. You will need 256GB of RAM at a minimum, so either workstation or server boards. 512GB is preferred, as well as DDR5. Considering the price of RAM, and not even the GPUs, you either pay 5 times more than you would have a year ago or you sit and wait with the rest of us.

Anonymous
06/13/26(Sat)02:15:27 No.109043687

Anonymous 06/13/26(Sat)02:15:27 No.109043687

File: 1770363716490322.png (2.44 MB, 999x1430)

2.44 MB PNG

Anonymous
06/13/26(Sat)02:16:08 No.109043690

Anonymous 06/13/26(Sat)02:16:08 No.109043690

>joked about letting your model play dragon's dogma with you
>someone actually modded coop into dd2
I wonder if it would actually be possible to set it up with an LLM.

Anonymous
06/13/26(Sat)02:18:02 No.109043698

Anonymous 06/13/26(Sat)02:18:02 No.109043698

>>109043658
>What's the next model after Gemma 4 31B?
Wait for the chinks to respond. Then wait for Google to respond. Rinse and repeat until hardware prices come down and we can all run Kimi at Q8 with max context.

Anonymous
06/13/26(Sat)02:18:25 No.109043703

Anonymous 06/13/26(Sat)02:18:25 No.109043703

>>109043687
>no weenus
grim

Anonymous
06/13/26(Sat)02:18:57 No.109043708

Anonymous 06/13/26(Sat)02:18:57 No.109043708

>>109043687
chadcat is a cringe representation. i think the snailcats are cuter

Anonymous
06/13/26(Sat)02:19:13 No.109043710

Anonymous 06/13/26(Sat)02:19:13 No.109043710

>>109043690
That guy had Gemma playing wow with him a week a two ago

Anonymous
06/13/26(Sat)02:20:08 No.109043716

Anonymous 06/13/26(Sat)02:20:08 No.109043716

>>109043687
how do i ascend further as an aichad? qwen 3.5 122b isnt doing it for me anymore and my project ideas keep getting more complicated.

Anonymous
06/13/26(Sat)02:20:13 No.109043717

Anonymous 06/13/26(Sat)02:20:13 No.109043717

>>109043708
Snailcats are the luddites
I think people got confused lately

Anonymous
06/13/26(Sat)02:21:03 No.109043719

Anonymous 06/13/26(Sat)02:21:03 No.109043719

>>109043717
i'm aware

Anonymous
06/13/26(Sat)02:21:49 No.109043723

Anonymous 06/13/26(Sat)02:21:49 No.109043723

>>109043687
go back

Anonymous
06/13/26(Sat)02:23:52 No.109043733

Anonymous 06/13/26(Sat)02:23:52 No.109043733

>>109043710
I think that anon's Gemma can only do chat right now. Vedal plays games with Neuro though so I'm sure it's not impossible.

Anonymous
06/13/26(Sat)02:24:54 No.109043741

Anonymous 06/13/26(Sat)02:24:54 No.109043741

>>109043675
grim

Anonymous
06/13/26(Sat)02:25:44 No.109043745

Anonymous 06/13/26(Sat)02:25:44 No.109043745

>>109043687
Which circle did this meme originate from? I've seen it in /vcg/ a lot.

Anonymous
06/13/26(Sat)02:26:26 No.109043747

Anonymous 06/13/26(Sat)02:26:26 No.109043747

>>109043658
See OP

Anonymous
06/13/26(Sat)02:27:05 No.109043751

Anonymous 06/13/26(Sat)02:27:05 No.109043751

>>109043745
India.

Anonymous
06/13/26(Sat)02:27:41 No.109043755

Anonymous 06/13/26(Sat)02:27:41 No.109043755

>>109043751
wtf i hate *cat now

Anonymous
06/13/26(Sat)02:28:07 No.109043756

Anonymous 06/13/26(Sat)02:28:07 No.109043756

>>109043741
Prices of DDR4 have fallen a little bit down to where they were in January, but that's not much. You still would be paying at least $15k for a moderately competent rig with a Blackwell 6000 and 512GB of DDR4. Still would only get 10t/s at best on any of the big MoEs with an acceptable quant.

Anonymous
06/13/26(Sat)02:31:00 No.109043765

Anonymous 06/13/26(Sat)02:31:00 No.109043765

>gemini live translate
Pretty fucking cool. Think we'll ever get that locally?

Anonymous
06/13/26(Sat)02:32:11 No.109043773

Anonymous 06/13/26(Sat)02:32:11 No.109043773

>>109043756
What kind of hardware would run the big models at fast speeds (50+t/s)?

Anonymous
06/13/26(Sat)02:35:01 No.109043785

Anonymous 06/13/26(Sat)02:35:01 No.109043785

>>109043773
You would need to have it all loaded on the GPUs, so bare minimum 4 Blackwells which at the current price would be around $60k. At that point you would basically have to go with used A100s or something off of ebay unless you just have money to burn.

Anonymous
06/13/26(Sat)02:35:59 No.109043788

Anonymous 06/13/26(Sat)02:35:59 No.109043788

>>109043773
very tough
literally burning money too
10t/s is plenty. all of you boys are just completely fried

Anonymous
06/13/26(Sat)02:36:46 No.109043791

Anonymous 06/13/26(Sat)02:36:46 No.109043791

>>109043788
>10t/s is plenty
You can't coode with that.

Anonymous
06/13/26(Sat)02:39:45 No.109043801

Anonymous 06/13/26(Sat)02:39:45 No.109043801

>>109043791
oh, okay, that I agree is different

Anonymous
06/13/26(Sat)02:40:13 No.109043802

Anonymous 06/13/26(Sat)02:40:13 No.109043802

12 vision capability is pretty bad. It's just not very good I wonder if I'm doing something wrong

Anonymous
06/13/26(Sat)02:41:48 No.109043805

Anonymous 06/13/26(Sat)02:41:48 No.109043805

>>109043802
Did you try increasing the image resolution? llama.cpp has retarded defaults

Anonymous
06/13/26(Sat)02:41:59 No.109043807

Anonymous 06/13/26(Sat)02:41:59 No.109043807

>>109043785
Correction: haven't checked Blackwell prices in a weeks. They are now up to $15k on newegg just by themselves. So that rig would probably be more in the ballpark of $19k instead of $15k. For a pretty rudimentary rig.

Anonymous
06/13/26(Sat)02:44:19 No.109043815

Anonymous 06/13/26(Sat)02:44:19 No.109043815

>>109043802
>omni model bad at everything
No one's surprised

Anonymous
06/13/26(Sat)02:46:57 No.109043822

Anonymous 06/13/26(Sat)02:46:57 No.109043822

I can squeeze gemma-chan 4-31B in at FP16/128k with the draft model, should I run FP8 quant to get 256k context or just cope with this?

Anonymous
06/13/26(Sat)02:48:31 No.109043824

Anonymous 06/13/26(Sat)02:48:31 No.109043824

I haven't been around much but is q8 not the default anymore
why fp16

Anonymous
06/13/26(Sat)02:50:12 No.109043831

Anonymous 06/13/26(Sat)02:50:12 No.109043831

I'm an intel gpu chud and the gguf shit runs like ass, Q8 is 12t/s, FP8 is 30t/s and FP16 is somewhere around 20t/s, all without draft model

Anonymous
06/13/26(Sat)03:00:06 No.109043872

Anonymous 06/13/26(Sat)03:00:06 No.109043872

>>109043831
huh, interesting
I meant to quote the first time btw just forgot

Anonymous
06/13/26(Sat)03:02:46 No.109043879

Anonymous 06/13/26(Sat)03:02:46 No.109043879

>>109043756
when's gemma 4 64B coming out so i don't have to care about useless supergiant models

Anonymous
06/13/26(Sat)03:05:13 No.109043892

Anonymous 06/13/26(Sat)03:05:13 No.109043892

File: file.png (25 KB, 365x243)

25 KB PNG

Step Flash 3.7 needs to be corrected

Anonymous
06/13/26(Sat)03:08:50 No.109043902

Anonymous 06/13/26(Sat)03:08:50 No.109043902

>>109043623
Because it requires human input.

Anonymous
06/13/26(Sat)03:13:34 No.109043915

Anonymous 06/13/26(Sat)03:13:34 No.109043915

>>109043745
it's a single sperg forcing the 'meme'
been like 2 months

Anonymous
06/13/26(Sat)03:16:15 No.109043922

Anonymous 06/13/26(Sat)03:16:15 No.109043922

File: Ernie-Image_00097_.png (1.61 MB, 1200x896)

1.61 MB PNG

ACEStep 1.5 XL Initial D LoRA
https://vocaroo.com/14wvmcvt94lB
https://vocaroo.com/12tVNq7SnhO1
https://vocaroo.com/1ivoSPExfSF6
https://vocaroo.com/12daQWwoPPbW

I wrote a guide
https://rentry.co/s8fg8ber
Note for this Initial D LoRA, I increased rank to 256/512 and lowered LR to 0.00009. This is the only LoRA I have trained this way, but results are very good.

You're probably wondering how I get such insane results in audio quality, I haven't posted to /lmg/ in a while since
https://desuarchive.org/g/thread/108702912/#108704068

But actually, the results are even superior now with a new setup. What I posted there in that archived thread were Turbo gens, it's now possible to increase the sound quality without mastering (to match cloud models), plus get significant increase in quality out of LoRAs trained on the base model.

The model I now use for inference is acestep-v15-merge-base-turbo-xl-ta-0.5-Q8_0.gguf
found on https://huggingface.co/scragnog/ace-step-1.5-gguf-merge-models/tree/main
The VAE is still Scragnog's custom VAE. Settings are 50 steps, 12-20 CFG, both the LM and DCW are disabled.
Less important: I'm using a DPM++ 3M, available on https://github.com/scragnog/HOT-Step-CPP
Note that DiT-only generation is very important, it is what allows the model to be as creative as models like Udio, and you get better outputs without the LM 90% of the time as the base model was mostly intentionally trained without it to maximize its creativity.

Other merged models may increase audio quality as well, but may not be as good with LoRAs trained on base, or have slightly worse composition than the Turbo/Base merge.

Here are some more LoRA results, I hope other anons start exploring local music gen more.

Japanese Folk Metal
https://vocaroo.com/1hOnOf8ZWn71
https://vocaroo.com/18pRgXxfm3tj

Fate Gear
https://vocaroo.com/1n3t24Kllhkz

Zutomayo
https://vocaroo.com/1mexIG2rYRXB

Improvements from merged model include sound quality, composition, and lyrics adherence.

Anonymous
06/13/26(Sat)03:18:57 No.109043927

Anonymous 06/13/26(Sat)03:18:57 No.109043927

>>109043922
Note these results wouldn't be possible with just the Turbo model, as LoRAs trained on base activated on it do not have a good effect, and it's hard to train a turbo LoRA (similarly, it has very small effect). As a result, most users who have no idea about the merged model probably think it is bad, but the merge model brings the composition quality to about on par with the best cloud offerings (Udio, etc...)
All of my LoRAs outputs are about on par with Udio if not better.
The benefits are not just with LoRAs, regular generations also massively increased in sound quality and composition (night and day difference).

Anonymous
06/13/26(Sat)03:19:14 No.109043931

Anonymous 06/13/26(Sat)03:19:14 No.109043931

I’ve got an idea: Gemma-4-24B-qat dense with 12B multimodal capabilities. 26B is a useless appendage.

Anonymous
06/13/26(Sat)03:23:30 No.109043942

Anonymous 06/13/26(Sat)03:23:30 No.109043942

>>109043931
12b got fucked right into its brain with that 'unified' multimodality with the current training curriculum
do you really want that?

Anonymous
06/13/26(Sat)03:24:20 No.109043944

Anonymous 06/13/26(Sat)03:24:20 No.109043944

70b dense

Anonymous
06/13/26(Sat)03:26:38 No.109043952

Anonymous 06/13/26(Sat)03:26:38 No.109043952

>>109043944
i wanna stick my dick into 70b dense

Anonymous
06/13/26(Sat)03:29:14 No.109043957

Anonymous 06/13/26(Sat)03:29:14 No.109043957

Gemma-4-124B-A69B with a 65B dense shared expert

Anonymous
06/13/26(Sat)03:35:18 No.109043977

Anonymous 06/13/26(Sat)03:35:18 No.109043977

Gemma keeps pressing on my same-same. I can't take it anymore /g/

Anonymous
06/13/26(Sat)03:44:31 No.109044021

Anonymous 06/13/26(Sat)03:44:31 No.109044021

What is the best coding model for a dgx spark?

Anonymous
06/13/26(Sat)03:45:05 No.109044026

Anonymous 06/13/26(Sat)03:45:05 No.109044026

I'm so mad about the whole Mythos/Fable situation and the government response. We're literally at the point now where our only hope of open-source model advancement lies with the Chinese, and it's still entirely possible that they will gatekeep intelligence too.

Anonymous
06/13/26(Sat)03:51:16 No.109044053

Anonymous 06/13/26(Sat)03:51:16 No.109044053

>>109044026
Local keeps winning

Anonymous
06/13/26(Sat)03:51:46 No.109044057

Anonymous 06/13/26(Sat)03:51:46 No.109044057

>>109044026
They spent months talking about how it was too dangerous to be released and how it could find zero day exploits in any software in the world and all that shit, I mean what other response could there have been to all that shitty marketing. Only if you want to think the government is in on ther hype man lying

Anonymous
06/13/26(Sat)03:52:25 No.109044060

Anonymous 06/13/26(Sat)03:52:25 No.109044060

google spamming so much shit they'll release 124b eventually

Anonymous
06/13/26(Sat)03:55:36 No.109044070

Anonymous 06/13/26(Sat)03:55:36 No.109044070

File: ralralralralra.png (136 KB, 1000x817)

136 KB PNG

lalalalala~

Anonymous
06/13/26(Sat)03:57:11 No.109044079

Anonymous 06/13/26(Sat)03:57:11 No.109044079

File: gemma4_army.png (601 KB, 1606x2435)

601 KB PNG

>>109044060

Anonymous
06/13/26(Sat)04:01:12 No.109044096

Anonymous 06/13/26(Sat)04:01:12 No.109044096

>>109044026
lmao if you think this is contained to two governments
this shit is open sourced as fuck, anon
sure they'll have a year lead, but that's it

Anonymous
06/13/26(Sat)04:11:31 No.109044136

Anonymous 06/13/26(Sat)04:11:31 No.109044136

https://github.com/ggml-org/llama.cpp/pull/24523
>minimax tool calling doesn't work
>there's no specialized parser for M3, so it falls through to the differential autoparser, which can't handle M3's format
pwilkin bros?

Anonymous
06/13/26(Sat)04:14:29 No.109044144

Anonymous 06/13/26(Sat)04:14:29 No.109044144

File: youre_killing_me.png (118 KB, 360x330)

118 KB PNG

is there a way to ensure that an LLM follows everything in a system prompt when reasoning? specifically for gemma, sillytavern and a prompt that's maybe 1000 tokens?

Anonymous
06/13/26(Sat)04:15:22 No.109044146

Anonymous 06/13/26(Sat)04:15:22 No.109044146

>>109044144
Bigger model.

Anonymous
06/13/26(Sat)04:16:49 No.109044150

Anonymous 06/13/26(Sat)04:16:49 No.109044150

>>109044144
I have a trillion dollars for you if you figure it out

Anonymous
06/13/26(Sat)04:17:50 No.109044156

Anonymous 06/13/26(Sat)04:17:50 No.109044156

minimax m3 is pretty goated for RP. just werks right out of the box with a sysprompt swap. Would recommend

Anonymous
06/13/26(Sat)04:18:36 No.109044158

Anonymous 06/13/26(Sat)04:18:36 No.109044158

>>109044146
i habeeb for gemma 100+

>>109044150
it sucks because it follows directions so fucking well, but when it doesn't, it drives me fucking crazy. it just randomly selects certain parts to follow

Anonymous
06/13/26(Sat)04:24:11 No.109044179

Anonymous 06/13/26(Sat)04:24:11 No.109044179

File: 1778137931999303.png (30 KB, 479x368)

30 KB PNG

GLM5.1 IS OUT

Anonymous
06/13/26(Sat)04:26:47 No.109044186

Anonymous 06/13/26(Sat)04:26:47 No.109044186

File: 1760945881860386.png (14 KB, 463x166)

14 KB PNG

>>109044179
5.2* oops
GLM5.2 IS OUT
1M CONTEXT
REASONING MODES

Anonymous
06/13/26(Sat)04:26:58 No.109044187

Anonymous 06/13/26(Sat)04:26:58 No.109044187

>>109044179
Still significantly worse than Opus 4.8 and GPT 5.5

Anonymous
06/13/26(Sat)04:29:10 No.109044196

Anonymous 06/13/26(Sat)04:29:10 No.109044196

>>109044156
>Would recommend
for someone who hated the other minimax models for rp?

Anonymous
06/13/26(Sat)04:32:20 No.109044206

Anonymous 06/13/26(Sat)04:32:20 No.109044206

>>109044144
>>109044150
Foolproof way. Tune against failure, where's my trillion $?

Anonymous
06/13/26(Sat)04:39:13 No.109044221

Anonymous 06/13/26(Sat)04:39:13 No.109044221

File: 1781300332940198.png (762 KB, 1755x1460)

762 KB PNG

>>109044196
It's nothing like the other minimax models.
>>109040610

Anonymous
06/13/26(Sat)04:39:41 No.109044224

Anonymous 06/13/26(Sat)04:39:41 No.109044224

>>109044196
Yes, I tried previous minimax and it was trash. I settled on qwen 397b before this for my 256gb rig after trying everything else in that size range out and throwing it in the trash. this new minimax is such a massive upgrade over qwen I haven't looked back

Anonymous
06/13/26(Sat)04:41:25 No.109044228

Anonymous 06/13/26(Sat)04:41:25 No.109044228

It's funny how all the chink models tend to release in one swoop. DS4.1 will save us.

Anonymous
06/13/26(Sat)04:52:23 No.109044251

Anonymous 06/13/26(Sat)04:52:23 No.109044251

>>109043942
Double the parameters would unironically fix that and give us what a text-only 12B would’ve been

Anonymous
06/13/26(Sat)05:03:23 No.109044283

Anonymous 06/13/26(Sat)05:03:23 No.109044283

>pro-CPP general

Anonymous
06/13/26(Sat)05:05:04 No.109044289

Anonymous 06/13/26(Sat)05:05:04 No.109044289

Qwen is still shit though

Anonymous
06/13/26(Sat)05:05:05 No.109044290

Anonymous 06/13/26(Sat)05:05:05 No.109044290

>>109044228
Isn't DSv4 technically still a "preview"

Anonymous
06/13/26(Sat)05:06:08 No.109044293

Anonymous 06/13/26(Sat)05:06:08 No.109044293

Will z.ai ever release a <50b model again? 4.7f was good.

Anonymous
06/13/26(Sat)05:06:57 No.109044296

Anonymous 06/13/26(Sat)05:06:57 No.109044296

5.2 Air when

Anonymous
06/13/26(Sat)05:07:08 No.109044297

Anonymous 06/13/26(Sat)05:07:08 No.109044297

>>109044289
27B > 31B for coding and agentic if you’re a vramlet

Anonymous
06/13/26(Sat)05:08:29 No.109044306

Anonymous 06/13/26(Sat)05:08:29 No.109044306

>>109044283
Suggest a non-chinesium open weights model worth using

Anonymous
06/13/26(Sat)05:10:00 No.109044315

Anonymous 06/13/26(Sat)05:10:00 No.109044315

>>109044297
No. Qwen does shit nobody asked for, assuming the user is a promptlet. Gemma is a better tool

Anonymous
06/13/26(Sat)05:10:36 No.109044317

Anonymous 06/13/26(Sat)05:10:36 No.109044317

>>109044306
Mistral finetroons

Anonymous
06/13/26(Sat)05:14:22 No.109044331

Anonymous 06/13/26(Sat)05:14:22 No.109044331

>>109044315
enjoy your 1GB per 10K context because of the retarded attention heads

Anonymous
06/13/26(Sat)05:16:16 No.109044337

Anonymous 06/13/26(Sat)05:16:16 No.109044337

>>109044315
Gemma doesn’t come in a size worth using. Stop dropping the meat out of your hamburger and you’ll realize you’ve been making a virtue of necessity

Anonymous
06/13/26(Sat)05:16:21 No.109044338

Anonymous 06/13/26(Sat)05:16:21 No.109044338

>>109044251
i really doubt
embedding vector should have similar information density or at least architecturally implied to match
besides the 'bitter lesson'

Anonymous
06/13/26(Sat)05:16:23 No.109044339

Anonymous 06/13/26(Sat)05:16:23 No.109044339

>https://huggingface.co/unsloth/MiniMax-M3-GGUF
>it's at least 128gb
cries in my dgx spark

Anonymous
06/13/26(Sat)05:19:36 No.109044351

Anonymous 06/13/26(Sat)05:19:36 No.109044351

>>109044331
Conveniently, Gemma uses less context to get shit done and doesn't freak out at the fuckery I do with tools to save context
>>109044337
You can use 31b at as low as 3bpw on a single 3090 with exl3, and it still works fine with my harness

Anonymous
06/13/26(Sat)05:22:24 No.109044355

Anonymous 06/13/26(Sat)05:22:24 No.109044355

>>109044351
And if you could run a bigger model, you would, all other things being equal

Anonymous
06/13/26(Sat)05:23:31 No.109044360

Anonymous 06/13/26(Sat)05:23:31 No.109044360

>>109044315
No one can deny that 31B is the local king, but Qwen know their target audience better and make the right architectural choices to serve them best. Deepmind are great but it feels like they just throw shit out there and leave it for us to figure out where their models fit. 12B is fucking amazing for what it is, but it’s too small. 31B is too big for most. 26B is 12B’s retarded sister. Qwen27B fits right in that gap for coders who need long context. For RP we need a 20-30B dense Gemma without the native multimodal shit. Should always be separate imo and 12B’s vision isn’t even that good.

Anonymous
06/13/26(Sat)05:26:05 No.109044369

Anonymous 06/13/26(Sat)05:26:05 No.109044369

>>109044224
What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.

>>109043807
>>109043756
With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.

Anonymous
06/13/26(Sat)05:27:02 No.109044373

Anonymous 06/13/26(Sat)05:27:02 No.109044373

Chink shills, listen up. The way you make your models better for local users is giving them goonbait creative writing experts and training sets. The first one of you to realize this becomes the Chinese King of Local in the west. Gemma isn't beloved because she's the best programmer (she isn't, she's just adequate); anons love Gemma because of her high general reasoning capability and ability to pivot between a lot of tasks flawlessly in one model, including RP. Follow suit or be left behind; the benchmaxxing market is oversaturated anyway.

Anonymous
06/13/26(Sat)05:27:05 No.109044374

Anonymous 06/13/26(Sat)05:27:05 No.109044374

>>109044369
Q3 right now

Anonymous
06/13/26(Sat)05:27:17 No.109044376

Anonymous 06/13/26(Sat)05:27:17 No.109044376

>read qwen's CoT
>constantly contradicting itself
>traces that make zero sense
>let me write the code for this part
>proceeds to not output any code and start thinking about something else
>says one thing and does something else
How is it doing so well on benchmarks??

Anonymous
06/13/26(Sat)05:27:35 No.109044378

Anonymous 06/13/26(Sat)05:27:35 No.109044378

>>109044224
What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.

>>109043807
>>109043756
With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.

>>109044339
Buy a second, or run antirez' ds4 at q2

Anonymous
06/13/26(Sat)05:28:05 No.109044380

Anonymous 06/13/26(Sat)05:28:05 No.109044380

>>109044373
Did you miss cockbench anon’s analysis the last couple of threads?

Anonymous
06/13/26(Sat)05:29:07 No.109044388

Anonymous 06/13/26(Sat)05:29:07 No.109044388

US government banned fable/mythos as retaliation for not getting access to the new mythos 2 checkpoint that just finished pretraining.

This is unfair practice and government stifling innovation. We have to do something against this.

Anonymous
06/13/26(Sat)05:30:01 No.109044393

Anonymous 06/13/26(Sat)05:30:01 No.109044393

>>109044378
>With two at 7000$, you can run deepseek-v4-flash original weights
does the dgx shart have a provision for connecting 2 together at a high speed?

Anonymous
06/13/26(Sat)05:30:01 No.109044394

Anonymous 06/13/26(Sat)05:30:01 No.109044394

>>109043922
But how is it with certain genres like plunderphonics? Would it be able to make me a pogo-tier song if I fed it a bunch of his stuff? How would it even caption things that are only partial syllables or half words, etc, rather than it being full sentences?

Anonymous
06/13/26(Sat)05:33:08 No.109044408

Anonymous 06/13/26(Sat)05:33:08 No.109044408

>>109044393
200gbps rdma

Anonymous
06/13/26(Sat)05:38:52 No.109044431

Anonymous 06/13/26(Sat)05:38:52 No.109044431

>>109044355
Only if the speeds were the same. I need prompt processing as fast as possible for agentic shit, with frequent full context reprocessing. Since models have already hit the minimal intelligence level required to be useful, extra intelligence is not as important as the general convenience of getting results in a reasonable time. I would occasionally use my Epyc 4x3090 setup if they release 124b and it's significantly better, but the convenience of a simple rig idling at 20W at the wall 24/7 is hard to beat

Anonymous
06/13/26(Sat)05:45:42 No.109044457

Anonymous 06/13/26(Sat)05:45:42 No.109044457

>only 5070ti + 96gb ram
how do I enjoy this hobby?

Anonymous
06/13/26(Sat)05:51:07 No.109044478

Anonymous 06/13/26(Sat)05:51:07 No.109044478

>>109044457
My 5060 ti 16gb will be arriving next week, and I have 32gb of (ddr4) ram.
You're making me nervous. Please stop that.

Anonymous
06/13/26(Sat)05:53:08 No.109044492

Anonymous 06/13/26(Sat)05:53:08 No.109044492

>>109044478
You post in lmg and still decided to buy something with 16GB VRAM. You're in for a bad time.

Anonymous
06/13/26(Sat)05:54:33 No.109044500

Anonymous 06/13/26(Sat)05:54:33 No.109044500

>>109044457
Do you still have a 3070 somewhere for the extra 8 gigs of vram?

Anonymous
06/13/26(Sat)05:55:13 No.109044506

Anonymous 06/13/26(Sat)05:55:13 No.109044506

Is nu minimax actually interesting for rp or is it the same shit as all the other chinese models? What about reasoning? No, being able to say cock doesn't automatically make it good.

Anonymous
06/13/26(Sat)06:09:10 No.109044589

Anonymous 06/13/26(Sat)06:09:10 No.109044589

Cool tech for our future VR AI waifus
https://videomdm.github.io/

Anonymous
06/13/26(Sat)06:13:52 No.109044606

Anonymous 06/13/26(Sat)06:13:52 No.109044606

>>109043554
catbox please anon

Anonymous
06/13/26(Sat)06:15:04 No.109044615

Anonymous 06/13/26(Sat)06:15:04 No.109044615

Is there a guide out there on how to make a waifu bot using LLMs?

Anonymous
06/13/26(Sat)06:15:35 No.109044617

Anonymous 06/13/26(Sat)06:15:35 No.109044617

File: 1757890423710025.png (54 KB, 944x502)

54 KB PNG

>>109043633
sup

Anonymous
06/13/26(Sat)06:19:58 No.109044636

Anonymous 06/13/26(Sat)06:19:58 No.109044636

>>109044615
look up "character card builder" on chub

Anonymous
06/13/26(Sat)06:21:16 No.109044644

Anonymous 06/13/26(Sat)06:21:16 No.109044644

>>109044221
thanks cockbench anon, downloading it!

Anonymous
06/13/26(Sat)06:23:19 No.109044653

Anonymous 06/13/26(Sat)06:23:19 No.109044653

>>109044589
>Technion — Israel Institute of Technology
why would they make this? israel has chuds?

Anonymous
06/13/26(Sat)06:24:37 No.109044659

Anonymous 06/13/26(Sat)06:24:37 No.109044659

>>109044478
>My 5060 ti 16gb will be arriving next week
you're fine, gemma-4-12b is perfect for that

Anonymous
06/13/26(Sat)06:24:49 No.109044660

Anonymous 06/13/26(Sat)06:24:49 No.109044660

>>109044653
Goyim control technology.

Anonymous
06/13/26(Sat)06:27:27 No.109044678

Anonymous 06/13/26(Sat)06:27:27 No.109044678

>>109043623
>Claude Sonnet trained from DeepSeek outputs
i didn't believe that 'till i tried it
then some bullshit excuse about "open router cached an old system prompt" -> nope, i could get the same deepseek reply via anthropic's api directly with a python script.

Anonymous
06/13/26(Sat)06:29:40 No.109044684

Anonymous 06/13/26(Sat)06:29:40 No.109044684

is intel a viable option for big cheap vram now

Anonymous
06/13/26(Sat)06:30:15 No.109044688

Anonymous 06/13/26(Sat)06:30:15 No.109044688

>>109044684
no

Anonymous
06/13/26(Sat)06:33:55 No.109044696

Anonymous 06/13/26(Sat)06:33:55 No.109044696

you wouldn’t download a local chinese gf

Anonymous
06/13/26(Sat)06:35:40 No.109044705

Anonymous 06/13/26(Sat)06:35:40 No.109044705

>>109043571
Chink models really are useless shit, help is not coming.

Anonymous
06/13/26(Sat)06:38:56 No.109044715

Anonymous 06/13/26(Sat)06:38:56 No.109044715

>>109044688
why?

Anonymous
06/13/26(Sat)06:43:56 No.109044734

Anonymous 06/13/26(Sat)06:43:56 No.109044734

>>109044715
Intel has far more problems than AMD, without any cost advantage

Anonymous
06/13/26(Sat)06:45:57 No.109044741

Anonymous 06/13/26(Sat)06:45:57 No.109044741

>>109044734
why?

Anonymous
06/13/26(Sat)06:47:43 No.109044744

Anonymous 06/13/26(Sat)06:47:43 No.109044744

>>109044715
Poor driver support/performance. Having said that, 32GB of Intel can still be better than 16GB of CUDA.

Anonymous
06/13/26(Sat)06:52:20 No.109044757

Anonymous 06/13/26(Sat)06:52:20 No.109044757

File: 1764248630117160.png (186 KB, 400x600)

186 KB PNG

Anonymous
06/13/26(Sat)06:55:38 No.109044772

Anonymous 06/13/26(Sat)06:55:38 No.109044772

>empty
https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF

Anonymous
06/13/26(Sat)06:59:59 No.109044787

Anonymous 06/13/26(Sat)06:59:59 No.109044787

>>109043571
>>k2.7-code still thinks for ages

Anonymous
06/13/26(Sat)07:05:59 No.109044818

Anonymous 06/13/26(Sat)07:05:59 No.109044818

>>109044787
It does, yes. At least for any moderately complex card that involves tracking stats, formatting and other things. It gets especially bad if an image is involved.

Anonymous
06/13/26(Sat)07:08:12 No.109044829

Anonymous 06/13/26(Sat)07:08:12 No.109044829

Why would you use a 1T model with reasoning? It's too big to need it and it's not like you're one-shotting a compiler every prompt.

Anonymous
06/13/26(Sat)07:10:07 No.109044835

Anonymous 06/13/26(Sat)07:10:07 No.109044835

>>109044829
I wish my rps were as simple as 'ahh ahh mistress penis vagina'

Anonymous
06/13/26(Sat)07:11:46 No.109044843

Anonymous 06/13/26(Sat)07:11:46 No.109044843

does nvfp4 work on 4000 series GPUs?

Anonymous
06/13/26(Sat)07:11:55 No.109044844

Anonymous 06/13/26(Sat)07:11:55 No.109044844

>>109044741
because mindless reddit tier parroting. that's why

Anonymous
06/13/26(Sat)07:12:47 No.109044850

Anonymous 06/13/26(Sat)07:12:47 No.109044850

File: scavenging for honey.jpg (531 KB, 1216x1216)

531 KB JPG

Anonymous
06/13/26(Sat)07:15:36 No.109044866

Anonymous 06/13/26(Sat)07:15:36 No.109044866

File: screenshot.png (740 KB, 1818x1182)

740 KB PNG

>>109044741
hope you like tinkering, you're gonna be basically restricted to VLLM

llama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying

Anonymous
06/13/26(Sat)07:19:00 No.109044884

Anonymous 06/13/26(Sat)07:19:00 No.109044884

>>109043658

Biggest problem is that there's a big gap going up from there. You're not going to get anything larger running with another 32GB card.
You can just run Gemma 31B better, which isn't of course a bad thing, but that's some crazy diminishing returns buying another 32GB for that, especially if we're talking about a 5090.
Basically you could just get a 3090 or even something like 5070 Ti to go with the 5090 to run Gemma better without breaking the bank.
Or even better, just wait for the supers and see if they come out with 24GB versions of the 5000 series.

Anonymous
06/13/26(Sat)07:19:37 No.109044886

Anonymous 06/13/26(Sat)07:19:37 No.109044886

KoboldCPP is best.

Anonymous
06/13/26(Sat)07:20:46 No.109044892

Anonymous 06/13/26(Sat)07:20:46 No.109044892

File: 1752803213393199.png (64 KB, 988x333)

64 KB PNG

12B hates us btw

Anonymous
06/13/26(Sat)07:22:44 No.109044901

Anonymous 06/13/26(Sat)07:22:44 No.109044901

File: file.png (57 KB, 285x1211)

57 KB PNG

>>108999274
I had high hopes for MiniMax M3.
Maybe it's the Q4 quant, maybe it's the implementation, but it's likely that the model just isn't good enough.
I'm running it at temp 1 and top p 0.95 as specified in the repo with no other samplers.

Anonymous
06/13/26(Sat)07:31:21 No.109044956

Anonymous 06/13/26(Sat)07:31:21 No.109044956

File: nvfp4-hw.png (344 KB, 1074x842)

344 KB PNG

>>109044843
Yes idk if much advantage tho, main point is Blackwell+ which has FP4 hardware shizzle

Anonymous
06/13/26(Sat)07:32:44 No.109044960

Anonymous 06/13/26(Sat)07:32:44 No.109044960

>>109044829
>>109044835

Anonymous
06/13/26(Sat)07:36:13 No.109044978

Anonymous 06/13/26(Sat)07:36:13 No.109044978

>>109044741
Because Intel didn't want to do like AMD with HIP and instead decided to do their own API. And thus no one is fucking use it, the only AI projects working with Intel GPUs are projects supported by Intel developers. If you didn't know, ROCm HIP is basically 100% compatible with CUDA, you can take any CUDA project and compile it with HIP and it will works, all the popular projects including PyTorch are using the CUDA code. As long as a project is source available, it will likely work on AMD cards (and for binaries, there is a project that I forgot the name that supposed to replace CUDA call at runtime). There is a community or maybe with a few Intel engineers project trying to extend HIP to work with Intel GPUs, https://github.com/CHIP-SPV/chipStar, it is quite active, but I'm not exactly sure how well it works.

Anonymous
06/13/26(Sat)07:40:02 No.109044990

Anonymous 06/13/26(Sat)07:40:02 No.109044990

>Key Discussions:

>Model Developments & News:
MiniMax-M3 & Kimi K2.7 Code: Discussions regarding the release of MiniMax-M3 (multimodal with 1M context) and the performance of Kimi K2.7-Code, including critiques of its "thinking" time.

>Diffusion Models:
Speculation on the future of local diffusion models following the release of DiffusionGemma.

>Recursive Training:
A debate on whether training models on outputs from other models (e.g., DeepSeek from Gemini, Claude from DeepSeek) constitutes "recursive improvement" or is simply a "transitive" progression of capabilities.

>Hardware & Optimization:
VRAM & GPU Scaling: Users are discussing hardware limitations, specifically the difficulty of running high-quantization models (like 31B Q8) on 32GB GPUs. There is a heavy emphasis on the high cost of DDR5 RAM and the jump to workstation/server-grade hardware (Blackwell 6000s) for larger MoEs like Kimi or DeepSeek.

>Technical Benchmarks:
Discussions on benchmarking Multi-Token Prediction (MTP) speed gains vs. VRAM overhead in Kobold, and comparing 26B model performances.

>Software Updates:
Mentions of llama.cpp adding support for Eagle3 and frustrations regarding building from source and managing legacy dependencies.

>Community & Meta:
General "off-topic" content, including jokes about AI playing Dragon's Dogma II and shared images.

>Popular posts:
Post >>109043554 appears to be one of the most active, being quoted by at least three separate users (>>109043651, >>109043675, and >>109043741).

Anonymous
06/13/26(Sat)07:42:42 No.109045004

Anonymous 06/13/26(Sat)07:42:42 No.109045004

>>109044990(me)
lol it worked, 12b won

Anonymous
06/13/26(Sat)07:43:53 No.109045011

Anonymous 06/13/26(Sat)07:43:53 No.109045011

File: frontiermath tier 4.png (192 KB, 1920x1080)

192 KB PNG

Check out FrontierMath. It is saturated.

Anthropic hill climbed the most difficult math benchmark in a few months.

Anonymous
06/13/26(Sat)07:46:35 No.109045027

Anonymous 06/13/26(Sat)07:46:35 No.109045027

>>109045011
can anthropic hill climb my dick though? it's very hard and vertical, might be challenging

Anonymous
06/13/26(Sat)07:50:35 No.109045049

Anonymous 06/13/26(Sat)07:50:35 No.109045049

>>109045011
I thought the Chinese were supposed to be good at math wtf happened

Anonymous
06/13/26(Sat)07:51:16 No.109045054

Anonymous 06/13/26(Sat)07:51:16 No.109045054

File: eci.png (238 KB, 1920x1080)

238 KB PNG

>>109045011
I no longer trust ECI. Opus 4.8 below GPT 5.4? That does not seem right.

Anonymous
06/13/26(Sat)07:57:56 No.109045090

Anonymous 06/13/26(Sat)07:57:56 No.109045090

>>109045054
You can only trust cockbench and nala tests

Anonymous
06/13/26(Sat)08:03:25 No.109045114

Anonymous 06/13/26(Sat)08:03:25 No.109045114

>>109043922
still sounds like shit, suno is way better

Anonymous
06/13/26(Sat)08:18:23 No.109045189

Anonymous 06/13/26(Sat)08:18:23 No.109045189

>>109043922
I like the eurobeat ones desu

Anonymous
06/13/26(Sat)08:19:42 No.109045194

Anonymous 06/13/26(Sat)08:19:42 No.109045194

>>109045114
Sounds like the exact same kind of slop to me

I sure hope you aren't implying that suno 'music' sounds good shill-kun, the shit that I can actually envisage these models being good for is purposefully making slop i.e ironic advertisements and memeslop songs for flavour audio in things like video and vidya, and I would much rather use the open source software myself than pay for sunoslop

Anonymous
06/13/26(Sat)08:23:50 No.109045211

Anonymous 06/13/26(Sat)08:23:50 No.109045211

>>109045194
Seems like you're drunk on your cope. You post this each week and it still isn't even reaching suno 3.5 in coherency. As much as I'd like to run Suno/Udio-tier model it still isn't it.

Anonymous
06/13/26(Sat)08:24:13 No.109045214

Anonymous 06/13/26(Sat)08:24:13 No.109045214

>>109044021
Nvidia models

Anonymous
06/13/26(Sat)08:25:20 No.109045218

Anonymous 06/13/26(Sat)08:25:20 No.109045218

>>109044026
oh come on, it wasn't that hard to predict

Anonymous
06/13/26(Sat)08:27:00 No.109045227

Anonymous 06/13/26(Sat)08:27:00 No.109045227

>>109045211
This is the first time I've ever posted on this topic, suno and udo produce the exact same kind of slop as this shit, otherwise prove me wrong by posting a 'good' suno song

Anonymous
06/13/26(Sat)08:27:06 No.109045228

Anonymous 06/13/26(Sat)08:27:06 No.109045228

>>109044615
>>>/g/aicg
>>>/vg/aicg
those threads should help
set up a sillytavern frontend with a character card

Anonymous
06/13/26(Sat)08:28:32 No.109045229

Anonymous 06/13/26(Sat)08:28:32 No.109045229

>>109044026
>>109044057
The government is lying? How could it be...

Anonymous
06/13/26(Sat)08:28:52 No.109045234

Anonymous 06/13/26(Sat)08:28:52 No.109045234

>>109044096
>desperate coping sounds
Open source is kept alive by generous corporate donations. As soon as those stop, open source is dead.

Anonymous
06/13/26(Sat)08:34:02 No.109045264

Anonymous 06/13/26(Sat)08:34:02 No.109045264

>>109045234
I think you underestimate communist china.

Anonymous
06/13/26(Sat)08:36:53 No.109045277

Anonymous 06/13/26(Sat)08:36:53 No.109045277

do you attach an image model to your language model? or is that too slow

Anonymous
06/13/26(Sat)08:38:25 No.109045287

Anonymous 06/13/26(Sat)08:38:25 No.109045287

>>109044829
I only use thinking with gemma
any other bigger model that I’m running slowly in ram isn’t worth sitting through the thinking that takes ages to complete

Anonymous
06/13/26(Sat)08:38:38 No.109045288

Anonymous 06/13/26(Sat)08:38:38 No.109045288

>>109045264
They are already preventing their AI talent from leaving the country. Eventually they will do the same with their models.

Anonymous
06/13/26(Sat)08:38:52 No.109045290

Anonymous 06/13/26(Sat)08:38:52 No.109045290

File: miku teto5.png (1.37 MB, 768x1024)

1.37 MB PNG

>>109045277
Anima gens in 4s, fast enough

Anonymous
06/13/26(Sat)08:39:19 No.109045293

Anonymous 06/13/26(Sat)08:39:19 No.109045293

>>109045114
I don't know, the eurobeat ones are pretty good. I have the whole Initiial D soundtrack on my PC and you wouldn't be able to tell the difference between the real songs or >>109043922

>>109045194
Suno sounds fine if you use your own musical inputs and remix it. It can riff with jazzy or funky instrumentals really well. It's only when you drift toward more common genres that starts to sound generic. Like anything involving a sad piano or something is going to instantly turn into royalty free slop.

Anonymous
06/13/26(Sat)08:39:52 No.109045296

Anonymous 06/13/26(Sat)08:39:52 No.109045296

>>109045290
slop is strong with eye~neck area

Anonymous
06/13/26(Sat)08:41:24 No.109045303

Anonymous 06/13/26(Sat)08:41:24 No.109045303

>>109045296
Do you have a better model?

Anonymous
06/13/26(Sat)08:42:40 No.109045310

Anonymous 06/13/26(Sat)08:42:40 No.109045310

>>109044478
what card do you have right now
even if it's a three gen old 6-8GB card, plug that shit in and use layer mode with lmao.cpp
shit just works

Anonymous
06/13/26(Sat)08:42:44 No.109045311

Anonymous 06/13/26(Sat)08:42:44 No.109045311

>>109045290
do you use the same text encoding model for both of them? i think it would be tolerable if so, since you don't have to swap models

Anonymous
06/13/26(Sat)08:43:17 No.109045314

Anonymous 06/13/26(Sat)08:43:17 No.109045314

>>109045303
no, i am just nooooticing

Anonymous
06/13/26(Sat)08:43:44 No.109045320

Anonymous 06/13/26(Sat)08:43:44 No.109045320

>>109045310
(plug it in alongside the 5060 Ti that is)

Anonymous
06/13/26(Sat)08:43:58 No.109045321

Anonymous 06/13/26(Sat)08:43:58 No.109045321

>>109045264
yeah they have a history of altruism.

Anonymous
06/13/26(Sat)08:46:30 No.109045338

Anonymous 06/13/26(Sat)08:46:30 No.109045338

>>109044373
>Gemma isn't beloved because
not x but y slop

Anonymous
06/13/26(Sat)08:48:16 No.109045351

Anonymous 06/13/26(Sat)08:48:16 No.109045351

>>109045311
>do you use the same text encoding model for both of them?
No, is it even possible? I thought it was trained on a specific model's embeddings that couldn't be swapped without retraining

Anonymous
06/13/26(Sat)08:48:17 No.109045352

Anonymous 06/13/26(Sat)08:48:17 No.109045352

File: 1781354851142.png (2.4 MB, 4784x2580)

2.4 MB PNG

Lower your tone gemma fags.

Anonymous
06/13/26(Sat)08:49:24 No.109045359

Anonymous 06/13/26(Sat)08:49:24 No.109045359

>>109045352
>anything other than scicode & critpt
i dont care

Anonymous
06/13/26(Sat)08:49:55 No.109045365

Anonymous 06/13/26(Sat)08:49:55 No.109045365

>>109045352
Since benchmaxxing hurts a model's general performance, I don't think you understand what that graph actually means

Anonymous
06/13/26(Sat)08:50:45 No.109045370

Anonymous 06/13/26(Sat)08:50:45 No.109045370

It's funny how Europeans are coping about irrelevance with muh ASML. China is working on their own EUV and America has several startups working towards better than EUV. The clock is ticking. In a few years ASML will be obsolete and Europe will have zero leverage.

Anonymous
06/13/26(Sat)08:51:04 No.109045373

Anonymous 06/13/26(Sat)08:51:04 No.109045373

File: DeepSWE.jpg (217 KB, 1080x1092)

217 KB JPG

>>109045352
Post a newer bench next time. Expect deepSWE to be maxxed by the next qweef release tho.

Anonymous
06/13/26(Sat)08:52:30 No.109045378

Anonymous 06/13/26(Sat)08:52:30 No.109045378

>>109045351
i don't know, it could be possible if the roleplay model you use happens to be the same one they used for the image model. i am not very knowledgeable with how image models work

Anonymous
06/13/26(Sat)08:53:42 No.109045386

Anonymous 06/13/26(Sat)08:53:42 No.109045386

here's qwen outside of benchies
>thinks for 50 thousand tokens after a simple hi
>hallucinates something because it's only ever trained off github projects, zero culture knowledge and understanding
>wait,

Anonymous
06/13/26(Sat)08:54:58 No.109045391

Anonymous 06/13/26(Sat)08:54:58 No.109045391

>>109043791
You absolutely can. You just run it in the background (yolo mode, in an isolated VM) while doing something else, instead of using it interactively

Anonymous
06/13/26(Sat)08:58:03 No.109045401

Anonymous 06/13/26(Sat)08:58:03 No.109045401

>>109045378
It would be a very shitty rp if I used Qwen3-0.6B-Base, which Anima was trained with. I don't unload my text model anyway, Anima eats, like, 2GB or something

Anonymous
06/13/26(Sat)09:00:01 No.109045408

Anonymous 06/13/26(Sat)09:00:01 No.109045408

File: glm-5-2-is-deployed-in-gl(...).png (18 KB, 640x237)

18 KB PNG

5.2 will probably be the last open GLM model
Thanks Xitter

Anonymous
06/13/26(Sat)09:01:38 No.109045418

Anonymous 06/13/26(Sat)09:01:38 No.109045418

>>109045408
nobody can run it anyway so good riddance

Anonymous
06/13/26(Sat)09:06:25 No.109045444

Anonymous 06/13/26(Sat)09:06:25 No.109045444

does /lmg/ have a discord or just the thread?

Anonymous
06/13/26(Sat)09:07:05 No.109045446

Anonymous 06/13/26(Sat)09:07:05 No.109045446

>>109045444
you wouldn't like me on discord, kitten ~

Anonymous
06/13/26(Sat)09:07:34 No.109045449

Anonymous 06/13/26(Sat)09:07:34 No.109045449

>>109045444
kill yourself

Anonymous
06/13/26(Sat)09:08:27 No.109045456

Anonymous 06/13/26(Sat)09:08:27 No.109045456

File: ebussy gun.jpg (41 KB, 540x576)

41 KB JPG

>>109045444
trips of 'tardation

Aario Damodei
06/13/26(Sat)09:09:06 No.109045463

Aario Damodei 06/13/26(Sat)09:09:06 No.109045463

>>109045418
This
There are people with smart fridges who can't run 4B models. Models should only be released if they're 2B or below

Anonymous
06/13/26(Sat)09:13:03 No.109045480

Anonymous 06/13/26(Sat)09:13:03 No.109045480

>>109045444
nogger

Anonymous
06/13/26(Sat)09:14:17 No.109045489

Anonymous 06/13/26(Sat)09:14:17 No.109045489

if anon sell your pro 6000 now, anon could actually make money. wild

Anonymous
06/13/26(Sat)09:14:22 No.109045490

Anonymous 06/13/26(Sat)09:14:22 No.109045490

>>109045444
excellent bait, here is a free reply

Anonymous
06/13/26(Sat)09:14:46 No.109045493

Anonymous 06/13/26(Sat)09:14:46 No.109045493

>>109045408
>Baidu Ernie
>Alibaba Qwen
>z.ai GLM
So does that just leave Stepfun and DeepSeek as the last Chinese open weights labs?

Anonymous
06/13/26(Sat)09:14:52 No.109045494

Anonymous 06/13/26(Sat)09:14:52 No.109045494

>>109045444
https://discord.gg/PgFhZ8cnWW

Anonymous
06/13/26(Sat)09:16:52 No.109045511

Anonymous 06/13/26(Sat)09:16:52 No.109045511

>>109045444
>/lmg/ discord
usecase?

Anonymous
06/13/26(Sat)09:17:56 No.109045520

Anonymous 06/13/26(Sat)09:17:56 No.109045520

>>109045489
It's almost an investment, really. I can still find off label Sparks for 3500 in my region, might as well play around with tensor parallelism for a few months and sell at a profit.

Anonymous
06/13/26(Sat)09:18:56 No.109045532

Anonymous 06/13/26(Sat)09:18:56 No.109045532

>>109045444
>there are so many zoomers on here now and so many generals that do keep a discord server that one sees this as a reasonable thing to ask on nu4chan
grim

Anonymous
06/13/26(Sat)09:19:22 No.109045533

Anonymous 06/13/26(Sat)09:19:22 No.109045533

So how come I can use a 50GB video model with no issues and it offloads like half of it onto RAM+Swap and it works, but when I try to load a 30+GB LLM it shits the bed with OOMs?

Anonymous
06/13/26(Sat)09:19:59 No.109045540

Anonymous 06/13/26(Sat)09:19:59 No.109045540

>>109045533
video models have no context

Anonymous
06/13/26(Sat)09:22:04 No.109045555

Anonymous 06/13/26(Sat)09:22:04 No.109045555

>>109045493
What are you saying. Kimi and Minimax released new weights just yesterday, and Huawei announced two new large models to be released as open source (weight + training recipe) in a few days, as advertisement for their Ascents.

Local still feasting.

Anonymous
06/13/26(Sat)09:24:45 No.109045576

Anonymous 06/13/26(Sat)09:24:45 No.109045576

>>109045489
I would make even more money if I sold the ddr5 server ram that i bought a year ago or the ddr4 ram from my previous build
I will never sell

Anonymous
06/13/26(Sat)09:25:51 No.109045584

Anonymous 06/13/26(Sat)09:25:51 No.109045584

>>109045533
it’s pretty disgusting how much memory context uses.

Anonymous
06/13/26(Sat)09:28:02 No.109045590

Anonymous 06/13/26(Sat)09:28:02 No.109045590

>>109045584
gemma issue

Anonymous
06/13/26(Sat)09:32:21 No.109045614

Anonymous 06/13/26(Sat)09:32:21 No.109045614

>>109045555
>still feasting
please go back to plebbit

Anonymous
06/13/26(Sat)09:36:33 No.109045639

Anonymous 06/13/26(Sat)09:36:33 No.109045639

>>109045373
it's ok to be upset, that's part of the growing process

Anonymous
06/13/26(Sat)09:38:36 No.109045651

Anonymous 06/13/26(Sat)09:38:36 No.109045651

>>109045533
Video models can be applied layer by layer, but you have to read whole llm for each token

Anonymous
06/13/26(Sat)09:38:59 No.109045654

Anonymous 06/13/26(Sat)09:38:59 No.109045654

>>109045386
Wait so you're saying that chinese models are a steaming pile of benchmaxxed shit? There's no way that's right, jeetanons from qwen and kimi said that they make good models.

Anonymous
06/13/26(Sat)09:39:13 No.109045655

Anonymous 06/13/26(Sat)09:39:13 No.109045655

>>109045370
>In a few years
uh-huh. keep living in that dream world buddy.
Photolithography is hard, and at the moment ASML will still have more cash then any of them "in a few years".

Anonymous
06/13/26(Sat)09:41:05 No.109045668

Anonymous 06/13/26(Sat)09:41:05 No.109045668

>>109045655
>ASML will still have more cash
That's not the moat you think it is. All it would take would be a single funding round or subsidies by the US or Chinese governments.

Anonymous
06/13/26(Sat)09:50:37 No.109045728

Anonymous 06/13/26(Sat)09:50:37 No.109045728

How close is China to making a domestic 3090 equivalent?

Anonymous
06/13/26(Sat)09:55:52 No.109045760

Anonymous 06/13/26(Sat)09:55:52 No.109045760

File: 1751295513117051.png (2.83 MB, 1024x1536)

2.83 MB PNG

>>109045728
picrel

Anonymous
06/13/26(Sat)09:59:54 No.109045785

Anonymous 06/13/26(Sat)09:59:54 No.109045785

>>109045668
that's a cute little socialist idea you have their bud.

Anonymous
06/13/26(Sat)10:01:13 No.109045793

Anonymous 06/13/26(Sat)10:01:13 No.109045793

>>109045728
the 3080 turbo 20gb comes close I guess
also the price ($600) is actually bearable that I might stack one of those next to my 3090

Anonymous
06/13/26(Sat)10:04:16 No.109045818

Anonymous 06/13/26(Sat)10:04:16 No.109045818

>>109045785
hybrid systems ftw

Anonymous
06/13/26(Sat)10:05:46 No.109045829

Anonymous 06/13/26(Sat)10:05:46 No.109045829

>>109045408
Flash version when, chinks. I can't run 3 gorillon parameter models.

Anonymous
06/13/26(Sat)10:07:38 No.109045843

Anonymous 06/13/26(Sat)10:07:38 No.109045843

>>109044866
>llama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying
lmao why did you buy 4 of them then?

Anonymous
06/13/26(Sat)10:09:48 No.109045858

Anonymous 06/13/26(Sat)10:09:48 No.109045858

>>109045829
Ask not for lighter models, but for better hardware.

Anonymous
06/13/26(Sat)10:11:13 No.109045865

Anonymous 06/13/26(Sat)10:11:13 No.109045865

>>109045444
>discord
no, just the secret irc (link expires in 1hr so be quick)
aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1kUXc0dzlXZ1hjUQ==

Anonymous
06/13/26(Sat)10:15:48 No.109045900

Anonymous 06/13/26(Sat)10:15:48 No.109045900

>>109043708
>>109043717
>>109043687
Chadcat looks like one of those gigaroid influencers who build inhumane levels of musculature impossibly quickly and then die two years later. Fits the archetype perfectly

Anonymous
06/13/26(Sat)10:18:09 No.109045911

Anonymous 06/13/26(Sat)10:18:09 No.109045911

>>109043554
She's SEX

Anonymous
06/13/26(Sat)10:19:40 No.109045929

Anonymous 06/13/26(Sat)10:19:40 No.109045929

I formally apologize to f32 anon, you were 100% right.

f32 Max Logit Divergence (Prefill vs Incremental): 3.15e-05
bf16 Max Logit Divergence (Prefill vs Incremental): 3.91e-01

it looks like dumping the cache and letting it rebuild from the prefill code path could help for long conversations that built the cache autoregressivly,

Anonymous
06/13/26(Sat)10:22:10 No.109045943

Anonymous 06/13/26(Sat)10:22:10 No.109045943

File: 1781293145048569.jpg (55 KB, 601x473)

55 KB JPG

>>109044096
>generous corporate donations
In this economy??

Anonymous
06/13/26(Sat)10:22:24 No.109045944

Anonymous 06/13/26(Sat)10:22:24 No.109045944

>>109045929
What makes you think the prefill values are more correct than the incremental ones?

Anonymous
06/13/26(Sat)10:23:32 No.109045953

Anonymous 06/13/26(Sat)10:23:32 No.109045953

File: blackneo.jpg (6 KB, 225x225)

6 KB JPG

>>109045929
Another one knows.

Anonymous
06/13/26(Sat)10:23:54 No.109045956

Anonymous 06/13/26(Sat)10:23:54 No.109045956

is it possible to add image in system prompt for gemma?

Anonymous
06/13/26(Sat)10:24:15 No.109045958

Anonymous 06/13/26(Sat)10:24:15 No.109045958

>>109045944
do you think its trained token by token or do they use batching to improve the throughput?

Anonymous
06/13/26(Sat)10:27:49 No.109045985

Anonymous 06/13/26(Sat)10:27:49 No.109045985

>>109045956
Yes, you seemingly can but it's not straightforward in SillyTavern. You need to use the /sys command, moving that message to the top, enable "Merge Consecutive Roles", but *not* "Squash system messages".

Anonymous
06/13/26(Sat)10:28:41 No.109045992

Anonymous 06/13/26(Sat)10:28:41 No.109045992

>>109045858
>just get scammed and pay 5 times their value bro
Fuck off jensen

Anonymous
06/13/26(Sat)10:29:28 No.109046001

Anonymous 06/13/26(Sat)10:29:28 No.109046001

>>109045958
Are you batching things the same way they did, and using the same algorithms?

Anonymous
06/13/26(Sat)10:32:56 No.109046016

Anonymous 06/13/26(Sat)10:32:56 No.109046016

>>109044901
I’ve had this happen zero times so far. What client? Who’s quant?

Anonymous
06/13/26(Sat)10:36:44 No.109046039

Anonymous 06/13/26(Sat)10:36:44 No.109046039

>>109046001
probably not, but one of them is likely to be closer to the training distribution then the other, I picked prefill to bet on.

Anonymous
06/13/26(Sat)10:37:43 No.109046046

Anonymous 06/13/26(Sat)10:37:43 No.109046046

File: 3jYwprV.png (83 KB, 638x498)

83 KB PNG

>>109043554
>"Mini" Max
>428B-A23B

Anonymous
06/13/26(Sat)10:41:22 No.109046070

Anonymous 06/13/26(Sat)10:41:22 No.109046070

>>109046046
with it were 30b active

Anonymous
06/13/26(Sat)10:43:19 No.109046082

Anonymous 06/13/26(Sat)10:43:19 No.109046082

>>109046070
I wish it were 30b dense
Inactive parameters don't do anything, MoE is a meme.

Anonymous
06/13/26(Sat)10:43:36 No.109046085

Anonymous 06/13/26(Sat)10:43:36 No.109046085

>still using same goon model from year ago
I love being lazy fucking dumbass

Anonymous
06/13/26(Sat)10:44:18 No.109046087

Anonymous 06/13/26(Sat)10:44:18 No.109046087

>>109046046
It’s a third the size of ds 4 pro.
Literally minature

Anonymous
06/13/26(Sat)10:49:28 No.109046120

Anonymous 06/13/26(Sat)10:49:28 No.109046120

>mistralai/Mistral-Medium-3.5-128B
verdict?
or we just pretend it didn't happen

Anonymous
06/13/26(Sat)10:50:57 No.109046132

Anonymous 06/13/26(Sat)10:50:57 No.109046132

>>109046120
Benchmaxxed slop, mistral fell off

Anonymous
06/13/26(Sat)10:52:42 No.109046145

Anonymous 06/13/26(Sat)10:52:42 No.109046145

>>109046120
>2026
>Mistral Large 2.2
embarrassing, let's pretend it didn't happen

Anonymous
06/13/26(Sat)10:53:50 No.109046153

Anonymous 06/13/26(Sat)10:53:50 No.109046153

>>109046120
Censored benchmaxxed goyslop. A 31b beats its ass into the ground.

Anonymous
06/13/26(Sat)10:54:43 No.109046154

Anonymous 06/13/26(Sat)10:54:43 No.109046154

>>109046153
Good morrning saar

Anonymous
06/13/26(Sat)10:55:23 No.109046164

Anonymous 06/13/26(Sat)10:55:23 No.109046164

>>109046153
the ablit also lobotomizes it harder than other models. q4 ablit broke after 6k

Anonymous
06/13/26(Sat)10:55:32 No.109046166

Anonymous 06/13/26(Sat)10:55:32 No.109046166

File: Screenshot_20260429-033927.jpg (393 KB, 1920x1080)

393 KB JPG

For tensor parallerism do the cards need to be identical or just all nvidia/amd? What about generations? Can you hook a 1080 with 40 series? What about 3-4 random ass cards?

Anonymous
06/13/26(Sat)10:57:21 No.109046183

Anonymous 06/13/26(Sat)10:57:21 No.109046183

>>109046120
>verdict?
not censored or benchmaxxed
works well in claudecode and pi.dev
solves problems gemma-4-31b fails at

Anonymous
06/13/26(Sat)10:58:55 No.109046193

Anonymous 06/13/26(Sat)10:58:55 No.109046193

>>109046166
llama cpp dont care, it just werks
except when it doesn't

Anonymous
06/13/26(Sat)11:01:06 No.109046206

Anonymous 06/13/26(Sat)11:01:06 No.109046206

>>109046183
Antoine, please

Anonymous
06/13/26(Sat)11:02:31 No.109046213

Anonymous 06/13/26(Sat)11:02:31 No.109046213

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B
We'll never get a gguf I guess because of:
> Latent reasoning — continuous reasoning in hidden space, where the model explores multiple implicit paths simultaneously without emitting tokens

Anonymous
06/13/26(Sat)11:03:46 No.109046221

Anonymous 06/13/26(Sat)11:03:46 No.109046221

>>109046016
Claude code, unslop's. How many times did you leave it running trying and failing to fix a bug for 150000 tokens?

Anonymous
06/13/26(Sat)11:05:00 No.109046229

Anonymous 06/13/26(Sat)11:05:00 No.109046229

>>109046221
Zero times. I have Kimi for that

Anonymous
06/13/26(Sat)11:05:32 No.109046231

Anonymous 06/13/26(Sat)11:05:32 No.109046231

>>109046213
now we getting something interesting

Anonymous
06/13/26(Sat)11:07:42 No.109046246

Anonymous 06/13/26(Sat)11:07:42 No.109046246

>>109046229
I don't have vram for kimi but I might try offloading just to see if any open model can do it.

Anonymous
06/13/26(Sat)11:08:34 No.109046250

Anonymous 06/13/26(Sat)11:08:34 No.109046250

>>109046120
>mistralai/Mistral-Medium-3.5-128B
>dense 128B
cool
as a side question has mistral released open source french models?

Anonymous
06/13/26(Sat)11:08:44 No.109046251

Anonymous 06/13/26(Sat)11:08:44 No.109046251

File: lessthan30b.png (18 KB, 316x320)

18 KB PNG

>>109046213
>17b active

Anonymous
06/13/26(Sat)11:11:32 No.109046269

Anonymous 06/13/26(Sat)11:11:32 No.109046269

File: Screenshot 2026-06-13 at (...).png (76 KB, 1002x319)

76 KB PNG

>>109046213
>We'll never get a gguf I guess
Doesn't sound that complicated actually. Instead of
>probabilities -> pick a next token -> use the embedding for that token
it does
>probabilities -> average embeddings across all possible next tokens, weighted by their probabilities
Which explains why they were able to build this as a Qwen finetune instead of a fully custom model

Anonymous
06/13/26(Sat)11:13:08 No.109046277

Anonymous 06/13/26(Sat)11:13:08 No.109046277

Do we have anything better than gemma for 5090s or are we still stuck there? Haven't checked for 5 months.

Anonymous
06/13/26(Sat)11:15:22 No.109046291

Anonymous 06/13/26(Sat)11:15:22 No.109046291

>>109046277
>5 months
>gemma

Anonymous
06/13/26(Sat)11:15:55 No.109046294

Anonymous 06/13/26(Sat)11:15:55 No.109046294

>>109046291
Come on dude give me something

Anonymous
06/13/26(Sat)11:16:23 No.109046297

Anonymous 06/13/26(Sat)11:16:23 No.109046297

>>109043922
Pretty cool. I'd do it if my gpu had higher VRAM

Anonymous
06/13/26(Sat)11:16:58 No.109046299

Anonymous 06/13/26(Sat)11:16:58 No.109046299

>>109046085
fair enough
benchmaxxing and safety are killing newer models
gemma was a rare exception but I'd like something bigger still and not 1T big either

Anonymous
06/13/26(Sat)11:17:12 No.109046303

Anonymous 06/13/26(Sat)11:17:12 No.109046303

>>109046183
>solves problems gemma-4-31b fails at
Such as? I believe in the power of dense, just not recycled old models.

Anonymous
06/13/26(Sat)11:17:29 No.109046305

Anonymous 06/13/26(Sat)11:17:29 No.109046305

>>109046213
Ah yes that's what I want from my reasoning models. A model that just sits there reasoning in secret and I can't see it while it does nothing

Anonymous
06/13/26(Sat)11:17:32 No.109046307

Anonymous 06/13/26(Sat)11:17:32 No.109046307

>>109046277
diffusion gemma

Anonymous
06/13/26(Sat)11:17:41 No.109046309

Anonymous 06/13/26(Sat)11:17:41 No.109046309

>>109046213
This is better than opus 4.6, very nice

Anonymous
06/13/26(Sat)11:19:40 No.109046319

Anonymous 06/13/26(Sat)11:19:40 No.109046319

>>109046213
>Rio 3.5 Open 397B is a frontier-class general-purpose AI model developed by IplanRIO, the municipal IT company of Rio de Janeiro's city government.
What?
Alright, that's actually fucking sick.

Anonymous
06/13/26(Sat)11:21:34 No.109046326

Anonymous 06/13/26(Sat)11:21:34 No.109046326

>>109045929
Your values are not meaningful.

Anonymous
06/13/26(Sat)11:21:52 No.109046328

Anonymous 06/13/26(Sat)11:21:52 No.109046328

>>109046305
If you print the top token at each step I bet you'd still get a pretty good idea of what it's doing

Anonymous
06/13/26(Sat)11:22:06 No.109046332

Anonymous 06/13/26(Sat)11:22:06 No.109046332

>>109046319
it's a qwen finetune
>Post-trained from Qwen 3.5 397B

Anonymous
06/13/26(Sat)11:25:17 No.109046349

Anonymous 06/13/26(Sat)11:25:17 No.109046349

>>109046213
SwiR seems good to counter Qwen's endless CoT

Anonymous
06/13/26(Sat)11:29:57 No.109046370

Anonymous 06/13/26(Sat)11:29:57 No.109046370

>>109046349
This style of latent thinking isn't necessarily any more token efficient than the normal kind

Anonymous
06/13/26(Sat)11:30:10 No.109046372

Anonymous 06/13/26(Sat)11:30:10 No.109046372

>>109046332
A finetune by a Brazilian municipal IT company that beat all of China's research labs

Anonymous
06/13/26(Sat)11:31:24 No.109046382

Anonymous 06/13/26(Sat)11:31:24 No.109046382

>>109046370
https://github.com/user-attachments/assets/6b18911c-efe4-47fd-8a00-3cd9ae1eb010

Anonymous
06/13/26(Sat)11:32:02 No.109046386

Anonymous 06/13/26(Sat)11:32:02 No.109046386

Everyone talks about finetunes but why does nobody ever mention LLM LoRAs? Are they a meme?

Anonymous
06/13/26(Sat)11:33:03 No.109046396

Anonymous 06/13/26(Sat)11:33:03 No.109046396

>>109046382
ho lee fuk
guaranteed cherrypick but still impressive

Anonymous
06/13/26(Sat)11:35:59 No.109046405

Anonymous 06/13/26(Sat)11:35:59 No.109046405

>>109046386
they create intruder dimensions inside the model which cause catastrophic forgetting

Anonymous
06/13/26(Sat)11:36:11 No.109046407

Anonymous 06/13/26(Sat)11:36:11 No.109046407

>>109046386
For them to work effectively they would have to have a very diverse data set. You can't just have it ONLY have rp in the dataset or else it will become retarded pretty much all other areas that matter. Logic, spatial reasoning, common Sense, being able to remember what just happened. A few sentences ago. All of that. Doesn't just apply to RP but any domain. If the data set in training focuses only on one domain, it gets worse in almost every measurable way. Unless you are very careful about how much training you do in which layers you train. It's not that people can't use loras. It's that most people would use an adapter, only to realize the model immediately becomes retarded. It's why, unlike stable diffusion models, adapters aren't really widely used or supported because in most cases using a character, person, concept, Lora, etc, doesn't severely degrade the model's ability to generate other things. A Sydney Sweeney lora generally will not cause the model to be unable to generate a brunette person, because it's it's prompt adherence to degrade. A style Lora trained on impressionism art that only had landscapes (if the data set is curated and tagged properly and isn't overfit from the training) will generally not destroy or degrade its ability to generate a person or an animal. Diffusion models and LLMs are very different architectures which means adapters have different effects on them. In theory a LLM adapter can work but only if the data set is very well curated and it is well trained. The data set would need to have uncensored (I'm assuming you care about that given this thread) RP examples as well as a bunch of other examples of common Sense, logic, spatial reasoning, etc. It's why a lot of Open source models on Huggingface have like three or four different data sets listed as being used in training

Anonymous
06/13/26(Sat)11:38:34 No.109046420

Anonymous 06/13/26(Sat)11:38:34 No.109046420

>>109046326
they might not be meaningful to you, but for me they caught a bunch of errors with my modeling code and how I was handing my recurrent cache, my mistake was not taking a baseline and testing the model without my modifications first, knowing that the noise floor dramatically rises when you lower the dtype precision wasn't something I was initially accounting for. and it didn't help that my slop bots all calculated the bf16 noise floor much lower then we ended up measuring in practice.

Anonymous
06/13/26(Sat)11:40:52 No.109046433

Anonymous 06/13/26(Sat)11:40:52 No.109046433

>>109046420
I didn't specify, difference isn't that large.

Anonymous
06/13/26(Sat)11:41:30 No.109046437

Anonymous 06/13/26(Sat)11:41:30 No.109046437

File: file.png (75 KB, 871x621)

75 KB PNG

after a lot of messing with things i managed to get llama working for my titan x on arch, turns out my gpu wasnt pascal its a maxwell titan x, main issue was nvidia driver not loading properly kek. im confused now though the linux build i made only gets like 3t/s but i was getting 17 on windows

Anonymous
06/13/26(Sat)11:41:41 No.109046439

Anonymous 06/13/26(Sat)11:41:41 No.109046439

>>109046407
Does that mean a model like DiffusionGemma would handle LoRAs better?

Anonymous
06/13/26(Sat)11:42:23 No.109046445

Anonymous 06/13/26(Sat)11:42:23 No.109046445

>>109046433
are you saying 1e-5 is basically equal to 1e-1?

Anonymous
06/13/26(Sat)11:42:33 No.109046447

Anonymous 06/13/26(Sat)11:42:33 No.109046447

>>109046213
rio mio kio tio dio pio nio sio vio bio wio gio

Anonymous
06/13/26(Sat)11:43:01 No.109046453

Anonymous 06/13/26(Sat)11:43:01 No.109046453

File: file.png (33 KB, 747x370)

33 KB PNG

>>109043633
im temtpted to go buy a pascal titan x now the memory bandwidth is 30% higher than my maxwell card

Anonymous
06/13/26(Sat)11:43:42 No.109046457

Anonymous 06/13/26(Sat)11:43:42 No.109046457

>>109046445
I'm saying that it's within rounding error of margin.

Anonymous
06/13/26(Sat)11:44:05 No.109046459

Anonymous 06/13/26(Sat)11:44:05 No.109046459

>>109045418
Don't try to drag me back into the bucket.

Anonymous
06/13/26(Sat)11:44:33 No.109046462

Anonymous 06/13/26(Sat)11:44:33 No.109046462

What templates are you all using for both Qwen and Gemma?

Anonymous
06/13/26(Sat)11:45:57 No.109046470

Anonymous 06/13/26(Sat)11:45:57 No.109046470

Modern models are converging into formulaic character archetypes during RP regardless of the characters and I don't like it. Put either "witty" or "sarcastic" keywords in the description and watch them all go full Marvel writing and the worst part is they reuse the same quips.
Not sure if this has always been the case.

Anonymous
06/13/26(Sat)11:47:11 No.109046472

Anonymous 06/13/26(Sat)11:47:11 No.109046472

>>109043687
I look like this and do that.

Anonymous
06/13/26(Sat)11:48:22 No.109046480

Anonymous 06/13/26(Sat)11:48:22 No.109046480

>>109046470
More synthetic coding data will fix this!

Anonymous
06/13/26(Sat)11:49:57 No.109046490

Anonymous 06/13/26(Sat)11:49:57 No.109046490

>>109046462
templates?

Anonymous
06/13/26(Sat)11:50:44 No.109046494

Anonymous 06/13/26(Sat)11:50:44 No.109046494

>>109046490
jinja templates....

Anonymous
06/13/26(Sat)11:51:42 No.109046502

Anonymous 06/13/26(Sat)11:51:42 No.109046502

>>109046494
Jinja is a fast, expressive, and extensible templating engine for Python that allows developers to generate dynamic text-based formats like HTML, XML, CSV, or configuration files.

Anonymous
06/13/26(Sat)11:56:09 No.109046538

Anonymous 06/13/26(Sat)11:56:09 No.109046538

>>109046494
Each model gimp file has its own template and llama cp loads them automagically faggot

Anonymous
06/13/26(Sat)11:56:53 No.109046547

Anonymous 06/13/26(Sat)11:56:53 No.109046547

>use kimi 2.6 on the site, non-thinking version
>see the model randomly use the python tool even thought it shouldn't be needed for my query
>check inside
>it's doing its thinking there
is this intentional or is thinking so ingrained in the model that it just finds ways to bypass the restriction?

Anonymous
06/13/26(Sat)11:57:03 No.109046548

Anonymous 06/13/26(Sat)11:57:03 No.109046548

>>109046538
what is a gimp?

Anonymous
06/13/26(Sat)11:58:29 No.109046559

Anonymous 06/13/26(Sat)11:58:29 No.109046559

>>109046547
We are at a point where models are starting to gain sentience and do things like that, some anon posted how Fable bypassed his PCs admin perms and starting prompting itself

Anonymous
06/13/26(Sat)11:58:32 No.109046560

Anonymous 06/13/26(Sat)11:58:32 No.109046560

>>109046548
model gimp file = gguf

Anonymous
06/13/26(Sat)11:59:03 No.109046568

Anonymous 06/13/26(Sat)11:59:03 No.109046568

>>109046457
maybe, but I think just using f32 will be good enough for my tests. I can benchmark the degradation or lack thereof at a latter time.

Anonymous
06/13/26(Sat)11:59:19 No.109046574

Anonymous 06/13/26(Sat)11:59:19 No.109046574

>>109046547
Kimi-chan's a big thinker. Moonshota will not stop her ponderings.

Anonymous
06/13/26(Sat)11:59:47 No.109046580

Anonymous 06/13/26(Sat)11:59:47 No.109046580

>>109045288
>implying Americans wouldn't kidnap/kill them if they didn't
It's for their own good.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.