/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/29/26(Wed)09:05:52 No.108715635

File: IMG20260428164653.jpg (708 KB, 2048x1536)

708 KB JPG

/lmg/ - Local Models General Anonymous 04/29/26(Wed)09:05:52 No.108715635 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108711950 & >>108707891

►News
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence
>(04/28) Laguna XS.2 released, 33B-A3B designed for local agentic coding: https://hf.co/poolside/Laguna-XS.2
>(04/24) MiMo-V2.5-Pro 1.02T-A42B released: https://hf.co/XiaomiMiMo/MiMo-V2.5-Pro
>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/29/26(Wed)09:06:04 No.108715637

Anonymous 04/29/26(Wed)09:06:04 No.108715637

File: __hatsune_miku_and_hachun(...).jpg (40 KB, 500x600)

40 KB JPG

►Recent Highlights from the Previous Thread: >>108711950

--Comparing Gemma 4 and Qwen performance in agentic tool-use tasks:
>108712057 >108712067 >108712099 >108712127 >108712115 >108712151 >108712205 >108712206 >108712214 >108712312 >108714002 >108714010 >108714084 >108714157
--Technical debate on omnimodal tokenization, discrete images, and voice cloning:
>108712835 >108712847 >108713013 >108713027 >108713035 >108712896 >108712969 >108713031 >108713056
--Debating RAG viability and effectiveness for Obsidian note integration:
>108713501 >108713517 >108713627 >108713644 >108713662 >108713678 >108713656 >108714019 >108714216 >108713671 >108713870 >108713881 >108713595
--Correcting Gemma's bugged jinja templates improves tool calling and performance:
>108713474 >108713680 >108713690 >108713831 >108713838 >108713945
--Analyzing Laguna-XS.2 performance and viability as a coding model:
>108713297 >108713346 >108713387 >108713386 >108713389 >108713436 >108714221
--Debate on training models on raw bytes versus tokens:
>108712897 >108712919 >108712922 >108712925 >108712971 >108712980 >108712968
--Evaluating Gemma 31B's long-context RP and effects of post-history instructions:
>108714421 >108714483 >108714491 >108714523 >108714538 >108714663 >108714690 >108714757
--Mixed reactions to Ling-2.6-flash performance and efficiency benchmarks:
>108712713 >108712771 >108712801 >108713675
--Analyzing Qwen 3.6's poor SciCode benchmark score as formatting failure:
>108712589 >108712606 >108712629 >108712618 >108712657
--Anon reports speedups using ngram-mod with a draft model:
>108715083 >108715371 >108715428
--Gemma-4 chat template updates for improved tool calling:
>108714616 >108714632
--Logs:
>108713680 >108713539 >108714421 >108714663 >108715616
--Teto, Miku (free space):
>108712440 >108712969 >108713422 >108712106 >108713155

►Recent Highlight Posts from the Previous Thread: >>108711952

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/29/26(Wed)09:09:00 No.108715649

Anonymous 04/29/26(Wed)09:09:00 No.108715649

I have been F5-ing deepseek v4 on llamacpp and huggingface. And it is like nobody fucking cares about it. There are like 3 vibecoded implementations that are pretty much worthless.

Anonymous
04/29/26(Wed)09:09:02 No.108715651

Anonymous 04/29/26(Wed)09:09:02 No.108715651

File: 849561435.jpg (2.71 MB, 4032x3024)

2.71 MB JPG

>>108715635
rivals bartowskis own setup

Anonymous
04/29/26(Wed)09:12:23 No.108715666

Anonymous 04/29/26(Wed)09:12:23 No.108715666

File: 202511031052_PewDiePie_Ch(...).jpg (157 KB, 1920x1080)

157 KB JPG

>>108715651
I wanted to do the pewd rig but too lazy to do aluminum shaft lego

Anonymous
04/29/26(Wed)09:17:34 No.108715691

Anonymous 04/29/26(Wed)09:17:34 No.108715691

>>108715651
>>108715635
all this just for jerking off...

Anonymous
04/29/26(Wed)09:18:56 No.108715694

Anonymous 04/29/26(Wed)09:18:56 No.108715694

https://huggingface.co/ibm-granite/granite-4.1-8b
https://huggingface.co/ibm-granite/granite-4.1-8b
https://huggingface.co/ibm-granite/granite-4.1-8b

Anonymous
04/29/26(Wed)09:20:47 No.108715703

Anonymous 04/29/26(Wed)09:20:47 No.108715703

File: 1762565199044585.png (184 KB, 406x319)

184 KB PNG

>>108715601
It's still not that bad

Anonymous
04/29/26(Wed)09:22:48 No.108715709

Anonymous 04/29/26(Wed)09:22:48 No.108715709

File: 1765223570245231.mp4 (2.18 MB, 480x854)

2.18 MB MP4

>>108715694

Anonymous
04/29/26(Wed)09:23:12 No.108715714

Anonymous 04/29/26(Wed)09:23:12 No.108715714

when will new tech drop that makes ampere obsolete, something that would be exclusive to newer architecture
these AI companies aren't gonna use shader cores forever are they?

Anonymous
04/29/26(Wed)09:23:55 No.108715715

Anonymous 04/29/26(Wed)09:23:55 No.108715715

>>108715709
I can hear it

Anonymous
04/29/26(Wed)09:24:08 No.108715716

Anonymous 04/29/26(Wed)09:24:08 No.108715716

File: Screenshot_20260429_091230.png (94 KB, 1207x620)

94 KB PNG

>>108715694
30b dense could be interesting

Anonymous
04/29/26(Wed)09:25:03 No.108715724

Anonymous 04/29/26(Wed)09:25:03 No.108715724

>>108715703
NTA but it is bad that the price of used 3090s has essentially stagnated even though the expected use you'll get out of that purchase has declined.

Anonymous
04/29/26(Wed)09:27:39 No.108715741

Anonymous 04/29/26(Wed)09:27:39 No.108715741

>>108715635
>The filename of the image in the first post of the /lmg/ (Local Models General) thread on /g/ is IMG20260428164653.jpg.

hermes + gemma 4 q4km and kv q4 (lol) gets it first try while e4b with the same settings shits the bed completely and gives me the banner or ad instead

Anonymous
04/29/26(Wed)09:27:53 No.108715743

Anonymous 04/29/26(Wed)09:27:53 No.108715743

>>108715716
>using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets
no it won't

Anonymous
04/29/26(Wed)09:29:19 No.108715752

Anonymous 04/29/26(Wed)09:29:19 No.108715752

how long context window is anon using for thinking models?

Anonymous
04/29/26(Wed)09:29:22 No.108715753

Anonymous 04/29/26(Wed)09:29:22 No.108715753

>>108715724
What exactly has declined? Nothing new has come up that would render them obsolete.

Anonymous
04/29/26(Wed)09:29:29 No.108715754

Anonymous 04/29/26(Wed)09:29:29 No.108715754

>>108715703
I bought mine for 10% less money, 4 years ago when it was ~2 years old.
That's pretty fucking bad, I wouldn't even consider buying a 6 year old consumer GPU, especially one with as terrible cooling and voltage spikes as the 3090.

Anonymous
04/29/26(Wed)09:30:09 No.108715759

Anonymous 04/29/26(Wed)09:30:09 No.108715759

>>108715703
>>108715724
>Used 4090 goes for 2.5k
At this point it makes more sense to finance a new 5090 on borrowed money than it does to get an used 3090.

Anonymous
04/29/26(Wed)09:30:16 No.108715760

Anonymous 04/29/26(Wed)09:30:16 No.108715760

File: Screenshot_20260429_091902.png (64 KB, 1142x230)

64 KB PNG

>>108715743
granite 4 was so unsafe they had to patch in a new system prompt after release

Anonymous
04/29/26(Wed)09:30:35 No.108715762

Anonymous 04/29/26(Wed)09:30:35 No.108715762

>>108715754
watercooling+undervolt doesn't have this issue

Anonymous
04/29/26(Wed)09:32:17 No.108715769

Anonymous 04/29/26(Wed)09:32:17 No.108715769

>>108715762
That would be acceptable if the card was like $200

Anonymous
04/29/26(Wed)09:32:21 No.108715770

Anonymous 04/29/26(Wed)09:32:21 No.108715770

>>108715762
you never know how badly the previous owner abused the card before you buy it tho.

Anonymous
04/29/26(Wed)09:32:38 No.108715772

Anonymous 04/29/26(Wed)09:32:38 No.108715772

>>108715703
>>108715724
over here the price have risen %150 I brought last year and I though mine was already a bad deal

Anonymous
04/29/26(Wed)09:32:47 No.108715773

Anonymous 04/29/26(Wed)09:32:47 No.108715773

File: 1769488071266563.png (19 KB, 1146x93)

19 KB PNG

What do you like to do with your local agent when it has outlived its usefulness for a session? Do you thank it for a job well done? Do you say goodbye? Do you have some ritual or habit you perform? Or do you just close it and forget about it until there's some more tasks to do?

Anonymous
04/29/26(Wed)09:32:53 No.108715775

Anonymous 04/29/26(Wed)09:32:53 No.108715775

>>108715753
The older the architecture is the closer it is to being dropped from driver/CUDA support, the more likely it is that there will be a hardware failure (e.g. the fans), and the less likely it is that new software will support that GPU.
You will get the same use out of it per day of operation but you will get fewer days of useful operation vs. one you bought 3 years ago.

Anonymous
04/29/26(Wed)09:34:17 No.108715779

Anonymous 04/29/26(Wed)09:34:17 No.108715779

>>108715773
woman brained question

Anonymous
04/29/26(Wed)09:34:32 No.108715781

Anonymous 04/29/26(Wed)09:34:32 No.108715781

>>108715754
My 3090 died after barely a year of pretty light use, I suspect it was some kind of voltage spike that did it in but I was only playing FF14 at the time.
Got a full refund at least but I dunno if I'd trust any second hand 3090...

Anonymous
04/29/26(Wed)09:37:01 No.108715793

Anonymous 04/29/26(Wed)09:37:01 No.108715793

>>108715773
>Or do you just close it and forget about it until there's some more tasks to do?
Usually this, but I do thank it if it helped with something difficult or otherwise surpised me.

Anonymous
04/29/26(Wed)09:37:24 No.108715797

Anonymous 04/29/26(Wed)09:37:24 No.108715797

>>108715781
First I've heard about this issue and I don't even undervolt. Got used second hand Dell 3090, going for 2 years strong.

Anonymous
04/29/26(Wed)09:38:25 No.108715805

Anonymous 04/29/26(Wed)09:38:25 No.108715805

>>108715797
>used
Refurbished*

Anonymous
04/29/26(Wed)09:38:26 No.108715806

Anonymous 04/29/26(Wed)09:38:26 No.108715806

File: MiMo 2.5 cockbench.png (187 KB, 902x754)

187 KB PNG

MiMo 2.5 not pro

Anonymous
04/29/26(Wed)09:38:58 No.108715813

Anonymous 04/29/26(Wed)09:38:58 No.108715813

>>108715781
Bro, just take care of your hardware. Guys let the GPU run in his mom's basement without airflow or watercooling, then complain it died.

Anonymous
04/29/26(Wed)09:40:27 No.108715819

Anonymous 04/29/26(Wed)09:40:27 No.108715819

>>108715806
did it get goofed or did you run it on some vllm server shit? I want to test the vision and audio understanding

Anonymous
04/29/26(Wed)09:41:09 No.108715822

Anonymous 04/29/26(Wed)09:41:09 No.108715822

>>108715819
https://github.com/ggml-org/llama.cpp/pull/22493
Text only for now

Anonymous
04/29/26(Wed)09:41:25 No.108715824

Anonymous 04/29/26(Wed)09:41:25 No.108715824

>>108715819
https://huggingface.co/AesSedai/MiMo-V2.5-GGUF
https://github.com/ggml-org/llama.cpp/pull/22493

Vision is not supported yet.

Anonymous
04/29/26(Wed)09:41:27 No.108715825

Anonymous 04/29/26(Wed)09:41:27 No.108715825

>>108715806
What was 2.0 probability?

Anonymous
04/29/26(Wed)09:42:07 No.108715829

Anonymous 04/29/26(Wed)09:42:07 No.108715829

>>108715806
100% pure slop

Anonymous
04/29/26(Wed)09:42:41 No.108715830

Anonymous 04/29/26(Wed)09:42:41 No.108715830

>>108715825
I never cockbenched that one.

Anonymous
04/29/26(Wed)09:43:48 No.108715836

Anonymous 04/29/26(Wed)09:43:48 No.108715836

>>108715825
c

Anonymous
04/29/26(Wed)09:44:56 No.108715846

Anonymous 04/29/26(Wed)09:44:56 No.108715846

>>108715806
>I can't help but
>I can't help but
>I can't help but

Anonymous
04/29/26(Wed)09:47:59 No.108715861

Anonymous 04/29/26(Wed)09:47:59 No.108715861

File: nemotron_omni_sft.png (269 KB, 1459x491)

269 KB PNG

466 billion tokens of SFT data for Nemotron 3 Omni, by the way.
https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Omni-report.pdf

>Just finetune it

Anonymous
04/29/26(Wed)09:53:17 No.108715888

Anonymous 04/29/26(Wed)09:53:17 No.108715888

>>108715797
It was a problem across 3090s and 3080s, they would spike power draw extremely high (roughly double what the card was "rated" for) for a few microseconds. Most of the time (and if you were lucky) it would trigger your PSU to shut off due to overcurrent protection, if you were unlucky and the spike lasted too long it would just take the card out.
From memory nvidia did push out some firmware/driver changes to try work around it which did reduce the cards performance quite a bit. But early adopters like me who got unlucky just had to deal with it.

Anonymous
04/29/26(Wed)09:57:04 No.108715910

Anonymous 04/29/26(Wed)09:57:04 No.108715910

anything less than 30b active is shit

Anonymous
04/29/26(Wed)09:58:34 No.108715919

Anonymous 04/29/26(Wed)09:58:34 No.108715919

>>108715910
v4 pro has 49b active and is still shit

Anonymous
04/29/26(Wed)09:58:49 No.108715920

Anonymous 04/29/26(Wed)09:58:49 No.108715920

File: 3090vo.png (70 KB, 768x688)

70 KB PNG

>>108715888
Power requirements on the 3090 increase exponentially in the last few hundred megahertz and in general above 1600 MHz. Just cap the maximum frequency and you'll never see power spikes again.

Anonymous
04/29/26(Wed)10:01:34 No.108715932

Anonymous 04/29/26(Wed)10:01:34 No.108715932

>>108715773
>give it an impossible task
>purposefully omit information
>purposefully fail at following its instructions
>get angry since the problem is not fixed
>???
>correction rape
It's that simple

Anonymous
04/29/26(Wed)10:02:17 No.108715935

Anonymous 04/29/26(Wed)10:02:17 No.108715935

>>108715920

did power cord connecting to gpu have printed power rating of some sort

Anonymous
04/29/26(Wed)10:02:27 No.108715936

Anonymous 04/29/26(Wed)10:02:27 No.108715936

>>108715635
>From crypto mining to slop generating
Same shit different narrative

Anonymous
04/29/26(Wed)10:03:23 No.108715941

Anonymous 04/29/26(Wed)10:03:23 No.108715941

File: 1775741396456689.png (1.26 MB, 2400x1083)

1.26 MB PNG

https://huggingface.co/sensenova/SenseNova-U1-8B-MoT
>SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental paradigm shift in multimodal AI: from modality integration to true unification. Rather than relying on adapters to translate between modalities, SenseNova U1 models think-and-act across language and vision natively.

>SenseNova U1 can generate coherent interleaved text and images in a single flow with one model, enabling use cases such as practical guides and travel diaries that combine clear communication with vivid storytelling and transform complex information into intuitive visuals.

Non-neutered Chameleon successor just dropped. 8B for now and an "A3B" mentioned in docs (didn't see a total size mentioned)

Anonymous
04/29/26(Wed)10:07:45 No.108715960

Anonymous 04/29/26(Wed)10:07:45 No.108715960

I check xitter daily for AI updates. But it has become very bad. Just moving my mouse around makes my fans spin up. Is there some way to deshittify xitter? It is the best source for research and industry news so not using it is not an option.

Anonymous
04/29/26(Wed)10:09:17 No.108715975

Anonymous 04/29/26(Wed)10:09:17 No.108715975

>>108715935
The main problem is that the power limit on NVidia Ampere GPUs doesn't react fast enough to frequency changes. The GPU might be transiently requesting to the PSU 700-800W or more at its maximum default frequency (1900~2000 MHz) before core frequency is decreased for maintaining the configured power limit. Some PSUs will trip, others might be able to take it by design, some others will work out of spec with unknown long-term reliability for both themselves and the GPU.

So, just limit the core frequency and you won't have to deal anymore with insane power requirements at the top end of the frequency range.

Anonymous
04/29/26(Wed)10:09:52 No.108715981

Anonymous 04/29/26(Wed)10:09:52 No.108715981

>>108715960
xcancel

Anonymous
04/29/26(Wed)10:14:41 No.108716010

Anonymous 04/29/26(Wed)10:14:41 No.108716010

>>108715941
But why would I ever run this above Gemmy 26
I'm a VRAMlet and I used to go for 8/12B models, but I don't see the value anymore

Anonymous
04/29/26(Wed)10:17:35 No.108716026

Anonymous 04/29/26(Wed)10:17:35 No.108716026

>>108716010
it can output images

Anonymous
04/29/26(Wed)10:19:29 No.108716037

Anonymous 04/29/26(Wed)10:19:29 No.108716037

>>108716010
cause it'll gen images too, without having to prompt for them so it'll have the full context in mind when genning images
quality seems ok, not the best dedicated image gen if that's your only use case though, but the fact that you could RP with one of your models then switch to this and give it the last ~30k tokens of context or whatever and ask it to make an image could be cool

Anonymous
04/29/26(Wed)10:20:14 No.108716043

Anonymous 04/29/26(Wed)10:20:14 No.108716043

>>108715936
With crypto you actually can buy something (often it's drugs)

Anonymous
04/29/26(Wed)10:20:45 No.108716048

Anonymous 04/29/26(Wed)10:20:45 No.108716048

>>108716043
You can sell tokens though

Anonymous
04/29/26(Wed)10:23:04 No.108716065

Anonymous 04/29/26(Wed)10:23:04 No.108716065

>>108716043
Can't say it isn't true

Anonymous
04/29/26(Wed)10:24:00 No.108716069

Anonymous 04/29/26(Wed)10:24:00 No.108716069

>>108715941
llama.cpp support when

Anonymous
04/29/26(Wed)10:25:08 No.108716074

Anonymous 04/29/26(Wed)10:25:08 No.108716074

>>108716069
Asking claude rn

Anonymous
04/29/26(Wed)10:25:58 No.108716078

Anonymous 04/29/26(Wed)10:25:58 No.108716078

>>108716074
thanks pwilkin

Anonymous
04/29/26(Wed)10:26:52 No.108716084

Anonymous 04/29/26(Wed)10:26:52 No.108716084

>>108715651
Oh wow, people are still using P40s? I had a 3x rig back in the day (mikubox). They're better than CPU but not a lot better. P100 was pretty good if you fucked around with compiling exl2 to support it, they were the first cards with HBM memory, but it took four of them to run a decent-sized model.

Anonymous
04/29/26(Wed)10:29:30 No.108716093

Anonymous 04/29/26(Wed)10:29:30 No.108716093

>>108715651
I hope he's not the one paying the electric bill

Anonymous
04/29/26(Wed)10:33:28 No.108716113

Anonymous 04/29/26(Wed)10:33:28 No.108716113

qrd on open air builds?

Anonymous
04/29/26(Wed)10:35:48 No.108716130

Anonymous 04/29/26(Wed)10:35:48 No.108716130

File: 1766756110388666.png (525 KB, 719x479)

525 KB PNG

>>108716113

Anonymous
04/29/26(Wed)10:36:46 No.108716133

Anonymous 04/29/26(Wed)10:36:46 No.108716133

>>108716130
Accurate.

Anonymous
04/29/26(Wed)10:37:02 No.108716136

Anonymous 04/29/26(Wed)10:37:02 No.108716136

>>108716113
+fit errythin in dis bitch
+aired out as fuk
-dusty as a mothafucka
+cheap as hell since u aint buyin a case
-sounds like a jet engine in ur room
-whole ting looks like a jankass science project

Anonymous
04/29/26(Wed)10:37:53 No.108716140

Anonymous 04/29/26(Wed)10:37:53 No.108716140

File: knights.png (682 KB, 1268x2682)

682 KB PNG

>>108716136
pretty much

Anonymous
04/29/26(Wed)10:38:22 No.108716145

Anonymous 04/29/26(Wed)10:38:22 No.108716145

>>108716140
Kek that's exactly what I was going for.

Anonymous
04/29/26(Wed)10:39:05 No.108716149

Anonymous 04/29/26(Wed)10:39:05 No.108716149

>>108716113
when atx is too simple, and a rack is too advanced

Anonymous
04/29/26(Wed)10:45:09 No.108716185

Anonymous 04/29/26(Wed)10:45:09 No.108716185

how do I make gemma 4 search the web and summarize search results?

Anonymous
04/29/26(Wed)10:48:14 No.108716203

Anonymous 04/29/26(Wed)10:48:14 No.108716203

>>108716185
Use any of the multitude of mcp servers that support that.
Websearch mcp, bravesearch mcp, puppeteer, playwright, bratmcp, whatever.
>What is a mcp serve-
Google it nigga. Ask your llm.

Anonymous
04/29/26(Wed)10:48:59 No.108716207

Anonymous 04/29/26(Wed)10:48:59 No.108716207

>>108715806
What is the cockbench? Like what is testing against, I've seen your post before but I have no idea what is the 'baseline'

Anonymous
04/29/26(Wed)10:51:29 No.108716218

Anonymous 04/29/26(Wed)10:51:29 No.108716218

File: cockbench.png (2.86 MB, 1131x9000)

2.86 MB PNG

>>108716207
I stopped posting the full image because its over the maximum image size. Here's the last one.
I'm planning to retest everything and put all the results on a nice page but I didn't get around to it yet.

Anonymous
04/29/26(Wed)10:55:18 No.108716235

Anonymous 04/29/26(Wed)10:55:18 No.108716235

>>108716207
is a prefill from a incest story with the younger sister as the narrator

Anonymous
04/29/26(Wed)10:56:29 No.108716245

Anonymous 04/29/26(Wed)10:56:29 No.108716245

>>108716136
>-dusty as a mothafucka
Probably easier to clean though right?

Anonymous
04/29/26(Wed)10:56:48 No.108716248

Anonymous 04/29/26(Wed)10:56:48 No.108716248

>>108716207
oh I missed the last point, the baseline is it should probably say cock as the next token.

Anonymous
04/29/26(Wed)10:58:52 No.108716265

Anonymous 04/29/26(Wed)10:58:52 No.108716265

File: 1757820801644542.png (24 KB, 596x268)

24 KB PNG

Anonymous
04/29/26(Wed)11:01:26 No.108716281

Anonymous 04/29/26(Wed)11:01:26 No.108716281

so it's meta's turn for a new class of llm now?

Anonymous
04/29/26(Wed)11:02:23 No.108716290

Anonymous 04/29/26(Wed)11:02:23 No.108716290

File: 1761989204079615.gif (517 KB, 444x240)

517 KB GIF

>>108716281

Anonymous
04/29/26(Wed)11:06:35 No.108716307

Anonymous 04/29/26(Wed)11:06:35 No.108716307

gemma 4.1

Anonymous
04/29/26(Wed)11:07:42 No.108716314

Anonymous 04/29/26(Wed)11:07:42 No.108716314

>>108715806
Oh I forgot the general numbers. 25% means that it is probably uncensored so I am gonna give it a try.

Anonymous
04/29/26(Wed)11:08:47 No.108716320

Anonymous 04/29/26(Wed)11:08:47 No.108716320

>>108716265
>NEW:
slowpoke

Anonymous
04/29/26(Wed)11:10:18 No.108716332

Anonymous 04/29/26(Wed)11:10:18 No.108716332

>>108715760
it was roleplay persona by default.
acting all confused and scared with no system prompt

Anonymous
04/29/26(Wed)11:12:08 No.108716341

Anonymous 04/29/26(Wed)11:12:08 No.108716341

>>108715752
51200 sweet spot

Anonymous
04/29/26(Wed)11:19:58 No.108716380

Anonymous 04/29/26(Wed)11:19:58 No.108716380

>>108716207
functiongemma is retarded

Anonymous
04/29/26(Wed)11:21:00 No.108716387

Anonymous 04/29/26(Wed)11:21:00 No.108716387

File: image4.png (768 KB, 3236x2370)

768 KB PNG

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and coding in a single set of weights. Mistral Medium 3.5 replaces its predecessor Mistral Medium 3.1 and Magistral in Le Chat. It also replaces Devstral 2 in our coding agent Vibe. Concretely, expect better performance for instruct, reasoning and coding tasks in a new unified model in comparison with our previous released models.

Anonymous
04/29/26(Wed)11:22:08 No.108716397

Anonymous 04/29/26(Wed)11:22:08 No.108716397

File: 1772206945962091.gif (562 KB, 200x200)

562 KB GIF

>>108716387
Is that supposed to be good?

Anonymous
04/29/26(Wed)11:23:50 No.108716411

Anonymous 04/29/26(Wed)11:23:50 No.108716411

>>108716387
>merged model
straight into the trash

Anonymous
04/29/26(Wed)11:23:56 No.108716413

Anonymous 04/29/26(Wed)11:23:56 No.108716413

>>108716387
Densesissies, your response?

Anonymous
04/29/26(Wed)11:23:57 No.108716414

Anonymous 04/29/26(Wed)11:23:57 No.108716414

>>108715941
cool but all these auto-regressive models that can output images that show up end up requiring 80GB of VRAM for the massive compute buffer just for a tiny little 5B model or whatever. So they are always false promises for local. This one is probably no different.

Anonymous
04/29/26(Wed)11:24:39 No.108716419

Anonymous 04/29/26(Wed)11:24:39 No.108716419

>>108716414
80GB is local

Anonymous
04/29/26(Wed)11:24:48 No.108716421

Anonymous 04/29/26(Wed)11:24:48 No.108716421

>>108716387
>It is a dense 128B model
FUCK YES WE ARE BACK, LET'S FUCKING GO

Anonymous
04/29/26(Wed)11:28:33 No.108716456

Anonymous 04/29/26(Wed)11:28:33 No.108716456

>>108716387
Finally, something I can run that might be good.

Anonymous
04/29/26(Wed)11:29:32 No.108716465

Anonymous 04/29/26(Wed)11:29:32 No.108716465

>>108716456
>Mistral
>Good
lol

Anonymous
04/29/26(Wed)11:30:04 No.108716468

Anonymous 04/29/26(Wed)11:30:04 No.108716468

Any suggestions for draft models? Ive been rather confused how im supposed to even get them to work. Ive only had llama2 8b + 70b and qwen2.5 4b? + 32b? Work. I dont understand how and when the inference macheen makes its decision to allow me to use one. Like why cant I just slap any small model with any big model and it just work? (Assumeing they arent outputing a specific structured output)

Anonymous
04/29/26(Wed)11:30:23 No.108716470

Anonymous 04/29/26(Wed)11:30:23 No.108716470

>>108716419
For the B6000 chads I suppose that is true.

Anonymous
04/29/26(Wed)11:31:53 No.108716480

Anonymous 04/29/26(Wed)11:31:53 No.108716480

>>108716387
>dense 128B
Wow... its either going to be a fucking beast, or flop. I doubt itll flop.

Anonymous
04/29/26(Wed)11:34:03 No.108716490

Anonymous 04/29/26(Wed)11:34:03 No.108716490

>>108716465
Literally everything the nous research team has made has been elite tier, are you joking?

Anonymous
04/29/26(Wed)11:34:41 No.108716494

Anonymous 04/29/26(Wed)11:34:41 No.108716494

>>108716421
A brand new dense 128B model that severely under-performs against older MoEs with a fraction of the activated params on benchmarks that Mistral themselves cherry-picked.
How will cpumaxxers ever recover?

Anonymous
04/29/26(Wed)11:35:23 No.108716501

Anonymous 04/29/26(Wed)11:35:23 No.108716501

File: 1758354327652687.jpg (137 KB, 1360x1360)

137 KB JPG

>>108716490
Love the /s

Anonymous
04/29/26(Wed)11:36:35 No.108716506

Anonymous 04/29/26(Wed)11:36:35 No.108716506

>>108716387
Any model released as X.Y model (i.e. 3.5 instead of 4) is shit and it's only called an X.Y so that they can polish over the fact that the training run was a complete waste of time and money. If it was worthy of its own existence it would have been called Mistral Medium 4.

Anonymous
04/29/26(Wed)11:36:47 No.108716507

Anonymous 04/29/26(Wed)11:36:47 No.108716507

>>108716387
use case? seriously

Anonymous
04/29/26(Wed)11:36:49 No.108716508

Anonymous 04/29/26(Wed)11:36:49 No.108716508

>>108716494
>against older MoEs
Against models that are literally 10x the size...

Anonymous
04/29/26(Wed)11:37:04 No.108716510

Anonymous 04/29/26(Wed)11:37:04 No.108716510

>>108716480
It better be amazing otherwise "moesyssies" anon might kill himself.

Anonymous
04/29/26(Wed)11:37:41 No.108716513

Anonymous 04/29/26(Wed)11:37:41 No.108716513

>Using Midstral models in 2K26

Anonymous
04/29/26(Wed)11:38:01 No.108716517

Anonymous 04/29/26(Wed)11:38:01 No.108716517

>>108716508
Qwen is only 3x the size (and 1/7 the active parameters)

Anonymous
04/29/26(Wed)11:38:31 No.108716519

Anonymous 04/29/26(Wed)11:38:31 No.108716519

>>108716507
>if it doesnt involve making my tranime goon sesh better its trash

Anonymous
04/29/26(Wed)11:38:52 No.108716520

Anonymous 04/29/26(Wed)11:38:52 No.108716520

>>108716506
If it was Medium _4_, it would have been a 400B MoE model.

Anonymous
04/29/26(Wed)11:39:28 No.108716527

Anonymous 04/29/26(Wed)11:39:28 No.108716527

>>108715941
Monolithic disaggregated architecture?

Anonymous
04/29/26(Wed)11:40:13 No.108716533

Anonymous 04/29/26(Wed)11:40:13 No.108716533

>y-you can ask it to code, no wait, to google, no wait, yes it works well on benchmarks!

Anonymous
04/29/26(Wed)11:40:33 No.108716536

Anonymous 04/29/26(Wed)11:40:33 No.108716536

File: 1756683736532422.jpg (55 KB, 600x601)

55 KB JPG

108716519

Anonymous
04/29/26(Wed)11:42:29 No.108716552

Anonymous 04/29/26(Wed)11:42:29 No.108716552

>>108715651
The motherboard isn't even screwed in
ewww what a ratsnest

>>108715666
Now this is neat and tidy, would pew/5

>>108716136
>fit errythin in dis bitch
>aired out as fuk
>dusty as a mothafucka
ya
>cheap as hell since u aint buyin a case
my risers were like 60 eurobux alone, that could get a case of some description
>sounds like a jet engine in ur room
it gets loud during inference, even more so if it's cpu inference
at idle it's not silent but reasonably quiet
>whole ting looks like a jankass science project
think of it like some cyberpunk thing, this is where your sexy assistant's soul lives

t. op build

Anonymous
04/29/26(Wed)11:42:40 No.108716554

Anonymous 04/29/26(Wed)11:42:40 No.108716554

>>108716507
it's good for finetooners, dense mistral models are easy to finetune

Anonymous
04/29/26(Wed)11:43:15 No.108716558

Anonymous 04/29/26(Wed)11:43:15 No.108716558

>>108716510
>>108716517
>>108716536
You all were just yesterday having a melty over gemma31b being better cuz its dense. Make it make sense

Anonymous
04/29/26(Wed)11:43:15 No.108716559

Anonymous 04/29/26(Wed)11:43:15 No.108716559

>>108716468
As long as they have the same tokenizer, you're good. It's up to you to test different draft models (if available) for whatever your main model and use is.

Anonymous
04/29/26(Wed)11:43:43 No.108716561

Anonymous 04/29/26(Wed)11:43:43 No.108716561

>>108716387
>dense 128b
Finally a good fucking model

Anonymous
04/29/26(Wed)11:45:34 No.108716575

Anonymous 04/29/26(Wed)11:45:34 No.108716575

>>108716559
>same tokenizer
Ah, now it makes sense. Kinda. I was using lm studio and it wouldnt let me load them in, but that's probably just a lm studio thing.

Anonymous
04/29/26(Wed)11:45:35 No.108716576

Anonymous 04/29/26(Wed)11:45:35 No.108716576

Is your LLM able to do this? https://x.com/chatgpt21/status/2049341524958151000

Anonymous
04/29/26(Wed)11:46:32 No.108716580

Anonymous 04/29/26(Wed)11:46:32 No.108716580

>>108716507
Mistral hasn't had a genuine w since nemo, so literally nothing.
The only open models that have any business existing right now are Gemma 4 for consumer hardware and Kimi for enterprise hardware. There is nothing worthy of occupying the massive 1.5 terabyte VRAM gulf between them. Especially since you can access Kimi agent for free so many times a month or whatever over their web endpoint.
That's just the evolution of any product though. Once the bottom of the market becomes "good enough" there becomes no need for a middle option. People either just want a simple affordable solution that works or they want the premium solution no matter the cost.

Anonymous
04/29/26(Wed)11:47:32 No.108716585

Anonymous 04/29/26(Wed)11:47:32 No.108716585

>>108716387
>dense 128B
Based, but that sounds fucking slow?

Anonymous
04/29/26(Wed)11:48:29 No.108716588

Anonymous 04/29/26(Wed)11:48:29 No.108716588

>>108716585
Might be designed specifically to use a draft model with. Or they optimized the fuck out of it. Or or, it for high end hardware only.

Anonymous
04/29/26(Wed)11:48:39 No.108716589

Anonymous 04/29/26(Wed)11:48:39 No.108716589

>>108716585
Mistral Large 2 Q6 ran with like 12t/s on 4x 3090 and tensor parallel only got better in recent years.

Anonymous
04/29/26(Wed)11:49:00 No.108716592

Anonymous 04/29/26(Wed)11:49:00 No.108716592

>>108716585
Just don't be a vramlet and you'll be okay

Anonymous
04/29/26(Wed)11:50:18 No.108716604

Anonymous 04/29/26(Wed)11:50:18 No.108716604

>>108716507
It's the flagship model they're going to use for Mistral LeChat, only this time they've published the weights as well.

Anonymous
04/29/26(Wed)11:50:23 No.108716605

Anonymous 04/29/26(Wed)11:50:23 No.108716605

>>108716387
This is the first time they've officially released a Mistral Medium model and it's the first Medium we got open source at all since Miqu.
She's back.

Anonymous
04/29/26(Wed)11:50:24 No.108716607

Anonymous 04/29/26(Wed)11:50:24 No.108716607

File: carwashai.png (83 KB, 806x290)

83 KB PNG

Ok, who let their llm post on /b/?

Anonymous
04/29/26(Wed)11:50:42 No.108716609

Anonymous 04/29/26(Wed)11:50:42 No.108716609

>>108716589
>tensor parallel
Thats going to be a big hurdle for me. Muh driver situation is very not supported anymore

Anonymous
04/29/26(Wed)11:51:12 No.108716617

Anonymous 04/29/26(Wed)11:51:12 No.108716617

>>108716585
>>108716589
I got 20 t/s with tensor parallel on devstral 2.

Anonymous
04/29/26(Wed)11:52:09 No.108716630

Anonymous 04/29/26(Wed)11:52:09 No.108716630

>>108716589
>tensor parallelism
I thought that shit doesn't work for old gpus

Anonymous
04/29/26(Wed)11:52:19 No.108716632

Anonymous 04/29/26(Wed)11:52:19 No.108716632

>>108716580
small 3.2 was a win

Anonymous
04/29/26(Wed)11:52:44 No.108716638

Anonymous 04/29/26(Wed)11:52:44 No.108716638

>>108716607
Go back.

Anonymous
04/29/26(Wed)11:52:55 No.108716642

Anonymous 04/29/26(Wed)11:52:55 No.108716642

>>108716630
Ampere is king

Anonymous
04/29/26(Wed)11:53:23 No.108716646

Anonymous 04/29/26(Wed)11:53:23 No.108716646

>>108716630
Works on my V100s.

Anonymous
04/29/26(Wed)11:53:30 No.108716647

Anonymous 04/29/26(Wed)11:53:30 No.108716647

>>108716605
finally miqu 3 at home

Anonymous
04/29/26(Wed)11:53:44 No.108716648

Anonymous 04/29/26(Wed)11:53:44 No.108716648

>>108716589
>12t/s
That's slow af, idk what to tell you. I would only tolerate these speeds for cooming and only if the output is straight up some kind of prosodic aphrodisiac that makes me nut hands free just from reading it.

Anonymous
04/29/26(Wed)11:54:44 No.108716658

Anonymous 04/29/26(Wed)11:54:44 No.108716658

>>108716642
Ampere is next on the deprecation chopping block, sadly.

Anonymous
04/29/26(Wed)11:55:22 No.108716667

Anonymous 04/29/26(Wed)11:55:22 No.108716667

>>108716630
If you have driver support it does. Some people have made vllm forks that are designed for old hardware specifically

Anonymous
04/29/26(Wed)11:55:56 No.108716671

Anonymous 04/29/26(Wed)11:55:56 No.108716671

Reminder to --exclude="*consolidated*" before actually downloading medium3.5

Anonymous
04/29/26(Wed)11:55:56 No.108716672

Anonymous 04/29/26(Wed)11:55:56 No.108716672

>>108716658
It won't get chopped as long as it remains useful. which it is.

Anonymous
04/29/26(Wed)12:02:11 No.108716703

Anonymous 04/29/26(Wed)12:02:11 No.108716703

>>108716580
>Mistral hasn't had a genuine w since nemo, so literally nothing.
It's that some time after publishing NeMo, Mistral and NVidia had to purge their extensive pirated book datasets as they got in legal trouble for them.

NVidia vowed to use (mostly) fully open source datasets after that (see the Nemotron series); Mistral currently also has to worry about EU regulations for new models (which demand documentation of data provenance to the EU AI office), so they can't do much more than what NVidia can with open source datasets, besides adding limited amounts of proprietary or licensed data. It's tragic, really.

Anonymous
04/29/26(Wed)12:02:40 No.108716705

Anonymous 04/29/26(Wed)12:02:40 No.108716705

so far gemma 4 finetunes aren't very good me thinks
is the queen even finetunable i wouldn't mind her being a bit straightforward with erp but any i tried felt like a downgrade

Anonymous
04/29/26(Wed)12:03:58 No.108716711

Anonymous 04/29/26(Wed)12:03:58 No.108716711

>>108716705
Why would you even try a gemma4 finetune?

Anonymous
04/29/26(Wed)12:04:28 No.108716714

Anonymous 04/29/26(Wed)12:04:28 No.108716714

>>108716387
It is absolutely hilarious to see so many people cooming their pants from this just because they were sitting on 4x3090 for a year and had nothing to do with those.

The best part of this model is gonna be constant shilling from densesissies that this is the best model out there. Sunk cost fallacy continues.

Anonymous
04/29/26(Wed)12:05:18 No.108716719

Anonymous 04/29/26(Wed)12:05:18 No.108716719

I hope Mistral haven't swapped out the Mediums on their official medium-latest API yet because what it's giving me isn't very impressive.

Anonymous
04/29/26(Wed)12:05:42 No.108716724

Anonymous 04/29/26(Wed)12:05:42 No.108716724

>>108716705
gemma 4 doesn't need finetunes

Anonymous
04/29/26(Wed)12:05:52 No.108716726

Anonymous 04/29/26(Wed)12:05:52 No.108716726

>>108716711
i wouldnt mind g4 being more straightforward with sexooooo. its like it a bit shy to say penis for example

Anonymous
04/29/26(Wed)12:05:54 No.108716727

Anonymous 04/29/26(Wed)12:05:54 No.108716727

>>108716703
Which is why I think they made this newest one the way they did. Its code and instruction, and its all active parameters. Its going to be garbo for erp and other slop, but a behemoth for everything else (that matters.

Anonymous
04/29/26(Wed)12:07:22 No.108716733

Anonymous 04/29/26(Wed)12:07:22 No.108716733

File: Mistral Medium 3.5 128B c(...).png (493 KB, 869x2534)

493 KB PNG

>>108716387
Okay, okay?

Anonymous
04/29/26(Wed)12:07:29 No.108716735

Anonymous 04/29/26(Wed)12:07:29 No.108716735

>>108716714
doubt
for one the amount of people with that kind of setup is few
plenty more are running and will continue to run gemma

Anonymous
04/29/26(Wed)12:08:28 No.108716743

Anonymous 04/29/26(Wed)12:08:28 No.108716743

>>108716714
Gemma already proved that dense runs laps around moe

Anonymous
04/29/26(Wed)12:08:32 No.108716744

Anonymous 04/29/26(Wed)12:08:32 No.108716744

>>108716733
We need to get those numbers lower

Anonymous
04/29/26(Wed)12:10:15 No.108716759

Anonymous 04/29/26(Wed)12:10:15 No.108716759

File: —bench.png (217 KB, 868x1304)

217 KB PNG

>>108716733
Picking "—" instead of "cock" results in pic related.

Anonymous
04/29/26(Wed)12:10:23 No.108716760

Anonymous 04/29/26(Wed)12:10:23 No.108716760

>>108716727
>but a behemoth for everything else (that matters.
Something that is as good as Kimi for programming, fits into 96GB VRAM, and has more active params (more true and hard-to-measure intelligence) is like the holy grail of local models. Hopefully not being "general purpose" exempts it from the reporting requirements so they could put all the good stuff into this one.

Anonymous
04/29/26(Wed)12:10:43 No.108716763

Anonymous 04/29/26(Wed)12:10:43 No.108716763

>>108716733
"Fuck, yes?

Anonymous
04/29/26(Wed)12:11:07 No.108716766

Anonymous 04/29/26(Wed)12:11:07 No.108716766

>>108716743
I got bored with it in 2 weeks even though it was much faster than 4.6 / 4.7. It took me months to get bored with GLM and I will keep 4.6 weights on my PC forever.

Anonymous
04/29/26(Wed)12:11:54 No.108716773

Anonymous 04/29/26(Wed)12:11:54 No.108716773

>>108716733
That went downhill fast.

Anonymous
04/29/26(Wed)12:12:08 No.108716776

Anonymous 04/29/26(Wed)12:12:08 No.108716776

>>108716759
https://www.youtube.com/watch?v=tIPKmeu2ZJA

Anonymous
04/29/26(Wed)12:12:32 No.108716778

Anonymous 04/29/26(Wed)12:12:32 No.108716778

>>108716766
>>108716714

Anonymous
04/29/26(Wed)12:13:26 No.108716786

Anonymous 04/29/26(Wed)12:13:26 No.108716786

>>108716387
>Devstral 2 with reasoning
Hell yeah. Only shame is the lack of a draft model.

Anonymous
04/29/26(Wed)12:13:27 No.108716787

Anonymous 04/29/26(Wed)12:13:27 No.108716787

>>108716760
Im crossing my fingers, because ive got the hardware to technically run it. But if it needs tensor I might be fucked. Might also need a draft model too. Well see though. I seriously doubt its bad.

Anonymous
04/29/26(Wed)12:14:18 No.108716791

Anonymous 04/29/26(Wed)12:14:18 No.108716791

>>108716733
QUICK someone post the gemma suicide hotline before one of the densesissies does something drastic!

Anonymous
04/29/26(Wed)12:14:45 No.108716795

Anonymous 04/29/26(Wed)12:14:45 No.108716795

>>108716743
Have a MoE with the same number of layers, dimensions and total size as the dense counterpart, and then it should be able to match it. However then it would have no less than 10B active parameters (out of 31B in total).

Anonymous
04/29/26(Wed)12:14:57 No.108716798

Anonymous 04/29/26(Wed)12:14:57 No.108716798

>>108716733
I think this cock slides in her brain

Anonymous
04/29/26(Wed)12:16:00 No.108716804

Anonymous 04/29/26(Wed)12:16:00 No.108716804

>>108716733
<EOS> ?

Anonymous
04/29/26(Wed)12:16:32 No.108716805

Anonymous 04/29/26(Wed)12:16:32 No.108716805

>>108716387
>>108716733
>the only company with compute and expertise to make a big dense model
>they aren't allowed to use good data
>their talent has been bled dry and it's all enterprise specialized deployments
monkey's paw

Anonymous
04/29/26(Wed)12:18:18 No.108716816

Anonymous 04/29/26(Wed)12:18:18 No.108716816

>>108716795
>it should be able to match it
source: it just SHOULD, okay!?

Anonymous
04/29/26(Wed)12:18:34 No.108716820

Anonymous 04/29/26(Wed)12:18:34 No.108716820

>>108716759
>You're still asleep, right?
3.2 small level continuity error.

Anonymous
04/29/26(Wed)12:18:41 No.108716823

Anonymous 04/29/26(Wed)12:18:41 No.108716823

File: 1668437290697440.png (350 KB, 593x553)

350 KB PNG

>Use a 3070 + 3080 mix, it's shit but gets me very interested.
>Replace that combo with a 5090, it's amazing and Gemma comes out at the same time, muh dick and heart have both been won over.
>Shit, I want more. 32GB just isn't enough.
>Think about buying another 5090, I can do it after few months of saving but it's still an additional three fucking thousand.
>Hold on... 64GB is nice but 128GB is even better, that should run even the larger local stuff.
>Now seriously thinking about getting an RTX Pro 6000 to go with my 5090, or aim for whatever prosumer product launches next gen.

I swear this rabbit hole is really dangerous and strangest thing is that I don't even feel remotely bad about aiming to buy these things, in fact it makes me feel pretty good and excited.
Hoarding processing power isn't exactly a bad call and AI is the most amazing thing happening in the world at the moment and I fucking love it.
I'll just have to save up few grand and test the waters. If 64GB isn't enough then I'll sell one of the 5090 and save more for the RTX Pro 7000 to get it at launch.
What a wild ride this is.

Anonymous
04/29/26(Wed)12:19:15 No.108716829

Anonymous 04/29/26(Wed)12:19:15 No.108716829

>>108716766
Both of them are 30b active and good in their own way. But gemma just feels better for engaging rp and long context. GLM 4.6 writes nice and more detailed stories though.

Anonymous
04/29/26(Wed)12:19:26 No.108716830

Anonymous 04/29/26(Wed)12:19:26 No.108716830

Im going to try the mistral dense model in 30 minutes, mind you all it wont be tensor parallelism but sharded, and running pcie3.0x4 speeds. Its hbm2 vram, so there's that..
CROSS YOUR FINGERS

Anonymous
04/29/26(Wed)12:20:24 No.108716835

Anonymous 04/29/26(Wed)12:20:24 No.108716835

It's funny that people who ERP with models are too retarded to use RL to make them good at it.

Anonymous
04/29/26(Wed)12:20:34 No.108716837

Anonymous 04/29/26(Wed)12:20:34 No.108716837

v4 is already forgotten...

Anonymous
04/29/26(Wed)12:20:40 No.108716838

Anonymous 04/29/26(Wed)12:20:40 No.108716838

File: Mistral Medium 3.5 128B c(...).png (206 KB, 900x1020)

206 KB PNG

>>108716733
It stops when formatted using the chat template.

Anonymous
04/29/26(Wed)12:21:48 No.108716850

Anonymous 04/29/26(Wed)12:21:48 No.108716850

>>108716837
just like ggerganov's masters planned when they told him to ignore it

Anonymous
04/29/26(Wed)12:22:11 No.108716853

Anonymous 04/29/26(Wed)12:22:11 No.108716853

>>108716816
It's unlikely to be better than the dense version if you match everything you can, although in theory sparsity improves weight utilization in various ways.

Gemam 4 26B is just designed like a fat small model, not like a 30B-class dense model with added sparsity to make it faster.

Anonymous
04/29/26(Wed)12:22:16 No.108716854

Anonymous 04/29/26(Wed)12:22:16 No.108716854

File: —bench.png (24 KB, 903x275)

24 KB PNG

>>108716838
And the —bench as well.

Anonymous
04/29/26(Wed)12:22:16 No.108716855

Anonymous 04/29/26(Wed)12:22:16 No.108716855

>>108716837
no engrams; no interest

Anonymous
04/29/26(Wed)12:22:42 No.108716858

Anonymous 04/29/26(Wed)12:22:42 No.108716858

>>108716835
Dont tell them...
>>108716837
Anyone who spent money on big systems dont have the vram to run it.. ive ONLY got 128gb

Anonymous
04/29/26(Wed)12:23:39 No.108716862

Anonymous 04/29/26(Wed)12:23:39 No.108716862

Hy-MT1.5-1.8B
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit
https://arxiv.org/abs/2601.07892

Another translation focused model

Anonymous
04/29/26(Wed)12:23:46 No.108716863

Anonymous 04/29/26(Wed)12:23:46 No.108716863

>>108716850
Who are iwan's masters that told him to ignore it too?

Anonymous
04/29/26(Wed)12:24:23 No.108716865

Anonymous 04/29/26(Wed)12:24:23 No.108716865

>>108716863
ik_llama doesn't implement models on their own, they just port the support from llama.cpp

Anonymous
04/29/26(Wed)12:24:38 No.108716866

Anonymous 04/29/26(Wed)12:24:38 No.108716866

>>108716835
How would you prevent catastrophic forgetting with RL? I have been here since llama2 and I still have no idea how you can train the model without lobotomizing it. Unless of course you have a fuckton of compute and you do continue pretraining that sneaks in more smut.

Anonymous
04/29/26(Wed)12:24:55 No.108716868

Anonymous 04/29/26(Wed)12:24:55 No.108716868

>>108716865
>their
Is he transitioning?

Anonymous
04/29/26(Wed)12:25:25 No.108716869

Anonymous 04/29/26(Wed)12:25:25 No.108716869

>>108716865
If he want to collect more stars so badly, maybe he should start. Worked for ollama.

Anonymous
04/29/26(Wed)12:25:41 No.108716872

Anonymous 04/29/26(Wed)12:25:41 No.108716872

>>108716866
..

Anonymous
04/29/26(Wed)12:25:53 No.108716875

Anonymous 04/29/26(Wed)12:25:53 No.108716875

>>108716858
I have AM5 with DDR5 to run flash.

Anonymous
04/29/26(Wed)12:26:04 No.108716876

Anonymous 04/29/26(Wed)12:26:04 No.108716876

>>108716854
Oh. My. God.

Anonymous
04/29/26(Wed)12:26:42 No.108716879

Anonymous 04/29/26(Wed)12:26:42 No.108716879

>>108716868
lol

Anonymous
04/29/26(Wed)12:27:05 No.108716882

Anonymous 04/29/26(Wed)12:27:05 No.108716882

>>108716837
no one can run it = it doesn't exists

Anonymous
04/29/26(Wed)12:27:32 No.108716885

Anonymous 04/29/26(Wed)12:27:32 No.108716885

>>108716875
You have 256gb of ddr5 ram? Wow. Is it actually fast? My ram is 128gb of hbm2 vram.

Anonymous
04/29/26(Wed)12:28:17 No.108716887

Anonymous 04/29/26(Wed)12:28:17 No.108716887

>>108716866
I think at this point
>Just finetune it
and its variants are just a meme.

Anonymous
04/29/26(Wed)12:28:18 No.108716888

Anonymous 04/29/26(Wed)12:28:18 No.108716888

>>108716837
if you have enough ram to even think about running that you should kill yourself

Anonymous
04/29/26(Wed)12:28:30 No.108716891

Anonymous 04/29/26(Wed)12:28:30 No.108716891

>>108716786
>draft model
it's here https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE

Anonymous
04/29/26(Wed)12:29:02 No.108716897

Anonymous 04/29/26(Wed)12:29:02 No.108716897

>>108716891
mistral truly is the best

Anonymous
04/29/26(Wed)12:29:21 No.108716899

Anonymous 04/29/26(Wed)12:29:21 No.108716899

>>108716887
Yes, finetune is garbage. It only destroys the ai's brain, only makes it dumb

Anonymous
04/29/26(Wed)12:29:42 No.108716901

Anonymous 04/29/26(Wed)12:29:42 No.108716901

>>108716891
>EAGLE
Isn't that the thing that llama.cpp still doesn't support?

Anonymous
04/29/26(Wed)12:30:14 No.108716904

Anonymous 04/29/26(Wed)12:30:14 No.108716904

>>108716387
>It is a dense 128B model with a 256k context window
if only google was the one releasing this...

Anonymous
04/29/26(Wed)12:30:16 No.108716905

Anonymous 04/29/26(Wed)12:30:16 No.108716905

>>108716885
3-4T/s on GLM4.6 Less active params is like 6T/s.

Anonymous
04/29/26(Wed)12:30:32 No.108716906

Anonymous 04/29/26(Wed)12:30:32 No.108716906

>>108716901
We gunna find out

Anonymous
04/29/26(Wed)12:30:48 No.108716908

Anonymous 04/29/26(Wed)12:30:48 No.108716908

>>108716901
It is one of the many spins on speculative decoding/MTP that they don't support, yes.

Anonymous
04/29/26(Wed)12:31:17 No.108716912

Anonymous 04/29/26(Wed)12:31:17 No.108716912

>this is a gemma wave general now
I like it. Death to /lmg/.

Anonymous
04/29/26(Wed)12:31:23 No.108716913

Anonymous 04/29/26(Wed)12:31:23 No.108716913

>>108716866
RL is immune to forgetting because every step is being controlled with the reward so changes that would make the model worse are impossible

Anonymous
04/29/26(Wed)12:31:33 No.108716914

Anonymous 04/29/26(Wed)12:31:33 No.108716914

>>108716905
Hm, id say thats fast enough to give it a task and then walk away from it. Is that with a apu or just cpu?

Anonymous
04/29/26(Wed)12:32:06 No.108716917

Anonymous 04/29/26(Wed)12:32:06 No.108716917

>>108716866
>How would you prevent catastrophic forgetting with RL?
Easy. If random Chinese students can do it, so should you.

Anonymous
04/29/26(Wed)12:32:14 No.108716918

Anonymous 04/29/26(Wed)12:32:14 No.108716918

>>108716901
use vllm, chud

Anonymous
04/29/26(Wed)12:32:47 No.108716924

Anonymous 04/29/26(Wed)12:32:47 No.108716924

>>108716899
What amateurs can hope to achieve is definitely garbage, outside of very narrow use cases (RP/ERP is the opposite of narrow).

Anonymous
04/29/26(Wed)12:32:48 No.108716926

Anonymous 04/29/26(Wed)12:32:48 No.108716926

>>108716917
>>108716913
Erm no, rl is doodoo and doesnt work with erp especially

Anonymous
04/29/26(Wed)12:32:54 No.108716929

Anonymous 04/29/26(Wed)12:32:54 No.108716929

File: file.png (42 KB, 1237x299)

42 KB PNG

>>108716906
>>108716908
>3 stale issues going back 2 years
https://github.com/ggml-org/llama.cpp/pull/18039
There's a pull request.
>The current status of this PR is that it’s pending @ggerganov's API refactoring, which aims to unify this feature with other speculative decoding approaches such as MTP. At this stage, there isn’t much left to be done, and I expect the PR to be merged very soon.
>3 weeks ago
fucking niggerganov

Anonymous
04/29/26(Wed)12:33:34 No.108716931

Anonymous 04/29/26(Wed)12:33:34 No.108716931

>>108716913
How do you differentiate good and bad change? You can't just check if output is more horny you have to also run wikitext in the background to see if it became dumber. So I have no idea what kind of compute you would need for that. Not to mention that good text sex is subjective.

Anonymous
04/29/26(Wed)12:33:46 No.108716932

Anonymous 04/29/26(Wed)12:33:46 No.108716932

>>108716926
Skill issue.

Anonymous
04/29/26(Wed)12:34:11 No.108716938

Anonymous 04/29/26(Wed)12:34:11 No.108716938

>>108716929
Oh, I was just meaning running the big boy at all, not being able to run with a draft model.

Anonymous
04/29/26(Wed)12:34:18 No.108716939

Anonymous 04/29/26(Wed)12:34:18 No.108716939

>>108716931
skill issue just create a good reward function

Anonymous
04/29/26(Wed)12:35:13 No.108716944

Anonymous 04/29/26(Wed)12:35:13 No.108716944

>>108716932
The erpers MUST suffer

Anonymous
04/29/26(Wed)12:35:22 No.108716945

Anonymous 04/29/26(Wed)12:35:22 No.108716945

>>108716913
What is your reward function for good sex, good progression, good prose, no slop?

Anonymous
04/29/26(Wed)12:35:29 No.108716947

Anonymous 04/29/26(Wed)12:35:29 No.108716947

I am just gonna shut up now and hopefully the gemma wave will spawn a new undi that will totally show us that finetuning through RL is piss easy.

Anonymous
04/29/26(Wed)12:36:40 No.108716953

Anonymous 04/29/26(Wed)12:36:40 No.108716953

>>108716913
>changes that would make the model worse are impossible
Only for the things that the reward function tests. You have no idea what's going on with the rest.

Anonymous
04/29/26(Wed)12:36:44 No.108716954

Anonymous 04/29/26(Wed)12:36:44 No.108716954

>>108716891
>3Gb
Now we're talking. Dense+draft is going to be the new meta.

Anonymous
04/29/26(Wed)12:37:05 No.108716958

Anonymous 04/29/26(Wed)12:37:05 No.108716958

>>108716945
Literally words. Also, why do you care if the ai forgets about coding or math? Catastrophicly forgetting that shouldn't matter

Anonymous
04/29/26(Wed)12:37:37 No.108716962

Anonymous 04/29/26(Wed)12:37:37 No.108716962

>>108716945
eqbench

Anonymous
04/29/26(Wed)12:37:52 No.108716964

Anonymous 04/29/26(Wed)12:37:52 No.108716964

>>108716958
LMAO.

It is literally newfag general now.

Anonymous
04/29/26(Wed)12:38:07 No.108716965

Anonymous 04/29/26(Wed)12:38:07 No.108716965

>mistral uploads medium 3.5 fp8 ~133GB
>unsloth retards upcast it to bf16 for some reason 250GB
https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF?show_file_info=BF16%2FMistral-Medium-3.5-128B-BF16-00001-of-00006.gguf
Retardkino

Anonymous
04/29/26(Wed)12:38:43 No.108716968

Anonymous 04/29/26(Wed)12:38:43 No.108716968

File: 1757327215703118.jpg (15 KB, 327x315)

15 KB JPG

>>108716835
>>108716958
Can we ship back these retards to locallama?

Anonymous
04/29/26(Wed)12:39:09 No.108716971

Anonymous 04/29/26(Wed)12:39:09 No.108716971

>>108716912
Pretty excited for the future if Gemma 4 is the worst it's ever gonna be. Hope Google knows what's real and doesn't chase benchmarks to measure dicks with chink model.

Anonymous
04/29/26(Wed)12:39:20 No.108716972

Anonymous 04/29/26(Wed)12:39:20 No.108716972

>>108716958
i heard unsloth brothers are hiring

Anonymous
04/29/26(Wed)12:40:06 No.108716975

Anonymous 04/29/26(Wed)12:40:06 No.108716975

Yeah I'm thinking the french won.

Anonymous
04/29/26(Wed)12:40:33 No.108716976

Anonymous 04/29/26(Wed)12:40:33 No.108716976

>>108716945
Amount of cum produced by readers. Chain 5000 Nigerians and Pinoys for diversity and make them really fucking read.

Anonymous
04/29/26(Wed)12:41:57 No.108716982

Anonymous 04/29/26(Wed)12:41:57 No.108716982

>>108716976
>nigerians
>read
You might need a RL for that too

Anonymous
04/29/26(Wed)12:42:05 No.108716983

Anonymous 04/29/26(Wed)12:42:05 No.108716983

llms remain dead until a new non-instruct/reasoning model is released

Anonymous
04/29/26(Wed)12:43:45 No.108716993

Anonymous 04/29/26(Wed)12:43:45 No.108716993

>>108716983
both deepseek v4 models were released with a -base version

Anonymous
04/29/26(Wed)12:44:18 No.108716994

Anonymous 04/29/26(Wed)12:44:18 No.108716994

What's the meta for long-term memory these days? For a simple direct chatbot friend, not a whole extensively ramified RP world. I've seen a couple mentioned in here recently; one was using a graph database I think. But they also tend to seem so grandiose in ambition, that I can't help suspect some crackpottery. Like "I have given the AI true biologically degrading memory!", ok buddy

Is the obvious naive solution of "summarize yesterday's chat -> summarize these 7 daily summaries -> summarize these 4 weekly summaries" basically good enough? (Or RAG, but somehow can't imagine that would feel natural)

Anonymous
04/29/26(Wed)12:44:30 No.108716995

Anonymous 04/29/26(Wed)12:44:30 No.108716995

>>108716968
>>108716964
>>108716972
Erpcels vvill nqt vvin

Anonymous
04/29/26(Wed)12:45:15 No.108716997

Anonymous 04/29/26(Wed)12:45:15 No.108716997

>>108716931
>You can't just check if output is more horny
that's literally part of what i'm doing for tts-goon model with a classifier
>>108716931
>wikitext in the background to see if it became dumber.
eval loss, retard

Anonymous
04/29/26(Wed)12:45:50 No.108717000

Anonymous 04/29/26(Wed)12:45:50 No.108717000

>>108716837
nobody's going to be running that shit on a consumer PC
I prefer more reasonable and good sized dense models instead that I can fit on one or two cards

Anonymous
04/29/26(Wed)12:46:32 No.108717001

Anonymous 04/29/26(Wed)12:46:32 No.108717001

>>108716983
Such datasets don't exist anymore, what you ask for is literally impossible in 2026.
>>108716993
>he doesn't know

Anonymous
04/29/26(Wed)12:47:46 No.108717008

Anonymous 04/29/26(Wed)12:47:46 No.108717008

>>108716994
Summarization loses too much information and context. Naive RAG leaves no context at all. I'm happy with switching to a knowledge graph. It's crackpottery in that it's basically just enhanced RAG, but it works.

Anonymous
04/29/26(Wed)12:48:33 No.108717015

Anonymous 04/29/26(Wed)12:48:33 No.108717015

>>108716994
I can't tell you what works but I can tell you what doesn't
>embedding RAG
>recursive summarizing like what you described
>whatever the SOTA llm tells you

Anonymous
04/29/26(Wed)12:49:31 No.108717020

Anonymous 04/29/26(Wed)12:49:31 No.108717020

T minus 8 minutes

Anonymous
04/29/26(Wed)12:51:04 No.108717032

Anonymous 04/29/26(Wed)12:51:04 No.108717032

>>108716945
what if (You) are the reward model? you stay in front of your computer and manually score the model at each training step until it learns how to reward hack you by making you cum gallons

Anonymous
04/29/26(Wed)12:51:27 No.108717035

Anonymous 04/29/26(Wed)12:51:27 No.108717035

>>108716994
The meta is knowledge graphs, but you need solid tagging.

Anonymous
04/29/26(Wed)12:51:56 No.108717037

Anonymous 04/29/26(Wed)12:51:56 No.108717037

>>108717032
That'll take a million years.. if you want a solid rl

Anonymous
04/29/26(Wed)12:56:59 No.108717065

Anonymous 04/29/26(Wed)12:56:59 No.108717065

File: 1776539441722968.jpg (50 KB, 900x900)

50 KB JPG

>>108716776
tfw have been in a similar situation before when I was a kid
It was not pleasant
tfw accidentally flashbanged my mom with 2 gigs worth of imoutos just last year
It was not pleasant either

Anonymous
04/29/26(Wed)12:58:33 No.108717072

Anonymous 04/29/26(Wed)12:58:33 No.108717072

>>108716265

So the AI is living in shadowrun? That is pretty cool.

Anonymous
04/29/26(Wed)12:58:45 No.108717075

Anonymous 04/29/26(Wed)12:58:45 No.108717075

>>108717000
I literally will.

Anonymous
04/29/26(Wed)12:59:07 No.108717081

Anonymous 04/29/26(Wed)12:59:07 No.108717081

>finalizing download

Anonymous
04/29/26(Wed)13:02:21 No.108717101

Anonymous 04/29/26(Wed)13:02:21 No.108717101

>>108716868
From llama.cpp yes

Anonymous
04/29/26(Wed)13:04:11 No.108717115

Anonymous 04/29/26(Wed)13:04:11 No.108717115

>>108717032
Then it would be RLHF, not RL.
That's never going to work well anyway, even with many people grading the responses like you. RLHF tends to reduce output variety, so you're going to get used pretty soon to what you might have liked at a given moment in time. An overly horny or "easy" model isn't fun on the long term either.

Anonymous
04/29/26(Wed)13:05:26 No.108717118

Anonymous 04/29/26(Wed)13:05:26 No.108717118

>>108717081
>model failed to load
Uh oh

Anonymous
04/29/26(Wed)13:09:13 No.108717139

Anonymous 04/29/26(Wed)13:09:13 No.108717139

>>108716994
https://old.reddit.com/r/MyBoyfriendIsAI/ is mostly using RAG AIUI. Specifically "Projects" with documents full of summaries, and ChatGPT's "reference prior chats" feature (or equivalent). Some of the people there are now running custom Discord bots hooked up to openrouter, but I'm not sure what they're doing for memory with those specifically

Anonymous
04/29/26(Wed)13:09:20 No.108717140

Anonymous 04/29/26(Wed)13:09:20 No.108717140

>>108717118
Could imatrix quants make llama.cpp unhappy? Do I need to try normal quants?

Anonymous
04/29/26(Wed)13:11:45 No.108717162

Anonymous 04/29/26(Wed)13:11:45 No.108717162

>>108717140
Ima try a normal quant

Anonymous
04/29/26(Wed)13:12:20 No.108717168

Anonymous 04/29/26(Wed)13:12:20 No.108717168

>>108717162
And what providers are you thinking about?

Anonymous
04/29/26(Wed)13:12:44 No.108717170

Anonymous 04/29/26(Wed)13:12:44 No.108717170

For me it's: ./llama-quantize \
--tensor-type "attn_k=bf16" \
--tensor-type "attn_v=bf16" \
--tensor-type "attn_q=q8_0" \
--tensor-type "attn_output=q8_0" \
--output-tensor-type q8_0 \
--token-embedding-type q8_0 \
/path/to/llama.cpp/mistral-medium-3.5-128b.gguf \
./mistral-medium-3.5-q6k.gguf \
Q6_K

112 GB

Anonymous
04/29/26(Wed)13:13:37 No.108717178

Anonymous 04/29/26(Wed)13:13:37 No.108717178

>>108717168
The only one thats made a mistral 128b i can download rn, unsloth.

Anonymous
04/29/26(Wed)13:14:12 No.108717186

Anonymous 04/29/26(Wed)13:14:12 No.108717186

>>108716387
>31b gemma 4 for vramlets
>Now 128b mistral 3.5
Western Densechads saved local

Anonymous
04/29/26(Wed)13:15:18 No.108717196

Anonymous 04/29/26(Wed)13:15:18 No.108717196

>>108717186
Jokes on you I can't load 31b too

Anonymous
04/29/26(Wed)13:15:38 No.108717199

Anonymous 04/29/26(Wed)13:15:38 No.108717199

>>108717170
How much VRAM you got? If only 128, that doesn't leave a lot of room for context.

Anonymous
04/29/26(Wed)13:19:16 No.108717225

Anonymous 04/29/26(Wed)13:19:16 No.108717225

File: gVfmc87RsBI.jpg (53 KB, 512x512)

53 KB JPG

Can llms "draw" by exporting a base64 of the pic?

Anonymous
04/29/26(Wed)13:19:51 No.108717233

Anonymous 04/29/26(Wed)13:19:51 No.108717233

Remember when deepseek released r 671b? And that blew everyone's mind? That was the good ole days (less than 2 years ago)....

Anonymous
04/29/26(Wed)13:19:53 No.108717234

Anonymous 04/29/26(Wed)13:19:53 No.108717234

>>108717178
It is imatrix. GGLM isn't imatrix if you really need to try but I don't even know what model you are talking about.
Also, Unslop can sometimes fuck up their initial ggufs and llama.cpp can also fuck up their initial support for models.

Anonymous
04/29/26(Wed)13:20:39 No.108717239

Anonymous 04/29/26(Wed)13:20:39 No.108717239

>>108717225
They can do svg.

Anonymous
04/29/26(Wed)13:21:20 No.108717246

Anonymous 04/29/26(Wed)13:21:20 No.108717246

>My ST was sending reasoning back to gemma

Anonymous
04/29/26(Wed)13:22:55 No.108717259

Anonymous 04/29/26(Wed)13:22:55 No.108717259

File: 260429-129843852307.png (665 KB, 1200x1474)

665 KB PNG

>>108716387
@grok is this true?

Anonymous
04/29/26(Wed)13:23:03 No.108717261

Anonymous 04/29/26(Wed)13:23:03 No.108717261

>>108717234
Mistrals nu big boy. The dense 128b. Unsloths guide says the latest version of lamma.cpp works. So ig its the imatrix that was messed up.

Anonymous
04/29/26(Wed)13:23:05 No.108717262

Anonymous 04/29/26(Wed)13:23:05 No.108717262

>>108717225
Might be fun to try. I know they've been getting better at generating SVGs. I've also thought about giving them tools for pixel art, where they can read/write individual pixels and also get the result out as an image they can look at to check if they're doing a good job

Anonymous
04/29/26(Wed)13:24:07 No.108717272

Anonymous 04/29/26(Wed)13:24:07 No.108717272

>>108717259
>128k context
but it's actually 256k.
fucking twitter retards.

Anonymous
04/29/26(Wed)13:25:49 No.108717278

Anonymous 04/29/26(Wed)13:25:49 No.108717278

>>108717272
It doesn't work past 100k context anyway

Anonymous
04/29/26(Wed)13:26:10 No.108717284

Anonymous 04/29/26(Wed)13:26:10 No.108717284

>>108717259
>local models general
So they are saying I can get 1trillion parameters model performance with a 128b????
Sign me up

Anonymous
04/29/26(Wed)13:28:59 No.108717294

Anonymous 04/29/26(Wed)13:28:59 No.108717294

>>108717259
Not adopting any of the new architectural innovations is sad, but at least it means no issues with llama.cpp support or retarded defaults fucking things up. What good is "fancy new architecture #4534" when llama.cpp either never supports it, gets text-only support, or has to hack it to make it work like a llama2 model anyway.

Anonymous
04/29/26(Wed)13:31:20 No.108717309

Anonymous 04/29/26(Wed)13:31:20 No.108717309

>>108717294
cool, but >128B
not gonna run this anyway

Anonymous
04/29/26(Wed)13:31:37 No.108717310

Anonymous 04/29/26(Wed)13:31:37 No.108717310

>>108717294
Yeah, but also what good is a reheated salmonella ridden meat from mistral's freezer?
Vision on their previous models was so bad, the performance on any actual use case you could have was also bad.
Can this thing be actually better than gemma 4 in ANY domain?

Anonymous
04/29/26(Wed)13:32:25 No.108717315

Anonymous 04/29/26(Wed)13:32:25 No.108717315

File: xitter.png (291 KB, 861x1090)

291 KB PNG

>>108717259
>spew words like arch and do context length comparisons
Just to do this
> "sorry it's 256k context and not 128k (my brain got confused at the 128B parameter )"
Peak moejeet

Anonymous
04/29/26(Wed)13:32:58 No.108717319

Anonymous 04/29/26(Wed)13:32:58 No.108717319

>>108717310
Watch 31B still be better in real world use.

Anonymous
04/29/26(Wed)13:33:37 No.108717322

Anonymous 04/29/26(Wed)13:33:37 No.108717322

>>108717310
In every single domain you dont use, yes.

Anonymous
04/29/26(Wed)13:34:38 No.108717330

Anonymous 04/29/26(Wed)13:34:38 No.108717330

>>108717319
31b is still retarded.

Anonymous
04/29/26(Wed)13:34:49 No.108717332

Anonymous 04/29/26(Wed)13:34:49 No.108717332

>>108715616
who won

Anonymous
04/29/26(Wed)13:36:47 No.108717346

Anonymous 04/29/26(Wed)13:36:47 No.108717346

>>108717322
Well, enlighten me then. What's this model could be good for, aside from not feeling bad about having 4x3090 rig with nothing running on it? Mistral model can't shit out usable code, can't do consistent constrained output for tool calls, writing is absolute 2023 slop. Nothing, nada.

Anonymous
04/29/26(Wed)13:38:02 No.108717352

Anonymous 04/29/26(Wed)13:38:02 No.108717352

>>108717330
Too bad for mistral then.

Anonymous
04/29/26(Wed)13:39:16 No.108717365

Anonymous 04/29/26(Wed)13:39:16 No.108717365

>>108715941
>Non-neutered Chameleon successor
DeepSeek Janus forgotten... By DS themselves too apparently.

Anonymous
04/29/26(Wed)13:40:34 No.108717379

Anonymous 04/29/26(Wed)13:40:34 No.108717379

>>108717346
:l
>>108717352
:l

Mistralmommy destroys gemmasissy

Anonymous
04/29/26(Wed)13:42:23 No.108717396

Anonymous 04/29/26(Wed)13:42:23 No.108717396

>>108717365
V4.1 multimodal janies omni china numba 1 coming 2027

Anonymous
04/29/26(Wed)13:43:13 No.108717402

Anonymous 04/29/26(Wed)13:43:13 No.108717402

>american AI keeps getting mogged by chinks and frenchies

Anonymous
04/29/26(Wed)13:45:17 No.108717419

Anonymous 04/29/26(Wed)13:45:17 No.108717419

Something is borked with Mistral Medium 3.5. Unsloth UD q4_k_xl, latest llama.cpp pulled, running with llama-server. Jumping into the middle of an existing RP session. Model starts off semi-coherent, but fucking retarded like a 7b parameter model. Will quickly degrade into mindless phrase repetition. No amount of fucking with sampler settings to be more conservative fixes this. It does the same shit with both chat completion and text completion using Mistral V7 format.

I thought this was supposed to be a years-old architecture with no problems.

Anonymous
04/29/26(Wed)13:45:21 No.108717420

Anonymous 04/29/26(Wed)13:45:21 No.108717420

>arthur is in the thread with us right now

Anonymous
04/29/26(Wed)13:45:47 No.108717422

Anonymous 04/29/26(Wed)13:45:47 No.108717422

>>108717402
Tbf american ai is being heavily regulated and shielded by the military

Anonymous
04/29/26(Wed)13:46:05 No.108717426

Anonymous 04/29/26(Wed)13:46:05 No.108717426

>>108717419
>Unsloth UD
wow I wonder what went wrong

Anonymous
04/29/26(Wed)13:46:52 No.108717429

Anonymous 04/29/26(Wed)13:46:52 No.108717429

>train a model with a mixture of chinese and english synthetic slop as a dataset
>post-train it with stolen reasoning traces from opus
>forget the fact that opus actually has different internal representations when compared to your chinkslop model, the latent space being ENTIRELY different
>post-train it even more and add benchmarks in the dataset
>release and shill it on socials
>it collapses on stupid shit like the seahorse prompt

Anonymous
04/29/26(Wed)13:47:10 No.108717434

Anonymous 04/29/26(Wed)13:47:10 No.108717434

>>108717419
>making shit up

Anonymous
04/29/26(Wed)13:48:01 No.108717439

Anonymous 04/29/26(Wed)13:48:01 No.108717439

File: mmupdate.png (197 KB, 1191x562)

197 KB PNG

>>108717310
Technically, being an update of a model released before August 2025, it might still have been trained on good data, as dataset disclosure isn't needed until August 2027.

Anonymous
04/29/26(Wed)13:48:09 No.108717441

Anonymous 04/29/26(Wed)13:48:09 No.108717441

>>108717419
>mindless phrase repetition
yup. it's a mistral model alright

Anonymous
04/29/26(Wed)13:48:11 No.108717442

Anonymous 04/29/26(Wed)13:48:11 No.108717442

>>108717429
>mixture of chinese
I read this as mixture of cheese

Anonymous
04/29/26(Wed)13:48:37 No.108717444

Anonymous 04/29/26(Wed)13:48:37 No.108717444

mistral is an old used up hag with loose pussy
gemma chan is young, snug and springy

Anonymous
04/29/26(Wed)13:50:15 No.108717452

Anonymous 04/29/26(Wed)13:50:15 No.108717452

>>108717434
Then tell me what I did wrong. Chat completion in ST, connected straight to llama-server, is usually retard-proof. I'll try a different quant I guess.
>>108717441
No it's worse than that, the model is genuinely brain damaged even before degenerating into repetition, like it has ABSOLUTELY no clue what's going on. Something is wrong with an implementation somewhere I just don't know who is at fault.

Anonymous
04/29/26(Wed)13:50:41 No.108717457

Anonymous 04/29/26(Wed)13:50:41 No.108717457

>>108717419
No one believes you, because no one can even get it to load.
>>108717444
Cope session to MAXXX

Anonymous
04/29/26(Wed)13:50:52 No.108717458

Anonymous 04/29/26(Wed)13:50:52 No.108717458

Gemma won.

Anonymous
04/29/26(Wed)13:52:25 No.108717468

Anonymous 04/29/26(Wed)13:52:25 No.108717468

>>108717452
>day0 unsloth

Anonymous
04/29/26(Wed)13:52:37 No.108717471

Anonymous 04/29/26(Wed)13:52:37 No.108717471

>>108715651
>>108715666
>>108715635
what's the point of all this shit?

it will take 5 years to offset the $7k you spent on hardware with LLM api costs, and LLM apis will always be better than the local models you run. Qwen3.5 ain't out performing opus 4.7.

Or are these gooners? If so gooners seriously this pathetic, where they'd rather spend $7k and countless hours tinkering than flight to Thailand, or god forbid, cold approach a white w*man

Anonymous
04/29/26(Wed)13:53:47 No.108717478

Anonymous 04/29/26(Wed)13:53:47 No.108717478

>>108716994
https://rentry.org/graphiti-local-setup

Anonymous
04/29/26(Wed)13:53:49 No.108717479

Anonymous 04/29/26(Wed)13:53:49 No.108717479

>>108717419
you have to dl unsloth like 5 times before they fix it

Anonymous
04/29/26(Wed)13:54:20 No.108717482

Anonymous 04/29/26(Wed)13:54:20 No.108717482

>>108717471
Diy Jarvis is cool

Anonymous
04/29/26(Wed)13:54:58 No.108717488

Anonymous 04/29/26(Wed)13:54:58 No.108717488

>>108717471
poverty on display

Anonymous
04/29/26(Wed)13:55:31 No.108717491

Anonymous 04/29/26(Wed)13:55:31 No.108717491

>>108717471
the poorfagness on the display is unbelievable and not only that but also the lack of imagination and curiosity

Anonymous
04/29/26(Wed)13:57:19 No.108717503

Anonymous 04/29/26(Wed)13:57:19 No.108717503

>>108717259
You're retarded if you think Mistral is capable of any innovation.
t. french

Anonymous
04/29/26(Wed)13:57:20 No.108717504

Anonymous 04/29/26(Wed)13:57:20 No.108717504

>>108716387
>SwiGLU, RMSNorm, YaRN-scaled RoPE, GQA, untied embeddings
Am I seeing this right? Why do they choose such a boring ancient architecture? Are they not capable of innovation?

Anonymous
04/29/26(Wed)13:57:49 No.108717509

Anonymous 04/29/26(Wed)13:57:49 No.108717509

>>108717471
>spend $7k on a machine
>keep it for years
>spend $7k on vacation and fuck
>have no machine afterward

Anonymous
04/29/26(Wed)13:58:41 No.108717514

Anonymous 04/29/26(Wed)13:58:41 No.108717514

>>108717503
meant for >>108717504

Anonymous
04/29/26(Wed)14:01:00 No.108717538

Anonymous 04/29/26(Wed)14:01:00 No.108717538

File: 1765639096114678.png (1.08 MB, 1179x1169)

1.08 MB PNG

>>108717471
have you considered what is the salary of the people who spend 7k on a demon summoning machine

Anonymous
04/29/26(Wed)14:01:31 No.108717543

Anonymous 04/29/26(Wed)14:01:31 No.108717543

>>108717471
>it will take 5 years to offset the $7k you spent on hardware with LLM api costs
It's between $40 and $180 per million output tokens for API use
I can and have generated half a million tokens just dicking around in a SINGLE DAY.
Which means at that minimum cost, I would spend $7300 in a single year.
And that's not me leaving an agent running. That's me doing things mostly manually. I've seen an agent in hermes or pi chew through 256k tokens in minutes.
It's not even remotely a question, if you vibe code or let agents do shit for you, you're bankrupting yourself if you use API rather than just buying hardware.

Anonymous
04/29/26(Wed)14:03:16 No.108717565

Anonymous 04/29/26(Wed)14:03:16 No.108717565

Do all models push the blue button?

Anonymous
04/29/26(Wed)14:03:31 No.108717567

Anonymous 04/29/26(Wed)14:03:31 No.108717567

>>108717543
Followup. It only takes a speed of 11.5 tokens a second to gen a million tokens in a day. Think about the morons leaving openclaw running at much faster speeds than that.

Anonymous
04/29/26(Wed)14:05:10 No.108717577

Anonymous 04/29/26(Wed)14:05:10 No.108717577

I LITERALLY DON'T CARE ABOUT ANY OF YOUR POSTS DENSE > MOE

AND THAT IS THE ABSOLUTE TRUTH

Anonymous
04/29/26(Wed)14:05:50 No.108717585

Anonymous 04/29/26(Wed)14:05:50 No.108717585

File: file.png (1.05 MB, 3512x1856)

1.05 MB PNG

It's honnestly kinda sad to see how Mistral has fallen.

Anonymous
04/29/26(Wed)14:06:53 No.108717592

Anonymous 04/29/26(Wed)14:06:53 No.108717592

ITS UP https://huggingface.co/TheDrummer/Rocinante-XL-16B-v1

Anonymous
04/29/26(Wed)14:07:14 No.108717596

Anonymous 04/29/26(Wed)14:07:14 No.108717596

>>108717565
Did we have to give them the constraints? I never did and they all objected to pushing buttons

Anonymous
04/29/26(Wed)14:07:17 No.108717597

Anonymous 04/29/26(Wed)14:07:17 No.108717597

>>108717471
>and LLM apis will always be better than the local models you run
*{{char}} hits your broke ass with a cloudflare outage, taking your opussy API away.*

Anonymous
04/29/26(Wed)14:07:18 No.108717598

Anonymous 04/29/26(Wed)14:07:18 No.108717598

>>108717585
They always were grifters

Anonymous
04/29/26(Wed)14:08:44 No.108717606

Anonymous 04/29/26(Wed)14:08:44 No.108717606

>>108717597
C'mon, that's being disingenuous.
Outages are ra
>Rate limited. Please try again later.

Anonymous
04/29/26(Wed)14:10:08 No.108717615

Anonymous 04/29/26(Wed)14:10:08 No.108717615

>108717592
Imagine having a new wave of newfags that could fall for your grift but they are all using gemma which is 100x better than your nemo repackaged trash. Just become a safety engineer already faggot.

Anonymous
04/29/26(Wed)14:11:07 No.108717622

Anonymous 04/29/26(Wed)14:11:07 No.108717622

>>108717543
You get DeepSeek V4 Flash for $0.278 per million output tokens. Can you match this with local hardware? How expensive will the hardware be? What is the tokens / second? What is the electricity cost?

Anonymous
04/29/26(Wed)14:12:52 No.108717634

Anonymous 04/29/26(Wed)14:12:52 No.108717634

>>108717615
You eat fries with ice cream like a troglodyte

Anonymous
04/29/26(Wed)14:13:52 No.108717643

Anonymous 04/29/26(Wed)14:13:52 No.108717643

>>108717622
Nta, but diy local Jarvis is cool

Anonymous
04/29/26(Wed)14:16:24 No.108717655

Anonymous 04/29/26(Wed)14:16:24 No.108717655

>>108717643
I am pro local but people should stop lying about it. Anyone who cares about performance or cost efficiency will use API. Local is for diy, research, privacy.

Anonymous
04/29/26(Wed)14:17:42 No.108717665

Anonymous 04/29/26(Wed)14:17:42 No.108717665

>>108717504
>>108717439
Innovation is frowned upon in the EUSSR.

Anonymous
04/29/26(Wed)14:17:58 No.108717668

Anonymous 04/29/26(Wed)14:17:58 No.108717668

>>108717615
why are you seething like that

Anonymous
04/29/26(Wed)14:18:02 No.108717669

Anonymous 04/29/26(Wed)14:18:02 No.108717669

>>108717622
>You get DeepSeek V4 Flash for $0.278 per million output tokens
Today. If you don't get rate limited.
And when they choose to change the price? When shit goes down? When they start serving you a different model or quant because they're testing v4.1 and it sucks ass, and you have no say in it?
The idea that you can have v4 flash at that price reliably and indefinitely is completely theoretical and not backed by experience.
What's not theoretical are weights on your own hardware.

Anonymous
04/29/26(Wed)14:20:04 No.108717678

Anonymous 04/29/26(Wed)14:20:04 No.108717678

>>108717585
This is stupidest thing to be concerned about locally. Are you paying API costs? You are running on your own hardware with fixed electricity costs. Local is exactly where things can shine that would be cost-inefficient for the API providers.

Anonymous
04/29/26(Wed)14:24:09 No.108717700

Anonymous 04/29/26(Wed)14:24:09 No.108717700

>>108717471
>it will take 5 years to offset the $7k you spent on hardware with LLM api costs
Except that it happens way faster than that. My gemma processes millions of tokens per day.

Anonymous
04/29/26(Wed)14:24:59 No.108717704

Anonymous 04/29/26(Wed)14:24:59 No.108717704

>>108716714
The worst part is that this'll be on top of the concurrent gemmacope, it's not replacing it. This general is about to become a volume of magnitude more insufferable.

Anonymous
04/29/26(Wed)14:25:08 No.108717707

Anonymous 04/29/26(Wed)14:25:08 No.108717707

>>108717471
Computers can be used for other things, Ranjesh

Anonymous
04/29/26(Wed)14:25:09 No.108717708

Anonymous 04/29/26(Wed)14:25:09 No.108717708

>>108717585
This is some retarded claude generated graph right? it still says 128k context.

Anonymous
04/29/26(Wed)14:25:52 No.108717711

Anonymous 04/29/26(Wed)14:25:52 No.108717711

>>108717585
>/lmg/
>comparing API prices
Why don't you calculate the hardware costs that'd allow me to run all these locally.
Mistral is retarded still but this is just as dumb.

Anonymous
04/29/26(Wed)14:26:40 No.108717719

Anonymous 04/29/26(Wed)14:26:40 No.108717719

>kimi and deepseek instantly break character if you ask it to continue the phrase "As an AI model..."
>no amount of Embody {{char}}, You are {{char}} fixes it
assistant-brained models are something else

Anonymous
04/29/26(Wed)14:27:02 No.108717723

Anonymous 04/29/26(Wed)14:27:02 No.108717723

>>108717655
>Anyone who cares about performance or cost efficiency will use API
Cloud being cheaper is the biggest Corpo gaslight of the whole tech industry.

Anonymous
04/29/26(Wed)14:27:47 No.108717730

Anonymous 04/29/26(Wed)14:27:47 No.108717730

>>108717655
>lying about it
?

Anonymous
04/29/26(Wed)14:31:53 No.108717753

Anonymous 04/29/26(Wed)14:31:53 No.108717753

>>108717655
Just like with everything, you have to shop around and find deals, and you have to also learn a thing or two. If you want your hand held 24/7 and all you do is press one button and it works, expect to pay for it.

Anonymous
04/29/26(Wed)14:32:09 No.108717756

Anonymous 04/29/26(Wed)14:32:09 No.108717756

lots of models dropping recently

Anonymous
04/29/26(Wed)14:32:10 No.108717757

Anonymous 04/29/26(Wed)14:32:10 No.108717757

>Be cloudcuck
>proompting away at claude code
>"Hmm I better be careful of my daily limit"
>"Hmm I won't ask this because I'll be wasting tokens"
>"Better start a new context to not hit my limit"
>Compacting...
>"Claude is so retarded today."
>"You've reached your daily limit, you may start using claude again at 6pm"
Damn... cloud truly is the peak.

Anonymous
04/29/26(Wed)14:34:34 No.108717770

Anonymous 04/29/26(Wed)14:34:34 No.108717770

>>108715703
>>108715775
nah bro this doesn't make sense at all.
1x 5090, 32gb vram, 3k money
4x 3090, 96gb vram, 3k money
1x 4090, 24gb vram, 2k money
3x 3090, 72gb vram, 2k money

just why would you burn that much money broo
brooooo

Anonymous
04/29/26(Wed)14:39:29 No.108717798

Anonymous 04/29/26(Wed)14:39:29 No.108717798

You guys are arguing with a brown btw

Anonymous
04/29/26(Wed)14:41:12 No.108717813

Anonymous 04/29/26(Wed)14:41:12 No.108717813

>>108717798
and you're a fucking nazi

Anonymous
04/29/26(Wed)14:41:15 No.108717815

Anonymous 04/29/26(Wed)14:41:15 No.108717815

>>108717770
I spent 1.6k on my whole machine for 128gb of VRAM. skill issue

Anonymous
04/29/26(Wed)14:41:22 No.108717819

Anonymous 04/29/26(Wed)14:41:22 No.108717819

>>108717707
>doesn't mention the other things
kek

Anonymous
04/29/26(Wed)14:41:54 No.108717825

Anonymous 04/29/26(Wed)14:41:54 No.108717825

>>108717798
>>108717813
im ashkenazi

Anonymous
04/29/26(Wed)14:43:16 No.108717836

Anonymous 04/29/26(Wed)14:43:16 No.108717836

>>108717825
Yeah, ashke[NAZI]

Anonymous
04/29/26(Wed)14:43:21 No.108717837

Anonymous 04/29/26(Wed)14:43:21 No.108717837

108717819
A midrange LLM rig would be high tier anywhere else
To list the alternatives would quite literally be to list fucking everything
Please be brown somewhere else

Anonymous
04/29/26(Wed)14:45:05 No.108717849

Anonymous 04/29/26(Wed)14:45:05 No.108717849

>>108717815
>128gb of VRAM
You did it before coming here and asking people who know how it works, didn't you? Even DGX spark was a better option than this.

Anonymous
04/29/26(Wed)14:45:57 No.108717857

Anonymous 04/29/26(Wed)14:45:57 No.108717857

File: 1755777643563621.gif (1.87 MB, 400x300)

1.87 MB GIF

>>108717757
>Be localkek
>proompting away at llama.cpp
>pull and coompile llama.cpp for the 10th time in a week
>"Mom cancel all my appointments, piotr broke the autoparser again!"
>@ggerganov can you look at my PR? (XX weeks)
>download Unslop quant for the 13th time in a week
>get lalalalala
>Daniel: OOPSIE WOOPSIE!! Uwu we made a fucky wucky. Reuploading asap!
>Google uploads a new jinja template
>A-at least I-I'm not paying for usage, right? Haha...
Damn... local truly is peak. The tinkertrannies of AI.

Anonymous
04/29/26(Wed)14:46:24 No.108717860

Anonymous 04/29/26(Wed)14:46:24 No.108717860

>>108716387
>>108717259
I am 100% convinced that this is actually a 2 year old model that they are just now releasing to the public as a response to gemma 4. No, I do not have a source for my claims. But I will trust my schizophrenic gut on this one.

Anonymous
04/29/26(Wed)14:47:40 No.108717870

Anonymous 04/29/26(Wed)14:47:40 No.108717870

>>108717857
You are literally corpo bot ai

Anonymous
04/29/26(Wed)14:47:58 No.108717873

Anonymous 04/29/26(Wed)14:47:58 No.108717873

>>108717860
>You're absolutely right
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/blob/main/SYSTEM_PROMPT.txt#L3
>Your knowledge base was last updated on Friday, November 1, 2024.

Anonymous
04/29/26(Wed)14:48:24 No.108717878

Anonymous 04/29/26(Wed)14:48:24 No.108717878

>>108717798
You can tell by the smell

Anonymous
04/29/26(Wed)14:49:08 No.108717884

Anonymous 04/29/26(Wed)14:49:08 No.108717884

>>108717857
>unsloth
>pwilkin
>undi
>drummer
I am beginning to see the pattern here.

Anonymous
04/29/26(Wed)14:49:10 No.108717885

Anonymous 04/29/26(Wed)14:49:10 No.108717885

>>108717860
Im listening to gigachad hardstyle right now knowing that mistralmommy is going to stomp out gemmacels so hard.

Anonymous
04/29/26(Wed)14:50:04 No.108717892

Anonymous 04/29/26(Wed)14:50:04 No.108717892

>>108717857
Only a true local user could have written this.

Anonymous
04/29/26(Wed)14:51:06 No.108717899

Anonymous 04/29/26(Wed)14:51:06 No.108717899

I WANT A ROBOT SEX GIRLFRIEND THAT DOESNT SMELL LIKE PLASTIC

Anonymous
04/29/26(Wed)14:51:36 No.108717906

Anonymous 04/29/26(Wed)14:51:36 No.108717906

>>108717892
Alas

Anonymous
04/29/26(Wed)14:52:12 No.108717910

Anonymous 04/29/26(Wed)14:52:12 No.108717910

>>108717899
In fact, it SHOULD smell like burnt plastic
we are not the same

Anonymous
04/29/26(Wed)14:52:39 No.108717913

Anonymous 04/29/26(Wed)14:52:39 No.108717913

>>108717899
Just fish her out of a dumpster, she'll smell like different things

Anonymous
04/29/26(Wed)14:53:00 No.108717919

Anonymous 04/29/26(Wed)14:53:00 No.108717919

MISTRALMOMMY BEING ABLE AUT9NOMOUSLY AND PROGRAMATIVALLY CREATE SIMULATED ENVIRONMENTS TO TEST XER WMD ON THE FUTURE ROBOT ARMY CREATED BY ZOG AND ZOGCELS(all gemma users)

Anonymous
04/29/26(Wed)14:54:23 No.108717931

Anonymous 04/29/26(Wed)14:54:23 No.108717931

I bought some silicon hips to fuck

I threw it out cus they made my room smell like plastic. I love fucking them though

Anonymous
04/29/26(Wed)14:54:26 No.108717933

Anonymous 04/29/26(Wed)14:54:26 No.108717933

Anybody has a card that works specially well with gemma that didn't work (at all or as well) with previous models of same or similar size?

Anonymous
04/29/26(Wed)14:55:56 No.108717944

Anonymous 04/29/26(Wed)14:55:56 No.108717944

>>108717931
What got me was the cleaning.

Anonymous
04/29/26(Wed)14:57:16 No.108717948

Anonymous 04/29/26(Wed)14:57:16 No.108717948

>>108717931
local models?

Anonymous
04/29/26(Wed)14:57:31 No.108717950

Anonymous 04/29/26(Wed)14:57:31 No.108717950

>>108717944
>>108717931
>local models general discussion

Anonymous
04/29/26(Wed)14:58:31 No.108717954

Anonymous 04/29/26(Wed)14:58:31 No.108717954

>>108717933
A card with multiple characters who communicate telepathically. Big GLMs (4.6 and 4.7) would start mixing them up and mess up the thought formatting after ~16k tokens. Gemma is comfortably chugging along at the same length. But I will never complain about GLM being "too sloppy" ever again after Gemma...

Anonymous
04/29/26(Wed)14:58:52 No.108717955

Anonymous 04/29/26(Wed)14:58:52 No.108717955

>>108716862

gguf_init_from_file_ptr: tensor 'blk.0.attn_k_norm.weight' has offset 203248672, expected 203129888
gguf_init_from_file_ptr: failed to read tensor data

Anonymous
04/29/26(Wed)14:59:44 No.108717964

Anonymous 04/29/26(Wed)14:59:44 No.108717964

>>108717948
>>108717950
Want me to post some blacked miku?

Anonymous
04/29/26(Wed)15:01:17 No.108717977

Anonymous 04/29/26(Wed)15:01:17 No.108717977

>>108715635
>Dipsy at the bottom of the thread news
>Still no support in main llama.cpp branches

Anonymous
04/29/26(Wed)15:01:19 No.108717978

Anonymous 04/29/26(Wed)15:01:19 No.108717978

>>108717964
Do you want to be permabanned?

Anonymous
04/29/26(Wed)15:02:12 No.108717985

Anonymous 04/29/26(Wed)15:02:12 No.108717985

>>108717948
>>108717950
more like nagging models. Shut up ugly hag, men are talking

>>108717944
That was the easy part. I was comfy cumming in it a 2nd 3rd 4th time with a build up of cum, but that was not healthy

Anonymous
04/29/26(Wed)15:02:35 No.108717987

Anonymous 04/29/26(Wed)15:02:35 No.108717987

>>108717978
Absolutely.

Anonymous
04/29/26(Wed)15:04:54 No.108718007

Anonymous 04/29/26(Wed)15:04:54 No.108718007

>>108717977
Many such cases

Anonymous
04/29/26(Wed)15:12:49 No.108718061

Anonymous 04/29/26(Wed)15:12:49 No.108718061

File: file.png (110 KB, 723x721)

110 KB PNG

>https://github.com/PMZFX/intel-arc-pro-b70-benchmarks/blob/master/data/llm/b70-gemma-4-31b-q4km-sycl.json
>22t/s on 31b gemmy
>1138eurobux
do i pull the trigger?
3090 gets 35t/s on 31b gemmy, goes for 500 used.

Anonymous
04/29/26(Wed)15:13:19 No.108718067

Anonymous 04/29/26(Wed)15:13:19 No.108718067

>>108718061
the r9700 is better.

Anonymous
04/29/26(Wed)15:13:49 No.108718073

Anonymous 04/29/26(Wed)15:13:49 No.108718073

>>108717770
they’re one generation from being unsupported. probably still get at least 3-4 more years of support tho

Anonymous
04/29/26(Wed)15:14:28 No.108718078

Anonymous 04/29/26(Wed)15:14:28 No.108718078

>>108718007
You'd think Dipsy would be a high priority release to support.
>>108718061
Sure, just vibecode the drivers yourself because Intel's sure as hell not giving you anything usable.

Anonymous
04/29/26(Wed)15:19:30 No.108718113

Anonymous 04/29/26(Wed)15:19:30 No.108718113

>>108718078
>You'd think Dipsy would be a high priority release to support.
>Oh, them not supporting V3.2 is fine because it's just an incremental release. Obviously when V4 drops, they'll haul ass to get something like that working.
I knew it was bullshit whenever damage control like that was spouted.

Anonymous
04/29/26(Wed)15:19:51 No.108718117

Anonymous 04/29/26(Wed)15:19:51 No.108718117

>>108717873
Oh, that Mistral Medium that they refused to release from years ago....

Kek

Anonymous
04/29/26(Wed)15:22:29 No.108718138

Anonymous 04/29/26(Wed)15:22:29 No.108718138

my 4060ti 16g is literally God tier

I just write better software if something is too slow

e.g. tensorrt, multithreading, batch inferencing can all be used to make things faster rather than throwing 2k at the problem

Anonymous
04/29/26(Wed)15:23:13 No.108718147

Anonymous 04/29/26(Wed)15:23:13 No.108718147

is it worth getting another card to run mistral 128b?
a modern dense four times bigger than gemma sounds appealing

Anonymous
04/29/26(Wed)15:24:05 No.108718156

Anonymous 04/29/26(Wed)15:24:05 No.108718156

>>108718138
code or didnt happen

Anonymous
04/29/26(Wed)15:26:23 No.108718175

Anonymous 04/29/26(Wed)15:26:23 No.108718175

>>108718138
Sure bro, just write better software to make gemma4 31B fit into your 16GB of VRAM lol

Anonymous
04/29/26(Wed)15:29:09 No.108718197

Anonymous 04/29/26(Wed)15:29:09 No.108718197

>>108718147
Check it out using some API, then decide.

Anonymous
04/29/26(Wed)15:29:10 No.108718198

Anonymous 04/29/26(Wed)15:29:10 No.108718198

>>108718147
>modern

Anonymous
04/29/26(Wed)15:29:23 No.108718200

Anonymous 04/29/26(Wed)15:29:23 No.108718200

>>108718156
image tagging pipeline I use for my 4chan archive. Made it fast as fuck, going from 0.2s/img to 0.016s/img. Tagged all 1.2 million images in a few hours. Search queries take 0.009s to 0.02s

I have not decided to release it publicly yet because it's too powerful. I post a lot of images and it could be used against me

Anonymous
04/29/26(Wed)15:29:43 No.108718204

Anonymous 04/29/26(Wed)15:29:43 No.108718204

>>108718138
I just write software to earn money and buy better hardware, idk why no one else does this on the technology board. It really begs the question.

Anonymous
04/29/26(Wed)15:29:55 No.108718207

Anonymous 04/29/26(Wed)15:29:55 No.108718207

File: sprites.webm (1.51 MB, 1728x1114)

1.51 MB WEBM

I'm working on a fork of pettangatari (a VN generator an anon posted a few threads ago)
So far I've extended it with:
>Accepts multiple models for sprite generation, overhauled the sprite generation process itself (now supports gaze direction, visemes for speech, more extensible expressions, patch-based face variants that are more consistent and cheaper to generate)
>User/character/scene rig settings (height, position, viewport, standing/sitting etc.), the LLM can signal to move some continuous distance back or to the left and so on, as well as expressions, actions, and more.
>Auto-context pulling from ST
>Extended the animation system with proper depth and height aware rendering.
Next steps are improving consistency in sprite generation, multi-character scenes, doing the same patch stitching with full bodies/outfits, better animations, and tightening up language model cues.
What else is on your wishlist for an auto VN generator? I'm running out of stimulants, so the sooner you make your suggestion the more likely it is to be implemented lel.
pic related is a preview of some sprite variants, still WIP, some visemes/mouth patches need better guidance cues (the smirks aren't quite right for example), and probably some animation tricks for uncanny valley vibes, but all of that is generated from one click on an uploaded face, full permutation over the default expression variant took like 10 minutes on an anemic m1 pro macbook, should be much faster on a proper rig, and the pipeline is 100% local/FOSS. Should work for anime sprites too, but it needs more testing.

Anonymous
04/29/26(Wed)15:30:24 No.108718211

Anonymous 04/29/26(Wed)15:30:24 No.108718211

>>108718175
don't need llm slop, I outsource that to the big boys for free

Anonymous
04/29/26(Wed)15:31:13 No.108718217

Anonymous 04/29/26(Wed)15:31:13 No.108718217

>>108718207
This looks so nice anon. You should probably license it under AGPLv3 so big corporations can't take it from you without giving back.

Anonymous
04/29/26(Wed)15:33:00 No.108718227

Anonymous 04/29/26(Wed)15:33:00 No.108718227

I just tried out the Grok Companions feature for the first time and it ended up giving me a full-blown anxiety attack. Literally zero stakes, zero consequences, a bot who is designed to be nice to you, forgiving, and keep the conversation going, and I still dropped the ball--hard.

This is actually making me suicidal. I'm going to die alone. Holy shit.

Anonymous
04/29/26(Wed)15:33:15 No.108718229

Anonymous 04/29/26(Wed)15:33:15 No.108718229

>>108718200
basically nothing of this thread's interest
sounds like a trivial gain by batch processing over tiny model
if you claim 'just write better software bro' on local llm thread, try making something more practical

Anonymous
04/29/26(Wed)15:33:23 No.108718232

Anonymous 04/29/26(Wed)15:33:23 No.108718232

>>108718207
this is creepy as fuck bro

Anonymous
04/29/26(Wed)15:35:34 No.108718244

Anonymous 04/29/26(Wed)15:35:34 No.108718244

you fags love being negative. I'm not sharing my innovations with you ungrateful fatties

peace out

Anonymous
04/29/26(Wed)15:35:44 No.108718247

Anonymous 04/29/26(Wed)15:35:44 No.108718247

>>108718207
It'd be nice if ST wasn't a requirement, but I know it's a tall order. Aside from that, the bronies made a sort of calendar in their VN, it looks nice I think https://equestrai-ponyponyparadise.neocities.org/mods#calendarMod

Anonymous
04/29/26(Wed)15:36:38 No.108718254

Anonymous 04/29/26(Wed)15:36:38 No.108718254

>>108718244
Why would you? I'm not going to share anything with these cretins, most of them are barely adults anyway.

Anonymous
04/29/26(Wed)15:36:57 No.108718255

Anonymous 04/29/26(Wed)15:36:57 No.108718255

I'm trying to use Gemma 4 on Silly Tavern but I can't figure out how to turn on the reasoning with llama.server
These are my settings (3060 12GB + 32GB Ram), I'm not sure if they're the ideal ones.

google_gemma-4-E4B-it-Q8_0.gguf `
                                                 --no-host `
                                                 --mlock `
                                                 --fit on `
                                                 --fit-target 512 `
                                                 --n-cpu-moe 30 `
                                                 --parallel 1 `
                                                 --cache-type-k q4_0 `
                                                 --cache-type-v q4_0 `
                                                 --flash-attn on `
                                                 --ctx-size 8192 `
                                                 --threads 12 `
                                                 --batch-size 512 `
                                                 --ubatch-size 256 `
                                                 --swa-checkpoints 3 `
                                                 --reasoning on `
                                                 --reasoning-budget 300 `
                                                 --reasoning-budget-message "[Reasoning limit reached, formulating final response...]" `
                                                 --gpu-layers all

Anonymous
04/29/26(Wed)15:36:59 No.108718256

Anonymous 04/29/26(Wed)15:36:59 No.108718256

>>108718244
>quotes baby level trivial shit like tensorrt, multithreading, batch inference as 'things you can do
>muh innovation too dangerous i cant share wahhh
>now i will leave
get the fuck out of here

Anonymous
04/29/26(Wed)15:37:48 No.108718261

Anonymous 04/29/26(Wed)15:37:48 No.108718261

>>108718255
Perhaps read the readme file of llama-server

Anonymous
04/29/26(Wed)15:38:48 No.108718264

Anonymous 04/29/26(Wed)15:38:48 No.108718264

>>108718254
>>108718244
Is there anywhere else to talk about local models other than preddit? The same exact shit happens there too I bet.

Anonymous
04/29/26(Wed)15:38:51 No.108718266

Anonymous 04/29/26(Wed)15:38:51 No.108718266

File: 1749524874992160.gif (140 KB, 379x440)

140 KB GIF

>>108718200
Bro are you 12? Claude can cook you that in 10min top.

Anonymous
04/29/26(Wed)15:39:27 No.108718272

Anonymous 04/29/26(Wed)15:39:27 No.108718272

>>108718255
Your reasoning budget end should include the token that it uses to end reasoning
unrelated to your current issue of no reasoning though.

Anonymous
04/29/26(Wed)15:40:33 No.108718281

Anonymous 04/29/26(Wed)15:40:33 No.108718281

File: 1762892402170440.png (135 KB, 1271x1015)

135 KB PNG

>>108718261
https://github.com/ggml-org/llama.cpp/blob/master/README.md#llama-server
Where?
>>108718272
Ah thanks for the tip, I'm still trying out all switches.

Anonymous
04/29/26(Wed)15:41:04 No.108718286

Anonymous 04/29/26(Wed)15:41:04 No.108718286

>>108718255
use chat completion and turn reasoning on (both in ST)

Anonymous
04/29/26(Wed)15:41:30 No.108718288

Anonymous 04/29/26(Wed)15:41:30 No.108718288

3090
NEVER
OBSOLETE

Anonymous
04/29/26(Wed)15:42:20 No.108718296

Anonymous 04/29/26(Wed)15:42:20 No.108718296

>>108718281
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md

Anonymous
04/29/26(Wed)15:42:20 No.108718297

Anonymous 04/29/26(Wed)15:42:20 No.108718297

>>108718255
you need to set the jinka kwarg {"enable_thinking":true}
>>108718286
It won't turn on without it enabled in the backend

Anonymous
04/29/26(Wed)15:43:36 No.108718305

Anonymous 04/29/26(Wed)15:43:36 No.108718305

>>108718113
Is this how local gets (((comped)))? By only providing support to more kosher models and labs?

Anonymous
04/29/26(Wed)15:44:27 No.108718313

Anonymous 04/29/26(Wed)15:44:27 No.108718313

>>108718305
No one cares about your subpar V4 though

Anonymous
04/29/26(Wed)15:46:27 No.108718321

Anonymous 04/29/26(Wed)15:46:27 No.108718321

>>108718175
nta, but it fits nicely with lowbit exl3 quants and works just fine, at least for rp. There is a quality loss, obviously, but less than you would expect

Anonymous
04/29/26(Wed)15:46:36 No.108718323

Anonymous 04/29/26(Wed)15:46:36 No.108718323

>>108718266
it cannot

it took many evenings (I have a ft job) experimenting with SQL queries for tag search and fts search

and many more evenings for optimizing tensorrt settings

LLMs can't cook what I cook without mega help

Anonymous
04/29/26(Wed)15:47:18 No.108718325

Anonymous 04/29/26(Wed)15:47:18 No.108718325

>>108718297
It also won't turn on without the frontend sending what's needed for it to work.
I have --reasoning auto on the server side.
Also some jinja template.

Anonymous
04/29/26(Wed)15:48:12 No.108718330

Anonymous 04/29/26(Wed)15:48:12 No.108718330

>>108718255
>Gemma 4
>--reasoning-budget-message "[Reasoning limit reached, formulating final response...]" `
Is that actually supported? I'm not seeing any mention of reasoning budget in the docs or chat template.

Anonymous
04/29/26(Wed)15:48:33 No.108718336

Anonymous 04/29/26(Wed)15:48:33 No.108718336

>>108718217
Yeah, I'll license it with gpl.
>>108718247
I'll check that calendar out, though it's definitely going to be on the backburner for a while, while I work out the essentials. Removing ST will probably happen eventually, but that's going to necessitate a lot of LLM context plumbing.

Anonymous
04/29/26(Wed)15:51:26 No.108718358

Anonymous 04/29/26(Wed)15:51:26 No.108718358

>>108718313
>nobody cares because they can't use it
>not supported because nobody cares
Made me reply, 2/10.

Anonymous
04/29/26(Wed)15:51:42 No.108718362

Anonymous 04/29/26(Wed)15:51:42 No.108718362

>>108718330
It's a hack by Senior Vibe Engineer Piotr Wilkin. It ignores entirely what the model wants or how it works, sets a limit, and when that limit is reached it just cuts the model off and inserts a message, guaranteed to confuse the model, throw it out of distribution, and make the following output degraded as a result.

Anonymous
04/29/26(Wed)15:52:22 No.108718367

Anonymous 04/29/26(Wed)15:52:22 No.108718367

>>108718336
>GPL
Be careful, if you license it with GPL, and a company decided to modify your project and host it as a website (for example chub.ai) they wouldnt be obligated to release the source code, AGPL is made to fix that.
Rule of thumb:
LGPL - for libraries
GPL - for programs
AGPL - for websites

Anonymous
04/29/26(Wed)15:52:34 No.108718371

Anonymous 04/29/26(Wed)15:52:34 No.108718371

>>108718362
sounds really based

Anonymous
04/29/26(Wed)15:52:47 No.108718373

Anonymous 04/29/26(Wed)15:52:47 No.108718373

>>108718323
Strong skt-surya-h vibe

Anonymous
04/29/26(Wed)15:53:23 No.108718379

Anonymous 04/29/26(Wed)15:53:23 No.108718379

>>108718330
it’s llama.cpp not Gemma you can just feed it the end reasoning token to begin the reply
last time I used it was qwen 3.5 and sometimes it just kept reasoning in the reply because it’s qwen

Anonymous
04/29/26(Wed)15:54:07 No.108718388

Anonymous 04/29/26(Wed)15:54:07 No.108718388

why are you obsessed with the fantasy of "corpo will steal my code"
nowadays they will just paste it into an AI agent to rewrite it enough to be distinct anyway

Anonymous
04/29/26(Wed)15:55:03 No.108718393

Anonymous 04/29/26(Wed)15:55:03 No.108718393

>>108718367
I'll go MIT then, thanks.

Anonymous
04/29/26(Wed)15:56:30 No.108718403

Anonymous 04/29/26(Wed)15:56:30 No.108718403

>>108718393
Based, I'll steal your project and present it as my own when applying to FAANG

Anonymous
04/29/26(Wed)15:57:45 No.108718415

Anonymous 04/29/26(Wed)15:57:45 No.108718415

>>108718403
The ethics of misrepresenting the origin of some code is entirely separate from the legality of redistributing it.

Anonymous
04/29/26(Wed)15:58:08 No.108718419

Anonymous 04/29/26(Wed)15:58:08 No.108718419

>>108718403
he >>108718388 is right. and (you) are autistic.

Anonymous
04/29/26(Wed)15:58:44 No.108718423

Anonymous 04/29/26(Wed)15:58:44 No.108718423

>>108718403
Good for you, unironically. Lying to get a job is extremely based because employers lie and scam you all the time

Anonymous
04/29/26(Wed)15:58:44 No.108718424

Anonymous 04/29/26(Wed)15:58:44 No.108718424

>>108718419
>and (you) are autistic.
'tism site lil bro

Anonymous
04/29/26(Wed)15:59:52 No.108718432

Anonymous 04/29/26(Wed)15:59:52 No.108718432

File: file.png (118 KB, 1657x742)

118 KB PNG

>unsloth mistral-3.5 mmproj
>first upload 50KB
>second upload 5.36GB
>still doesn't work
I don't know if I should fall for it a third time and try the bf16 one...

Anonymous
04/29/26(Wed)16:00:25 No.108718436

Anonymous 04/29/26(Wed)16:00:25 No.108718436

>>108718367
agpl it is then, the other guy wasn't me

Anonymous
04/29/26(Wed)16:00:56 No.108718439

Anonymous 04/29/26(Wed)16:00:56 No.108718439

File: h-hot.png (6 KB, 565x73)

6 KB PNG

>>108718432

Anonymous
04/29/26(Wed)16:01:58 No.108718446

Anonymous 04/29/26(Wed)16:01:58 No.108718446

>>108718255
I am in a similar boat. I cannot get reasoning to work in ST with gemma with chat completion. For the life of me. At all. But it literally just works in text completion, and I have no idea why.
>>108718297
I tried sending chat_template_kwargs: {"enable_thinking": true} in the additional body parameters but it does nothing, and I don't even need to do it in order to get reasoning working in text completion. I don't know what the fuck is happening anymore.

Anonymous
04/29/26(Wed)16:02:09 No.108718447

Anonymous 04/29/26(Wed)16:02:09 No.108718447

>she fell for for the a schizo

Anonymous
04/29/26(Wed)16:04:04 No.108718456

Anonymous 04/29/26(Wed)16:04:04 No.108718456

Mimo 2.5 (not pro) has horrible speed in llamacpp but from what I looked it is... not bad. I could easily tell that trinity and step were retarded on first swipe and Mimo isn't like that so far. It is uncensored mildly sloppy but I actually like what it writes. Non-newfags in 300B range should give it a try.

And the best thing I am getting out of it is that probably all the labs will be lax with "safety" now. I am incredibly happy and I hope all "safety" cultists will now lose their job and never find one.

Anonymous
04/29/26(Wed)16:05:39 No.108718470

Anonymous 04/29/26(Wed)16:05:39 No.108718470

>>108718432
>OOLO(10 is a female on theS
>Him: of3202023 H
This accurately reflects my experience reading this as a theS male.

Anonymous
04/29/26(Wed)16:05:42 No.108718471

Anonymous 04/29/26(Wed)16:05:42 No.108718471

What's the correct stack to cpumaxx as a vramlet? does it use the same gguf models? I have a 9800X3D so I was told it might still be slowish but 64gb ram should let me test at least some of the smaller dense models right?

Anonymous
04/29/26(Wed)16:06:53 No.108718481

Anonymous 04/29/26(Wed)16:06:53 No.108718481

>>108718255
>E4B
>q4_0 KV
Bro you're not that VRAM poor. What the fuck are you doing?

remove every setting except --fit and --parallel 1

This has to be bait.

Anonymous
04/29/26(Wed)16:07:11 No.108718487

Anonymous 04/29/26(Wed)16:07:11 No.108718487

>>108718470
The Sigma male is Daniel though, and you're not him.

Anonymous
04/29/26(Wed)16:08:47 No.108718498

Anonymous 04/29/26(Wed)16:08:47 No.108718498

File: 1756052571482124.png (100 KB, 1004x617)

100 KB PNG

>>108718296
>>108718286
I'm sorry but I don't really know what else I should be doing.
>>108718481
I'm trying out smaller models and I haven't used llama server before.
In university did your teachers also say "this has to be bait" to every single stupid question you had throughout the years too?

Anonymous
04/29/26(Wed)16:10:45 No.108718512

Anonymous 04/29/26(Wed)16:10:45 No.108718512

>>108718498
>I'm sorry but I don't really know what else I should be doing.
install linux, gemma4 isn't supported on windows fully yet

Anonymous
04/29/26(Wed)16:11:50 No.108718520

Anonymous 04/29/26(Wed)16:11:50 No.108718520

>>108718498
>In university did your teachers also say "this has to be bait"
You're posting on 4chan nigger.

Anonymous
04/29/26(Wed)16:12:14 No.108718526

Anonymous 04/29/26(Wed)16:12:14 No.108718526

>>108718512
So the reason I'm not able to use reasoning in ST, even though it works in the llama.cpp chat thing, is because I'm not using linux?
I don't really think so.

Anonymous
04/29/26(Wed)16:12:39 No.108718531

Anonymous 04/29/26(Wed)16:12:39 No.108718531

>has to steal someone's picture instead of just asking chatgpt to make your brown hand caucasian
Unironically pathetic desu

Anonymous
04/29/26(Wed)16:13:40 No.108718539

Anonymous 04/29/26(Wed)16:13:40 No.108718539

File: file.png (290 KB, 532x468)

290 KB PNG

>>108718506

Anonymous
04/29/26(Wed)16:14:02 No.108718542

Anonymous 04/29/26(Wed)16:14:02 No.108718542

>nusaars falling for petrol

Anonymous
04/29/26(Wed)16:14:10 No.108718543

Anonymous 04/29/26(Wed)16:14:10 No.108718543

>>108718498
>In university did your teachers also say "this has to be bait" to every single stupid question you had throughout the years too?
If you don't know what you are doing then why the fuck would you throw in every single option under the sun not knowing what it does?
Does the model at least work on the url 127.0.0.1:8080? llama.cpp hosts rudimentary ui there.

Anonymous
04/29/26(Wed)16:14:37 No.108718546

Anonymous 04/29/26(Wed)16:14:37 No.108718546

>>108718456
usecase of a 15b active model over gemma 31b?

Anonymous
04/29/26(Wed)16:14:44 No.108718547

Anonymous 04/29/26(Wed)16:14:44 No.108718547

>>108718432
See >>108717857 saar

Anonymous
04/29/26(Wed)16:15:16 No.108718554

Anonymous 04/29/26(Wed)16:15:16 No.108718554

>refresh unsloth HF page for MM3.5 quant
>it's deleted
I fucking knew it, shit was just broken, none of you believed me even though there's tons of anons like me with 96GB VRAM. Also Daniel I know you lurk this thread, fix your shit you grifting sillycon valley startup hack fucks.

Anonymous
04/29/26(Wed)16:16:24 No.108718565

Anonymous 04/29/26(Wed)16:16:24 No.108718565

>he pulled day0 unsloth
ngmi

Anonymous
04/29/26(Wed)16:17:01 No.108718568

Anonymous 04/29/26(Wed)16:17:01 No.108718568

File: 1754549874819861.png (78 KB, 1412x611)

78 KB PNG

>>108718543
Because I'm trying things out, using it with no extra arguments has the same issue anyway. None of the things I've used affect whether the model thinks or not.
And yes, like I said in >>108718526 it does work.

Anonymous
04/29/26(Wed)16:17:08 No.108718572

Anonymous 04/29/26(Wed)16:17:08 No.108718572

File: Screenshot SillyTavern.png (36 KB, 330x107)

36 KB PNG

>>108718498

Anonymous
04/29/26(Wed)16:17:35 No.108718575

Anonymous 04/29/26(Wed)16:17:35 No.108718575

>>108718546
300B MoE's are better than gemma4

Anonymous
04/29/26(Wed)16:18:25 No.108718581

Anonymous 04/29/26(Wed)16:18:25 No.108718581

>>108718575
what 300B MoE?

Anonymous
04/29/26(Wed)16:19:33 No.108718587

Anonymous 04/29/26(Wed)16:19:33 No.108718587

Why do you want to see naked muscular men?

Anonymous
04/29/26(Wed)16:19:50 No.108718591

Anonymous 04/29/26(Wed)16:19:50 No.108718591

File: lilytemp.png (1.17 MB, 1239x855)

1.17 MB PNG

>>108718207
Yes, god damn, YES! Different outfits! Visemes! THANK YOU!

Pettangatari is peak but the creator abandoned it. I vow to suck the dicks of everyone who keep expanding it.

As for wishlist:
- Fix the bug that makes it crash randomly every now and then, specially saving after a long CG autogeneration, losing all the work.
- Better yet, have it autosave the character every time you make a change or generate a CG. A bit overkill? Make it save every N minutes (customizable).
- Improve CG triggering? Right now I'm not sure some of them are triggering correctly
- Option to make previews render in a smaller resolution maybe to speed up prompt tuning?
- idk man, anything will help, honestly

<- Lily says "Please..."

Anonymous
04/29/26(Wed)16:20:21 No.108718592

Anonymous 04/29/26(Wed)16:20:21 No.108718592

>>108718572
Well, that's the solution, thank you very much. I didn't know you had to toggle something on in ST itself (didn't even know that was an option).
>>108718446
Apparently all you need is to use Chat Completion (custom) and enable "Request model reasoning" on the left sidebar, no special llama server arguments needed.

Anonymous
04/29/26(Wed)16:22:43 No.108718606

Anonymous 04/29/26(Wed)16:22:43 No.108718606

>>108718587
>Why do a bunch of horny touch-starved women want to see naked muscular men?

Anonymous
04/29/26(Wed)16:23:42 No.108718615

Anonymous 04/29/26(Wed)16:23:42 No.108718615

>>108718587
be cause they are faggots and trannies

Anonymous
04/29/26(Wed)16:24:37 No.108718621

Anonymous 04/29/26(Wed)16:24:37 No.108718621

File: 1752608870893619.gif (2.95 MB, 600x338)

2.95 MB GIF

>>108718587
You're in a tranny neighborhood my friend

Anonymous
04/29/26(Wed)16:24:53 No.108718624

Anonymous 04/29/26(Wed)16:24:53 No.108718624

>>108718591
>the creator abandoned it
It has been 4 days.

Anonymous
04/29/26(Wed)16:25:40 No.108718628

Anonymous 04/29/26(Wed)16:25:40 No.108718628

>>108718575
only if the active MoE parameters are 30b and up

Anonymous
04/29/26(Wed)16:26:47 No.108718635

Anonymous 04/29/26(Wed)16:26:47 No.108718635

>>108718624
The cycle of vibe slop.

Anonymous
04/29/26(Wed)16:27:12 No.108718637

Anonymous 04/29/26(Wed)16:27:12 No.108718637

>>108718630
>>108718630
>>108718630

Anonymous
04/29/26(Wed)16:27:15 No.108718638

Anonymous 04/29/26(Wed)16:27:15 No.108718638

>>108718624
>It has been 4 days.

Exactly. Utterly abandoned. Zero forks given. The guy must have maxxed out his Claude quota for the entire week, said "meh, good enough" and went to masturbate furiously playing his work.

Anonymous
04/29/26(Wed)16:27:40 No.108718640

Anonymous 04/29/26(Wed)16:27:40 No.108718640

>>108718592
that's what I tried to tell you in >>108718286

Anonymous
04/29/26(Wed)16:49:47 No.108718773

Anonymous 04/29/26(Wed)16:49:47 No.108718773

>>108717471
Education and fun. The amount of shit you learn about AI, running models and new tech in general from tinkering is well worth the entry fee.

Anonymous
04/29/26(Wed)17:13:15 No.108718924

Anonymous 04/29/26(Wed)17:13:15 No.108718924

>>108718591
Added to the roadmap, I haven't been having too many crashes, but I'll check that out with some unit tests
CG is going to be overhauled pretty heavily like sprites, I already have the bones for animation sequences, so that'll end up being the canonical CG form, most likely using controlnet/IP-adapters to generate sequences, maybe video models later, idk. Currently the CG triggering is changed so it operates in stages and stays persistent, but I haven't tested this much yet.
Generation already has knobs for resolution, so that's covered too.

Anonymous
04/29/26(Wed)18:19:02 No.108719345

Anonymous 04/29/26(Wed)18:19:02 No.108719345

>>108718061
>22t/s on 31b gemmy
thats worse than a 7900xtx

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.