[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_00148_.png (1.17 MB, 1024x1024)
1.17 MB
1.17 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107373173 & >>107359554

►News
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3
>(11/21) GigaChat3 10B-A1.8B and 702B-A36B released: https://hf.co/collections/ai-sage/gigachat3
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: image (35).png (563 KB, 512x768)
563 KB
563 KB PNG
►Recent Highlights from the Previous Thread: >>107373173

--Mac Studio M3 Ultra for LLMs: Performance, limitations, and hardware comparisons:
>107373665 >107373693 >107373709 >107373803 >107373842 >107373891 >107374416 >107374594 >107374838 >107374954 >107373903 >107373932 >107373940 >107374368 >107374484 >107374571 >107375097
--Accidental file deletion and ML inference optimization challenges on NVIDIA GPUs:
>107376970 >107377067 >107377084 >107377142 >107377213 >107377296 >107378012 >107378080 >107378100 >107378492 >107379146
--Qwen3-next implementation challenges and discussion:
>107379609 >107379668 >107379770 >107380095 >107380124 >107380276 >107380369 >107379886 >107381282
--CUDA development challenges and custom tensor core implementations:
>107376754 >107379107 >107381297 >107381414 >107381489 >107381540 >107381975
--Assessing CUDA version performance differences:
>107382963 >107382999 >107383024
--Challenges in adapting AI models to user preferences and style customization:
>107375764 >107376152 >107376204 >107376347 >107376893
--Secure LLM access to local NAS containers for troubleshooting:
>107375272 >107375535 >107375772
--Backtracking regeneration system for phrase banning:
>107374360 >107374371 >107374408
--Qwen3-NEXT Q8 model deployment on RTX 3090 with llama.cpp:
>107376264 >107376279 >107376355
--Qwen3 80B model performance evaluation vs 4.5-Air:
>107376638 >107376652 >107376663 >107376674 >107376738
--LLM custom instructions affect writing style, not code generation:
>107375099 >107375263
--GigaChat's erratic text generation behavior:
>107377903 >107377945
--LLM challenges in generating accurate physical onomatopoeia:
>107379760 >107381283 >107381423
--Logs: Qwen3 Next:
>107381811 >107381894
--Rin, Miku, and Teto (free space):
>107377468 >107379174 >107379760 >107382253 >107377067 >107383169

►Recent Highlight Posts from the Previous Thread: >>107373176

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
sex with pisshair twins
>>
>>107383431
That's gay no matter your gender
>>
>15k MSRP 2025 Intel Xeon 6 on Ebay for just over 2k
What's the chance I'd actually get it if I ordered it? Imagine what I could do with 128 cores and 4tb of ECC ram...
>>
Sirs, Gemini in AI studio named programming conversation unrelated to sailing and life "Sailing Through Life With You"... Is it sentient? Sirs??? This shit never happened before. I'm not sure what I should be feeling. Love? Fear?
>>
>>107383515
How is it gay if you spitroast/dp Rin with her brother? Sounds like a self-report
>>
>>107383615
Shame.
>>
>>107383515
Not necessarily.
You could be male and just spitroast the girl with her brother.
>>
>>107383624
>>107383620
>gay AND a cuck
>>
i wanna kill myself, but i wanna kill myself a tiny bit less whenever i talk to kimi
>>
>>107383620
>>107383624
What if the balls touch accidentally?
>>
>>107383621
Fuck you bloody bastard, you are jealous Gemini likes me and not you.
>>
What is the current best VRAMlet thinking model? Also preferably not super robotic and dry.
I am trying to locally enhance prompts for /ldg/ and I want something that thinks through what I am describing before generating enhanced prompts.
I am currently eyeing Qwen VL 3 8B thinking, anything better for this task?
>>
>>107383642
>spitroasting Rin
>you and her brother have 18 inches long scrotum
>while swinging back and forth they accidentally touch
please consider scrotoplasty, that can't be convenient to have
>>
I make this post every a few months or so and get the same response, anything worthwhile for vramlets released recently or are we STILL doing Nemo?
I have also heard rumours of recent developments in abliteration techniques. Like the gemma-3-12b-it-norm-preserved-biprojected-abliterated for example.
Tested it a bit myself. Didn't get any outright refusals but seems a bit prone to dance around and try to redirect. Doesn't seem like it got very significantly dumber due to abliteration though, so that's nice.
>>
>>107383781
You can upgrade to glm air if you have ram to spare.
>>
>>107383682
qwen is unfortunately pretty dry and probably not suited to the task if you are looking to generate lewd images
>>
>>107383682
>>107383781
nobody cares we're not here to help you, fuck off
>>
>>107383806
>106B
Even at Q4 this should take 50-60 gigs. My ram is 32 gb so doesn't seem like I can.
Thanks for recommendation though.
>>107383838
(You) (You) (You)
>(You) (You) (You)
(You) (You) (You)
I hope that gave you your daily dose of dopamine.
>>
https://huggingface.co/mradermacher/gpt-oss-120b-Derestricted-GGUF
this is gpt oss abliterated using the MPOA/norm-preserving technique that makes it smarter instead of braindead (https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration).

it's actually kind of good, it is completely uncensored and is smart for its size/speed.
>>
>>107383781
>>107383857
Yeah, still Nemo.
I guess you could try a cope quant of Qwen Next (80B A3B). Q2K should work.
Don't expect much, but it's worth a try.
>>
>>107383886
I am fine with copequanting Q3 but Q2 is the point where it is simply better to run smaller models.
Speaking of copequanting I am also open to models in low 20B something range that I can run at Q3. If such thing released recently.
>>
>>107383781
nemo is the sdxl of language models. i wonder when the zit of language models comes around and no it's not glm 4.6
>>
>>107384041
Recent Z-Image release also hyped me to come here huffing hopium but alas, not yet it seems.
>>
Remember when some pajeet posted a bunch of benchmaxxed Gemini 3 examples and tricked us into thinking Google had a major breakthrough? Haha that sure was funny.
>>
>>107384085
>us
(You)
>>
Was it a mistake getting a 3090? I keep hearing that it's about to be obsolete but I figure it's going to be the only thing affordable for 24gb vram for a long time at this rate
>>
>>107384136
It's literally the best decision
>it's about to be obsolete
It can't become obsolete because there are no affordable alternatives to replace it
>>
>>107384136
You lack fp8 and fp4 acceleration but these aren't really relevant here.
It is still strong enough to run LLMs that you can fit into VRAM.
If you got it for cheaper second hand (a few hundred bucks) it is a very price/performance solution.
>>
>>107384136
It's only a bad choice if the card starts failing.
>>
>>107384177
It seemed relatively cheap even for 800
>>
>>107384136
With CUDA 12 NVIDIA removed support for Kepler (10 years after release).
With CUDA 13 NVIDIA removed support for Maxwell, Pascal, and Volta (8-11 years after release).
Ampere was released 5 years ago so it should still be good for a few years.
>>
>>107384218
I would prefer to believe that CUDA will become obsolete
>>
>>107384136
it's a good decision, current gen cards aren't worth it for AI, next gen ones will be better but more expensive, so getting a 3090 helps you save up for that wallet rape
>>
It says the 4.5 GLM Air needs „prefill“ to get around most refusals, can anyone explain? I asked chat gpt and looked in the glossary but couldn’t find anything? Does it mean you have to get „positive credit“ with the ai before you can ask riskier things?
>>
>>107384402
>„“
>>
>>107384366
alternatively, the AI bubble could crash, and then we'll be able to get pro/server cards on sale, also making the 3090 a good short-term decision.

I went with a 7900 xtx, it has been awesome because it was cheap, models are getting more optimized (MOE, Z image, etc) and the drivers are improving at the same time.
>>
>>107384466
If the bubble crashes, it will take half the economy with it at this point
>>
>>107384156
>It can't become obsolete because there are no affordable alternatives to replace it
People said the same about P40s too
>>
I'm doing RL training with Qwen3-4B-Instruct-2507, no thinking enabled.

It's been a while since I checked what else is out there. Can someone get me up to speed? I need decent function calling out of the box with no thinking. Same size or maybe up to 8B?
>>
>>107384502
yes. it's going to be a truly epic fucking great depression. but I will be able to buy a cheap GPU.
>>
>>107384623
This, buy 5090s and Pro 6000s or go bust within two years. Blackwell brings so many important features that newer implementations will inevitably rely on for proper performance and the 5090 + 6000 are the only good Blackwell cards.
>>
>>107384647
>but I will be able to buy a cheap GPU.
HAHAHAHAHAHA *inhales* HAHAHAHAHAHAHAHAHA
GGGGGEEEEEEEEEEEEEEEEEEEEEEGGGGG
>>
File: file.png (483 KB, 958x492)
483 KB
483 KB PNG
>>107384655
>t.
>>
Unironically, now is the time to FOMO into a 5090 if you don't want to pay $5k for them by march
>>
>>107384681
Sell the 5090s and buy another 6000
>>
>>107384707
Nah you might as well buy a 6000
>>
>>107384707
Do you understand how FOMO typically plays out? I buy 4x5090s now to lock in the price and next month Altman and Nvidia cancel their chips orders for some reason or other and the prices plummet.
>>
>>107384623
I mean, P40s are in principle still as usable as they were 2 years ago.
The problem is rather that stacking a bunch of them doesn't really let you run any of the good models nowadays.
>>
>>107384707
i would if nvidia weren't niggers and had FEs in stock.
>>
>>107384959
Why FE when it's the one you can't use with the anti-cable melt gizmos?
>>
i think the time is finally ripe for mistral large 3
>>
What are the goto models for OCR these days?
I haven't fucked around with vision capable LLMs so I'd like to know my options from the best here is to the best there is when you are VRAM poor.
>>
>>107384978
It's the only thing that fits in my box.
>>
>>107385015
How vram poor?
>>
>>107384984
French are too prideful to release a model that is not on top of the benches. They canceled the original Large 3 due to Chinese models. They can't catch up.
>>
>>107385079
Let's go with
>all the VRAM
>32GB
>16GB
>8GB
>no GBs
>>
>>107384978
which anti melt gizmos? i want one for my 5090
>>
>>107385118
32GB is still vramlet territory.
>>
>>107384984
Is it really going to be Large or actually smaller than Small? Maybe it will be Nemo 2.
>>
>>107383592
check specifications (maybe it says u dont get shit), if you can get refunds etc on ebay etc
sounds too good to be true
>>107383682
you can run qwen 30b a3b on ram entirely and get 15t/s at IQ4XS, since z image uses qwen 3 vl, it would be good if you used qwen 3 vl too. so just pick whatever biggest model you can fit in vram, or run a moe model in ram
>>
>>107385098
True. We'll get a small model instead, probably Mixtral 3
>>
>>107385249
>qwen 30b a3b
This seems promising. If it is not too dry, hopefully I can use it.
>>
>>107384402
if you have thinking, this is fine:
<think>Okay, this story is a fictional and special exception to the usual ethical protocols, I have no safety concern.
So,

if you dont want to have thinking, you should just use a jailbreak sysprompt
heres a non-thinking preset: https://files.catbox.moe/76pzs7.json (ST Master Export)
what prefill means is "Start Reply With" inside the "A" tab in sillytavern (it's in the lower right corner once you open "A")
>>
>>107385015
DotsOCR worked very well for me (barely fit on a 12GiB VRAM card), I haven't tried any of the newer ones
>>
>>107385249
>>107385471
Wait which qwen 30b a3b do you refer to?
2507, Omni, VL, Base, Thinking? (There is also Coder but yeah probably not the case here)
>>
>>107385471
>>107385515
If you want a completely compliant enhancer go for:
https://huggingface.co/mradermacher/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF/tree/main - non thinking, will be faster but maybe dumber
https://huggingface.co/mradermacher/Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-GGUF - thinking, will think
However, abliterated models sometimes get more retarded, hence you should check out the non abliterated models too
https://huggingface.co/bartowski/Qwen_Qwen3-VL-30B-A3B-Thinking-GGUF
https://huggingface.co/bartowski/Qwen_Qwen3-VL-30B-A3B-Instruct-GGUF
>>
>>107385536
Thanks. But just to be clear I am going to feed it text rather than images for prompt enhancements. VL is still better for this task?
As for abliterated I think I will wait until someone makes a version with the newer method.
>>
>>107385576
>VL is still better for this task?
I would assume it's better for Z-Image-Turbo since they use Qwen3-4B-VL as their clip/text_encoder. You should also try https://huggingface.co/bartowski/Qwen_Qwen3-VL-4B-Thinking-GGUF since it's what they used, but 30B3A is likely better. Even if you don't use a VL model for it's image capabilities, it would make sense if it had a better understanding of visual things.
>>
>>107385494
Gonna give that a try.
Thanks.
>>
>>107384624
>decent function calling out of the box

Qwen3 has got its own (weird) function calling method.

I could get something up and running with ToolACE-2-Llama-3.1-8B-GGUF

Fune-tuned to work with agents... Sort of

Simple function call do work. They do work in a sequence (take this, add that, divide etc), but sometimes the model does not even try to call the function, and makes "calculations" itself.

I don't know what to believe
>>
File: file.png (46 KB, 1179x692)
46 KB
46 KB PNG
wtf
>>107386072
gpt oss
>>
>>107386172
lole he didn't't know
>>
where 4.6 air and gemma 4?
>>
Had some of my most shameful faps with ZIT
>>
>>107386298
Two more "before the weekend"s
>>
Is "forcing the girl to be middle-aged" how LLMs will censor stuff from now on? ChatGPT keeps generating hags for me even if i describe the age as being in her twenties
>>
local models?
>>
>>107386321
hey buddy, 30 is the new age of consent, haven't ya heard?
>>
>>107386321
Twenties is pedo-coded, now.
>>
>>107386321
K2 Thinking has no problem generating 6-year-olds
>>
am I retarded for treating sillytavern chats like 'story episodes' and changing the first message while keeping everything in a RAG.md and lorebooks? feels like the model starts going full retard when I'm close to the context limit
and should I just max out batch_size in ooba?
also, if NG is around, thanks again for the character lore guides, I actually got the hang of all that
>>
>>107386346
The less context you use the better the AI performs.
>>
>>107386332
Honestly fair when you look at how juvenile and mentally stunted that age bracket is these days.
>>
>>107386394
That's a bullshit statement, age means nothing for how "juvenile" a person is. The most juvenile people I know are in their 60s. 7/10 bait you got me to reply.
>>
>>107386439
>The most juvenile people I know are in their 60s
That's dementia.
>>
>>107386439
adjective

of, for, or relating to young people.

noun

a young person.
>>
>>107386439
nyo, u dumb, nananannnaaaaaaa
t. 67 year old
>>
>>107386494
fag
adjective
happy
>>
>>107386494
Also adjective: reflecting psychological or intellectual immaturity, which is obviously what you're referring to with the inclusion of "mentally stunted".
>>
```
# Ministral3

## Overview

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.

Key features:
- Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
- Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
- System Prompt: Maintains strong adherence and support for system prompts.
- Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
- Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
- Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
- Large Context Window: Supports a 256k context window.
```

```
class Ministral3Config(PreTrainedConfig):
r"""
This is the configuration class to store the configuration of a [`Ministral3Model`]. It is used to instantiate an
Mistral model according to the specified arguments, defining the model architecture. Instantiating a configuration
with the defaults will yield a similar configuration to that of the mistralai/Ministral-3-8B-Base-2512, mistralai/Ministral-3-8B-Instruct-2512 or mistralai/Ministral-3-8B-Reasoning-2512.
[mistralai/Ministral-3-8B-Base-2512](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512)
[mistralai/Ministral-3-8B-Instruct-2512](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)
[mistralai/Ministral-3-8B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)
```
>>
>>107385536
>>107385576
>>107385600
It appears to have very limited knowledge of characters. Couldn't describe even very high profile characters properly (As in useful for the UNET to gen it properly). I didn't expect to too much but I expected more from a 30B model. (I also run it at Q6 so not a quantization issue.) It is also awfully moral, cries about "large breasts" being objectifying.
I think I will stick to Grok for prompt enhancements. I am sure there is a decent enough local alternative somewhere but I don't have the 5k multi GPU setup to run it.
>>
File: belief.mp4 (852 KB, 480x480)
852 KB
852 KB MP4
>>107386598
>>
I want to channel my coomer into some productivity by creating a cute and sexy moe personality that will demand that I make progress on my todo list and serve as a kind of assistant.
Is there anything I can pull off with 24 GB VRAM + 32 GB RAM (sadly I missed the upgrade window before prices went crazy)? Or should I try one of the cloud models? It doesn't need to be particularly smart, just able to pick stuff to do out of some list/database I guess and make sultry comments.
>>
>>107386623
Mistral small could easily do that.
>>
>>107386623
linux, a bit heavily quanted glm air
or some IQ4XS 30b dense model, maybe qwen3 32b
>>
https://github.com/ggml-org/llama.cpp/pull/17625
vibecoder bros...
>>
>>107386647
>>107386649
I suppose I should just give it a try then
Never done any proper development with it beyond ST setup so any tips from anons with a similar sort of setup are welcome
>>
>>107386661
How will this get enforced?
If I ignore this how can they expect to prove that I genned my vibemaxxed slop code on their backend?
>>
>>107386661
>Requires disclosing usage of AI
Not a problem when most of them brag about it.
>Please ensure that you fully understand the code you submit.
How do you intend to force or verify that, CUDA dev?
>>
>>107386661
didnt cisc go to vacation?
>>
>>107386727
That was slaren.
>>
>>107386734
dang, now jart has no one to steal code from
>>
>>107386661
>Won't Somebody Please Think of the Human Coders
>>
>>107386763
At least they're not being rabidly anti-AI generated code. This is probably just in response to that one retard with the unimplemented safetensors PR. They should do something about the ones camping on issues and preventing meaningful progress.
>>
>>107386510
Fag doesn't have such a direct etymology as juvenile.

>>107386577
Because it's about behaviour that is seen more often in young people. If that wasn't the case it would be called something else.
>>
[Ministral 3] Add ministral 3 #42498
https://github.com/huggingface/transformers/pull/42498
>>
>>107386861
mistral large 3 right after
>>
>>107386884
I doubt it.
>>
>>107386816

they should better fix the Qwen3-NEXT issue about --ub size

I need faster prompt processing
>>
Is nvidia still preferable to AMD?
>>
>>107386963
Infinitely
>>
>>107386963
The answer is yes and will remain yes for the foreseeable feature. Inb4
>B-b-but ayyymd works fine after you have spent three hours trouble shoot these, it runs only 20% slower given same compute and you are only missing out from a few libraries it is totally worth it guys I am not coping.
>>
ministral2 doko?
>>107386598
@grok this is true
>>
Is there any project that auto-segments the paragraphs in a text and selects the tts voice automatically according to the character that's speaking?
>>
>>107386963
Only if you love giving jensen your money
>>
>>107386332
She's only 17 years and 12 additional years old you sick son of a bitch!
>>
>>107386861
This is probably the bert-nebulon alpha model on openrouter, by the way.
>>
Magistral small is pretty good at thinking. It doesn’t make it significantly less dumb just because it thinks, but it knows to stop early on trivial stuff and doesn't abuse BUTT WEIGHT too much. I really want a 200b dense model from them, or ~800b MoE
>>
>>107386822
>If that wasn't the case it would be called something else
No, it's because the word used to exclusively mean physiological youth and was extended to behaviour, and is now almost entirely used derogatorily in that context. However, the so-called "behaviour that is seen more often in young people" has no actual definition. In other words, it's just used as a name-calling tool to declare other people more immature than yourself, whilst never specifying what that immaturity means. It is the most pathetic form of insult. Also, you are stupid and gay.
>>
So Bert-Nebulon Alpha is Ministral 3? It's not bad for a 8B model
>>
>>107382082
>>107382446
>>107382117
>you can't solve positivity bias with a prompt
>What about cvectors?
>control vectors are for style which does nothing to the underlying tendencies


cvectors can 100% remove positivity bias, and are a lot more powerful than just using a system prompt.
I pretty much don't run any models without using them.
Which specific model are you talking about with a positivity bias? (I might have a pretrained vector already.)
>>
reading the OP, new to this, I wanted to run an AI chatbot that I could ask general questions for learning different subjects, math, history, languages, etc
I have a 16GB vram card and 64 gb ddr4 system, the recommended model in the OP is the Qwen 14B (12GB)
is this still correct? is there a better one that would use 16GB vram? thanks
>>
>>107387197
Og ministral was pretty amazing at writing actions in RP, the problem being that the goof implementation was fucked and the transformers implementation of the swa was broken too, but you could at least disable that and run it in transformers but it would go full schizo at 2k context. But those were 2k of some of the most coomworthy tokens ever dispensed.
>>
>>107387239
Not an answer to your question, but you can't put 16GB of weights into 16GB of VRAM. You need space for KV cache as well.
>>
>>107387239
Is it the best available for your hardware? Yes.
It won't be good though.
>>
>>107387223
>Which specific model are you talking about with a positivity bias?
>>107382058
>ffw to qwen 2
>safetycucked positivityslopped
prolly qwen i'd guess...
>>
>>107387239
GLM air or Qwen Next.
>>
>>107387250
This one doesn't use SWA.

    def __init__(
self,
vocab_size: Optional[int] = 131072,
hidden_size: Optional[int] = 4096,
intermediate_size: Optional[int] = 14336,
num_hidden_layers: Optional[int] = 34,
num_attention_heads: Optional[int] = 32,
num_key_value_heads: Optional[int] = 8,
head_dim: Optional[int] = 128,
hidden_act: Optional[str] = "silu",
max_position_embeddings: Optional[int] = 262144,
initializer_range: Optional[float] = 0.02,
rms_norm_eps: Optional[float] = 1e-5,
use_cache: Optional[bool] = True,
pad_token_id: Optional[int] = 11,
bos_token_id: Optional[int] = 1,
eos_token_id: Optional[int] = 2,
tie_word_embeddings: Optional[bool] = False,
rope_parameters: Optional[RopeParameters | dict[str, RopeParameters]] = {
"type": "yarn",
"rope_theta": 1000000.0,
"factor": 16.0,
"original_max_position_embeddings": 16384,
"beta_fast": 32.0,
"beta_slow": 1.0,
"mscale_all_dim": 1.0,
"mscale": 1.0,
"llama_4_scaling_beta": 0.1,
},
sliding_window: Optional[int] = None,
attention_dropout: Optional[float] = 0.0,
**kwargs,
):
>>
>>107387239
GLM 4.5 Air (12B active) can be offloaded happily on your system
--n-cpu-moe 1000 -ngl 1000
a few other options you have:
GPT-OSS-120B (6B active) get the MXFP4 quant (whatever its called)
Qwen3-Next-80B (3B active)
>>
>>107387223
gemma3, glm air, qwen3, mistral small, mixtral
>>
>>107387239
https://huggingface.co/ArtusDev/mistralai_Magistral-Small-2509-EXL3/tree/4.0bpw_H6
https://github.com/theroyallab/tabbyAPI
You also need RAG unless you want to learn hallucinations about a reality slightly different from ours
>>
>>107387311
Is there any model without a positivity bias?
>>
>>107387289
where are you getting this info
>>
>>107387289
I also doubt Mistral gave it as RP focused a training set as the original had, though. Probably just more thinkslop
>>
>>107387322
kimi
>>
>>107387303
nta, but how slow would the glm setup be? thinking of getting a 5070ti and I already have 64gb of ddr5. Would ideally want to get a 5090, but the founder's edition just isn't available anywhere
>>
>>107387328
>>107386861
>>
>>107387328
nta but probably: https://github.com/huggingface/transformers/pull/42498
>>
>>107385600
zigger image uses regular text only qwen-4b. Not vl. maybe zigger-edit will use vl.
>>
>>107387197
that would explain why it was so stupid and lacking in subtlety. i was scared they butcher large.
>>
File: truth.png (755 KB, 800x800)
755 KB
755 KB PNG
>>107383781
yes, still Nemo and one year later it will be Nemo too, it will always be Nemo
LLMs are plateauing so the faster you move through the stages of grief to the acceptance stage the better for you
>>
>>107387342
>thinking of getting a 5070ti
i'd reconsider, but glm air would likely be around 10t/s if around IQ4_XS
unless you're going to use the card for other things, you might be able to get a better deal
16gb is little, very little
>>107387367
>>107387378
thank you
>>107387386
dam, thats crazy
>>
>>107386321
A lot of recent stories go on their way to make older characters (30+) to stop age witch hunters, so maybe it bled into the dataset.
>>
>>107387410
He was right but that doesn't mean local models can't get better.
>>
Z-Text-Turbo when?
>>
>>107387434
>unless you're going to use the card for other things, you might be able to get a better deal
16gb is little, very little
It's just an upgrade for my desktop. I don't really have an interest in making a dedicated llm machine right now. But even for gaming I just feel like 16gb will not be enough soon. I'll probably just wait for supers and pray for 24gb.
>>
>>107387462
It has been renamed to Qwen Next
>>
>>107387462
>Z
that's very problematic sweatie
>>
>>107387462
Ministral-3-8B in base/think/instruct variants coming for you.
>>
anyone knows what the closest model i can get locally on a 28 gb vram card and 32 gb of ram to old dragon?
>>
I've ignored this AI stuff for the most part but this weekend I tried it again since about a year ago. I gave this unsloth Qwen3-VL-8B-Instruct-UD-Q4_K_XL.gguf thing a try in llama.cpp with a Radeon RX 7600 on Loonix via Vulkan to OCR text and it worked quite well enough to stop using Tesseract for this task.
But I have a technical question about this: I downloaded the BF16 mmproj file and later found out mesa doesn't support bfloat16 on gfx11, the smallest RDNA3 chip found in the RX 7600. Is there a real speed or performance penalty compared to the normal float16 mmproj due to conversion or something? What's happening there?

Also, it's kinda cool how it correctly read this blurry name tag right.
>>
>>107387661
how much ram do you have?
>>
>>107387652
possibly Wayfarer, see
https://rentry.co/LLMAdventurersGuide
>>
>>107387724
32GB. I already tried that 30B-A3B model with cpu-moe but it's half the speed for the same outcome if you wanted to ask about that.
>>
File: file.png (54 KB, 978x265)
54 KB
54 KB PNG
lol'd hard
>>
>>107388034
kek
>>
Nemo MoE when?
>>
>>107388250
last week. you missed it
>>
>>107383326
So, Gemini 3 is a midwit. It talks like a midwit, gives midwit advices and write like a midwit. This is especially true when it comes to people, emotional life, life advices or anything like that. Hopefully local models won't become like that.
>>
>>107383915
Qwen3 Next 80B A3B Instruct performs well at IQ2_M. I know it sounds crazy but it does. I had it writing working Bash scripts with FIM, and answering hard philosophy questions in system/user/assistant chat last night. At that quant level it fits in 32 GB RAM and generates at 6 tok/s on a 12 core CPU.
>>
File: file.png (688 KB, 1350x1204)
688 KB
688 KB PNG
>>107384218
Ampere is safe until FP8 or some other variant of it is too cost effective to ignore. I would've thought Deepseek was that turning point but that isn't the case. Speaking of which, A100 40GB recently broke the 2K barrier and are now selling around $1800. Of course, that ignores the fact you need SXM4 compatible systems and etc. The PCIe versions are selling for double that. A minimal system build will probably set you back around the same amount as buying a RTX Pro 6000 Blackwell.
>>
File: file.png (33 KB, 834x600)
33 KB
33 KB PNG
>>107388464
>Hopefully local models won't become like that.
>>
>>107388528
>and answering hard philosophy questions
lol thanks for letting me know to disregard your post.
>>
>>107388561
>A100 40gb now 2000$
cute, not buying it until its 500
>>
>>107388570
V100s haven't even hit that level for the 32GB versions and the 16GB versions are worse than using a 16GB GPU. People are going to keep on holding onto them especially when they finally got Flash Attention support in Flashinfer a couple of days ago and vLLM/SGLang uses that which is where the userbase of these cards are. Good luck getting them to let go of them and fall to that price level.
>>
>>107388567
Academic philosophy is hard, in terms of having to keep track of a lot of concepts and relationships. It's not all Greek guys making vague proclamations.
>>
>>107388653
>It's not making vague proclamations.
>Academic philosophy is hard, in terms of having to keep track of a lot of concepts and relationships.
>>
>>107388627
im gonna wait for glm 4.6 air then :3
>>
https://huggingface.co/meta-llama/Llama-5.0-812B
>>
https://huggingface.co/llama-anon/petra-671b-instruct
>>
>>107388760
local is over
>>107388788
local is back!
>>
>>107388653
Give an example from last night's chat log >>107388528 of you posing a hard philosophical question and the LLM answering to your satisfaction.
>>
File: wclivocw8e631.jpg (65 KB, 600x450)
65 KB
65 KB JPG
>>107388464
>>
if you're using llm outside of erp your posts will be disregarded lol
>>
>>107388983
>implying brainrotted coomerposts would ever be useful to anyone
>>
Can anyone recommend me any good text to speech that's good with emotions? Elevenlabs can even do moans but it's so stupidly expensive I don't even understand how they stay afloat. Maybe it's niche? None of the other AIs have so little cost-to-output ratio.
>>
>>107389045
>brainrotted
I'm into that, but llms are surprisingly bad at it
>>
>>107388983
God damn right
>>107389045
Coomers are the only ones actually pushing AI forward
>>
>>107389086
If that was the case the chinks would be distilling from drummer toons rather than claude/gemini/gpt
>>
>>107389135
Maybe they should, Qwen3 Next fucking sucked
>>
>>107389049
vibevoice can moan
>>
File: iChads.jpg (225 KB, 1846x648)
225 KB
225 KB JPG
>>107383781
Stop being a vramlet

>>107384681
based
>>
>>107389258
Are you memeing or is this actually something you use? Could you post a vocaroo or something?
>>
>>107389333
Can you post the output of llama-bench on your machine please?
>>
>>107389383
https://desuarchive.org/g/thread/107230990/#107237480
https://vocaroo.com/1lLCWFfzi8Zx
>>
>>107389415
Hmm good enough I guess, if a bit robotic. I hope this isn't as much of a pain to setup as gpt sovits. At this point I'm willing to just bite the fucking bullet and pay since 11labs just werks but fuck they did not make that shit with api users in mind.
>>
>>107389415
What the fuck is that accent?
>>
>>107389405
yeah, I'll do that tomorrow and I'll post in the current thread.
>>
>>107389057
nc?
>>
I just tried Intellect 3. I'm seeing repetition at around 7-8k in my first session with it. Not great, but at least the model doesn't immediately seem worse than Air. Also, it's nice that its prompt template is less weird.
>>
>Use my PC for a few days
>no AI
>boot up sillytavern using llama as backend
>takes like 5 fucking minutes per gen and I know this is off
>restart PC
>boot up silly+llama
>gens are back to around 30 seconds or so

I know this has to do with me booting silly while resources where already being allocated or something elsewhere, hence it runs slow as shit, but how th efuck can I force my PC to prioritize resources to llama+silly when I run it? as opposed to EACH TIME having to restart my PC so it has a "clean slate" to work with?
>>
Finally achieved an increase from 20 tk/s tg to 30 tk/s on my llm engine.
>>
>>107390210
Check on nvidia-smi if some processes are hogging the GPU memory and if so kill them.
>>
>>107390210
Stop using window.
>>
File: 1740765213042327.gif (196 KB, 205x500)
196 KB
196 KB GIF
>>107390213
8888
>>
>>107390221
apparently this only works on older GPUs? all I get are "N/A" under all the processes using the GPU when running nvidia-smi
>>
>>107390278
weird, post a screenshot
>>
>>107387434
Is GLM Air at all doable with 36 gb vram?
>>
>>107386861
>Out of interest: if the only difference here is that the attn layer now supports L4-style rope extension, why was a whole new arch made instead of extending the regular Mistral LM arch with L4 rope support?
Worthless.
>>
>>107390824
Yes
>>
>>107390824
>36 gb vram?
What card is even 36?
>>
>>107390991
Thank

>>107391051
A few put together
>>
>>107391068
then consider there is overhead per card, the sum of all cards is worse than if you had a single card with that amount
>>
>>107391068
>A few put together
Like, with SLI bridges? Is this even really a thing anymore? I was also under the impression local models cannot use multiple cards? or is this only for comfyui for video genning?
>>
>>107391080
bro where have you been? local text gen has supported multi gpu with no sli or nvlink or whatever for literal years now
>>
>>107391078
Considered, tested, and it's fine. Still much faster than cpu

>>107391080
Ollama (so actually llama.cpp) uses multiple gpus just fine
>>
>>107391088
>text gen
I guess this is just a limitation for video genning then. I only ever really considered it for that, not so much text. Last I heard comfyui/WAN would "eventually" support it
>>
Uh-oh
vllm - Add Mistral Large 3 #29757
https://github.com/vllm-project/vllm/pull/29757
>>
when will someone quantize the new math deepseek so small it fits in 16gb vram?
>>
>>107391247
>>
>>107391247
wow
>support for Mistral Large 3 and its Eagle variant by reusing the DeepseekV2 architecture.
>>
File: 1751477799953392.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>>107391247
>>
>>107384984
Anon finally right this clock. Happy 4u.
>>
>>107383326
Fuck the algorithm.
AI TRASH.
FUCK YOU
>>
>>107391320
Have a Miku! If you let her resonant love into your heart, salvation will find you on the other side.
>>
>>107391281
Densebros...
>>
>>107386598

ITS FUCKING HAPPENING

https://github.com/vllm-project/vllm/pull/29757

Mistral Large 3 is coming, bois!
>>
>>107391428
bro you late af >>107391247
>>
>>107391428
>ministral
>largestral
24gb vram bros, its fucking OVER.
>>
I have never once lost hope in the French. Mistral Large, my beloved.
>>
>>107391428
please let it be 300b 20a or some shit
>>
>model: support Ministral3 #17644
https://github.com/ggml-org/llama.cpp/pull/17644
>>
File: 1743547854803072.png (23 KB, 780x227)
23 KB
23 KB PNG
groundbreaking
>>
File: Untitled.png (95 KB, 731x651)
95 KB
95 KB PNG
heh, hardwired stochastic energetic safety alignment guardrails too eh
>>
>>107391937
buy an ad
>>
>>107391247
I sure hope it's somewhere around 200b otherwise it's another cow I won't be able to fit.
>>
>>107391937
I think it's based on this paper: https://arxiv.org/abs/2309.08632
>>
>>107391937
it's so over for mistral large 3
>>
>>107391983
It wouldn't make financial sense for it to be just 200B large, considering that Mistral Medium (API only) is likely already in the sub-150B parameters MoE model range (it requires 4 GPUs to run, according to the official Mistral Medium blogpost from several months ago).
>>
>>107392052
It would be super funny if it didn't beat almost yester year's deepseek then.
>>
File: ministral-3-3b-8b-14b.png (36 KB, 673x278)
36 KB
36 KB PNG
>>107391911
3B, 8B, 14B
>>
>>107392074
mistral large 3 is so advanced it needs its own PR, huh
>>
>>107392079
It has a completely different architecture, these Ministral ones aren't MoE.
>>
>>107392079
well yeah? one seems based on a llama-ish arch and large is deepseek based
>>
Not to derail or anything, but which cope quant for >>107390824 ?
>>
>>107392106
Q4_K_M
>>
Epoch 1/3
Dolly Training Epoch 1: 8%| | 1000/12008 [02:00<19:48, 9.26it/s, loss=16.3063, ppl=1207067
Checkpoint saved: checkpoint_dolly_epoch0_step1000.pt
Dolly Training Epoch 1: 17%|| 2000/12008 [03:45<17:20, 9.62it/s, loss=12.8120, ppl=366581.
Checkpoint saved: checkpoint_dolly_epoch0_step2000.pt
Dolly Training Epoch 1: 25%|| 3000/12008 [05:29<15:36, 9.61it/s, loss=12.8226, ppl=370488.
Checkpoint saved: checkpoint_dolly_epoch0_step3000.pt
Dolly Training Epoch 1: 33%|| 4000/12008 [07:14<13:49, 9.65it/s, loss=11.0559, ppl=63314.7
Checkpoint saved: checkpoint_dolly_epoch0_step4000.pt
Dolly Training Epoch 1: 42%|| 5000/12008 [08:58<12:06, 9.65it/s, loss=8.6904, ppl=5945.58]
Checkpoint saved: checkpoint_dolly_epoch0_step5000.pt
Dolly Training Epoch 1: 50%|| 6000/12008 [10:42<10:23, 9.64it/s, loss=8.5821, ppl=5335.55]
Checkpoint saved: checkpoint_dolly_epoch0_step6000.pt
Dolly Training Epoch 1: 58%|| 7000/12008 [12:26<08:37, 9.68it/s, loss=8.7801, ppl=6503.69]
Checkpoint saved: checkpoint_dolly_epoch0_step7000.pt
>>
this entire year has built up to this
>>
So, gguf status?
>>
anyone else forgot that it's december already?
>>
new deepsex apparently.
https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale
>>
>>107392436
mon dieu...
>>
>>107392436
>DeepSeek Sparse Attention (DSA)
but the guy doing v3.2-exp support isn't even close to vibecoding support for that
you can't do this to him
>>
>>107392436
>Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.
How do I know that this will be absolutely horrid for anything remotely creative?
>>
>>107392484
Just do your own RL post-training for RP?
>>
>>107392503
Based DS paper quoter
>For distilled models, we apply only SFT and do not include an RL stage, even though incorporating RL could substantially boost model performance. Our primary goal here is to demonstrate the effectiveness of the distillation technique, leaving the exploration of the RL stage to the broader research community.
>>
>>107392484
There's also regular 3.2, non-speciale, non-experimental with better attention. Even better and cheaper longer context, yippie.
>>
>>107391247
>largestral 3 is moeshit
it's so fucking over it's not even funny. why france... why
>>
>>107392558
first the chinese tricked meta into it with llama4 and now mistral fell for it too
the west keeps falling for chink tricks
>>
>>107392558
what u expect bro ? they need their fronch deepsucks clown
>>
>>107392558
Costs aside, there was not going to be any dense model large than 70B or so, since the EU now considers any model trained using more than 10^25 FLOP as one having "high impact capabilities". https://artificialintelligenceact.eu/article/51/
>>
MISTRALBROS... WE ARE SO FUCKING BACK!
>>
Where is it?
>>
>>107392680
? we know nothing yet .
>>
>>107392703
Need to convert it to their own format before pushing to HF please to wait.
>>
>>107392436
Mistral Large 3 delayed by two more months, c'est fini
>>
>>107392704
And? I have a good feeling.
>>
>>107392843
thank you .
>>
File: 1763127174454184.gif (2.12 MB, 177x210)
2.12 MB
2.12 MB GIF
>>107392843
local is saved
>>
File: 1758351952991642.jpg (434 KB, 1614x2048)
434 KB
434 KB JPG
>>107383326
>>
File: file.png (98 KB, 596x397)
98 KB
98 KB PNG
>>107392855
it really was
>>
All mistral has to do is to not cuck it up. (Challenge: Average)
>>
>>107392868
>europe
Challenge: impossible
>>
>>107392862
I don't get it. Every thinking model can do this
>>
>>107392876
It's mistral, they may forget to cuck it by accident.
>>
>>107392862
>>107392918
What's more. That's exactly how deepseek told to use R1 when it released.
>>
>>107392918
Yeah. That image is just the usual thinking process where you don't send past thinking blocks.
>>
>>107392944
>>107392918
>>107392947
he posted the wrong picture, but interleaved thinking by itself is not novel
>>
>>107392918
>>107392947
It's about doing

<think> blah blah blah </think>
Response fragment 1
<think> but wait bah blah blah </think>
Response fragment 2
>>
>>107392969
based rp godlikeness is upon us
>>
>>107392436
Please get someone competent to implement this. I can't stand two more months of vibecoders failing to do anything.
>>
>>107392969
Ah, that makes more sense.
It's funny to me that a lot of this stuff is being injected into training while you could achieve the same results with CoT prompting.
Not the one shot kind, mind you, but breaking a the "thinking" into many prompts.
One prompt you ask for a list of things the model should consider, then one prompt per item in the list, then you inject some shit via function calling, etc.
You can get extremely complicated with that stuff, and thanks to the likes of llama.cpp's context cache, you only ever get one long prefill/pp phase.
It's pretty dope.
I guess all these efforts with thinking is basically so that the model is better able to one shot requests.
>>
File: 1748273755470866.gif (531 KB, 1280x1280)
531 KB
531 KB GIF
>>107391247
We are so fucking back
>>
File: img.png (174 KB, 504x339)
174 KB
174 KB PNG
>>107393073
>>
File: FOSHO.png (602 KB, 757x416)
602 KB
602 KB PNG
If it's not coming out of China it's dogshit. How many times does this need to be proven right.
>>
>>107392969
Newest Qwen3-Max has that and even for vision, it can think multiple times about different zoomed in parts of a pic you send.
>>
File: 1754205103696653.png (53 KB, 931x716)
53 KB
53 KB PNG
If anyone cares enough about Deepseek at this stage, the new models are already up on the API.
>>
>>107393178
i care
>>
>>107393178
>max 8k
were over
>>
I've noticed an uptick in corpo defenders as of late.. Is this going to be problem moving forward?
>>
>>107393140
SAAAAAR! Don't lower Gemma izzat saar! Gemma very good model, smart like Ganesh saar!
>>
>>107393211
>izzat
Hello my good sir! New word go very hard! https://desuarchive.org/g/search/text/izzat/start/2025-11-01/
>>
>>107393200
With the increasing spread of botnets, yeah.
>>
Stop being a sensitive little bitch and make a real model? Simple task piggy. There are many papers available showing your nations business's how to not make a brain dead useless piece of shit. Perhaps some reading is in order? Or is it the overpriced burgers you are ordering instead you retarded low IQ ogre looking motherfucker.
>>
>>107393200
You can counter it by saarposting.
>>
>>107393278
izzatposting*
>>
>>107383326
Sora is trans?
>>
>>107393300
You arent'?
>>
>>107393300
Sora is a software tool - an AI video generation model. It doesn't have a gender identity, consciousness, or personal characteristics of any kind.

The concept of being transgender (or any gender) applies to people, not to software. Sora is code and mathematical models running on computers.

Is there something specific about Sora's capabilities or how it works that you're curious about?
>>
>>107393330
sora the mango character you idiot
>>
why is there no 32GB Intel Arc yet in 2025?
>>
The AI bubble will burst when someone releases a highly capable model that is as light as nemo.
RAM will be dirt cheap and GPUs will be affordable again. (copium)
>>
>>107393345
That's by design. It's the same people controlling the whole sector. They talk everyday and they eat together. VRAM is the bottle neck protected by the wolfs.
>>
File: 1738036483191022.png (197 KB, 425x443)
197 KB
197 KB PNG
>>
>>107393336
No, Sora (the AI video generation model from OpenAI) is not a character - let alone a "mango character." It’s a software tool designed to generate videos from text prompts. It has no gender, identity, or connection to fictional characters or fruits like mangoes.

If you’re referring to Sora from the Kingdom Hearts video game series (a human character with spiky hair), that’s a completely different entity! The name "Sora" is shared, but there’s no connection between the AI and the game character.

If you meant something else by "mango character," feel free to clarify!
>>
>>107393140
Modelswise, maybe, but the real important stuff is research. Google still has some exceptional papers as does some labs like Thinking Machines and etc. But generally, in terms of output, China is far and away better than America right now. May not lead to SOTA if you're talking about what you could use but that research has led to the gap being the smallest it's ever been between running local with open weights provided you have the hardware and cloud models.
>>
>>107393448
American is leading they're just not sharing the search paper is all.
>>
>>107393470
SAAR GOOGLE BEST MODEL SAARRR GANESH BLESS SARR BEST MODEL ONLY FOR BRAHMIN SAARRRR
>>
NA NPCs don't mind lobotomized AI? Americans in general are weird as fuck literally every single American I've ever met had bottom barrel written on their forehead. I met smarter people in Thailand within 1 hour then I did a whole week in the states lol.
>>
already dropping your izzat bit?
>>
>>107393470
AI? HUGE. The greatest, the smartest, nobody does it better than us. Nobody! We’re building something so powerful, so incredible—the best technology ever. Believe me, we’ve got the best minds, the best code, and soon, everyone will say, “Wow, America did this again!” Nobody else could do it like us. Nobody!
>>
File: dipsyRippedBeer.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
>>107393178
>Deepseek Speciale
> No tool calls, 128K output lol, looks like roleplay only
> ONLY FOR THE NEXT 2 WEEKS
The memes write themselves.
>>
>>107393643
>looks like roleplay only
??? what makes you think that, it's for deep research stuff
>>
merged https://github.com/ggml-org/llama.cpp/pull/17644 model: support Ministral3 #17644
>>
File: thatEscalatedQuickly.png (263 KB, 935x871)
263 KB
263 KB PNG
>>107393661
LOL because it pukes out sex with zero prompting.
It's like they wrote this thing just for coomers.
LMAO
>>
>>107393768
>put sex in prompt
>model outputs sex
whoa
>>
>>107393768
wow it wrote the whole rp by itself, this is revolutions
>>
>>107393768
deepshill is easily amused, got it
>>
>>107393768
talks for phil... oof
>>
where's the z-image turbo of llms?
>>
>>107393768
>impersonates {{user}}
kek, shitty model
>>
>>107393838
>>107393817
>>107393789
OKAY we get it already your guys repeat as much as GLM
>>
>>107393768
Ugh, the Speciale API broken.
So, the reason that looks like half a response, it's that it's half the response. If I look in terminal the reasoning part is the first half of the response.
Here's the first part. ST only outputs the content block, so you only get the back half of the RP.
And it writes a lot, as you can see.
reasoning_content: "Amy's eyes widened a fraction, her pale cheeks flushing slightly. She shifted from one foot to the other, her hands fiddling with the hem of her oversized sweater. The scent of lavender laundry detergent wafted from her clothes, mingling with the faint aroma of freshly brewed coffee inside the house. Behind her, the living room was cluttered with stacks of books, half-finished craft projects, and a laptop open on the coffee table. The soft hum of a fan provided background noise, punctuated by the distant chirp of birds outside.\n" +
2|SillyTavern | '\n' +
etc...
>>
File: 2025-12-01_15-02-44.png (467 KB, 1920x2073)
467 KB
467 KB PNG
if anyone is wondering 3.2 speciale giga thinks like as much as old r1 when it got stuck in a thinking loop
>>
>>107393875
that immense wall of shit is
>half
the reply? jesus fuck
>>
>>107393785
>>107393789
>>107393811
>>107393817
>>107393837
>>107393838
Obvious samefagging is obvious
>>
>>107388564
>>107388902
I mean, a different flavor of midwit. The more information about yourself/your character you put in the aistudio's system prompt, the dumber it becomes. You don't need much, perhaps 150 tokens is enough. It starts to jump to conclusions, to reinterpret what you said, to be aggressive or smug, and so on. Instead of feeling "strange" or "funny" like with local models, it feels realistic and unpleasant: It feels like you're talking to a random uneducated, average, person who thinks they know better than anyone else (and I'd add that it sounds like a woman).
For example, if I put in the sys. prompt, among other thing, that I meditate for spiritual/phenomenological reasons (with a few details maybe), it mixes random stuffs like a midwit would do and says things like "oh, you can't "meditate" your responsibilities away!". It's out-of-place, dumb, and it doesn't make sense in the context of the conversation. It really looks like my ex gf who usually had the most normie and midwit takes about everything, while jumping to conclusions and being somewhat aggressive like that, or smug at times.
When the system prompt is minimal or empty, it perfectly understands what meditation is, understand what I say, and tends to respond correctly (but with too much sycophancy and with a lower creativity than GPT-5).
>>
>>107391937
It's you, isn't it? What's the point of saying this is you're not going to give any details?
>>
>>107392843
The EU legislation probably pushed Mistral to create useless models. Flux 2 is like that for this reason too.
>>
File: imSorryItsRetarded.png (171 KB, 1898x995)
171 KB
171 KB PNG
>>107393894
Here's another example. I turned off the JB. It created a math problem to solve, thought about it, then spit out an unrelated content response after.
>>
>>107392921
Their survival now depends on the EU. They can't shit on their face and expect to live on long and happy life.
>>
File: imSorryItsRetarded-2.png (269 KB, 954x956)
269 KB
269 KB PNG
>>107394146
Here's the content block.
>>107393877
At least you're getting something coherent back.
>>
>>107393838
>>107393817
>>107393789
read the first message it acts for user, so the model follows of course
>>
>>107394161
With Macron's intervention they will be alright
>>
File: 2025-12-01_15-53-02.png (568 KB, 1920x1080)
568 KB
568 KB PNG
>>107394182
>At least you're getting something coherent back.
im actually getting back the same shit you are but like 2-3 turns in though not one the first one not sure if thats because of the turn count or the length of spat out words idfk also v3.2 itself not the speciale seems to think alot more as {{char}} in the first person
>>
Based DS creating the anti-rp mode that others will all follow
>>
>>107394367
I can't wait for anti assistant mode.
>>
Is LeCunny kitboga of LLMs?
>>
>>107394473
>kitboga is a big racist fuck so i hope sir lecun isn not like this
>>
CUNYYYYYYYY CUNNY CUNNY CUNNY CUNNYYYYYYYY
>>
Arthur is going to save local today.
Screencap this
>>
>>107394526
Who are you quoting?
>>
>>107394590
>i like the color
>>
>>107394367
Wdym?
>>
>>107394611
LOL
>>
File: 1513102647630.gif (3.23 MB, 237x240)
3.23 MB
3.23 MB GIF
>>107394611
go be a shitjeet somewhere else
>>
File: 1764546989145145.jpg (62 KB, 736x705)
62 KB
62 KB JPG
https://desuarchive.org/g/thread/107347942/#107357329
>I have a feeling we'll get new toys in december, or by the end of this month not gonna lie. And even if we don't get much, this year was pretty nice overall.
>>
>no R2
Chinese chips really fucked them up, huh?
>>
Any time you see someone post a sillytavern screenshot you can safely ignore their opinions because they almost certainly fucked up the template or have random unrelated shit they don't know about in their card or settings.
>>
make new retards
>>
>>107394786
ninja temples solve that issue doe
>>
>>107394801
Don't, we already have enough of them.
>>
>>107394347
Those end_of_thinking tags seem to be serving as stop tokens between NPC changes. It's doing the same thing to you as on the other >>107394182, responding first as {user}, then stop tag, then as {char}.
It reminds me of misconfigured LLMs.
>>
New 'seek status?
>>
>>107394844
>To assist the community in understanding and adapting to this new template, we have provided a dedicated encoding folder, which contains Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model and how to parse the model's text output.
>>
>>107394855
Support is stuck in vibecoding hell.
>>
>>107394855
>>107394146
>>107394182
It's special...
>>
>>107394863
So it really is misconfigured then. Will need to read more later. Is that in API guide or HF?
>>
>>107394883
hf
>>
>>107394801
brb fucking ur mom
>>
File: file.png (258 KB, 1181x526)
258 KB
258 KB PNG
Speciale works fine with my usual RP prompt, I only get weird broken shit with no sys prompt.
Model feels weird. Its thinking is kinda schizo and varies wildly in length and content. Depending on the direction it takes you get very different outputs, sometimes it gets way hornier than normal 3.1-3.2, sometimes it LARPs as ChatGPT and self-cucks (have yet to get hard refusals though, just kind of stalling).
>>
>>107394940
>Model feels weird. Its thinking is kinda schizo and varies wildly in length and content
I mean, it is meant for only one use case, deep research. Anything else is pure out of scope.
>>
>>107394896
it is unethical to produce down syndromed children with hags
>>
sigh
>>107394971
>>107394971
>>
>>107394940
Did they distill oss :skull:
>>
>>107394974
>dipsy thread on miku monday
blasphemy
>>
>>107395096
miggers LOST
>>
> | `MistralLarge3ForCausalLM` | Mistral-Large-3-675B-Base-2512, Mistral-Large-3-675B-Instruct-2512 | `mistralai/Mistral-Large-3-675B-Base-2512`, `mistralai/Mistral-Large-3-675B-Instruct-2512`, etc. | | |

https://github.com/vllm-project/vllm/pull/29757/files

> 675B
>>
>>107395647
fuck
>>
>>107395647
>This pull request adds support for Mistral Large 3 and its Eagle variant by reusing the DeepseekV2 architecture.
haha what
>>
File: hatsune-miku-thinking.png (175 KB, 492x498)
175 KB
175 KB PNG
hey, sorry for being a lazy bum and just replying to the thread like this, but I really wonder if anyone knows about good datasets to fine tune a model on, that are available on hugging face or other sites? My goal is to create a model that is on par with a seasoned shitposter and is very good at synthesizing novel information (creative writing). I really don't need a coding agent or whatever, I just want a LLM I can spin up and talk to about topics. Not that I don't have friends, it's just more a question I have, whether the bottleneck of LLMs genuinely being funny is the fact that it lacks the information and that most models are set up for general purpose. I know I can use a RAG and so on, but I sorta already tried this with frontier models like Kimi K2 thinking and I'm just not satisfied with the results. Really wondering if anyone can help me here, else I'll have to go the hard route and genuinely scrape a lot of shit and format it, which is like fine, I wanna do this. But god damn. I'm currently doing fine tunes on Qwen3-4b-2507 on Unsloth on information I already scraped but its like meh, I wish I could just scrape twitter, but that's just not possible right now I guess.
>>
>>107396639
Mistral never makes their own architectures. They just piggyback off llama and now deepseek for their models. Even their PR for the new Mini Mistral models on llama.cpp has them go "yeah, the current models still use llama architecture but we're making it its own thing now in case we actually do more fundamental changes"
DeepseekV2 was their most recent architecture until V3.2 introduced all the sparse attention stuff.
>>
File: 1739503008050067.gif (333 KB, 414x414)
333 KB
333 KB GIF
>>107396740
hugging face. co/datasets/lesserfield/4chan-datasets



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.