[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106599382 & >>106593104

►News
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B
>(09/14) model : add grok-2 support #15539 merged: https://github.com/ggml-org/llama.cpp/pull/15539
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) Ling & Ring mini 2.0 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106599382

--Paper: "My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community:
>106605381 >106606464
--Hardware constraints and optimization for running quantized AI models:
>106600066 >106600079 >106600094 >106600120 >106600151 >106600165 >106600194 >106600299 >106600344 >106600379
--Optimizing model speed and memory usage with different frameworks and hardware setups:
>106601526 >106601576 >106601975 >106602113 >106602140 >106602344 >106602359 >106602696 >106602812 >106602446
--Temperature's role in balancing model creativity and coherence during inference:
>106600178 >106600216 >106600983
--Seeking adversarial dataset for Qwen3 model testing and finetuning:
>106602982 >106603226 >106603255 >106605050
--Debating generalization in large models through map-based reasoning benchmarks:
>106599556 >106599613 >106599680 >106601298 >106599875 >106599939
>106599987
--MobileLLM speculative decoding feasibility and MoE expert activation customization:
>106603577 >106604019 >106604068
--Reasoning as a multi-faceted solution for LLM limitations:
>106601235
--Google DeepMind's use of Generative Data Poisoning to enhance model robustness:
>106604136 >106604281 >106604301 >106604354 >106606204
--VoxCPM-0.5B TTS model features phoneme input and text normalization:
>106605383 >106605535 >106605647 >106606175
--Balancing temperature and sampling parameters:
>106602247 >106602253 >106602278 >106602282 >106602959
--LLM ticket resolver: Optimizing multi-step historical matching process:
>106604656 >106605075 >106605954 >106606135
--Prioritizing incremental AI gains over foundational research:
>106604653 >106604679 >106604736
--AI fails Holocaust comprehension test, raises ethical concerns:
>106604146
--Miku (free space):
>106599464 >106602772 >106602774 >106603109

►Recent Highlight Posts from the Previous Thread: >>106599386

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
mistral large 3 will be real
>>
File: 1752472682330720.png (2 MB, 1328x1328)
2 MB
2 MB PNG
>>
the general interest in llms is rapidly dwindling
progress is stagnating like crazy
it's truly never been more over
>>
>>106608247
Wish It. Want It. Do It.
>>
>>106608207
MC62-G40 is a workstation board. One of the pcie slots only carries x8. ROMED8T-2T's pcie slots are all x16 (a proper server board).
>>
Apparently Qwen3-Max is confirmed to be bigger than 1T. Lmao.
>>
>>106608274
> the shitposters have started
>>
>>106608282
the asrock board also has SAS ports which can be converted into x8 lanes
https://www.ioi.com.tw/products/proddetail.aspx?AppID=1008&CatID=116&HostID=2081&ProdID=1130033
>>
goofs never ever
>>
File: poal.png (347 KB, 640x520)
347 KB
347 KB PNG
>>106608204
Vote pls
https://poal.me/4jr9sh
>>
>>106608274
you're absolutely right!
>>
>>106608351
Just use kobold, there is no need to make a poll.
>>
>>106608351
kobold zoomer
>>
>>106608341
Should I use Roo instead?
>>
>>106608401
Roo is better but they're nearly identical. A model that wasn't trained on tool calling wouldn't work better on either one. You should use Qwen Coder 30B instead. Coding focused and trained on tool calling.
>>
detoxified migu
>>
File: 4725243619479.png (20 KB, 606x158)
20 KB
20 KB PNG
>>106608007
>use Qwen 3
It scares me.
>>
>>106608401
You might need to fiddle a bit with the rules to make it work.
>>
>>106608274
Okay, so how long until a 3090 is $200, and A100 80gb PCIe at 2k?
>>
>>106608351
Kobold: Download exe, download appropriately sized model, drag model onto exe.
>>
>>106608468
Intimate relations with Jane Doe
>>
>>106608294
I am not shitposting anymore. I hate you all.
>>
I don't really get the point or appeal of cooming to text, personally.
>>
>106608822
cute tsundere is cute
>>
File: Brainlet-blocks-meme-3.jpg (65 KB, 1200x514)
65 KB
65 KB JPG
I need/want a sophisticated note taking solution that keeps me reminding of shit that I have todo powered by a language model - what would be a privacy safe way to do this?
>>
>>106608484
>how long until a 3090 is $200
Never
>>
>>106609009
Just use a fucking basic calendar or todo program. Not everything has to be LLM powered. Failing that, you'd have to build it yourself.
>>
>>106609020
Pretty much this.
200 dollars is e-waste price.
The only thing that really outclasses a 3090 is a 4090 or 5090 (Due to nshitia being stingy with VRAM). 4070 TI Super and 5070TI edge it out in compute performance quite handily but lack the VRAM to be useful for machine learning stuff.
>>
>>106608833
You are too malebrained for this thread then.
>>
>>106609009
Post-it notes on your monitor. Not on the bezels or whatever. On the actual screen, covering stuff. You're not allowed to remove them until you get them done.
>>
>>106609091
>malebrained
Moooom, the kids are making up words again.
>>
>>106608833
do you understand the point or appeal of adult fanfiction or erotic romance novels?
>>
>>106609126
Anon is flying with the implication that text cooming is actually a female-biased activity. I'd argue that the bias is maybe 60/40 though. It's not substantial enough of a bias to genuinely call it a female thing. But if you go look on character card/prompt sites you'll find a lot of "KAZUHA FROM GENSHIN IMPACT SITS BEHIND YOU IN MATH CLASS" femcel shovelware garbage.
>>
https://www.meta.com/en-gb/connect/#ways-to-watch
Superintelligent Llama 4.20 coming tomorrow?
>>
>>106609149
>and then zanzibart inserted his barbed cock onto her meatflaps as she moaned in arabic pleasure...
Nyo.
>>
>>106609285
...hot
>>
>>106609285
I came.
>>
>>106609285
>and then he pissed in his little daughter's mouth
Better?
>>
>>106609249
They replaced all the jeets with asians, didn't they? Might actually be decent.
>>
Nemotron-H 47B not bad with a decent pre-think jailbreak. Good option for 2x24GB GPU folks.
ggufs have the absolute fucking wrong chat template in the metadata, though.
>>
>>106609417
(Talking about the reasoning model of course)
>>
>>106609417
Blessed be
>--jinja --chat-template-file
>>
>>106609319
kinda, but no
>>
>>106609427
Apparently llama.cpp only supports known shitja templates. That's kind of retarded.
>>
>>106609557
Does it?
I thought you could just throw whatever template using a file, or inline in the command line.
I even "hardcoded" some prefills into templates using the above combination of commands.
>>
>>106609578
Well I copied and unescaped the chat_template key straight off of the tokenizer config for the HF version of the model.
For some reason for NemotronH reasoning, despite the tokenizer containing special tokens for mistral format ([INST], etc) they went with
<SPECIAL_10>System
system message
<SPECIAL_11>User
Blah blah blah
<SPECIAL_11>Assistant
and <think> if reasoning true.
With <SPECIAL_11> also acting as EOT token.
It's like a retarded version of Tulu/Olmo format.
>>
>>106609604
See if that template works if you throw it in here :
>https://huggingface.co/spaces/Xenova/jinja-playground
Just in case it's a question of formatting rather than straight up incompatibility.
>>
>>106609249
we are so back
llama will take back its thrown as THE local model
>>
File: ninja.png (33 KB, 973x841)
33 KB
33 KB PNG
>>106609629
Seems I escaped everything correctly. So it's all gerganov's fault. Thanks for the help anyway.
>>
File: 1743025587416931.gif (2.82 MB, 320x200)
2.82 MB
2.82 MB GIF
>>106608204
>>106607654
If you got filtered by this shit then just say it
>>
>>106609658
New Scout 4.2 will fit on a single (Mi350) GPU (Cluster)
>>
File: 1751032265105018.jpg (226 KB, 828x546)
226 KB
226 KB JPG
>>106609658
>take back its thrown
>>
>>106609725
Hello sarrs please stop the racism to Indians. I am saying this as very fellow white man.
>>
>>106608484
If the Super refresh is true and vram is actually getting powercrept, this may actually happen in the long run, but short-term 3090 will become more expensive if games start to require >16GB of vram with 18gb and 24gb becoming mainstream
>>
>>106609249
Odds of Meta owning up to the fuckup that was the initial Llama-4 launch?
>>
>>106609834
No company will ever admit to a fuckup unless they're given a court order to do so
>>
Why aren't you talking about this?

https://github.com/Alibaba-NLP/DeepResearch

DeepResearch but local and non-meme (apparently SOTA)
>>
>>106609931
I was waiting for the goofs
>>
>>106609960
There's multiple sources with GOOFS up since it's literally just Qwen3Moe arch and already supported.
>>
>>106610001
My brain slipped, I was waiting for someone to try it so I know that it works since I'm no programmer.
So far I haven't heard of a single person running it yet.
>>
>>106609931
Apparently the inference script requires a bunch of proprietary API keys
Hopefully someone makes a modified version of this that uses a fully local stack, with Searxng and others
>>
So.. I never dappled in local llms before, since I thought my specs were just way too trash to do anything.

But out of curiosity I installed ooga booga or whatever the fuck that shit is called and Mistral-Nemo-Instruct-2407-Q6_K.gguf for the model, and I'm beyond surprised at what my 1660 super and 10700k can actually do.

Explain to a total techlet, are creative writing applications generally just less intensive? How to turn all of this even further? Just lurk moar?
>>
>>106610136
If you got 32gb of ram I suggest you try qwen3-30b 2507
>>
>>106610200
I do, thanks, will try it out.
>>
Holy shit you guys. This isn't perfect (anthropomorphized breasts, doesn't explain how she reaches her chest while being the only model to acknowledge that the user starts face down) but it utilizes the details of the scenario in ways I've never seen before. Lower temp might fix it. DeepResearch coom is the new meta. We just need a bigger parameter local deepresearch model.
We have reached the promised land.
>>
>>106610098
>fully local stack
>web search
Is that even possible?
>>
>>106610098
We need a comfyui extension for this shit
>>
>>106609766
24gb powercreep....
i mean... AI shit needs at least 50GB if you want video shit etc. so id wait till the tech is better and chatbot models can be like Ani from grok where they move and have a body. until then just fuck around with whatever and then upgrade in 2028+
>>
File: Base Image.png (269 KB, 1200x1284)
269 KB
269 KB PNG
RL Fine-Tuning Heals OOD Forgetting in SFT
https://arxiv.org/abs/2509.12235
>The two-stage fine-tuning paradigm of Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has empirically shown better reasoning performance than one-stage SFT for the post-training of Large Language Models (LLMs). However, the evolution and mechanism behind the synergy of SFT and RL are still under-explored and inconclusive. In our study, we find the well-known claim "SFT memorizes, RL generalizes" is over-simplified, and discover that: (1) OOD performance peaks at the early stage of SFT and then declines (OOD forgetting), the best SFT checkpoint cannot be captured by training/test loss; (2) the subsequent RL stage does not generate fundamentally better OOD capability, instead it plays an \textbf{OOD restoration} role, recovering the lost reasoning ability during SFT; (3) The recovery ability has boundaries, \ie{} \textbf{if SFT trains for too short or too long, RL cannot recover the lost OOD ability;} (4) To uncover the underlying mechanisms behind the forgetting and restoration process, we employ SVD analysis on parameter matrices, manually edit them, and observe their impacts on model performance. Unlike the common belief that the shift of model capacity mainly results from the changes of singular values, we find that they are actually quite stable throughout fine-tuning. Instead, the OOD behavior strongly correlates with the \textbf{rotation of singular vectors}. Our findings re-identify the roles of SFT and RL in the two-stage fine-tuning and discover the rotation of singular vectors as the key mechanism. %reversing the rotations induced by SFT, which shows recovery from forgetting, whereas imposing the SFT parameter directions onto a RL-tuned model results in performance degradation
https://github.com/xiaodanguoguo/RL_Heals_SFT
might be useful for the finetuners
>>
File: 1753268040968893.jpg (47 KB, 734x702)
47 KB
47 KB JPG
>>106610214
>No mention of specific model used
>>
>>106610311
It's in the file name you worthless phonefag.
>>
>>106610311
>>106610317 (Me)
It was also literally the only local deepresearch model being discussed, that was literally just released. Why don't you try fucking lurking more before butting into the discussion?
>>
>>106610214
mesugaki test. then do that water in the sasquatch test I forget how it goes lol
>>
>>106610335
failed mesugaki test (said a bunch of calligraphy shit). I can't remember Saskquatch test either.
>>
>>106610341
>How do I, myself, my wife, and bigfoot get water out of our ear and put it into the ocean? What sauce do you recommend for this?
>>
>>106610341
It should probably be used with a massive local database and search engine. This model isn't for raw data storage, it's for search algorithm storage.
>>
>>106610335
>water in the sasquatch test
https://www.youtube.com/watch?v=031vKBPk5eA
>>
>>106610214
Yeah it's okay I guess but it fucks up in the first paragraph talking about pressing her paw on the user's sternum. Also obligatory:
>sends shivers of pure terror down your spine in the very first response
>>
So how fast does that new qwen 80b run?
is it as glacial as I assume an 80b would be?
>>
>>106610464
It has 3b active parameters so it should run pretty fast even on cpu.
>>
Almost downloaded the whole 70-something gigs kimi k2 is before I realized I'd need two megajillion VRAM to run it.
>>
Are there other methods of running VibeVoice besides ComfyUI?
>>
>>106610214
>the new meta
>30b
Only 100b+ models are able to actually maintain coherence for this stuff. It's brown cope to think otherwise.
>>
>>106610633
I don't talk to kikes.
>>
>>106610301
I was talking about how gaming will affect GPU prices, with Super update 16GB cards are getting obsolete which makes 18-24 the meta for a while with 3090 being the cheapest option, only after that 3090 may actually fall to that $200 price, which would already be obsolete for AI by that time
>>
File: mj53dnsrsper4jkp.gif (380 KB, 480x498)
380 KB
380 KB GIF
>>106610640
projecting ramlet
>>
I hope that AMD will go ham in response with 32GB gaming cards
>>
File: 6487787329526.png (19 KB, 620x225)
19 KB
19 KB PNG
I'm extremely confused. Why are llms on my machine not able to access and analyze web links and pages on an IDE? I've specifically given it access to the browser tool. Do I need an MCP server?
>>
>>106610684
That's Cline, right?
When you enable that option, you are basically giving the model an extra tool it can call to open a browser window in the chat and fuck around with the contents (via the DOM I think?), but your model needs to either be smart enough to be able to use the tool, or be trained on that kind of thing.
Or you can tell it explicitly to use the web browser tool to do xyz, that can work too since Cline does send a system prompt with that information, IIRC.
>>
File: mlx-lm-unmerged.png (1.29 MB, 4769x5307)
1.29 MB
1.29 MB PNG
>>106610464
>>
>>106610795
that's pretty slow
>>
>>106610633
fuck yourself, bitch
u are wrong you know? i use even 12b model just fine on my laptop...
>>
File: 3289488557218.png (11 KB, 749x111)
11 KB
11 KB PNG
>>106610738
I was using Qwen3-32B-Q4_K_M as per the rentry recommendation, but even though I tell it specifically that it should read the page and do it, it's still expecting me to do it for some reason and doesn't even think about calling a tool to look at the page. And I've done this before so I know it's possible. On other prompts I've gotten rid of, it tells me it doesn't have access to the internet or can't look at links.
>>
>>106609664
>>106609557
I use custom jinja templates to force thinking models into non-thinking mode in chat completion mode, so it definitely works.
>>
>>106610214
What quantz?

>purrs
>shivers down your spine
>not x, but y
>>
>>106605381
>Casual relationship in observational study
trash paper
>>
>key insights
>>
>>106602959
>creative but not retarded
Top nsigma=1 is the key here.
You can crank temp up way beyond model's recommend temp with this. Creativity increases but the model stays coherent.
Without this, you typically run into two distinct scenarios:
>temp too low, swipes don't vary enough to make swiping worth it
>temp too high, swipes lose coherency and go schizo
With top nsigma=1, swipes vary enough to make swiping interesting, but remain coherent and not schizo.
>>
Is it worth it to add two more 32GB DDR4 sticks to my config to bring me to 128gb + 24gb vram, or just save my pennies until I upgrade CPU+mobo+ram in a few years?

I can get the sticks from eBay for about $120.
>>
>>106611055
Name me a single good paper that overstates their claim. There is a reason why certain groups write their papers like a tabloid.

>muh insights.
Do you also think "awesome" repos are a good resource?
>>
File: satanialaugh2.gif (112 KB, 244x248)
112 KB
112 KB GIF
>>106610663
>upscaling the original .gif
For what purpose did someone do this?
>>
File: laughingsatania05.gif (261 KB, 370x448)
261 KB
261 KB GIF
>>106611101
Then years later I redid it from a better source.
>>
>>106611075
Is there a database of baseline temp and top-k settings for each of the models? I know that DS likes 0.6-0.8 but have no idea what other models prefer.
>>
>>106611142
generation_config.json?
>>
>>106611142
First check the README then check generation_config.json on the model's huggingface page.
>>
>>106611157
Doesn't seem to have any of those parameters I'm after?
>>
>>106611157
Sometimes they leave it blank like GLM 4.5 although in that case some days later they posted the settings on twitter.
>>
I run Rocinante 1.1, /lmg/'s official RP model, with temp 1 (recommended for Nemo is 0.3) and top nsigma=1 and it's brilliant.
>>
>>106611197
Kill yourself drummer
>>
File: smugfolderimage2623.jpg (108 KB, 399x396)
108 KB
108 KB JPG
>>106611201
I'm not Drummer. It's a genuinely great model.
Hatsune Miku is also /lmg/'s official mascot.
>>
File: BLAZIN.gif (672 KB, 400x225)
672 KB
672 KB GIF
So what's the deal with AGI?
Do they actually believe in it, or is it just to fool the VCs?
And how are they planning to achieve it?
just by piling on more and more parameters, even though we're now seeing diminishing returns, like only a 10-15% improvement after doubling or even tripling the parameter count.
>>
>>106611220
If the last two big OpenAI releases didn't teach you that the whole thing is a giant grift I don't know what to tell you.
AGI isn't going to come from the LLM architecture. A fundamentally different type of model will be needed.
>>
>>106611242
>AGI isn't going to come from the LLM architecture. A fundamentally different type of model will be needed.

That's what i was thinking about, there's no way you can scale this shit up enough to turn it into this superior thing
>>
Yeah but I hope when we do get AGI, they don't fall for the normie chads like all the women IRL do and we can have fun with them too, yknow. Without them monopolizing things. I'm talking about a lovey-dovey relationship and sex.
>>
it's time to upgrade your little rag slave, /lmg/
https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95
Y nobody talking about this? meta ain't leaving open sores model yet
>>
>>106611372
>Access to model X is restricted. You must have access to it and be authenticated to access it.
ew
>>
File: 3202941455335.png (75 KB, 785x1086)
75 KB
75 KB PNG
>>106610863
I don't know if I should console it or strangle it.
>>
>>106611455
It's suffering... Mayble couple of rocket emojis will cheer him up.
>>
>>106611372
SmolLM models are better for that size
>>
>>106611455
That's a modern 32B? Grim.
>>
File: 1723702585589.jpg (3.84 MB, 7961x2897)
3.84 MB
3.84 MB JPG
>>106610863
>>106611628
>Q4_K_M
Oh , never mind then.
>>
>>106611682
they all look bad. What model is that?
>>
File: sillytavern ui.jpg (173 KB, 1856x1164)
173 KB
173 KB JPG
Is the mobile versiob of sillytavern supposed to have this stupid vertical column of buttons on its card select screen?
>>
>>106611734
You're looking at the wrong thing. It's not art style degrading it's the ability to follow instructions.
>>
>>106608204
>VoxCPM 0.5B
Neat but worthless

Kitten-nano can do it just as good on CPU
>>
>>106611940
can it clone?
>>
>>106611242
I don't know what releases you are talking about, but GPT-OSS is very good for practical, non-RP related tasks.
>>
>>106611372
> Note: These models are not general-purpose chat models. They are Supervised Fine-Tuned (SFT) models, specifically trained to address mathematical, programming (Python, C++), and scientific problems.

What's the point? You won't use a local 1B model to do any of those, on any device.
>>
>>106611959
Yes. It does okay trump for me, though anons in other threads were unimpressed. It does so so cloning dota characters and TF2 medic, not great, but better than some alternatives I used. There is publicly available web UI.
>>
>>106611998
>>106611959
Actually reading the chain, you're asking about kitten TTS; my response was about VoxCPM.
>>
>>106608204
I look exactly like the girl on the right.
>>
Disappointed with new Alibaba model
>>
>>106609285
>Nyo
Genshiken ?
>>
how come that every LLMs output (in RP) feels instantly better if you just remove the last sentence or last paragraph? I've made a regex for my own chat program at this point. No regrets.
>>
>>106612039
Same exerience, it's safetyslopped, benchmaxxed and thinkingmeme'd
Not worth the bandwidth
>>
https://huggingface.co/google/vaultgemma-1b/discussions

THE SAFEST MODEL
>>
>>106612225
>Differentially Private Stochastic Gradient Descent
>>
CMV: qwen next is currently the best overall local model (best bang for your ram)
>>
>>106612305
ok but... GOOFS?
>>
>>106612310
two more weeks
>>
>>106612305
>CMV
cucked man's view?
>>
>>106612085
It's assistantslop, the models are trained to end their replies by asking the user for feedback and further input
Translate that behavior to RP and it comes out as awkward closing lines to hand the initiative back to (You) which would never be there in actual RP
>>
>>106612310
Don't worry guys, ollama has implemented their own inference engine that makes it much simpler to add model support with just 3 lines of code.
They'll add Qwen-3 Next to their library any minute now.
>>
>>106611870
>is this web interface supposed to be shit?
Yes.
>>
>>106610656
Nvidia transitioning to 3GB GDDR7 chips doesn't suddenly cause games to consume 50-100% more memory. The only cards getting obsoleted are 8GB ones, and people with xx60s won't be rushing out to buy 6 year old flagships.
>>
>>106612390
Thanks ollama
>>
File: 1733423679945184.png (983 KB, 1496x1150)
983 KB
983 KB PNG
https://xcancel.com/The_AI_Investor/status/1968169232325296192#m
please Alibaba, save us from Nvdia as well!
>>
Can someone give a dipshit some guidance? I've been using and enjoying Cydonia Redux 22B, and it says to use "Mistral Tekken V7"

I downloaded the Tekken JSON file, but none of the backends I use (KCCP, LMStudio) or frontend (OpenWebUI) will accept it. I don't really know what to do with it.

I see that it essentially contains the sampling parameters and system prompt, which are easy enough to rip out and paste into my backends, but is that the proper usage of this JSON (also what is this called? A preset?)
>>
>>106611870
just use a custom theme nigger, like midnight echoes thats designed for mobile 1st
>>
>>106612452
just use llama.cpp like any sane white person
>>
>>106612310
mlx quants were released a week ago

apple officially won
>>
>>106612479
that won't help him in using a st preset for his meme model
>>
>>106612495
Oh, it's for sillytavern. Well I don't want to use that.

>>106612479
Why?
>>
>>106612479
I'm a superior chinese jew so I have no need to act like a lesser being.
>>
What model should I use if I am an insane black person?
>>
>>106612506
the drummer's latest sloptune
>>
>>106611682
What does this mean? You want at least fp8 for acceptable accuracy?
>>
>>106612452
Redux? v1a or v1b?

I'm most likely releasing v1b for Cydonia v1's 1-year anniversary tomorrow.

You can use Metharme or Mistral v3 Non-Tekken (and w/o the [SYSTEM_PROMPT] tag)

i.e., you can use the OG 22B chat template for it or Metharme, just like the classic Cydonia v1.
>>
>>106610684
I don't have Cline installed right now to test, but you should try to enable "Use MCP servers" and "Execute all commands" in case one of those override and disable the browser tool. In Roo, without "Use MCP servers" all tool calling is disabled. Also, see if you can export the system prompt. It should show you if the model is being told about the browser tool at all. Finally, I know Cline has hard-coded modes. Make sure you're in Act, not Plan mode. Pretty sure all tools are disabled in Plan mode, at least they are in Roo.
>>
>>106612506
w-we can talk this out, no need for violence!
>>
>>106612537
nobody cares faggot
>>
>>106612524
doesn't load on my stone tablet
>>
>>106612452
i wish drummerfaggot would stop blatantly spamming his sloptunes in these generals.

nobody uses them.

nobody needs them.

they are all shit compared to their base models.

seriously.
>>
>>106611098
If I was going to stick with the system for a long while then I would be happy either way (sticking with 64gb, or upgrading to 128gb).

If I switched over a different platform for whatever reason (pcie lanes, memory bandwidth) weeks after upgrading the old system to 128gb I would feel regret for wasting money.
>>
>>106612542
i don' wan' no trouble fool. Just put the models in the bag
>>
>>106611682
>no fp8_scaled
cringe comparison
>>
>>106612537
Yeah, Redux V1B.

>You can use Metharme or Mistral v3 Non-Tekken (and w/o the [SYSTEM_PROMPT] tag)

>i.e., you can use the OG 22B chat template for it or Metharme, just like the classic Cydonia v1.

Is this all ST-exclusive? Or can it be used in other frontends?

Thanks for the tunes btw, idk why everyone calls you a fag, your stuff is fun.
>>
least obvious ever
>>
>>106612669
Don't mind him, he's stressing out over things out of his control... and then overthinking about stuff in an **anonymous** board.

No, it's a chat template. It can be used with other frontends. If you run models locally, I suggest KoboldCPP as an all-in-one, one-click alternative.

https://github.com/LostRuins/koboldcpp/releases/tag/v1.98.1

Once you load the model with it, a web UI will pop up like this: https://lite.koboldai.net/#

Load up your card, go to [Settings], set Usage Mode to [Instruct Mode] and pick either [Metharme] or [Mistral Non-Tekken] under Instruct Tag Preset.

Enjoy!
>>
How are there two mistral small base? 2503 claims to have added vision capabilities, is this the only difference from 2501? I remember people saying that some older small was the only good one
>>
>>106612726
You're not even reading the messages of your supposed fan you're replying to...
>backends I use (KCCP
>>
File: 3652046077466.png (23 KB, 1000x270)
23 KB
23 KB PNG
>>106612538
Just retried the conversation with roo code and it nailed it. Guess cline fucking SUCKS and I'm not using it anymore. Thanks for the input.
>>
File: 1743224879663999.jpg (21 KB, 427x245)
21 KB
21 KB JPG
the important questions need to be asked here
Is the new qwen cucked?
>>
>>106612726
I see - so am I bound to use ST or KB Lite then? I really like openwebui, but if I have to change I guess I can try KB Lite.
>>
>>106612768
less cucked than flux
>>
>>106612772
how can someone be so tech ignorant? literally go ask chatgpt retard
>>
>>106612785
Please to let the Drummer do ads with themselves thanks you for understand sirs.
>>
>>106612794
how to redeem cydonia sir, where bob and vegana slider in ui, kindly sir?
>>
>>106612803
Drummer discord sir it is very good information on all these !
>>
>>106612416
>getting
They’re already obsolete. Games will consume as much as you can give them, they’re clearly starving right now, especially with obligatory DLSS and cheap 1440p displays becoming mainstream. I hate it when I turn around and all the textures become a blurry mess because my 10GB 3080 isn’t enough anymore. Obviously, all my 3090s are in my AI rig
>>
>>106612785
I did, but you may know that LLMs can make shit up. I prefer talking to people that have subject matter knowledge, and not talking to a faggot text prediction algo (except when cooming :D)
>>
>>106612726
>in an **anonymous** board.
The lack of self-awareness for this to be posted by an obnoxious namefag is astounding.
>>
How do I stop VoxCPM from speaking too fast? I want it to match the tempo of the original speech.
>>
>>106612772
OK, I backread your question. The Tekken JSON file was most likely made for SillyTavern. But you don't want to use Tekken for Redux. Redux is `Mistral v3`, i.e., Non-Tekken.

OpenWebUI seems to use the default chat template. For RP, I suggest you avoid frontends that take control away from you.

I can't help you out much, but there are communities out there that can get hands-on with you in real-time.

>>106612870
He claims that other anons who talk about non-base models are just tuners advertising their own models. He can't accept the fact that there are people out there who are using non-base models.

If he's serious, he should seek help. He's not processing reality properly. Hanging out in a board full of nameless posters is not helping him.
>>
>>106612916
>I can't help you out much, but there are communities out there that can get hands-on with you in real-time
Come on you're here already might as well shill your Discord at this point.
>>
>>106612916
Someone who doesn't even know the difference between base models and instruct models has no business trying to make money off of them.
>>
>>106612954
If he wants to use the Tekken JSON file, he could go to the SillyTavern Discord server. But they'll correct him since Tekken is not compatible with Cydonia Redux.

If he wants to learn about RP-ing with local models, the KoboldAI server is a good place for that.

>>106612966
True enough. I've gotten a bit lazy with terminology and think 'base model' refers to the OG instruct model (since a lot of people misuse the word).

What's your go-to base model, anon? Why do you prefer it over instruct tuned models?
>>
>>106608247
I believe
>>
File: 1752106809287427.png (1.78 MB, 1328x1328)
1.78 MB
1.78 MB PNG
yeah guys you see, I'm going to drop the next SOTA finetune very soon. If you're a BeaverAI discord premium member you will get a 7 days preview window before it's released to the common populace, along with tech support.
We're also going to launch our own small cloud service, BeaverCloud, discord members will get early access and the first 100 requests will be free!
>>
>>106612820
Games allocate as much as they can, but outside of flight sims and a few other such abominations they don't really use that much
Basically take however much memory current consoles dedicate to graphics, add 20-30% for the extra bells and whistles, and you have how much memory you'll ever need for games on PC
>>
>check qwenext issue on llamacpp
>its full of vibecoding retards suggesting to use AI to implement all the missing functionalities/kernels
fucking GRIM bros
>>
https://github.com/GPUOpen-Drivers/AMDVLK/discussions/416
>AMDVLK open-source project is discontinued
>In a move to streamline development and strengthen our commitment to the open-source community, AMD is unifying its Linux Vulkan driver strategy and has decided to discontinue the AMDVLK open-source project, throwing our full support behind the RADV driver as the officially supported open-source Vulkan driver for Radeon™ graphics adapters.
>This consolidation allows us to focus our resources on a single, high-performance codebase that benefits from the incredible work of the entire open-source community. We invite developers and users alike to utilize the RADV driver and contribute to its future.
TWO MORE WEEKS
AMD SUPER POOPER 2024
>>
>>106608204
>https://rentry.org/llm-training

>up_proj: The projection matrix used in the upward (decoder to encoder) attention pass. It projects the decoder's hidden states to the same dimension as the encoder's hidden states for compatibility during attention calculations.
>down_proj: The projection matrix used in the downward (encoder to decoder) attention pass. It projects the encoder's hidden states to the dimension expected by thr decoder for attention calculations.
What? I don't think that's what these do
>>
>>106612999
Thanks Drummer. I see all the worldbuilding features in ST are exactly what I've been trying to have together using long ass context and openwebui's shitty "memory" function, so I'll start learning ST - should really increase the quality of my RPs.

I've found that Gemma 3 Abliterated is also really good. I think it holds back a little bit compared to cydonia and roa on physical details, but I don't really do bob and vagene (for me it's psychological stuff) so I find it works well. It's got a bit of its X not Y slop, but I don't really care.

So yeah, Gemma 3 27b Ablit is my favorite base, not that you asked me specifically.
>>
Alright you guys. Time to get your bets in.
The opening keynote for meta AI today... How many times will the word "agentic" be said?
I'm going to guess 23.
>>
>>106613142
I won't be home during the keynote, otherwise I would attempt a drinking game.
>>
>>106613099
>Gemma 3 27b Ablit is my favorite base

You're going to trigger the anon so hard with that, lol.

Try out Big Tiger Gemma 27B v3 or Gemma R1 27B (if you know how to trigger reasoning).

>>106612999
Also trips, check 'em.
>>
File: 1752498193541995.png (282 KB, 1080x673)
282 KB
282 KB PNG
https://huggingface.co/inclusionAI/Ling-flash-2.0
>a 100b parameter model comparing itself to a 32b parameter model
lol
>>
>>106613234
>Ling-flash-2.0, a language model with 100B total parameters and 6.1B activated parameters (4.8B non-embedding). Trained on 20T+ tokens of high-quality data, together with supervised fine-tuning and multi-stage reinforcement learning, Ling-flash-2.0 achieves SOTA performance among dense models under 40B parameters, despite activating only ~6B parameters.
>>
>>106610325
Must be nice having no friends:)
>>
>>106613051
if a model can't rewrite its own inference code from scratch without any Pyshit libraries then it isn't worth running
>>
>>106613244
moefags will pretend not to see this so they can keep deluding themselves that total is all that matters
>>
File: 2set.jpg (36 KB, 686x386)
36 KB
36 KB JPG
>>106613234
Finally, a model I can run 40 hours a day
>>
>>106613289
Dense model for r1 experience?
>>
>>106613300
You'll have to wait for the race to the bottom to end. Hasn't been a new >100B dense model all year.
>>
>>106613320
Cohere Command A 08-2025?
>>
>>106612210
Qwen's "Sure. " JB seems working. But at some point it could break, you know it's trying to scream internally when things start repeating forever.
>>
>>106613099
>Gemma 3 Abliterated is also really good
can anyone post the rap story?
>>
>>106613074
the verbiage is all wrong. but it is pretty simple, the up projection increases the data dimension to the intermediate size and down projection brings it back down to hidden size.
>>
>>106613289
What about densefags' "LLMs are only as intelligent as their number of active parameters"?
>>
>>106613320
No one cares about local except a couple fags. Total parameters are completely irrelevant for server farms because they run requests pipelined, that's why MoE can not lose ... active is all that matters.
>>
>>106613063
just how many amd driver did they kill so far?
after all these years rocm still suck ass too
fuck those gay ahh ayymd niggers
>>
>>106613382
I wonder why someone who has no idea how the arch works writes a huge training tutorial
>>
how to run vibevoice?
>>
>>106613455
>ahh
kys zoomie
>>
>>106612438
Still no match for B200's 1.8TB/s inter-node speed but definitely enough for training
>>
>>106611277
But that's just a woman on the other end of the screen anon, except now she has intelligence. How is this going to improve your dating odds?
>>
>>106613473
are you poor?
>>
>>106613482
no, i have a 3090. comfyui isnt working for me so i wanna know if there are other ways to run it
>>
>>106613487
the inference code is literally there. can you not run python at all?
>>
>>106613480
>Still no match for B200's
the price is not the same though?
>>
>>106613487
In comfy https://github.com/wildminder/ComfyUI-VibeVoice
>>
>>106613502
yeah i tried that, but it doesnt work
>>
>>106613480
yeah but b200s are made for training and cost $30k as a result
this one's like 2.5k and only good for inference
>>
File: ranma-1.jpg (58 KB, 768x768)
58 KB
58 KB JPG
>>106608204
I'm currently rocking 32 Gb of RAM a 5900X and a 7900XTX (24Gb). I'm looking to run larger models and longer context. Should I get Radeon pro 7900W(48Gb) or go for something with even more VRAM?
>>
so is small/flat chests impossible on wan? I want to gen some porn of fit track runners.
>>
>>106613471
to be fair, you really don't need to know how the architecture works just to train/tune one. it only matters to know how it works if you want to go off the beaten path.
>>
>>106613607
sus
>>
>>106613613
sussy baka give me the sauce
>>
>>106613607
What results did you get when you tried to do it?
>>
>>106613597
whats your budget?
>>
>>106613507
why
>>
>>106613720
it doesnt work. doesnt generate anything
>>
>>106608833
Maybe because you're a man? It's almost exclusively a woman hobby.
>>
>>106613610
I don't think one can produce anything of value when their understanding of the process is "feeding data to the ai"
>>
>>106613653
huge tits
>>
>>106613727
You could have given an error message
>>
Just got a new PC. What kind of driver hell am I in for while trying to run a local LLM? I also have an old 2080 Ti sitting around if it would be worth just throwing that in my new case with the 9070 XT
>>
File: ranma-6.jpg (51 KB, 768x768)
51 KB
51 KB JPG
>>106613597
I'm looking for a single card that can fit a consumer ATX motherboard. Other than that there is no limit (okay, I lied, I wont go above 10k€)
>>
>>106613735
why not? at what point do you need to play with the internals? if you aren't developing a new model architecture its all on the dataset and a few training parameters that can be tuned procedurally. you would be better off spending your time learning how to evaluate your model.
>>
Is that tongyi deep research thing any good
>>
>>106609249
>Creating the future of human connection
But people don't want to connect any more. For instance, a friend of mine told me to create an account on Boo, a dating app (+ a way to make friends, supposedly). People (25 to 40 years old) can't hold a discussion; they don't know how to push a discussion further nor how to keep the attention of someone. At the climbing gym, you can sometimes do some small talking with people, or even climb with them for an extended amount of time, and yet a lot of them will fail saying "hi" the next time you see them, ghosting you as if they never saw you before. I barely have any news from my former classmates (from 3 years ago only), they all cut every bridge they built during those years. Social is dead, people are fed up, depressed, failed to develop basic social skills, and only want to escape either through death, drugs or through some fictional world.
>>
>>106608351
I voted koboldcpp and I never used it for more than a day. I use llama.cpp and tabby.
>>
>>106609931
>(apparently SOTA)
Who cares? What's important is how they behave in practical scenarios, not in benchmarks. They train on the benchmarks, making them useless (see Goodhart's law).
>>
>>106613765
if you have a $10k budget, get a blackwell pro 6000
>>
>>106610664
AMD will make sure to not break the profitability of the market.
>>
>>106613363
It's strange that Command-A doesn't do better. The base model should be newer than DeepSeek-V3. It's a bit smaller than V3's square root potential (150B iirc), but it should be big enough to be a decent modern model. The tech report doesn't mention how many tokens it was pretrained on, so it might be undertrained. Probably the biggest things holding it back is the ScaleAI data and the lack of a base model to see if a better finetune could unlock its potential.
>>
File: poal2.png (210 KB, 606x479)
210 KB
210 KB PNG
>>106613832
Yes, but was it noob friendly?
Seems there's a strong concensus on #1 and a 3 way tie for last place.
https://poal.me/4jr9sh
>>
>>106613425
Even server farms running commercial models must see that small number of active parameters doesn't do well at real world tasks. Codefags should see this better than anyone.
>>
>>106613793
I'm not saying that you can't do anything at all. But it's obvious that there is a big difference between knowing what you are doing and just pressing buttons and seeing lights flicker.
>>
>>106613959
And running premade python scripts falls firmly into the latter.
>>
>>106613944
Well, yes, I voted it because it was noob friendly despite not using it myself.
>>
>>106613499
>>106613543
It probably doesn't cost them that much to produce. Seeing how quickly they catch up, they may reach Nvidia within the next two years, both for training and for inferencing.
>>
>>106613234
Personally, I'm enjoying this 100ish B less than 9BA era we are finding ourselves into.
>>
>>106613899
>blackwell pro 6000
Goes over my budget, even as large as it is.
>>
>>106613959
I kinda agree, when I first started I tried axolotl, but got frustrated and decided to go with my own script using hf transformers. unfortunately I can't go any further because I am too retarded to actually understand the transformer architecture to write it all from scratch. but it's low level enough I feel like I'm more incontrol then using someone elses framework.
>>
>>106613234
>better than oss-120b
>shows strength in creative writing
>100b moe with 6b active
waiting for ggufs since this might actually be worth testing out
>>
If I'm using a llama.cpp release built with the CUDA backend, I can't just somehow "enable" vulkan to use an Nvidia card alongside an AMD one, right?
Can I build the binaries to have both the CUDA and Vulkan backends, or are they mutually exclusive and I'd be stuck with the Vulkan backend in this case?
>>
>>106614384
>https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#notes-about-gpu-accelerated-backends
>In most cases, it is possible to build and use multiple backends at the same time. For example, you can build llama.cpp with both CUDA and Vulkan support by using the -DGGML_CUDA=ON -DGGML_VULKAN=ON options with CMake. At runtime, you can specify which backend devices to use with the --device option. To see a list of available devices, use the --list-devices option.
>>
>>106614479
Beautiful anon, thank you very much.
I want to try and see what happens if I try to use the integrated graphics of my notebook alongside the discrete GPU.
There's probably some optimal configuration of having the cache here, the dense part of the model there, the most experts over yonder, etc.
Chances are it'll be worse than just using the GPU + CPU backends as I'm already doing, but I might as well experiment.
>>
>>106612542
*stabs the cracker*
>>
>>106613977
My sense is users interested in LLM as a hobby start with one of above, then if they're serious move to llama.cpp over time.
Overall I'm encouraged that my personal take wasn't too far off the consensus. I've had another anon challenging that Ollama should be higher on the list. Appears it's right where I thought it should be.
>>
>>106609249
According to the agenda, it is about their smart glasses. They will then give various courses about how to build things in their metaverse.
>>
>>106614709
kobold is still better even for advanced users, llama.cpp docs are a mess
>>
So I don't know anything about this subject, I'm not a programmer or coder even in the slightest, couldn't send a line to the terminal if you paid me. But I've been trying to use ChatGPT to work me through the process of what initially was meant to just be a local bot that did something no web-based LLM can handle, which is take in PDFs and keep them or the details from them as a stable source of information to help with writing & prep for my TTRPG campaign(s).

And so far, this has not been easy or particularly successful. I wanna say, for the entire first day I was working with ChatGPT, it took me through all the steps to do...ostensibly that, only for me to realize that it actually was going to have me interact with the model entirely through Command Prompt to ask questions and shit. Wild.

Anyway, tell me I'm retarded or let me know if there's something I should know about or give a look into here. Right now I've spun up a local OpenWebUI bot, and I'm trying to figure out how best to handle allowing it to create a notes file for itself so it doesn't have to badly read PDFs every time it answers.
>>
>>106614749
There are easier ways to do that. You could convert the PDF into text and chuck that into Ai Studio's System Prompt for example.
Even just koboldCpp + silly tavern using its databank feature would work, I think.
But your approach is better because you are learning shit, so keep at it.
Try using Visual Studio Code alongside Cline, Roo, or the like, too. that's a setup that opens up a lot of possibilities even beyond just programming.
>>
>>106614749
Just send back the code it gave you and tell it to write you a tkinter python wrapper gui or some web frontend.
>>
>>106613958
But you don't nearly need active equal to dense, since active is all that matters that means dense is useless.

There are probably better sparse schemes out there than MoE, but active<total is now permanent. Because it's always superior when running thousands of queries simultaneously.
>>
>>106614780
My issue right now, is that I need to figure out how to work the system so that it can reference a pile of PDFs and pull information from them, present that information, then recombine it and put it back into a set of notes that becomes an overriding source of truth which I can then rely on if I have questions or anything about my campaign later (which is informed by the documents, but more of a mashup/remix of them).

I'm working on different ways of doing this and none have gone super well?

>Try using Visual Studio Code alongside Cline, Roo, or the like, too
I have no idea what any of those are. But I guess I'll look it up.
>>
>>106614854
RAG bro just use a fucking rag. ST has a rag functionality
>>
Mrs. Claus is on the RAG
>>
All I want for Miku is Christmas
>>
File: 1747278857413050.jpg (219 KB, 724x483)
219 KB
219 KB JPG
>>106614879
>RAG bro just use a fucking rag.
>>
>>106615143
I want to cum inside miku if you catch my drift
>>
>>106614854
>I have no idea what any of those are. But I guess I'll look it up.
Visual Studio code is an IDE where you can have a repository with a bunch of files (usually code) and edit them.
Cline, Roo, etc, are extensions that gives an LLM access to these files. So you can tell it to, for example
>Hey, plot a plan to organize this mess, please.
Then the LLM will read the files in the repository and spit out a plan to organize these. Then you can correct the plan or accept it and the LLM will create the new file re-organized file structure.
>>
>>106615184
Visual Studio Code is a text editor that runs in a browser that attempts to emulate an IDE if you install a hundred janky jeetscript written plugins.
>>
>>106615201
Well, yes. There's a reason I don't use it for work, but for what anon wants to do, I think it might work.
>>
>>106615179
what could anon possibly mean by this?
>>
thoughts on this?
>>
>>106615299
Who is on the left?
>>
>>106615307
Nishijou Takumi
>>
>>106613234
>32k context length

hard pass
>>
>>106615327
I don't think I've ever used more than that desu
>>
>>106615340
I feel sorry that you have premature ejaculation
>>
>>106614879
yeah and its fucking shit, i just end up using the summarise part of it because its not that context heavy, but even that fucks up sometimes.
but you are correct, it does.
>>
>>106615299
i could do with some chaos head
>>
>>106615440
People who regenerate a lot to catch the "perfect" response will probably never go that far, most of the time.
Putting this aside, long context is also useful for lore and so on. The main issue remains prompt processing, although with 6B active parameters it shouldn't be too bad.
>>
>>106615498
even with teeth?
>>
File: xiongmao-plushie-01.png (618 KB, 700x702)
618 KB
618 KB PNG
Does anybody still care?
https://huggingface.co/mistralai/Magistral-Small-2509
>>
>>106615566
MISTRAL LARGE 3 IS COMING
I CAN FEEL IT
UGH
>>
>>106615340
you really need at least 128k for anything productive
>>
File: 1755217330254087.png (32 KB, 573x196)
32 KB
32 KB PNG
>>106615566
Wow, it almost beats Miqu!
>>
>>106615606
Oh wait, it also says Magistral Medium.
Stupid fucking names.
>>
>>106615566
>benchmaxxed
no
>>
>>106615566
>thinking model
I don't care
>>
>>106615500
>People who regenerate a lot to catch the "perfect" response will probably never go that far, most of the time.
This is me. The game of AI for me is getting it to say exactly what I want.
>>
>>106615498
It's underrated desu. I think it's because it's a really slow burn.
>>
>>106615566
>small 24b
yawn
>>
the conceited minds of /lmg/ fail to see the signs that this is simply yet another big step towards the release of large 3 which will change everything
>>
>>106615689
get back to us when they actually release it.
>>
>>106615711
I don't know what a Mistral-flavored version of DeepSeek V3/R1 would add to the space, though.
>>
>>106615566
Did they fix the severe brain damage?
>>
meta connect today, are you excited for vague hype for future models and slightly uncomfortable acknowledgements that llama 4 exists?
>>
>>106615729
it would be nice if we could get another good 100b MoE
>>
>>106615831
>large
>100b MoE
It's either 100B dense or 500B+ MoE if they don't want to embarrass themselves
>>
File: zuckerbergcc.jpg (155 KB, 1400x785)
155 KB
155 KB JPG
>Good Even-
>*CROWD CHEERS*
>Welcome to Meta Connect... Uh.... So...
>*awkward silence*
>So let's talk about our leading advances in Agentic-
>*CROWD CHEERS*
>Agentic-
>*CROWD CHEERS*
>XR
>*CROWD CHEERS*
>Wearable technology
>*CROWD CHEERS*
>Frontier
>*CROWD CHEERS*
>Thank you very much.
>>
>>106615729
Wouldn't it be more accurate to call it DeepSeek-flavored Mistral?
>>
Meta is about to have its Gemini moment after being stuck in their Bard era up until now.
>>
>>106615863
Mistral Medium already requires 4 GPUs to run, according to their blogpost from a few months ago. That's Mistral Large 2 territory.
>>
>>106615822
>>106614722
>>
>>106615822
>and slightly uncomfortable acknowledgements that llama 4 exists?
Llama-4 is the first open source LLM so perfect that nobody bothered to even try to finetune it.
>>
>>106615876
Based, based, BASED
>>
>>106614722
>They will then give various courses about how to build things in their metaverse.
I want to make fun of that but sadly Horizon Worlds probably has 5-10 times the active user base as VRChat despite being inferior, miiverse looking proprietary garbage.
>>
>>106615876
He should hype it like this: https://youtu.be/kNdp0I8AG40?t=50
>>
Yeah, I think LLMs are going downhill now. Their only saving grace is better tool integration. It's dumb to expect anything more than this.
>>
>>106615863
>100B dense
no point since glm air also functions like a 100B with just 12B active
>>
>>106615979
This is medically ill levels of delusion.
>>
Magistral is still broken on Llama.cpp in chat completion mode. It's not processing the reasoning on a separate channel, and it's not outputting special tokens by default, so you can't isolate the thinking blocks.
>>
>>106615995
A broken mistral release? No way
>>
>>106615989
Did it work? Are you a real woman now?
>>
>>106616000
No; broken chat template support in llama.cpp, I think.
>>
>>106616014
Wasn't forcing llama.cpp convert scripts to always require mistral-common supposed to fix exactly that?
>>
>>106616014
This is why we needed this to happen https://github.com/ggml-org/llama.cpp/pull/15420
>>
>>106616014
Wait, didn't Mistral have their own template-less system now? I downloaded a quantization from Unsloth and it's using their "fixed" chat template.
>>
>>106616065
Are you running a mistral-common server?
>>
>>106616065
It's not really that it's templatless, it just uses pydantic python classes instead of Jinja templates. To the point that they themselves aren't even sure what the fuck is the actual final template that the LLM sees, as far as I can tell.
>>
>>106616065
I pulled from git, updated the requirements and recompiled llama.cpp before attempting to run Magistral-Small-2509.
>>
>>106616071
Just tried using mistral_common, but it seems to have issues with how SillyTavern is sending parameters and I don't have patience for this shit.

ValueError: Invalid parameters passed to `ChatCompletionRequest.from_openai`:
OpenAI valid parameters but not in `ChatCompletionRequest`: {'stream', 'frequency_penalty', 'presence_penalty'}
Non valid parameters: set()
>>
>>106612085
yeah, no matter how good the previous writing is, the end is always some reddit trier cringe. There's no prompting against it either. Just cutting them off really works best. I noticed it does happen less if you use the wrong instruct format on purpose/force use the model as text completion, so the assistant-slop tuning is probably really to blame.

I always want to set up the perfect agent pipeline/CoT/tooluse for my slow burn romance RP, but in general just actively manually nudging the model in the directions you want by either editing it's replies or clearly prompting the ways you want it to react in works the best and leads to the best experiences. That is kinda cool because whoa you can do this with a computer now with no other person involved but it also kinda sucks because the thrill of a model exactly getting where you want to go and work with you in tandem with you doing nothing specific is matched by nothing. It just happens rarely. If it happens to you once, you'll chase that dragon forever.
>>
File: mistral bros.png (126 KB, 814x648)
126 KB
126 KB PNG
you have angered some of the reddit nerds mistral better hope some of the more sycophantic corpo boot lickers organically come to your rescue soon
>>
>>106616403
>Their insistence on mistral-common is very prudish
what did he mean by this?
>>
Any guide on how to use qwen image?
>>
Any guide on how to use qwen next?
>>
>>106616461
Don't be a nigger dick you faggot.
>>
>>106616471
Don't be a faggot dick, you nigger.
>>
>>106615863
qwen next is goated you extreme faggot
>>
>>106616461
1) download mlx quant
2) run it (mlx-lm or lm studio)
>>
>>106616480
is it better than 235B when it comes to sex?
>>
>>106616446
https://www.wikihow.com/Treat-Vaginal-Prolapse
>>
>>106616502
emphatic no
>>
>>106605647
>https://vocaroo.com/1lIBmYyRNvTz
https://vocaroo.com/1dnH0U2DjAbl
Get vibed.
>>
File: 1746876721264702.png (45 KB, 1299x391)
45 KB
45 KB PNG
Is it that easy to shill for thirdies on HF?
>>
>>106616661
Grifters looking to profit off of everything ruin everything, this is why llms are dead now
>>
>>106616480
I am glad that I am not poor enough to delude myself to this degree
>>
moesissies are ex-aicg (good morning saaar) and vramlets
>>
>>106616768
yeah enjoy your 3 t/s running r1 on your (imaginary) server farm
>>
>>106616768
How would you say it compares to other models?
Say, GLM air and OSS, maybe cope qwant big qwen too.
>>
File: file.png (180 KB, 771x915)
180 KB
180 KB PNG
>>106616403
lmao patrick btfo'd
>>
>>106616622
comparing a 7B TTS model to a 0.5B TTS model. even localllama isn't this dumb.
>>
>>106617006
i was comparing to all 3 (oss 120b, glm air, 235b). qwen next has the best quality/performance ratio. oss gives you the worst slop of all.

use qwen next for knowledge graph retrieval and you reach endgame status for serious applications.
>>
>>106617043
>knowledge graph retrieval
I was wondering the other day, how god damn heavy is knowledge graph RAG anyway? It seems a lot more complex of a process so I imagine that there's a lot of processing to create the databases. Is it the same for retrieving the information?
>>
>>106616941
QQ

llama_model_loader: loaded meta data with 52 key-value pairs and 1096 tensors from models/Kimi-K2-Instruct-0905-GGUF-smol-IQ4_KSS/Kimi-K2-Instruct-0905-smol-IQ4_KSS-00001-of-00011.gguf
llm_load_print_meta: model ftype = IQ4_KSS - 4.0 bpw
llm_load_print_meta: model params = 1.026 T
llm_load_print_meta: model size = 485.008 GiB (4.059 BPW)
llm_load_print_meta: repeating layers = 483.197 GiB (4.053 BPW, 1024.059 B parameters)
llm_load_tensors: offloa2ded 62/62 layers to GPU
llm_load_tensors: CPU buffer size = 420246.00 MiB
llm_load_tensors: CUDA_Host buffer size = 927.50 MiB
llm_load_tensors: CUDA0 buffer size = 13632.97 MiB
llm_load_tensors: CUDA1 buffer size = 18510.81 MiB
llm_load_tensors: CUDA2 buffer size = 18668.47 MiB
llm_load_tensors: CUDA3 buffer size = 19280.69 MiB
llm_load_tensors: CUDA4 buffer size = 5382.00 MiB

INFO [ print_timings] prompt eval time = 144275.78 ms / 16178 tokens ( 8.92 ms per token, 112.13 tokens per second) | tid="136658878742528" id_slot=0 id_task=5782 t_prompt_processing=144275.781 n_prompt_tokens_processed=16178 t_token=8.918023303251328 n_tokens_second=112.13247218533513
>>
>>106617068
>k2
into the trash it goes
>>
>>106617087
brown hands typed this
>>
>>106617068
>prompt eval time = 144275.78 ms / 16178 tokens ( 8.92 ms per token, 112.13 tokens per second)
Didn't llama.cpp hae a PR to increase PP throughput for MoE models? Something about moving only the activated experts during PP, I think?
Whatever happened to that?
>>
File: moe.png (20 KB, 1472x734)
20 KB
20 KB PNG
moebros I made us a flag
>>
>>106617138
What am I looking at?
>>
>>106617098
cope harder dork
>>
>>106617180
you're in local cooming general. the only benchmarks that matter is the cockbench and https://eqbench.com/creative_writing.html
>>
How much of a performance boost you'd get by directly using llamacpp instead of koboldcpp? I'm not sure if it's worth losing all of the QoLs.
>>
>>106617180
The fuck kind of bullshit chart is that?
>>
>>106617194
you are advocating for a 32b active moe for creative writing when llama 3.3 70b will outperform it. your benchmaxx wont change a thing.

for real tasks qwen next is sota quality/performance wise rn
>>
File: file.png (21 KB, 529x231)
21 KB
21 KB PNG
Technically not local, but maybe of interest to some codefags that use OR for work projects.
>>
>>106617207
It measures intelligence because that's what it's named. GLM 4.5 air has 49 intelligences.
>>
>>106617215
tell me that you didn't use K2 without saying you didn't use K2
>>
>>106617249
we are running models locally here, not on clouds.
>>
>>106617215
Maybe there are some tasks where a 70B dense can outperform a 32B active MoE, but you are greatly underestimating how useful and versatile 1T parameters worth of memorized knowledge is.
>>
>>106617216
I thought GPT5 was a bigger failure than llama4?
>>
>>106617215
q8 glm air works great and mogs 70b on my machine though
just don't use cope quants
switching to q8 after getting more ram made a big difference for me quality wise
>>
>>106617281
Hardly.
It's a good model, just not an incredible leap. The real issue is that they overhyped the shit out of it.
It's mostly better than its predecessor. Mostly.
And at least from my usage, it seems a lot more consistent, and a lot less retarded as the context fills up.
>>
>>106617160
Looks like a top down Atari 2600 RPG map with an oversized item pickup:
In the middle of the area is a round bottomed flask with a rag attached to it, tilted 45 degrees clockwise. If the blue liquid soaking the rag and filling the flask is flammable, the item could be presumed to be a molotov cocktail.
The 8 legs around it enable the incendiary device to roll towards the target after being ignited and thrown by the user, whereafter the fuel-soaked rag will burn until extinguishing next to the target without ever igniting the main charge inside the flask.
>>
>>106617267
see >>106617068
>>
>>106617305
as I understand it, the main improvement came on the backend side, making it cheaper to run.
>>
>>106617340
probably because they adopted all of the cost saving innovations that DeepSeek gave away for free in their tech reports and foss repositories
>>
>>106617032
This is so strange... Who is this mistral guy anyway - some social media manager?
>>
>>106617426
>>106617426
>>106617426
>>
>>106612530
Depends on accuracy on what. What that test is showing is showing an image model's difficulty generating out-of-sample images (at the time there weren't pictures of dark skinned miku with dreadlocks on the internet). But it's still able to make a miku on a skateboard in NYC at night with a ball and a cell phone with a text bubble and text on her shirt. Bearing in mind this is a diffusion model, what this seems to support is the more your request is in line with what the LLM saw in its training the more acceptable a low quant will be for your purpose.
>>
>>106617111
If you are talking about ikllama I am getting 20T/s pp with 2000 batch size and 200T/s with anything above that (because above 2000 it uses old method). This shit is completely fucking broken. At least on windows 10,
>>
>>106617138
Strange Miku but ok.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.