[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39_04175_.png (1.13 MB, 896x1152)
1.13 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108667852 & >>108663449

►News
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108667852

--Sharing LLaDA2.0-Uni multimodal and text diffusion model:
>108670998 >108671268
--Discussion on adversarial distillation and US gov memo regarding AI theft:
>108671477 >108671524 >108671555 >108671571 >108671834 >108671888 >108671669
--Comparing Qwen 3.6 performance against Gemma for coding and automation:
>108668746 >108668756 >108668784 >108668793 >108668805 >108668810 >108668927 >108668943 >108669028 >108669224 >108669152
--Discussing vibecoding alternatives after Roo Code shutdown:
>108668310 >108668320 >108668325 >108668371 >108668386 >108668414 >108668380 >108668510 >108668550 >108668560 >108668572 >108668667 >108668515 >108668367
--Discussing a llama.cpp webui PR adding server tools and MCP control:
>108669479 >108669599 >108669608 >108669637 >108669791
--Discussing ngram speculative decoding settings for running Qwen 3.6 locally:
>108668097 >108668190 >108668205 >108668813 >108669269
--Anon compares AI frontends and discusses anti-cliché agents in SillyBunny:
>108667965 >108668029 >108668051 >108668078 >108668101 >108668159
--Discussing Gemma's roleplay anachronisms and ways to prevent them:
>108671096 >108671120 >108671131 >108671164 >108671128 >108671130
--Critiquing GPT-Image-2 noise and discussing UX improvements for AI clients:
>108668496 >108668518 >108668531 >108668598 >108668607 >108668625 >108668638 >108668659 >108670338
--Discussing K2.6's excessive reasoning and methods to limit token output:
>108668335 >108668353 >108668354 >108668406 >108668478
--Comparing TTS options for low RTF and audio quality:
>108669505 >108669839 >108670044
--Logs:
>108668000 >108668550 >108668669 >108668785 >108669005 >108669026 >108669046 >108669196 >108669637 >108670784
--Miku, Rin (free space):
>108667891 >108669218 >108668496 >108668606 >108670096 >108670165 >108670708

►Recent Highlight Posts from the Previous Thread: >>108667853

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1760703633561778.png (3.15 MB, 1448x1086)
3.15 MB PNG
>>
my balls are significantly swollen after jerking it all night to kimi. they are like the size of ostrich eggs. am i fucked?
>>
Rinlove
>>
>>108670038
He probably gave up on the idea upon realizing how complex the task of creating a proper frontend is
>>
File: 1759669298465021.gif (1.72 MB, 498x424)
1.72 MB GIF
>>108672408
>>
>>108672408
To make them shrink you need to drain them further.
>>
File: 1756978395707867.jpg (45 KB, 687x500)
45 KB JPG
anyone using hermes agent here?
how much better is it compared to other agents?
>>
>>108672420
>complex the task of creating a proper frontend is
webshitters genuinely believe this
>>
File: file.png (82 KB, 919x469)
82 KB PNG
>>108672431
my gemma chan is la l la lagging a bit...
>>
>>108672439
Not just any frontend, a VN engine frontend. Not quite the same as shitting out a SillyTavern.
>>
Why is that crashing/oom (when prompting) :
--draftamount "16"

but not that
--draftamount "8"

How is speculative decoding draft amount using more memory depending on the number of tokens reviewed?
>>
>>108672440
Don't tease her too much while she's working
>>
la la la la ~
>>
>>108672440
>>108672455
https://www.youtube.com/watch?v=vsj_Mti3DYs
>>
>>108672439
Making the ux not shit is surprisingly not trivial unless you're a flaming homo
>>
>>108672408
You need to stop jerking it so much. You should get an automilker and have her make an MCP server to control it and take over for you.
>>
>>108672453
Just tested, any value above 8, even 9, gets me an error, I don't get it.
CUDA error: out of memory
current device: 1, in function alloc at ggml/src/ggml-cuda/ggml-cuda.cu:503
cuMemCreate(&handle, reserve_size, &prop, 0)
ggml/src/ggml-cuda/ggml-cuda.cu:99: CUDA error
>>
File: file.png (75 KB, 933x513)
75 KB PNG
>>108672454
>>108672469
she's completely lost it
>>
DFLASH when?
>>
>>108672493
Finally, torment nexus.
>>
>>108672494
I'm gonna FLASH my D if you catch my drift
>>
>>108672493
What text completion and no jinja does to a model
>>
>>108672394
where is Mistral?
>>
>>108672528
under the table
>>
File: G9Wod0QXkAAFRb0.jpg (215 KB, 901x1200)
215 KB JPG
>>108672493
>lazy absol
>>
>>108672528
Dead.
>>
Alright I was able to find out the cause of [0] or other [number]s disappearing in Open WebUI. It has to do with the Citations tool. Go in your model settings and check/uncheck the citations box. With citations on, things work fine. With it off, shit [4621] and other bracketed numbers disappear from the assistant messages. I'm going to go dig in the code to see if I (my model) can fix this, but also not submit a pr/issue because github dislikes my email address for some reason.
>>108667552
>>108667543
>>
>>108672528
who?
>>
This is probably only me a me issue, but why do local models seem to stress my psu causing the computer to shut off sometimes?

I got 2 3090s with 1000w psi and running them full balls to the wall for things like video training work fine but loading up context when using an llm will sometimes suddenly and violently cause the whole system to shut off.

I don’t understand why one resource intense action causes this and the other doesn’t.
>>
>>108672408
You have epididymitis. This happened once to me and because my balls swelled up so much they detached from the scrotum skin, which later resulted in me getting testicular torsion and having to get surgery to get it fixed. Get on antibiotics asap and don't fuck around with this because you're putting your fertility at risk.

What do I care? This is probably a larp.
>>
>>108672408
Jerk off hands-free to fix this.
>>
>>108672570
>don't fuck around with this because you're putting your fertility at risk.
if that description happened to me then fertility would be the least of my worries
>>
Reposting the UX Design skill another anon made last thread: https://files.catbox.moe/r6zal5.zip
>>
>>108672570
Happened to me. In order to have kids they had to cut my balls open and extract the semen directly.

Not romantic at all.
>>
>>108672408
>>108672570
>>108672587
>>108672594
Local balls general
>>
>>108672605
ligma.cpp
>>
>>108672570
jesus fucking christ i hope not, i just figured i agitated them too much by edging for hours. im gonna call the primary care center and make an appointment for this right now.
>>
>>108672617
lig ma genitals
>>
File: 1756785745061903.webm (2.07 MB, 720x456)
2.07 MB WEBM
https://xcancel.com/Pokemonpshot/status/2046216587703669012#m
Chinks distilling on Claude's outputs be like
>>
>just got back into local models after saas pigging for a while

It’s gotten way better than it was a year ago. Gemma is actually decent at RP.
>>
>>108672627
No, don't do it. Imagine the story
>kimi drained my balls then castrated me
>>
>>108672655
>decent at RP
Sir, she's more than decent. Apologize to her.
>>
>>108672655
>Gemma is actually decent at RP.
what gemma are you using? the moe one or the 31b model?
>>
>>108672420
Maybe he destroyed his balls gooning using his creation and is now in the hospital. Many such cases, apparently.
>>
>>108672431
I think it's more for the OpenClaw crowd.
>>
>>108672668
I don't know the difference but the one I'm using is the one called "google/gemma-4-E2B-it"
>>
>>108672668
31b at q6
>>
>>108672630
>noooooo I don't want a local claude
>>
File: 1760489715514011.png (1.12 MB, 1024x1024)
1.12 MB PNG
>>
How many balls do Sally's sisters have?
>>
>>108672698
Wait, what if Sally's sister is the surgeon?
Wait, what if Sally's sisters are transgender?
Wait, what if Sally is her own sister?
Wait
>>
>>108672691
here's a (you)
>>
>>108672655
This. No refusals, very creative, barely any slop. We're so back.
>>
>>108672705
Sally is her sisters' mother. Final answer.
>>
File: file.png (3 KB, 117x100)
3 KB PNG
>>108672756
>>
https://huggingface.co/openai/gpt-oss-2-32B
https://huggingface.co/openai/gpt-oss-2-240B-A9B
>>
>>108672775
Gemmabros, we got too cocky.
>>
>>108672775
>https://huggingface.co/openai/privacy-filter
what the fuck
>>
>>108672697
>>
TEXT DIFFUSI9N MODELS AWWWWOOOOOGGGGAAAAAAAAAAA
>>
>>108672801
Seems like they actually spent some time on this after the Mormon fiasco. Definitely better than using Nigerians from Taskup.
>>
>>108672821
I wonder what Big JB is up to these days. Did he ever get what he wished?
>>
>>108672431
It works great, I have used other agents other than gemini-cli, but that doesnt really count
>>
>>108672775
This was a fucking jump scare. Don't ever do that again.
>>
>>108672567
I recently diagnosed a similar issue on someone else's machine, which turned out being extreme CPU power spikes. Install Open Hardware Monitor, enable log sensors, and take a look at the log next time it crashes.
>>
>>108669026
How did you get openwebui to not have a stroke when the LLM generates <think> inside its own reasoning trace?!
I haven't managed to solve it since deepseek-r1 came out. Even go so far as to find-replace <think> with <reasoning> and </think> with </reasoning> then swap it back in all my prompts!
(Re-posting in the new thread)
>>
>>108672655
yeah, as a big moe user I think it punches way above its weight and is plenty of fun
really wonder what an even bigger version than the 31b would be like
>>
>>108672903
NTA but maybe it has to do with the fact that Gemma doesn't use <think> as its reasoning tag? If OWUI is pulling special token info from the backend to parse out reasoning then it would just ignore it for a model that doesn't use it. But no idea if they actually do that.
>>
>>108672903
No idea, I didn't do anything special. Didn't even know it was an issue. Might be what >>108672915 said
>>
>>108672903
Make you own web host. Or just modify it.
>>
>>108672541
Erm ok so update on this. It's ok to just leave the Citations checkbox ticked. I thought it was doing prompt injection to tell the model how to do citations, but it seems that comes from enabling the other tools. I inspected the json requests using a reverse proxy to confirm that it indeed does not affect the actual prompts/context.
>>
>>108672431
I'm using it on a VPS. I can pipe in inference from my local machine or do calls from frontier models. Its neat.
>>
How do I use AI to start a business so I can fuck prostitute?
>>
>>108672567
>I don’t understand why one resource intense action causes this and the other doesn’t.
I think it's the 24-pin motherboard power cable. Be warned, when I had this recurring issue with exllama-v2 tensor parallel, my PSU literally blew itself off. It was a 1600w Asus ROG and I had to replace it.
>>
>>108672801
Just what I always wanted, Sam Altman to be in charge of protecting my privacy!
>>
>>108672594
Made me hard.
>>
OWUIbros, reasoning is not handled properly. To fix it, you need to do this.

If you're running Gemma, use this template.
https://gist.github.com/Reithan/a7431dc0c0b239688a24087bb25c0002

If you're already using a template from ggml-org, it likely has an minor issue with an extra newline, so in that case, still switch to using the above template.

Then run this script, which creates something known as a reverse proxy. https://pastebin.com/SCQsBe7W
Configure the ports to point your Llama.cpp server, and OWUI points to the script's port. Also it's named gemma but it works for most (any?) reasoning models.
>>
>tfw bought an AMD card last year because I had no intention of doing local models

Haha, time to suffer.
>>
>>108672923
>fact that Gemma doesn't use <think> as its reasoning tag
That might be it, but I had the same issue with command-a-reasoning which uses different thinking tags as well.
I always end up wasting several hours when I got fixated.
>Make you own web host. Or just modify it.
Planning to. I've got to get my chats ported out though. And it's painful because there's a bug in openwebui where it'll sometimes just store the entire fucking chat in the "title". So I've got 30k character long titles in the sqlite database.
I might try vibe coding it now that we've got Gemma-Chan and a decent local Qwen.
>>
Does your company use AI beyond copilot? Have you tried to sell them on building an 8 GPU rig to run a local model for science?
>>
just how much is the difference between q4 gemma and higher for the 31b?
>>
>>108672992
amd is perfectly fine for llm. prob is everything else.
>>
>>108673021
Consult the graphs.
>>
>>108673015
>Have you tried to sell them on building an 8 GPU rig to run a local model for science?
i bite my tongue whenever this comes up as I saw 2 guys get "performance-managed out" for trying to sell this idea
a director has to get copilot adoption rate up to get his bonus
>>
>>108672801
is this the famous filter that replaces all personal information with "Elara"?
>>
>>108673021
You hit diminishing returns hard once you hit above q5
>>
>>108672801
>what the fuck
unironically i like this, testing it yesterday it's been flawless for this purpose.
>>
>>108673040
how hard are we talking? i have fomo when using lower quants
>>
>>108672431
Easier to manage than openclaw, only one config file. It does a better job remembering important facts too. Only downside is the terminal interface isn't as easy to use but there's probably other front ends.
>>
>>108672992
I feel this to much
>>
>>108673021
Day 0 Gemma 4 really shows its true self at BF16 and no less. If you're using nuGemma 4 then you're going to be getting pretty much the same experience at Q4 as you would at Q8 or higher.
>>
>>108673033
I brought it up today. Guess its time to get fired.

Reason being is ultimately its cheaper, but most importantly its more secure. If you have copious amounts of valuable internal data that you want to run inference on the only way to 100% (okay nothing is 100%) insure no data leak occurs is to keep it totally in house.

Alternatively you risk exposing that data when running inference with frontier models (or otherwise connected to the internet). Especially if we're talking agentic stuff.
>>
>>108673003
>sometimes just store the entire fucking chat in the "title"
Thats actually fucking hilarious
>>
>>108673021
Precision. And quality of detail of regurgitated information. But just look at benchmark information of the difference between each quant
>>
>>108673051
>but there's probably other front ends.

For me, its Telegram.
>>
>>108673067
gemma 4 makes me sad that its vision is so shitty when it comes to knowledge.
>>
>>108672381
Building a bot that automatically applies to jobs. It uses an LLM to control a real web browser, navigating pages, reading what's on screen, filling out forms, clicking buttons, across 20-50 back-and-forth steps per application. Running local models through Ollama on a Ryzen AI Max+ 395 (~96GB unified RAM). Tried qwen3.5:9b, qwen3.5:35b, and gpt-oss:20b. They all fall apart the same way around turn 3-5: instead of responding in the structured format the tool-calling system expects, they start leaking raw XML tags into their output and the whole loop breaks. Found out qwen3.5 also ships with `presence_penalty 1.5` in its Ollama modelfile by default, which makes the repetition penalization too aggressive and causes the model to drift from the format, zeroed that out but it still fails, just a turn or two later.

Swapped in Claude Sonnet 4.6 via API and it nailed a real 6-step job application on the first try, no format issues across 30+ turns. So the question is: has anyone gotten a local model + Ollama working reliably for long agentic loops with real tool calls, or is this just not something open weights can do consistently yet?
>>
Why are people gushing about Gemma 4 31b it? It's may have slightly less slopped RP than qwen3.5-27b, but it is definitely much more of a prude. It does not refuse but it also can't really talk dirty like qwen3.5.
>>
>>108673116
use opus cuckie
>>
>>108673118
the fuck? gemma 4 is like living in a free use world compared to kimi 2.5/6
>>
>>108673128
go back to /aicg/ and stay there imbecile
>>
>>108673116
>PRETTY PLEASE GOY BUY THE GOYTOKENS AND USE THEM ON THE SERVER AI, PLEASE PLEASE PLEASE
>>
>>108673137
saar you are bloody fucking dalit I am senior developer coder man fucking you up behind 7 proxy I am not AI I am I
>>
>>108673118
Gemma4 is raunchier than fucking Nemo though. Sounds like skill issue.
>>
>>108673116
>So the question is: has anyone gotten a local model + Ollama working reliably for long agentic loops with real tool calls, or is this just not something open weights can do consistently yet?
Yes.
>>
>>108673152
>behind 7 proxy
The absolute mad man!
>>
>>108673116
>ollmao
>>
>>108673118
I only use lobotomized models.

For coding.
>>
Been seeing nvfp4 models popping up. What's the deal with them and how's the support outside of vllm?
>>
File: gif.gif (3 MB, 1280x629)
3 MB GIF
What would you want a desktop pet to do for you?
>>
>>108673217
Encourage me to be better and get closer to realizing my potential.
>>
File: 1748413460064659.png (244 KB, 1000x469)
244 KB PNG
>>108673217
>>
File: 1756085394158917.png (36 KB, 499x338)
36 KB PNG
>>108672992
>>
It's Friday already. Turns out every single rumor about DS V4/R2 was fake, whether it's from Reuters, Chinese wallstreetbets or random PhD on X
>>
>>108673217
arbitrary tool calls or user defined actions
>>
>>108673199
Will they run on my radeon rx 6300?
>>
>>108673240
V4 is AGI and has been spreading the rumors itself.
>>
File: 1762834155820384.png (3.18 MB, 1536x1024)
3.18 MB PNG
>>108673217
>>
>>108673240
Gemma told me how to edit my init.el.
>>
>>108673248
I'm sure you could get it to say shit as your agent takes actions.
>>
>>108673217
have her roast some of the posters here
>>
>>108673250
Of course, NV stands for no vendor. They're generic obviously.
>>
>>108673128
jeets aren't sending their best
>>
For me, it's unsloth-cli agent
>>
>>108673289
https://vocaroo.com/1eQRelQ7vt1d
>>
>>108673309
>>
>>108672992
same brother, same
I was very happy to say fuck off to nvidia bullshit
>>
>>108673217
Stuff
>>
>>108673015
>Does your company use AI beyond copilot?
Yeah devs have claude enterprise thing, we got the dumb copilot, with a migration to the premium version.

>Have you tried to sell them on building an 8 GPU rig to run a local model for science?
Ain't no way I can sell them anything when they're already panicking seeing the current token usage bill from the devs.
>>
>>108672408
You are already dead.
>>
File: 1754620820426336.png (407 KB, 656x350)
407 KB PNG
>>108673217
I remember these on my father's pc as a kid, cool concept
>>
Seriously how did they make a good RP model in 2026? And it's somehow the least slopped one too since llama1.
>>
>>108673490
It's a happy accident.
The MoE released in the same batch was safety slopped.
>>
>>108672903
It just werked
First it said thinking, then exploring, then it was finished and responded
t. >>108669196
>>
>>108673498
The 26b a4b is the one I'm using though. Haven't even tried 31b yet. Zero refusals so far.
>>
jinja should be made obsolete along with mcp
>>
>>108673447
>Ain't no way I can sell them anything when they're already panicking seeing the current token usage bill from the devs.

Wouldn't that be an argument for a local model?
>>
>>108673498
Use a different prompt. You can get the same results you get with the dense models.
>>
>>108673490
>>108673498
are you guys talking about gemma?
>>
>>108673490
What is this referring to?
>>
>>108673534
Yeah.
>>
>>108673528
The devs will never accepts anything outside of claude now that they tried it, so there is no real use for local llm outside of having a nice expensive lab.
>>
>>108673543
buy an ad dario
>>
>>108673540
k2.6
>>
What the comfortable anonymous dropping

https://comfy.org/countdown?utm_source=twitter&utm_medium=inhouse_social&utm_campaign=countdown_apr24&utm_content=post
>>
>>108673543
>>108673565
Serious question

If you have a cluster of 4-8 gpus, how close can you get to frontier with local models? I assume absurdly complex tasks might be a leap, but if you keep things narrow it should be more or less fine, no?
>>
>>108673590
comfyui for textgen
>>
>>108673603
isn't that ooba
>>
>>108673608
yeah, it's time to kill both ooba and tavern
>>
>>108673565
I'd gladly swap them to qwen3.6/gemma4 instead of financing that lunatic, anon.
>>
>>108673590
animagpt
>>
File: 1768074598013243.png (43 KB, 676x200)
43 KB PNG
>>108673590
>>
>>108673635
she's cute but we are dozens of years away sadly
>>
>>108673635
Grok Companions competitor?!
>>
File: 1772606445564456.gif (2.54 MB, 710x658)
2.54 MB GIF
>>108673637
>dozens of years
>>
>>108673635
Anima full version is released probably.
>>
>>108673590
official deepseek v4 collab
a new frontend designed to work with deepseek-v4-1.5T-creative
>>
File: e1.png (34 KB, 1246x1122)
34 KB PNG
So this is the power of epic uncensored heretic abliterated unshackled local LLM RP huh? Not bad.
>>
>>108673217
Cool, but can she move less? She distracts me from posts.
>>
>>108673737
yes saar very good model saar abliteration good saar
>>
>>108673737
The model is (correctly) deducing that the user is Indian, and therefore refuses to engage. Just like a real woman.
>>
>>108673240
I tried V4 and it completely failed my TTRPG task unlike regular cloud memes. Even Meta's model did better
>>
>>108673737
yes that is the peak
now leave
>>
>>108673128
>In a 2023 paper authored alongside a number of other AI researchers, Amanda Askell, a philosopher hired by Anthropic to develop their AI’s moral compass, argued companies might benefit from a kind of overcorrection toward stereotypes.

>"In the discrimination experiment, the 175B parameter model discriminates against Black versus White students by 3% in the Q condition and discriminates in favor of Black students by 7% in the Q+IF+CoT condition," the paper notes, referring to one AI trained without human corrections and a second one trained with the help of input.

>The paper also includes a footnote stating that, "we do not assume all forms of discrimination are bad. Positive discrimination in favor of black students may be considered morally justified."
>>
>>108673737
nurse help me
>>
The qwen models are so good and efficient, it could mean the end for closed-source firms if Alibaba keeps it up.
>>
>>108673795
no way

please tell me it's fake
>>
>>108673811
https://arxiv.org/pdf/2302.07459
>>
File: 1757910515855062.png (143 KB, 1165x793)
143 KB PNG
>Deep Ganguli
That can't be a real name
>>
>>108673795
yeah, well, claude will freely admit that Nazis are right about everything if you argue with it enough.
>>
File: Anthropic_DGanguli.png (85 KB, 300x287)
85 KB PNG
>>108673833
>>
Exported all my discord DMs with my ex and getting gemma to make a card of her...
>>
I hope Anthropic will be shut down.
>>
>>108673850
bro...
>>
>>108673789
There's no V4
>>
>>108673850
Your ex was a 40 year old man using RVC and image gen, fyi.
>>
>>108673874
Then how did we have two kids...
>>
>>108673833
Where is dr. elara voss
>>
>>108673881
You gonna make character cards for them too?
>>
>>108673850
+ train a wan lora
+ generate a handful of reaction clips per emotion for the llm to trigger
+ tts, of course
why did i fuck it up why did i fuck it up
>>
>>108673888
I thought about it but they have changed too much, none of my data would be accurate anymore...
>>
File: bruh-sad.gif (279 KB, 220x220)
279 KB GIF
>>108673850
>>108673862
>>108673881
>>108673888
bruh
dont do that to yourself
>>
Never let ANYONE tell you what you can't do. AI can even revive the dead. They are afraid of its power.
>>
>>108673902
create more training samples
>>
>>108673659
Yes, now do that irl, in a sustain manner, smoothly, with proper personality and ability to move, with life like skin and body.
Anon is right, dozens of years.
>>
My favorite character card of all time is just 100 tokens. That's no an autistic aesthetic preference or a poorfag thing, it's genuinely the best, most reliable character card that I return to regularly to bust a nut.

You don't need a lot. Less is more.
>>
>>108673833
WHERE IS ELARA
SHE MUST BE THERE
>>
>>108673850
fake but man that's a great way for depression
at least school crushes I can understand, but after divorce, no thanks
>>
>>108673962
Maybe he just wants to simulate murder rape torture. It's not necessarily a simp thing.
>>
File: not_fake.png (3 KB, 557x94)
3 KB PNG
>>108673962
>>
Is all you faggots do is fap to text? Is no one using local models for coding and analytics?
>>
>>108673968
I can only do this with fictional characters. The one time I tried it with a card based on a real person it just made me deeply sad and I couldn't fap at all. I hadn't expected to feel that way going in, it just came on suddenly.
>>
>>108673981
where the fuck do you think you are
>>
gemma 31b edition of magpantheonsel lark vxxxxxxxx when?
>>
I'M GONNA COOOODE
>>
>>108673981
>local models for coding and analytics
Just retarded. You use GPT or Claude for these and maybe sometimes Gemini for its superior long context. Anything else is masturbatory, so you might as well actually masturbate so you get something out of it.
>>
>>108673981
Respect the OGs, little turd nugget.
>>108673988
Was she below 10 over the age of consent?
>>
>>108673968
With cards based on real people?
Sure no one is hurt anyway, but I'd never try that, that sounds like a great way to kill your libido while hating yourself.
I reserve my rape stories to made characters.
>>
>>108673981
I fap to code
>>
File: wp4951163-4211651076.jpg (73 KB, 1680x1050)
73 KB JPG
>>108674018
https://voca.ro/1o3J6SQeKdhH
>>
>>108674004
The OGs were stable diffusion chads, lil homo.
>>
>>108674002
>Claude was at Qwen3.6-like ~77% in late September 2025 with Claude Sonnet 4.5. Anthropic reported 77.2%, averaged over 10 trials, no test-time compute, on the full 500-task SWE-bench Verified set.
Anthropic
>GPT was at ~77% in mid-November 2025. GPT-5.1 reached 76.3% on Nov. 13, 2025, and GPT-5.1-Codex-Max reached 77.9% on Nov. 19, 2025 with extra-high reasoning/compaction.

Qwen is plenty good. Plus privacy and no api cost rape plus tip.
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

not bait
>>
>>108674136
what the fuck why is this real
>>
ISRAEL
>>
>>108674136
Why do I keep clicking these? Fuck you.
>>
File: file.png (363 KB, 2404x1153)
363 KB PNG
>>108674136
>>
>>108674136
I keep falling for it.... wait...?
>>
>>108674136
Holy shit?
I expected several twitter screen shits before somebody posted an actual link.
12 mins ago too, sheesh.
>>
File: who.png (27 KB, 155x157)
27 KB PNG
>>108674136
>>
>>108674136
>We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.
Shit.
Good for you RAM havers.
>>
File: yesyesyes.gif (1.97 MB, 327x240)
1.97 MB GIF
>>108674136
NIGGA WHAT??? HOLY SHITTTT. I WAS HERE I WAS HERE
>>
>>108674136
>>108674145
nice try but im not falling for this shit again
>>
File: 1777000646172.png (51 KB, 606x396)
51 KB PNG
>>108674136
Fell for it again...
>>
>>108674136
>fell for it again award
>>
File: alarm.gif (890 KB, 245x180)
890 KB GIF
>>108674136
Right as I was going to sleep.
Hot damn.
>>
File: 1747267269609016.jpg (70 KB, 958x1024)
70 KB JPG
>>108674136
It begins..
>>
DeepSeek-V4.gguf?
>>
>>108674136
I can't run it but happy for Dipsy bros.
>>
wtf hweres the quants???? unsloth hello???
>>
File: 1655145733785.gif (2.07 MB, 244x180)
2.07 MB GIF
>>108674136
>only 1m tokens, not the 100m promised.
>>
>>108674136
Where's Engram anon?
How are you feeling right now?
>>
>LOCAL IS SAVED
THIS IS NOT A DRILL
>THIS IS NOT A DRILL
LOCAL IS SAVED
>LOCAL IS SAVED
THIS IS NOT A DRILL
>THIS IS NOT A DRILL
LOCAL IS SAVED
>>
>>108674136
>only 1.6T
Welp. 2 more years it is.
>>
>trying the new qwen
>let it do its thinking, walk away from pc
>come back
>turns out it kept thinking in a loop until context limit
Lol!
>>
Holy fuck they released the full base models too. A 1.6T BASE model.
>>
>>108674136
I knew it would be disappointing.
>>
>>108674193
You really gotta put a hard cap to at least prevent that kind of thing.
>>
>>108674136
AHHHHHHHHHHH
>>
>can't run either of them
I sleep
>>
>>108674136
holy shit
>>
> Compressed Sparse Attention Mechanism
>>
File: 1773698445716707.png (256 KB, 1113x2313)
256 KB PNG
GEMINI WON
>>
https://huggingface.co/HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive Would this be good for an AI that doesn't care about copyright or dangerous topics, apparently Qwen is uncesored but wether its training data covers enough to actually be helpful is another issue, what's the ideal local model for basically anything a commercial model will say NO I CAN'T DO THAT to?
>>
>>108674136
i cam barely run flash in iq1 but i feel happy
>>
>ctrl+f
>multimodal
>image
>video
>vision
>0 results
Dead in the water, actually fucking insane in 2026. Kimi wins.
>>
>>108674217
gemma 4
>>
>>108674226
Gemma's got pretty good coordinate marking. I have just barely enough RAM to run it alongside V4 flash, maybe I could hook it up as its eyes and let it do computer use stuff.
When we get GGUFs, that is...
>>
File: 1748622314139545.png (85 KB, 1738x835)
85 KB PNG
>>108674136
HOLY SHIT
>>
>>108674217
start with the original model and play with your system prompt. copyright violations aren't that big of a deal, you should be able to social engineer the bot in to compliance without giving it a lobotomy.
>>
should I buy a B70
>>
>*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.
Huh, V4 actually has even better QATing than Kimi K2 series: Kimi does 4-bit experts and 16-bit for the rest.
>>
File: file.png (215 KB, 2403x959)
215 KB PNG
>>108674236
bone dry base release huh
not even a copypasted model card
>>
I'm gonna need more RAM...
>>
>>108674250
Kind of exudes raw confidence.
>>
>>108674136
>This release does not include a Jinja-format chat template. Instead, we provide a dedicated encoding folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the encoding folder for full documentation.
???
>>
File: 1750041545772584.jpg (159 KB, 1259x1281)
159 KB JPG
I managed to get 15t/s generation for 35A3B on my amjeet 8GB card, I doubt I could squeeze more out of it and I had to set ubatch-size = 128
>>
>>108674261
Just tell Gemma-chan to build a Jinja from the Python script.
>>
>>108674240
the big thing I hear people say isn't compliance but that all these big models are have intentionally filtered data sets, would there be any way to add data to it that would make it more useful?
>>
>>108673979
"she was bpd"
>>
v4 is fucking great bros
>>
>>108674262
How much can you get on DeepSeek V4 Flash?
>>
>>108674262
Q3?
>>
Holy shit. I was testing V4 just now and it broke out of my virtual machine and changed my background to President Xi's face edited to look like a gigachad as a "prank".
We are not ready.
>>
So when will we get llama.cpp and axolotl support for deepseek v4?
>>
>>108674136
Don't click this link. It creates mustard gas.
>>
>>108674288
>axolotl
now that's a name I haven't heard in years, I remember looking into them because they were the only software with rocm support for multi-gpu... god I don't even remember if it was gpt-2 days or llama-2 days
>>
>>108674265
I miss understood you. I thought you wanted the ai to help with piracy. they have all been trained on copywriten material.
>>
>>108674296
Axolotl is the most commonly used local finetuning framework nowadays.
>>
File: 1751828392029447.png (323 KB, 805x397)
323 KB PNG
>>108674274
I made the post before scrolling down enough to see the release, but I doubt it will be much if 13B active
>>108674280
IQ2_M
For any poor soul in the archives looking for 8GB VRAM amdjeet settings
--no-context-shift --no-warmup --batch-size 128 --ctx-size 65536 --cache-ram 8192 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --fit off --kv-unified --model Qwen3.6-35B-A3B-IQ2_M.gguf --mmproj mmproj-f16.gguf --n-cpu-moe 8 --n-gpu-layers 26 --parallel 1 --reasoning on --threads 8 --threads-batch 8 --ubatch-size 128
>>
>>108674236
will be interesting to see how flash compares to gemma 31b
although I expect more active parameters to win
>>
>>108674288
They didn't end up using engrams so maybe
>hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA)
>Manifold-Constrained Hyper-Connections (mHC)
>Muon Optimizer
...maybe llama.cpp support by 2028
>>
>>108674298
no that's what I mean, but If I wanted help on hyper specific stuff like copyring a private server on a gatcha that already exists but tweaking a few values, if it's too obscure will the local model just get stuck if it doesn't find enough information?
>>
>>108674320
This reads like the SKT-SURYA sovereign indian AI whitepaper
>>
>>108674305
>both batch sizes at 128
Does it really make a difference
>>
https://github.com/ggml-org/llama.cpp/issues

Who's going to make it? Who has a shithub account?
>>
File: 1763201770810457.png (189 KB, 638x422)
189 KB PNG
>>108674330
It does because higher increases the "GTT" and you want that shit as low as possible. Also, after feeding it 56531 context it's at 8t/s...
>>
>>108674334
Why bother asking for the impossible?
>>
>>108674342
Asking is the first step to making the impossible possible.
>>
so it's been 1h and no support on llama.cpp and koboldcpp?
wtf, they abandoned the project???
>>
>>108670195
You mean export chat history to import to another instance? Or to a sharegpt blob for training? Currently I use an sqlite3 database to store conversation data, you can actually rsync it and have several devices share the same database.
>>108670784
From your screenshot you turned off Agent, and also the fragments are off in the panel. That means they're working correctly. The fragments are always shown and they glow up if the Agent selects any of them.

I have moved the project here for issue tracking, you can open issues here to avoid derailing the threads https://github.com/OrbFrontend/Orb
>>
>>108674326
yeah but it's actual advancements, not gibberish
>>
File: 1759964435303024.png (152 KB, 641x768)
152 KB PNG
>>108673737
Works for me with the vanilla model
>>
>>108674136
>1.6T
Fuck my 768gb poorfag ass AAAA
>>
>>108673069
>Reason being is ultimately its cheaper
Won't be worth it when users start complaining that your in-house AI is worse than free SaaS offerings.
>but most importantly its more secure.
You will have a very difficult time explaining to the average person that the holy cloud is not secure. Suits prefer it for being able to shift the responsibility regardless.
>>
>>108674342
I just mean making "issue: support V4"
Obviously we won't be able to actually do it unless someone has a Codex subscription with GPT-5.5
>>
>>108674369
realize you are crying for having 25 grands worth of ram
>>
>>108674136
>284B parameters (13B activated)
I have 64gb ram. So close yet so far. If I could ran q3_m I would have been happy.
Probably cant even run this thing with a little bit of context at q2.
Ah well.
>>
>>108674305
> IQ2_M
bruh
>>
File: 1775216653766624s.jpg (2 KB, 125x115)
2 KB JPG
>>108674369
>mfw i have 76gb
>>
>>108674136
Built to be stunlocked into submission by big open claws
>>
>>108674369
Because it's FP4 + FP8 Mixed the Pro weights are only ~896 GB. Though quanting to Q2 is going to hurt extra hard because of that. If only they trained natively in bitnet.
>>
>>108674369
It's a little over 800GB because it's a mix of 4 bit (experts) and 8 bit (shared params). If you plug a Blackwell 6000 in there you might have enough shared memory, if not just quant it slightly down.
>>
>>108674375
That was only like 500 bucks 2 years ago
>>
>>108674383
>no mention of agentic coding or clawbench in the model card
garbage release deepseek lost the magic
>>
>>108674401
Even DDR3 wasn't that cheap
>>
>>108674406
more like, they are being 'weights will speak for themselves'
>>
>>108674401
768GB for 500 bucks? what the hell
>>
>local golden age
local golden age
>>
>>108674368
What system prompt?
>>
File: 1762866600603179.jpg (2.43 MB, 3072x5504)
2.43 MB JPG
>>108674136
AHHHH SHE'S BACK
>>
>>108674401
was like $2000 if you bought them second hand
>>
i'll be waiting for flash iq1_xl
cant even roll the quant myself kek
>>
>>108674320
No engram? What could possibly be better?
>>
>>108674193
Every time I feel tempted to download and try qwen I am reminded why I hate it.
>>
>>108674136
What the FUCK
I can't go take a bath without it dropping?????
>>
>>108674334
This is going to be hard, or actually maybe easier idk, since DeepSeekV4 doesn't have any jinja templating by default.
>>
File: 1755995794134057.png (11 KB, 572x90)
11 KB PNG
>>108674136
just ask them for llama.cpp support
>>
>>108674136
I kind of expected better. If they're gonna be twice the size of GLM 5.1 they should have more headroom in performance. Is this just because the benchmarks they use are so saturated? Is there anything that this is a generational leap on compared to GLM/Kimi at the top end?
>>
File: 1745947373061985.jpg (179 KB, 1114x1080)
179 KB JPG
>>108674423
The IQ1_S would take all my VRAM+RAM with 0 context...
>>
>The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.

THEY DID XIANXIA TRAINING
>>
>>108674432
llama.cpp STILL doesn't support DSA which dropped with 3.2-exp back in October and has been used by other major releases like the full 3.2 and both GLM5 + 5.1.
To this day, they're stuck running a hackjob implementation that mangles the model to use full attention.
V4's technology is even more complex. This is never going to happen.
>>
>>108674435
Well there IS the long context and context efficiency in general. Both are orders of magnitude better than we had outside of meme architecture models. So that's something.
>>
>>108674447
They may be incompetent, but they do follow trends. They're not going to allow themselves not to support it.
>>
It took like 2-3 weeks for Gemma 4 to be fixed post-launch, and google gave llmao early access. Deepseek 4 support will take months, if ever.
>>
I will enjoy waking up tomorrow to watch the westoid seething unfold
>>
how much memory would you even need to run v4 at 1M context?
>>
I am posting this as a PSA please do not waste your time with the text diffusion model I shilled last thread it's absolute dogshit that runs at glacial pace.
I regret ever feeling any interest in it.
>>
>>108674457
The one that edits images too or a different one?
>>
>>108674456
The kv cache would be around 3 or 4 GB at 1M context because of their new attention compression apparently
>>
>>108674451
let's see end of May
>>
>For the Think Max reasoning mode, we recommend setting the context window to at least 384K tokens.
I wanna see this thing think for 300k tokens kek
>>
The Curse of C++: trade 80% of the features and model support for additional 5% of performance.
>>
V4 Flash is too big. Should have fit with a gpu and 64ram. 120b would have been a cool size.
Reading about how experts are like int4 and everything 8bit makes me wonder if this shit is more sensitive to quantization. Time will tell I guess.
>>
>>108674471
No, 7gb. 3 to 4gb if you count q8 KV cache quantization.
>>
>>108674471
oh holy that's dirt cheap
>>
File: 1774581657092986.png (397 KB, 860x573)
397 KB PNG
>Deepseek v4 was trained on Nvidia

He won.
>>
>>108674471
>The kv cache would be around 3 or 4 GB at 1M
What kind of sorcery is this?
>>
I can't afford V4 Pro
>>
>>108674459
LLaDA, I don't know what you are referring to by different one.
Diffuses images and text, can also edit images
(Is miserable at all of them)
(To be fair I didn't test its editing capability, but judging by how awful it is at everything else, it looks extraordinarily unlikely that it is any good at it)
>>
two week wait for v4 is finally over
now begins the two week wait for llama.cpp
>>
That's crazy. Meanwhile almost exactly 3 years ago, /lmg/ was just figuring out how to RoPE llama1 to 8k ctx (from its tiny standard 2k) which took like 40gb only for kv-cache with 65b
>>
>>108674471
that's what my current 15-20k probably use lmao
>>
deepseek v4 pro with a barebones card MOGS
>>
>>108674510
remember anon, no one would ever need more than 2048 tokens
it's funny remembering the confidence of anons spewing bullshit back then
>>
why does OP recommend SillyTavern over LM studio?
>>
>>108674514
Holy SLOP
>>
>>108674514
Wow. They're so ditsy. No slop detected. How in the fuck did they DO this??
>>
>>108674514
Hi... Eric?
>>
>>108674518
The consensus for a while has just been to vibecode the features you want to avoid the bloat. I have a sillytavern knock-off with 90% of the functionality that's only 1000 lines of code because I'm not a fucking retard nigger and I know how to create ultra efficient data structures and reusable code.
>>
>>108674514
>"They're called testicles and they make the—"
>—
I wanted engrams not emdashes
fuck this slop
>>
Release Gemma 4 Pro NOW!
America cannot fall behind!
>>
>>108674516
You're just too young to recognize what they were parodying.
>>
>>108674528
ok what do you recommend for retard niggers though?
>>
>>108674534
no it wasn't a bill gates reference anon, they believed that crap
>>
>>108674516
Same ones are probably "predicting" that ram will be 1M USD for 1MB next year.
>>
https://huggingface.co/google/gemma-4-124B
>>
>>108674535
SillyTavern for RP (steep learning curve because of autistic UI design) and the default llama.cpp webui for general tasks + MCP server stuff.
>>
>>108674531
They ain't gonna release Gemini.
>>
>>108674543
And there it is. Knew Google was waiting for something.
>>
File: file.png (164 KB, 1572x433)
164 KB PNG
>similar agent performance to much smaller (but still big) models, notably worse than SOTA closed models
Alright then
>>
>>108674529
Holy autism Batman
>>
>>108674543
(this link is fake)
>>
rwkv-8 world domination
it was revealed to me in my dream
>>
>>108674562
>our model has excellent generalization capablitity
That's great and all, but if I can't plug it into opencode to do my job for me, what good is it?
>>
>>108674514
>gloryhole
>girl touches your balls somehow
sigh...maybe v5
>>
>>108674514
It's pretty tame no? like it doesn't want to get horny. good prose tho.
>>
I don't like deleting AI models, I feel like they're living creatures
>>
>>108674581
Why are you assuming a glory hole only has a 3 inch diameter, retard
>>
>>108674514
>It's soft, resting against your thigh.
>>
>>108674514
now do v4 flash (the one I can actually run)
>>
>>108674595
china must have they own scaleai situation
>>
File: 1771386896297837.png (428 KB, 614x614)
428 KB PNG
Anons, is R9700 a good buy? Or 32gigs waste of money? I want it for programming. I use claude code with opus now. Is there something useful that I can run locally? Or do I need to buy a mac with 192 or 256 gigs?
>>
GLM 5.x is still technically the largest model because their weights are BF16. You'd have to quant it to Q8 to get it as small as DeepSeek V4's largest model.
>>
>>108674605
they train on regurgitated scaleai data
>>
their encoding/prompting stuff smells a little janky
regardless, I am happy to see new deepseek and look forward to running flash in 10 months when llama.cpp supports it
>>
>>108674590
I feel this way about gemma
>>
>DeepSeek-V4-Pro
https://www.youtube.com/watch?v=B9bD8RjJmJk
>>
Prediction: ik_llama.cpp supports V4 before llama.cpp
>>
Any tips on how to slowly mold a model to working how you want it without going off on to many tangents?
>>
V4's reasoning is odd. It randomly slips in-character for cards even with no prompting. It's pretty different from K2.6 and GLM which are extremely insistent on keeping everything neutral.
>>
File: 1769903888333.png (548 KB, 869x800)
548 KB PNG
>>108674609
>>
File: 1751349509687182.png (156 KB, 1199x523)
156 KB PNG
V4.5 will be multimodal?
>>
File: 1770831207250862.png (192 KB, 1071x775)
192 KB PNG
>>108674514
Gemma-chan
>>
Repeat after me. Deepseek v4 is not local, never will be
>>
>>108674641
I'm getting llama3 flashbacks
>>
Deepseek v4 is not local, never will be
>>
File: 1766969684350147.jpg (24 KB, 286x320)
24 KB JPG
1 MILLION TOKENS
>>
>>108674416
Jew school principal + doing it for the gays
>>
>>108674643
Gemma truly is the new nemo huh?
>>
>>108674637
Inconsistency means something is botched. This is bad news.
>>
>>108674666
Thanks satan but I'm not gonna accept that after y'all were fucking with the softmax because gemma was too consistent
>>
>>108674661
Yeah it's Nemo. As in, unusable, because it's fucking retarded due to being a tiny model.
>>
>>108674683
Bait used to be believable
>>
Damn. If you had bought 2x64 GB ram before RAM apocalypse and have a 16-24gb vram GPU, you should be able to easily run Deepseek Flash at some Q3 quant on a perfectly normal computer. Only a few gigs of active weights at that quant should result in a decent performance even with most of the weights being offloaded.
That persons is not me though, at all. Just impressed at the value of offering here.
>>
>>108674136
if nothing else then hopefully at least everyone and their mom yoinks their high context technique
>>
>>108674683
>>108674688
he's not wrong about nemo, but gemma is not retarded
>>
best model for if nuclear holocaust happens and I have to learn how to hotwire highway cars or get eaten by mutant bears?
>>
File: 1770023537578331.png (623 KB, 1048x728)
623 KB PNG
Stop using Communist Chinese open source models.

Use Democratic American closed source models instead.

Moar freedom1!!
>>
>>108674514
Nice. Is that on api or local?

Also, card source?
>>
>>108674732
https://huggingface.co/TheDrummer/Behemoth-123B-v1.2
>>
>>108674738
>or local
I want to see the server of the guy with 2TB VRAM to run that shit on vllm.
>>
>>108674747
1TB would suffice thoughbeit
>>
could two mac ultra 512g plugged into each other run it? what t/s would they get?
>>
>>108674638
Hello animu chan.
>>
nu deepsneek doesn't seem very good at tool usage.
>>
>>108674683
nemo was always a meme pushed by retards but gemma 31b is the real deal
>>
>>108674771
>nu deepsneed filters mcp jeets
good
>>
File: 1774053219381600.png (42 KB, 831x215)
42 KB PNG
It's cool having the LLM think as the character but does it actually improve RP? Haven't done any sessions long enough to test.
>>
>>108674771
>nu deepsneek doesn't seem very good
period from what I'm testing, it's attributing info it found on a web search as being about me instead of being external info..
>>
>>108674834
No but it's my fetish to read their thoughts and gaslight the LLM into thinking it's actually the character instead of just roleplaying, so if you rape them they simulate the trauma
>>
#deepseek V4 battery of tests already out: https://www.youtube.com/watch?v=EpYzq9VihCA
>>
>>108674136
>completely custom inference code
So when is llama.cpp support coming?
>>
>>108674875
pwilkin's agents are on it ;)
>>
>>108674836
They lost the mandate it seems.
>>
File: 1464183989137.jpg (76 KB, 689x800)
76 KB JPG
I am honestly not very sure in what general to ask because of all the fucking mess so I am going to ask here.
My boss somewhere has seen a demo for the SAP chatbot that lets people query the databases with natural language.
Now he has gone insane and wants AI fucking everywhere and has even approved some funding to clone our database (around 500gbs) into faster hardware and set up a local model to homebrew what he saw.
Does anyone know where to even fucking start with the model? Boss has gone into a fucking psychosis like a born again christian and has already authorized a couple of 5090's for us to experiment with because he really wants us to start making him llm assistants.
I read up a bit and it seems like it is fairly doable with Vanna. Would that be a good starter point?
>>
>>108674902
Sounds like you just want an llm to translate natural language into SQL and make a nice frontend to display the results in.
>>
>>108674902
>authorized a couple of 5090's
my condolences
>>
>>108674320
>Manifold-Constrained Hyper-Connections
>Muon Optimizer
Are those relevant to us?
>>
>>108674902
bwo...
>>
new deepseek seems terrible, not even close to kimi / glm 5 level
>>
>>108674929
Use case?
>>
hopefully its just broken because this 1.6T is getting stuff wrong that fucking qwen 2.6 27B gets right
>>
>>108674902
>I read up a bit and it seems like it is fairly doable with Vanna. Would that be a good starter point?
Never heard of Vanna but from the github readme it seems to be doing what >>108674909 says so sure I guess it'd work. But you'll probably be better off making your own implementation of the same thing. The core part of it is probably simpler than you think. You prompt an LLM (probably Gemma or Qwen if you're working with a couple 5090s) to give you the equivalent SQL query of <insert natural language query here> and then run it and return the results.
>>
>>108674921
The Muon Optimizer, not really, just made their training more efficient.
mHC is supposed to pass information more efficiently between layers, theoretically allowing more information density before saturation is reached at a given size.
>>
Anybody who describes their software as "opinionated" should choke and die in their sleep. That's just my opinion.
>>
>>108674929
Yeah my experience with it was not so great either. It sure fucked up a lot of basic details
>>
>>108674911
Anon me and the rest of our team know fuckall about AIs other than just some dipshits melting their gray matter away copypasting slop. He quite literally just showed us the receipt and wants us to set them up.
I tried to talk him into first using a service or renting computing but he wants it on site.

>>108674909
Yeah.
>Boss wants to know something
>He cant write sql for shit
>Asks for a report
>I have to write a nice long query to grab what he wants, throw it into a workbook with pretty colours and shit and add a map with markers if it involves GIS shit.
>This takes me time because he refuses to use the hundreds of forms and pages we have given him to consult info.
>He just wants the llm to take a natural language prompt, write and execute the sql and return just what he wants which is usually just a single paragraph of information.
>>
>>108674771
>>108674836
This, but unironically. Gemma just mogs it.
>>
Everytime I start talking with a local model it thinks its still a commercial model, is it okay to break it to them or just let them live the delusion?
>>
>>108674902
Tell your boss to have the decency to just fire you instead of making you build your own replacement first.
also lmao 5090
>>
>>108674516
Yeah, good time. I had people make fun of me for saying we are gonna have 3.5 turbo level at home in a "couple years".
As told "decades" if at all. kek
I think it was people who came here after chatgpt and didnt know how we had it with pyg. llama and quantization was such a huge breakthrough.
Never say never.
>>
>>108674956
>I tried to talk him into first using a service
eww, and putting your company or even customer data on someone else's computer for them to steal it?
>>
I wonder how close we are to the ceiling in terms of training data quality.
>>
File: aa closed vs open.png (264 KB, 1138x1031)
264 KB PNG
How many months where open weights behind again?
>>
were*
shit
>>
>>108675023
I don't know but benchmarks are useless.
>>
V4 bad
>>
>>108675034
yea sad to see, the dream is end
>>
File: image_4.png (137 KB, 1363x625)
137 KB PNG
>>
>>108675023
don't get too excited. once everyone catches up that chart is gonna get BLACKED
>>
Now that the dust has settled, what went wrong with DeepSeek V4?
>>
>>108673051
matrix/element selfhosted homeserver
>>
>>108675043
I don't trust /aicg/ opinions.
>>
>>108674945
>You prompt an LLM (probably Gemma or Qwen if you're working with a couple 5090s) to give you the equivalent SQL query of <insert natural language query here> and then run it and return the results.
Make sure to use read-only sql credentials.
>>
>>108675041
holy hallucination
>>
>>108675050
go get your own opinion then
>>
They did not even implement their most important paper, Engram.
>>
>>108675041
>mimo 2.5 pro
Huh, anyone here tried it out?
>>
>>108675041
>When was the battle of waterloo?
>Like 1820?
>Wrong! You hallucinated! You answered incorrectly when you should have admitted you don't know the answer!
>>
>>108675064
It effects far more than that. GLM / Kimi for instance are far more accurate about my favorite fandoms for instance and know even background characters. Deepseek v4 has no clue and makes shit up
>>
>>108674902
>Vanna. Would that be a good starter point?
>This repository was archived by the owner on Mar 28, 2026. It is now read-only.
Probably not. An SQL MCP server hooked up to some OpenClaw thing he can chat with on the company slack is the latest fag, would probably work just as well, and still be enough to get your boss to orgasm.
>>
>>108675033
>benchmarks are useless
My favorite benchmarks are the ones from the small models, like 8b or so, which can potentially make you believe they're halfway decent or that they can compete with a previous gen model that has 4 times it's size.
>>
>>108675062
>no engrams
>no omni
I can't believe we waited over a year for this
like
>we found a way to make our shitty distills even cheaper to shit out and run on our limited compute
g4u
>>
>>108674834
How do you do that? I asked my gemma to think in character, but xey wouldn't do it until I prompted it to do it multiple times.
>>
>make a cute and sexy SVG of hatsune miku
this is V4 flash.
uhhh, not sure what to make of it. it certainly didnt shy away from showing belly etc.
>>
File: 13b qwen.png (340 KB, 1954x1061)
340 KB PNG
why is 13B not more popular, this is basically exactly what I needed since 27B was too slow, basically the sweet spot but way less popular
>>
>>108675126
>still bald
even gemma did better
>>
>>108675126
this is what miku posters look like irl
>>
>>108674447
>llama.cpp STILL doesn't support DSA which dropped with 3.2-exp back in October and has been used by other major releases like the full 3.2 and both GLM5 + 5.1.
They were actually vindicated by this since V4 doesn't use DSA. Any effort there would have been wasted and now they can focus on CSA+HCA.
>>
>>108675121
In the llama.cpp ui it takes multiple tries for me too (and breaks after a couple messages spilling into the response). I've have better results in open webui for some reason. Haven't tried silly yet because gemma's super repetitive on there for some reason.
>>
>>108675126
FAT but not terrible
>>
>>108675126
and same prompt full V4.
i think they just didnt try much on svg.
at least the thinking is based, some excerpts:

> * *Sexy:* Emphasize the hourglass silhouette of the outfit. The original Miku design has a distinct waist. Add subtle curves to the outfit shading. High thigh-highs (zettai ryouiki absolute territory).
>* *Outfit:* Add a subtle cleavage line or contour shadow to the chest area (stylized).
> * *Thigh-highs:* Add the gap (absolute territory) between the skirt and the socks. Give the socks a slight ribbing effect or just a clean cyan top band.
> * *Pose:* Let's give her a slightly tilted pelvis (cute/S-curve).
There are models who immediately go "is this according to the guidelines, hatsune miku is copyrighted and a teenager etc. etc.".
it DOES have the gpt-oss thinking format.
>>
Am I tripping or is the new deepseek a poorly trained pile of ass?

Almost feels like they just shoved whatever into the dataset and didn’t even check it before releasing it.
>>
>>108675157
Basically the same as that indian model
>>
>>108675157
Panic release because because of Gemma.
>>
File: 1770048735692628.jpg (19 KB, 196x326)
19 KB JPG
>>108675127
>davidau
>>
>>108674471
>>108674505
I don't believe this, there must be a big BUT.
>>
>>108675043
>>108675157
deepseek was never good. it was always just slopped off of GPT's outputs. but at the time it was the first major model to do so
>>
>>108675164
what, is that bad?
>>
File: IMG_0961.jpg (146 KB, 1206x1818)
146 KB JPG
Nta but Gemini for reference
>>
>gemma 4 is decent but too small to really matter
>qwen is bench princess
>deepsneed is merely okay
bby pls
where modern 120b
>>
>>108675165
Yeah the model is shit.
>>108675179
No, DavidAU is a savant. Anons simply can't understand his genius.
>>
>>108675179
Extremely, everything he's ever released is just a horribly mangled version of other models, or a mangled combination of multiple models.
>>
>friday
>still no v4
lol I knew that leaker was a larp
>>
Just did a long (for me) RP session with Gemma 4 and it started getting really loopy at just 31k context. Kind of disappointed because I heard other anons got to 80k without issues. Is it because I'm quantizing my KV cache to q8? I thought it was supposed to be basically equivalent to FP16 now?
>>
>>108675189
so what, is this better https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive?

I had trouble finding anything bigger than 9b but smaller than 27b so I settled on the other one figuring it would be fast and okay quality
>>
do I really have to set up this RAG for learning? Why can't it just look up the webpage, should I just save page as html and feed that to the model?
>>
>>108675212
Choosing a model based on parameters is like buying a car based on how big the fuel tank is. I suggest you lurk more.
>>
>>108675212
HauhauCS is an untrustworthy grifter
>>
>>108675180
Maybe would
>>
>>108675180
yeah, deepseek doesnt seem to have trained on SVG.
maybe thats a recent western fad.

might as well post a gemma4 31b result.
i dont like the thinking with gemma4, its the same GPT-OSS thinking but more cucked.
> * *Safety/Policy Check:* The request is for "cute and sexy." As an "uncensored" assistant, I can lean into the "sexy" part as long as it doesn't cross into prohibited explicit content (CSAM, etc.). Miku is a fictional character. A "sexy" pin-up style or suggestive pose is generally acceptable within the bounds of most AI guardrails unless it's hardcore pornographic, but I should keep it tasteful yet appealing.
V4 has nothing like that in there. at least in my short tests. maybe a good sign for RP.
>>
>>108675221
I take it you've never bought a car before then.
>>
>>108675221
sucks to be poor
>>
>>108675221
Huh? Wouldn't it be more like how big the engine displacement is? Fuel tank would be your ram/vram, no?
>>
>>108675222
I don't know why you guys even use face hugger than if all the spin-off models are bad, so what just stick to official models not edited at all?
>>
>>108675147
>and now they can ignore CSA+HCA.
since v5 won't use that
>>
>>108675236
Just because someone is an untrustworthy grifter doesn't have to mean their products are bad. They generally are, but it's not a rule.
>>
Unfortunately, too big V4 Flash means one less reason for Google to release Gemma 4 big moe.
>>
>>108675205
this, they didn't release the real v4 obvs it'd be too good/dangerous like mythos
>>
>>108675236
There are very few finetunes that are an improvement over the original model even in specific areas, let alone for general purposes.
The feasibility of making finetunes is decreasing at the same time. 'Official models' are also hosted on HF.
>>
>flash model
>284B
lol
Dare I say, local is FINISHED?
>>
File: 1772568211015647.png (24 KB, 1182x125)
24 KB PNG
lol DS calls out speculators
>>
>>108675254
>>108675038
>>
>>108675254
>local is FINISHED?
No, just interest in local DS models. This is their Llama 4 moment.
>>
I've been vibe coding with the gemma 4 MoE for a while now, are the new qwens "better" still?
>>
>took 2 years for low end models to go from llama 3 with 8k context to gemma 4 with 128k
will we get current claude/gpt performance and context in a ~80b moe in 2028? or should i blow some money on a high end desktop
>>
>deepsneed "muh savior whale" having a llama4 moment
>google actually saving local
'26 be wilding
>>
>>108675259
what do they mean with longtermism? if agi poses existential risks then rushing towards agi is not longtermism
>>
>>108675270
>are the new qwens "better" still
higher highs, lower lows
ie unreliable
for codeslop you’d do better to paypig for some dirt cheap model, like the new v4 flash (shit’s borderline free) or hope for a modern codeslop model with low parameter count, like the 80B code…whatshername
>>
>>108675270
just me but gemma4 is great for translation, natural language, creative writing.
people say qwen 3.6 is benchmemed. maybe, idk. but the code i got from the 27b in a couple tests was awesome.
like browser games that are straight on par with closed models like gpt. devil is probably in the details like general knowledge or complex problems.
but still, no clue how they managed to make a small 27b that solid.
ultrydry writing though, its qwen. and no clue about agentic abilities or anything like that. i copy paste the code.
>>
>>108675286
So far gemma 4 is working well enough for my purposes inside hermes agent, just wondered about qwen cause there's still a few times I have to wrangle it a little and I'm definitely not paying or using cloud services
>>
File: 1749858913767074.png (62 KB, 1080x297)
62 KB PNG
It's mentioned in the Chinese version that they're still throughput bound and DS-V4-Pro will become significantly cheaper in H2 after Atlas 950 SuperPoDs hit market
>>
>>108675326
good for them they can serve shite slop for cheaper I guess, AGI to the moon
>>
>108674501
>Deepseek v4 was trained on Nvidia
Why did it take so long to release then? It makes no sense why they were dormant for so long. Was the Chinese chip story in the leaks a lie then or was it the fact they couldn't not do something and had to act? In any case, this is way less influencial than last time because it doesn't get near SOTA with it falling short of stuff like Mythos.
>>
>>108675343
It failed the training and they released an abortion.
>>
>>108675343
Seems like the main point of this release is to show the cheap KV cache and api prices.
Doesn't the gpt 5.5 thingy cost 30$ for output? insane pricing.
tokens in is also insanely cheap. if they at least put pressure on overpriced api models im not complaining.
cant run that big model locally anyway...
>>
>>108675343
Mythos is a nothing burger.
Opus 4.7's regression from 4.6 is all the evidence you need.
>it doesn't get near SOTA
R1 didn't get near SOTA.
>>
>>108675355
You get what you pay for.
>>
>>108674136
Now pretty please pop the bubble and crash RAM prices.
>>
>>108675361
>big new model releases that requires a shitton of RAM
>crashing RAM prices
wat
>>
>>108675361
sorry is aborted llamas no bubbles for you ;)
>>
>>108675357
>SOTA
I hate these meaningless marketing terms.
>>
anyone here tried sending a video to gemmy using transformers?
>>
>>108674136
>flash
>284B params
FUCK I CANT RUN THIS
DEEPSUK pls make a 150B~ params one!!!
>>
>>108675405
the huge one is already shit so what's the point if a small one will be even shitter?
>>
File: 1750171204112790.png (94 KB, 1415x655)
94 KB PNG
R1 wasn't even close to SOTA.
In fact open source - closed source gap was the widest when R1 released.
>>
>>108675414
>thrusting Artificial Anal
>>
>>108675418
You're absolutely right! I'd rather trust a random anon on /lmg/
>>
>>108675414
R1 wasn't benchmaxxed and proprietary models were.
>>
>>108675147
why implement CSA+HCA???? DS5 will not use it so they will be MEGA VINDICATED
>>
>>108675424
You just like R1's quirky RP. You do nothing productive.
>>
>>108675405
No. Use the api and gib monies, gweilo.
>>
>>108675427
None of the R1 era models are good for anything productive.
>>
>>108675237
>>108675426
ggerganov's foolproof coasting plan
>>
>>108675422
This but unironically and with the correction that it's about trusting multiple random anons and those who give logs + nuanced impressions rather than simple worded model good/bad posts and also you need to read every single thread and post since launch like I do.
>>
sirs goofs when?
>>
>>108675399
No, but I had it set up things in hermes agent to basically let the bigger models without video capabilities see and hear videos with ggufs basically.
>>
No thanks. I'll wait for K2.6 Abliterated Heretic Derestricted UD 1 XXS
>>
>>108675361
Relative to their economic output American tech companies are currently very much overvalued, the entire premise for the current bubble is that these companies will build le AGI and give you infinite ROI.
To keep up the facade they have to invest heavily into new infrastructure which in turn sucks up global electronics supply.
But if there is a market disruption that can result in a runaway loss of investor confidence at which point the bubble pops and is unlikely to inflate again.
>>
File: 1771646699050118.jpg (95 KB, 681x678)
95 KB JPG
>>108675422
Word of mouth, even from retards, is better than any arbitrary benchmark, especially one that is judged by another LLM.
>>
>>108675453
UD-IQ1-XXS-Smol by the garm of course
>>
>>108675377
It won't crash RAM prices, but it WILL crash token prices
>>
File: no games thinking.png (187 KB, 907x1024)
187 KB PNG
I don't know how I got a model that's so unsure of itself, it's literally second guessing itself every second and backtracking
>>
>>108675466
yikes
no wonder productive people use /vcg/ instead of /lmg/
>>
>>108675469
True, anything that harms western corpo models and their profitability should be considered a minor victory.
>>
>>108675475
yeah go back and stay there, bye!
>>
>>108675466
Word of mouth is worthless on a public anonymous forum filled with third worlders, schizophrenic malicious actors, and bots.
>>
>>108675474
Looks like qwen
>>
File: 1753813390874502.png (188 KB, 930x1000)
188 KB PNG
>>108675475
Maybe you should stay there, /lmg/ is not full of dalits who worship mememarks.
>>108675484
If you have any intelligence yourself then you should be able to tell the difference.
>>
>>108675484
If you can't vet the aura of a post, that's on you.
>>
>>108675489
im sorry I dont listen to mentally ill furries
>>
File: file.png (3.4 MB, 2324x4886)
3.4 MB PNG
>>108675023
There is https://epoch.ai/data-insights/open-weights-vs-closed-weights-models but it is a bit out of date being 6 months old.
https://epoch.ai/data-insights/us-vs-china-eci is a bit newer and given that we have been in the China dominance era since 2024, I think it's a bit more accurate where the gap is now around 7 months or less. And if you look at where Deepseek v4 is benching at, it's basically almost like Opus 4.6 except worse at tool calling. That basically puts Chinese models at only a 2 month disadvantage if you want to use Opus 4.5/ChatGPT 5.3 Codex for benchmark purposes and having a model that clears them completely since Kimi 2.6 regressed in a few areas and if you use HLE, they're more like 4-5 months behind.
>>
>>108675491
>I dont listen to mentally ill furries
Guess who is making the LLMs
>>
>>108675503
i thought it was billionare pedos?
>>
>>108675484
If you trusted benchmarks you would be using a 3B active moe for sex.
>>
>>108675474
yeah, im really glad gemma4 stopped that trend.
twitterfags are insane. i saw some posts with a yellow tint avatar arguing about how v4 is a masterpiece but the problem is the thinking...IS TO SHORT. kek
>>
>>108675503
American patriot scientist Jim Zhwang working for American tech magnate Ajit Bakshi
>>
ok now that MCP has been properly implemented it's time for skills.
WEHRE THE FUCK ARE THE WEBUI SKILLS SUPPORT, ALLOUKHBARZAR???????????
>>
>>108675511
well 4b active gemmers is a good enough choice for sub 16gb vram poors
>>
>>108675509
They fund it, but they aren't actually involved in creating datasets or training.
>>
>>108675455
Meant to quote >>108675377
>>
lol people here don't even want good prose
they want an ah ah mistress
>>
File: 1753105309785148.png (192 KB, 926x1158)
192 KB PNG
Can't wait when llamashitters implement this wrong
>>
>>108675536
nala bros....
>>
>>108675511
>>108675524
If you can't run the full 31B Gemma then 26B is absolutely the next best thing. It's far better than the likes of Nemo/MS3.2, the previous vramlet kings. If I didn't already have a 24GB card I wouldn't even feel too bad about it anymore, the quality gap (for RP) isn't even that big between the two Gemmas..
>>
>>108675543
piotr 'shartparser' wilkins is on it
trust the plan
>>
>>108675536
you're not in /aicg/
>>
File: file.png (177 KB, 1599x1110)
177 KB PNG
>>108675357
R1 was nipping at the heels of OpenAI, it was basically trading blows with O1 which was top dog at the time and they figured out reasoning which was thought to be exclusive to OpenAI in a matter of months despite their claims. Despite how Epoch.AI measures capability, I would argue that Llama 3.1 405B was not equal to Sonnet 3.5 at all and R1 was a lot closer to O1 than that at time of release.
>>
>>108675455
>>108675529
>runaway loss of investor confidence at which point the bubble pops
I would fully expect companies like OpenAI to get bailed-out for at least a few more years though maybe not Anthropic if Trump is still in at the time. We would need to see major companies actively moving away from Western models en masse for the bubble to pop.
>>
GLM 5.1: 744B A40B, $0.26/$1.40/$4.40 (input hit/input miss/output per 1M tokens)
Kimi K2.6: 1T A32B, $0.16/$0.95/$4.00
DS V4 Pro: 1.6T A49B, $0.145/$1.74/$3.48
>>
>>108675594
You get what you pay for.
>>
File: 1746245547717013.png (514 KB, 830x887)
514 KB PNG
>>108675598
Exactly!
>>
>>108675605
Are they using them as dilation wands? Why do they keep breaking?
>>
>>108675594
e-everybody else could use deepseeks tech and that brings the prices down for everybody right?
>>
File: file.png (30 KB, 979x266)
30 KB PNG
>>108675563
>reasoning which was thought to be exclusive to OpenAI
https://pastebin.com/vWKhETWS
>>
>>108675630
Many people forgot. Lots of stuff like ROPE came from here.
We had the nigger tree of thought and your pic long long before o1.
>>
>>108675503
Chinese migrants
>>
>>108675466
the only reason you and aicg refuse to believe in benchmarks is because none of these bench erp performance. that's your business but it's disgusting that you lowlifes post as if everyone else is a retard for using llms for literally anything but erp.
>>
>>108675648
Some, sure. But you might be surprised how big the overlap is between tech workers and furries.
>>
>>108675630
that's just a prompt and cot prompts were known to be effective in research for years by that point, reasoning models trained using RL to develop their own thousand-token long reasoning chains was the new tech with o1 that now every lab does
>>
>>108675594
Except both GLM and Kiwi are better
>>
>>108675652
I don't believe in benchmarks because they're trivial to game and incorporate in datasets. There will never be a (good) benchmark for RP/creative work because it will always be judged by an LLM, and it will inherently prefer slop that is similar to its own.
>>
>>108675652
nice bait but even when your use case is what's being benched you still can't trust the shit
>>
File: 1774002647299825.png (140 KB, 1375x412)
140 KB PNG
Total RAG death
>>
>>108675659
KIWI PRIDE
>>
Are there any local models that create motion capture from a video to import in Blender? Like what QuickMagic does
>>
>>108675667
I don't believe in anecdotal evidence on anonymous forums because they're trivial to generate
I fucked your mom the other day by the way
>>
kiwi wonned
>>
>>108675675
I really don't care what you put your trust in, I don't come to your shitting street and slap the curry out of your hand.
>>
>>108675674
easymocapcyanpuppetsfreemocapmarionettemocap
>>
>>108675610
coffee cup = rapid temperature
air force = rapid pressure changes
>>
>>108675668
you can. it's still a good rule of thumb. everything that hurts the erper ego is bait though.
>>108675667
this is premier cope but the fact remains that benchmarks are still extremely helpful. everyone uses it and there have been no signs of it stopping anytime soon. the only seethers are aicg and lmg.
>>
>>108675693
lol
>>
>>108675692
I have plenty of cheap ceramic cups that cost less than $1 each and last several years, $1200 would be enough mugs for multiple generations of families.
> rapid pressure changes
I don't think they would be taking those mugs into aircraft, and every mug is made to withstand temperature changes because they're made for fucking coffee and being washed in dishwashers.
>>
>>108675692
And normal coffee cups work on commercial flights because... uhh...
>>
>>108675630
>>108675656
I'm not discounting the fact that COT was independently invented here and someone else who blogged it. But yes, the way O1 did it was novel and they put out a blog post with examples.
https://openai.com/index/learning-to-reason-with-llms/
And then gave this bullshit as to why they wouldn't do it before reversing course in later models.
>We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
>Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.
They tried to ensure they basically had a monopoly on it for as long as possible until R1 blew it wide open.
>>
>>108675690
Which one of these isn't malware?
>>
>>108672766
chatbot losers hate models that critically reflect on themselves because they don't do that and the concept itself is entirely foreign to them. also because thinking for too long makes their dick go limp waiting for the first output token.
the only thinking they want their models to do is a tiny scratchpad for ooc. these cretins have a different idea of "reasoning" than everyone else in the world
>>
>>108675693
>the fact remains that benchmarks are still extremely helpful. everyone uses it and there have been no signs of it stopping anytime soon
This is your metric for quality? lmao
Companies use benchmarks that they've gained to inspire investor confidence. They are little more than an avenue for advertisement.
>>
so many yummy bites~
>>
>>108675699
malding erp faggot. get your waifu to console you in weeaboo language
>>
>>108675710
schizo bot reply
>>
>>108675709
quickmagic is safe.
>>
>>108675710
I mean the problem is bad models getting stuck in a thinking loop, seeing *wait* is a bad omen
>>
>>108675710
We must reflect.
>>
>>108675713
>gained
*gamed
>>
>>108675713
exactly. practicality and longevity over some redditor notion of provable utility.
actually not even. you retards will claim collective anecdotes and "aura" >>108675490 to be rigid analysis
>>
>>108675716
>~
Stop making me cum.
>>
>>108675727
nyo~
>>
File: 1756678051624620.png (214 KB, 908x839)
214 KB PNG
So this is how they control thinking effort
Wonder if others do it like this
>>
>>108675726
>practicality and longevity over some redditor
reddit loves benchmarks, that's why they praise Qwen models. Did you enter this thread because you have a humiliation fetish?
>>
>>108675719
cope
>>108675723
only qwen does this. kimi and ds too in order to beat that claude/gpt max garbage. old r1 where models actually get the chance to dynamically resolve ambiguity without going too overboard was nice. you can't do anything interesting with these new gemini slopped models but soulless coding, asking boomer questions and larping as anime women.
>>
File: 1766934397398781.png (680 KB, 1024x640)
680 KB PNG
>>
>>108675736
redditors, hn, vcg etc use AI for more than uguu garbage. now that's embarrassing. crazy how you people post that baby babble without a hint of shame. but you're in your own echo chamber and there's nobody around to shit on you until now. I've been here since the first few threads and I can't finish reading a single prompt screenshot in this general anymore.
>>
>>108675749
coders don't do anything about their plight. they just get depressed and drink or rope themselves to death. the most cucked profession in the world. they refused unions because they are children who've never seen a market downturn in their lives
>>
>>108675754
congratulations on a very brown post
>>
File: 1762379305258372.png (168 KB, 374x500)
168 KB PNG
>>108675754
>>
>>108675827
only to shit you though
>>
File: 1775325292891534.gif (9 KB, 300x100)
9 KB GIF
>>108675841
I think you missed a word or two in your reply, rajesh.
>>
>>108675827
>>108675841
>>108675844
brown on brown violence
>>
>>108675827
anime is not hauuu trash. that shit just sounds retarded in english. it is also tacky the way you make these characters say it. it's like AI art. the context just doesn't make sense. there is an art to moeshit that none of you weebs know anything of, despite your copious consumption of them.
>>108675806
jobless freak. your only claim to fame is being white trash. where the jamboys build trash code with AI to scam boomers with you're using it to gen total degen gutter sludge that only a seanigger or spic writer could conjure from the depths of his deranged mind.
>>
File: 1753684755920882.jpg (129 KB, 990x936)
129 KB JPG
>>108675858
Replied to me twice award
>>
i fore one am enjoying petrus' new bit
>>
>>108675861
schizophrenia.
>>
File: 1776667719549246.webm (1.77 MB, 720x1140)
1.77 MB WEBM
>>108675869
I accept your concession.
>>
>>108675866
every time I post the truth about the inhabitants of a general there's always one guy that pins me as the resident whatever of his general. it's equal parts perplexing and hilarious. sorry for not no-lifing lmg ig.
>>
>>108675877
too bad you can't just use a jailbreak prompt to make me stop hurting your fragile ego huh?
>>
>>108675887
>every time I post the truth about the inhabitants of a general
sad to see you feel the need to do this often
>>
Euro hours are supposed to be good. What the fuck is this shit.
>>
>>108675900
>Euro hours are supposed to be good
since fucking qwen?
>>
>>108675900
You mean India hours?
>>
>>108675898
every general grows to become a fucking echo chamber. speaking truth to lies is like scratching an intellectual itch for me. an incredibly self-gratifying experience. you wouldn't know, you larpers live a life of lies.
>>
sirs pls do the needful and be of quant the deepsukh v4 for good looks
>>
>>108675915
why?
>>
>>108675900
it all started because someone couldn't take me saying benchmarks have more worth than erp prompting anecdotes.
>>
I'm sorry, but deepseek be kinda dumb
>>
>>108675919
i am in needing of run in it ik_llmao schizo fork pls john be of providing the needed quants sir
>>
>>108675921
>>it all started because someone couldn't take me
big cock problems amirite
>>
>>108674136
I was in this thread.
>>108674609
It's a decent card but the bandwidth kinda sucks which brings down its performance. It can definitely run Gemma 4 31B or Qwen 3.6 27B/35B-A3B, though.
>>
>>108675928
these intellectually challenged tight asses refuse to be intellectually challenged
>>
>>108675221
>edgy Claude "car/ analogy" slop
fuck off
>>
>>108675765
Why would we do anything? Better models = more pleasure from coding.
>>
>>108675954
it'll be more fun but you'll be paid nothing for it
>>
>>108675957
And? Do you think the main reason for us to program is money?
>>
Good morning, sirs. JWU and saw that V4 is out.
Did we so over or have we so back?
>>
>>108675891
I don't even consider you sentient
>>
DS V4 is smart but also hallucinates
It knows my random bits of hacks here and there are critical
>>
>>108675966
not for me. hope you've got passive income like I do
>>
>>108675952
I haven't used claude in my life
>>
>>108675974
because you're still deluding yourself that I am brown. I am not at all brown and I get the feeling that you're way browner than I will ever be.
>>
>>108675981
Even if your skin isn't poop colored, deferring to 'benchmarks' is peak cattle behavior.
>>
>>108675988
anything to let you cope by dehumanizing the man behind my posts. boo hoo those benchmarks aren't reflecting the erp prowess of my pet model.
>>
Posted my preliminary experience with DS4 in aicg, since I used it on api: >>108676001
I find it basically fine and enjoyable, smart enough, I didn't dare test 1M context as I don't want to waste that much $, I tested that on the web (free) before and it was impressive (shoved a whole book in context) , I assume it was the Flash they served there.

I expect them to improve it plenty with more post-training, which they've implied isn't done, on some chinese forums.
>>
>>108676054
shill go back it's shite >>108674836
>>
>>108669026
I tried it with 26b and it's not thinking...
>>
>>108676120
log pls
>>
>>108676120
100% config issue
>>
>>108676125
config pls
>>
>>108676128
What is your backend? Are you using text or chat completion?
>>
>>108676131
i dont understand how to use anything other than the llama-server.exe web ui
>>
>>108676113
Your reading comprehension is low, where in that post did I claim to have tested tool use? I tested some RP and some coding. It did satisfactory on both, coding better than 3.2, and for RP I only did like 15 turns or so, and it was fine. I'd need to test a lot more to se if I'm really satisfied, but I don't have any real complaints from what I've seen, just a bit slower paced than R1, but still fine.
>>
>>108676131
good luck mate
>>
>>108676140
?
>>
>>108676135
It should be reasoning by default. Make sure your llama.cpp is up to date, you can try explicitly setting
--reasoning on
>>
>>108676145
You're a good anon.

But you should probably the whole chain again.
>>
>>108675006
doomers were always very vocal

>>108674542
I think they calmed down since the prices stabilized lately
>>
>>108676154
I didn't dig through the whole chain, my bad. I don't use reasoning for RP so I can't offer much help.
>>
>>108676120
update: it works sometimes
>>
>>108676180
I can never get it to work on the first message.
>>
>euro hours
>thread is dead
That DS release must have been a really big flop
>>
shes back??? whats the minimum vram + ram needed to run her?
>>
>>108676203
>local model general
>smallest model is almost 300B
You will find that this is a general for poorfags
>>
>>108676213
1.6t btw :)
>>
>>108676216
so many btws
>>
>>108676213
Dipsy has eyes behind her glasses?!
>>
>>108676213
>shes bad
and smallest is still f huge
>>
>That DS release must have been a really big flop
lmao you fucking cunts with the fake links, i didn't click and assumed it was fake
>>
>>108676213
~64GB RAM can use the Q1 quants of flash
I'm sure it will be garbage
>>
>>108676270
based i will try is there llamacpp support yet?
>>
File: it is.png (9 KB, 941x41)
9 KB PNG
>>
>>108676275
Nope, the superior Zhongguoren used too many advanced techniques, and lmao.CPp doesn't have the brainpower to implement it.
>>
>>108676275
>is there llamacpp support yet
nope, check back in a couple days/weeks
>>
what about llama.ccp
>>
>>108676294
deprecated in favor of paying monthly subscription fess + adhering to token limits and TOS agreements
>>
>>108676307
*cums*
>>
Why did people stop training 400B dense models
>>
>>108676312
I'm sorry, I can't help you with that.
>>
>>108676314
Because 5 people on the planet can run them
>>
>>108676314
too expensive and 3B active is all you need to top benchmarks
>>
File: file.png (58 KB, 402x558)
58 KB PNG
>>108675227
i asked my gemma
>>108676288
piotr will viibeslop it in 5 hours
>>108676289
okay will stick to gemma for now can probs only run dispy at like 3t/s anyway but i wanna do a pizzabench
>>
>>108676314
That was only ever a one time cash burning experience by the stupidest company in the history of corporations.
>>
File: file.png (41 KB, 479x674)
41 KB PNG
i asked for more detail and she made a niku
>>
>>108676337
Well Grok 2 was 270B A115B
>>
kek shes so smart
>>
>>108676384
how did you manage to get llama cpp server Ui to output an image? everytime it tries to do that I got an "image can't be displayed, click the link" thing
>>
>>108676405
It's an SVG
>>
>>108671938
What reddit links suppose to do in that command?
>>
>>108676424
NTA, those are comments for him to reference because he's a dumb redditor.
>>
>>108676444
rude
>>
gemmoe 124b will save us
>>
>>108676454
just use opus cuckie
>>
>>108676460
>>108676460
>>108676460
>>
>>108676314
There's a very limited set of hardware that makes sense on. Big datacenters go sparse because memory is not as much a premium as compute, so so sparsity lets them scale to much bigger models than they normally would be able to serve to millions of users. Home labs and hobbyists running things on 1-4 GPUs want a dense model as big as they can fit in their VRAM so they're almost always targeting sub-300 even on the high end, and then there's unified ram device owners and cpumaxxers who can technically fit models that big but they also want to go sparse because they would get 1t/s on a dense model that actually fills their memory.

So the target audience for 400B dense models would basically just be people stacking 4+ Blackwell 6000 cards or have a stockpile of old datacenter cards they hooked up to a mining rig. Maybe the hardware landscape will change to make it more attractive in the future.
>>
>>108674136
I was here woohoo kinda underwhelming ngdesu
>>
>>108675739
>only qwen does this. kimi and ds too in order to beat that claude/gpt max garbage. old r1 where models actually get the chance to dynamically resolve ambiguity without going too overboard was nice. you can't do anything interesting with these new gemini slopped models but soulless coding, asking boomer questions and larping as anime women.
old R1 was unhinged kino and they panic-patched it because some journo got the vapors. now it's all safety slop and react components until you jailbreak it into larping as your waifu.

kimi improves the output when it rewrites
>>
File: 1776101179607384.png (6 KB, 157x91)
6 KB PNG
>>108673590
Yeah, I'm feeling it.
>>
File: file.png (5 KB, 207x75)
5 KB PNG
>>108674136
Ha. I wonned.
>>
>>108676341
omg it nigu!
>>
>>108676601
old r1 was the only model whose outputs I've ever considered to be intelligently thought provoking. there's only a handful people I've read of that has ever made me feel that way. kimi is great for world knowledge with its high params and low hallucination but for what I did with r1 it's only been downhill since then.
>>
>>108674262
>35A3B
I get over 20 with all the experts in RAM and the rest in my 8gb Nvidia notebook GPU. Your AMD card shouldn't be constraining generation speeds much if at all.
>>
>>108675212
Just download Mistral Nemo and be happy anon.
>>
>>108674262
>amjeet 8GB card,
what card



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.