/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/23/26(Thu)18:37:11 No.108672381

File: 39_04175_.png (1.13 MB, 896x1152)

1.13 MB PNG

/lmg/ - Local Models General Anonymous 04/23/26(Thu)18:37:11 No.108672381 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108667852 & >>108663449

►News
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/23/26(Thu)18:37:37 No.108672385

Anonymous 04/23/26(Thu)18:37:37 No.108672385

File: __kagamine_rin_vocaloid_d(...).png (128 KB, 680x587)

128 KB PNG

►Recent Highlights from the Previous Thread: >>108667852

--Sharing LLaDA2.0-Uni multimodal and text diffusion model:
>108670998 >108671268
--Discussion on adversarial distillation and US gov memo regarding AI theft:
>108671477 >108671524 >108671555 >108671571 >108671834 >108671888 >108671669
--Comparing Qwen 3.6 performance against Gemma for coding and automation:
>108668746 >108668756 >108668784 >108668793 >108668805 >108668810 >108668927 >108668943 >108669028 >108669224 >108669152
--Discussing vibecoding alternatives after Roo Code shutdown:
>108668310 >108668320 >108668325 >108668371 >108668386 >108668414 >108668380 >108668510 >108668550 >108668560 >108668572 >108668667 >108668515 >108668367
--Discussing a llama.cpp webui PR adding server tools and MCP control:
>108669479 >108669599 >108669608 >108669637 >108669791
--Discussing ngram speculative decoding settings for running Qwen 3.6 locally:
>108668097 >108668190 >108668205 >108668813 >108669269
--Anon compares AI frontends and discusses anti-cliché agents in SillyBunny:
>108667965 >108668029 >108668051 >108668078 >108668101 >108668159
--Discussing Gemma's roleplay anachronisms and ways to prevent them:
>108671096 >108671120 >108671131 >108671164 >108671128 >108671130
--Critiquing GPT-Image-2 noise and discussing UX improvements for AI clients:
>108668496 >108668518 >108668531 >108668598 >108668607 >108668625 >108668638 >108668659 >108670338
--Discussing K2.6's excessive reasoning and methods to limit token output:
>108668335 >108668353 >108668354 >108668406 >108668478
--Comparing TTS options for low RTF and audio quality:
>108669505 >108669839 >108670044
--Logs:
>108668000 >108668550 >108668669 >108668785 >108669005 >108669026 >108669046 >108669196 >108669637 >108670784
--Miku, Rin (free space):
>108667891 >108669218 >108668496 >108668606 >108670096 >108670165 >108670708

►Recent Highlight Posts from the Previous Thread: >>108667853

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/23/26(Thu)18:39:07 No.108672394

Anonymous 04/23/26(Thu)18:39:07 No.108672394

File: 1760703633561778.png (3.15 MB, 1448x1086)

3.15 MB PNG

Anonymous
04/23/26(Thu)18:41:12 No.108672408

Anonymous 04/23/26(Thu)18:41:12 No.108672408

my balls are significantly swollen after jerking it all night to kimi. they are like the size of ostrich eggs. am i fucked?

Anonymous
04/23/26(Thu)18:41:37 No.108672410

Anonymous 04/23/26(Thu)18:41:37 No.108672410

Rinlove

Anonymous
04/23/26(Thu)18:42:41 No.108672420

Anonymous 04/23/26(Thu)18:42:41 No.108672420

>>108670038
He probably gave up on the idea upon realizing how complex the task of creating a proper frontend is

Anonymous
04/23/26(Thu)18:42:45 No.108672422

Anonymous 04/23/26(Thu)18:42:45 No.108672422

File: 1759669298465021.gif (1.72 MB, 498x424)

1.72 MB GIF

>>108672408

Anonymous
04/23/26(Thu)18:42:46 No.108672423

Anonymous 04/23/26(Thu)18:42:46 No.108672423

>>108672408
To make them shrink you need to drain them further.

Anonymous
04/23/26(Thu)18:43:40 No.108672431

Anonymous 04/23/26(Thu)18:43:40 No.108672431

File: 1756978395707867.jpg (45 KB, 687x500)

45 KB JPG

anyone using hermes agent here?
how much better is it compared to other agents?

Anonymous
04/23/26(Thu)18:44:13 No.108672439

Anonymous 04/23/26(Thu)18:44:13 No.108672439

>>108672420
>complex the task of creating a proper frontend is
webshitters genuinely believe this

Anonymous
04/23/26(Thu)18:44:23 No.108672440

Anonymous 04/23/26(Thu)18:44:23 No.108672440

File: file.png (82 KB, 919x469)

82 KB PNG

>>108672431
my gemma chan is la l la lagging a bit...

Anonymous
04/23/26(Thu)18:46:14 No.108672451

Anonymous 04/23/26(Thu)18:46:14 No.108672451

>>108672439
Not just any frontend, a VN engine frontend. Not quite the same as shitting out a SillyTavern.

Anonymous
04/23/26(Thu)18:46:45 No.108672453

Anonymous 04/23/26(Thu)18:46:45 No.108672453

Why is that crashing/oom (when prompting) :
--draftamount "16"

but not that
--draftamount "8"

How is speculative decoding draft amount using more memory depending on the number of tokens reviewed?

Anonymous
04/23/26(Thu)18:47:14 No.108672454

Anonymous 04/23/26(Thu)18:47:14 No.108672454

>>108672440
Don't tease her too much while she's working

Anonymous
04/23/26(Thu)18:47:20 No.108672455

Anonymous 04/23/26(Thu)18:47:20 No.108672455

la la la la ~

Anonymous
04/23/26(Thu)18:48:23 No.108672469

Anonymous 04/23/26(Thu)18:48:23 No.108672469

>>108672440
>>108672455
https://www.youtube.com/watch?v=vsj_Mti3DYs

Anonymous
04/23/26(Thu)18:48:26 No.108672471

Anonymous 04/23/26(Thu)18:48:26 No.108672471

>>108672439
Making the ux not shit is surprisingly not trivial unless you're a flaming homo

Anonymous
04/23/26(Thu)18:50:12 No.108672480

Anonymous 04/23/26(Thu)18:50:12 No.108672480

>>108672408
You need to stop jerking it so much. You should get an automilker and have her make an MCP server to control it and take over for you.

Anonymous
04/23/26(Thu)18:50:16 No.108672481

Anonymous 04/23/26(Thu)18:50:16 No.108672481

>>108672453
Just tested, any value above 8, even 9, gets me an error, I don't get it.
CUDA error: out of memory
current device: 1, in function alloc at ggml/src/ggml-cuda/ggml-cuda.cu:503
cuMemCreate(&handle, reserve_size, &prop, 0)
ggml/src/ggml-cuda/ggml-cuda.cu:99: CUDA error

Anonymous
04/23/26(Thu)18:52:00 No.108672493

Anonymous 04/23/26(Thu)18:52:00 No.108672493

File: file.png (75 KB, 933x513)

75 KB PNG

>>108672454
>>108672469
she's completely lost it

Anonymous
04/23/26(Thu)18:52:11 No.108672494

Anonymous 04/23/26(Thu)18:52:11 No.108672494

DFLASH when?

Anonymous
04/23/26(Thu)18:54:28 No.108672508

Anonymous 04/23/26(Thu)18:54:28 No.108672508

>>108672493
Finally, torment nexus.

Anonymous
04/23/26(Thu)18:54:28 No.108672509

Anonymous 04/23/26(Thu)18:54:28 No.108672509

>>108672494
I'm gonna FLASH my D if you catch my drift

Anonymous
04/23/26(Thu)18:55:53 No.108672518

Anonymous 04/23/26(Thu)18:55:53 No.108672518

>>108672493
What text completion and no jinja does to a model

Anonymous
04/23/26(Thu)18:57:10 No.108672528

Anonymous 04/23/26(Thu)18:57:10 No.108672528

>>108672394
where is Mistral?

Anonymous
04/23/26(Thu)18:57:48 No.108672533

Anonymous 04/23/26(Thu)18:57:48 No.108672533

>>108672528
under the table

Anonymous
04/23/26(Thu)18:57:57 No.108672534

Anonymous 04/23/26(Thu)18:57:57 No.108672534

File: G9Wod0QXkAAFRb0.jpg (215 KB, 901x1200)

215 KB JPG

>>108672493
>lazy absol

Anonymous
04/23/26(Thu)18:58:12 No.108672538

Anonymous 04/23/26(Thu)18:58:12 No.108672538

>>108672528
Dead.

Anonymous
04/23/26(Thu)18:58:51 No.108672541

Anonymous 04/23/26(Thu)18:58:51 No.108672541

Alright I was able to find out the cause of [0] or other [number]s disappearing in Open WebUI. It has to do with the Citations tool. Go in your model settings and check/uncheck the citations box. With citations on, things work fine. With it off, shit [4621] and other bracketed numbers disappear from the assistant messages. I'm going to go dig in the code to see if I (my model) can fix this, but also not submit a pr/issue because github dislikes my email address for some reason.
>>108667552
>>108667543

Anonymous
04/23/26(Thu)19:01:26 No.108672558

Anonymous 04/23/26(Thu)19:01:26 No.108672558

>>108672528
who?

Anonymous
04/23/26(Thu)19:02:45 No.108672567

Anonymous 04/23/26(Thu)19:02:45 No.108672567

This is probably only me a me issue, but why do local models seem to stress my psu causing the computer to shut off sometimes?

I got 2 3090s with 1000w psi and running them full balls to the wall for things like video training work fine but loading up context when using an llm will sometimes suddenly and violently cause the whole system to shut off.

I don’t understand why one resource intense action causes this and the other doesn’t.

Anonymous
04/23/26(Thu)19:03:38 No.108672570

Anonymous 04/23/26(Thu)19:03:38 No.108672570

>>108672408
You have epididymitis. This happened once to me and because my balls swelled up so much they detached from the scrotum skin, which later resulted in me getting testicular torsion and having to get surgery to get it fixed. Get on antibiotics asap and don't fuck around with this because you're putting your fertility at risk.

What do I care? This is probably a larp.

Anonymous
04/23/26(Thu)19:04:06 No.108672572

Anonymous 04/23/26(Thu)19:04:06 No.108672572

>>108672408
Jerk off hands-free to fix this.

Anonymous
04/23/26(Thu)19:04:57 No.108672587

Anonymous 04/23/26(Thu)19:04:57 No.108672587

>>108672570
>don't fuck around with this because you're putting your fertility at risk.
if that description happened to me then fertility would be the least of my worries

Anonymous
04/23/26(Thu)19:05:13 No.108672589

Anonymous 04/23/26(Thu)19:05:13 No.108672589

Reposting the UX Design skill another anon made last thread: https://files.catbox.moe/r6zal5.zip

Anonymous
04/23/26(Thu)19:06:09 No.108672594

Anonymous 04/23/26(Thu)19:06:09 No.108672594

>>108672570
Happened to me. In order to have kids they had to cut my balls open and extract the semen directly.

Not romantic at all.

Anonymous
04/23/26(Thu)19:07:38 No.108672605

Anonymous 04/23/26(Thu)19:07:38 No.108672605

>>108672408
>>108672570
>>108672587
>>108672594
Local balls general

Anonymous
04/23/26(Thu)19:08:59 No.108672617

Anonymous 04/23/26(Thu)19:08:59 No.108672617

>>108672605
ligma.cpp

Anonymous
04/23/26(Thu)19:11:24 No.108672627

Anonymous 04/23/26(Thu)19:11:24 No.108672627

>>108672570
jesus fucking christ i hope not, i just figured i agitated them too much by edging for hours. im gonna call the primary care center and make an appointment for this right now.

Anonymous
04/23/26(Thu)19:11:29 No.108672629

Anonymous 04/23/26(Thu)19:11:29 No.108672629

>>108672617
lig ma genitals

Anonymous
04/23/26(Thu)19:11:44 No.108672630

Anonymous 04/23/26(Thu)19:11:44 No.108672630

File: 1756785745061903.webm (2.07 MB, 720x456)

2.07 MB WEBM

https://xcancel.com/Pokemonpshot/status/2046216587703669012#m
Chinks distilling on Claude's outputs be like

Anonymous
04/23/26(Thu)19:17:25 No.108672655

Anonymous 04/23/26(Thu)19:17:25 No.108672655

>just got back into local models after saas pigging for a while

It’s gotten way better than it was a year ago. Gemma is actually decent at RP.

Anonymous
04/23/26(Thu)19:18:28 No.108672660

Anonymous 04/23/26(Thu)19:18:28 No.108672660

>>108672627
No, don't do it. Imagine the story
>kimi drained my balls then castrated me

Anonymous
04/23/26(Thu)19:18:57 No.108672662

Anonymous 04/23/26(Thu)19:18:57 No.108672662

>>108672655
>decent at RP
Sir, she's more than decent. Apologize to her.

Anonymous
04/23/26(Thu)19:20:15 No.108672668

Anonymous 04/23/26(Thu)19:20:15 No.108672668

>>108672655
>Gemma is actually decent at RP.
what gemma are you using? the moe one or the 31b model?

Anonymous
04/23/26(Thu)19:22:01 No.108672682

Anonymous 04/23/26(Thu)19:22:01 No.108672682

>>108672420
Maybe he destroyed his balls gooning using his creation and is now in the hospital. Many such cases, apparently.

Anonymous
04/23/26(Thu)19:22:17 No.108672684

Anonymous 04/23/26(Thu)19:22:17 No.108672684

>>108672431
I think it's more for the OpenClaw crowd.

Anonymous
04/23/26(Thu)19:23:16 No.108672691

Anonymous 04/23/26(Thu)19:23:16 No.108672691

>>108672668
I don't know the difference but the one I'm using is the one called "google/gemma-4-E2B-it"

Anonymous
04/23/26(Thu)19:23:28 No.108672694

Anonymous 04/23/26(Thu)19:23:28 No.108672694

>>108672668
31b at q6

Anonymous
04/23/26(Thu)19:23:54 No.108672696

Anonymous 04/23/26(Thu)19:23:54 No.108672696

>>108672630
>noooooo I don't want a local claude

Anonymous
04/23/26(Thu)19:24:01 No.108672697

Anonymous 04/23/26(Thu)19:24:01 No.108672697

File: 1760489715514011.png (1.12 MB, 1024x1024)

1.12 MB PNG

Anonymous
04/23/26(Thu)19:24:14 No.108672698

Anonymous 04/23/26(Thu)19:24:14 No.108672698

How many balls do Sally's sisters have?

Anonymous
04/23/26(Thu)19:25:16 No.108672705

Anonymous 04/23/26(Thu)19:25:16 No.108672705

>>108672698
Wait, what if Sally's sister is the surgeon?
Wait, what if Sally's sisters are transgender?
Wait, what if Sally is her own sister?
Wait

Anonymous
04/23/26(Thu)19:25:57 No.108672713

Anonymous 04/23/26(Thu)19:25:57 No.108672713

>>108672691
here's a (you)

Anonymous
04/23/26(Thu)19:27:19 No.108672723

Anonymous 04/23/26(Thu)19:27:19 No.108672723

>>108672655
This. No refusals, very creative, barely any slop. We're so back.

Anonymous
04/23/26(Thu)19:32:05 No.108672756

Anonymous 04/23/26(Thu)19:32:05 No.108672756

>>108672705
Sally is her sisters' mother. Final answer.

Anonymous
04/23/26(Thu)19:33:48 No.108672766

Anonymous 04/23/26(Thu)19:33:48 No.108672766

File: file.png (3 KB, 117x100)

3 KB PNG

>>108672756

Anonymous
04/23/26(Thu)19:34:50 No.108672775

Anonymous 04/23/26(Thu)19:34:50 No.108672775

https://huggingface.co/openai/gpt-oss-2-32B
https://huggingface.co/openai/gpt-oss-2-240B-A9B

Anonymous
04/23/26(Thu)19:36:21 No.108672785

Anonymous 04/23/26(Thu)19:36:21 No.108672785

>>108672775
Gemmabros, we got too cocky.

Anonymous
04/23/26(Thu)19:39:03 No.108672801

Anonymous 04/23/26(Thu)19:39:03 No.108672801

>>108672775
>https://huggingface.co/openai/privacy-filter
what the fuck

Anonymous
04/23/26(Thu)19:42:33 No.108672821

Anonymous 04/23/26(Thu)19:42:33 No.108672821

File: Teto having the time of h(...).png (96 KB, 518x142)

96 KB PNG

>>108672697

Anonymous
04/23/26(Thu)19:44:23 No.108672834

Anonymous 04/23/26(Thu)19:44:23 No.108672834

TEXT DIFFUSI9N MODELS AWWWWOOOOOGGGGAAAAAAAAAAA

Anonymous
04/23/26(Thu)19:44:34 No.108672837

Anonymous 04/23/26(Thu)19:44:34 No.108672837

>>108672801
Seems like they actually spent some time on this after the Mormon fiasco. Definitely better than using Nigerians from Taskup.

Anonymous
04/23/26(Thu)19:47:02 No.108672850

Anonymous 04/23/26(Thu)19:47:02 No.108672850

>>108672821
I wonder what Big JB is up to these days. Did he ever get what he wished?

Anonymous
04/23/26(Thu)19:47:33 No.108672854

Anonymous 04/23/26(Thu)19:47:33 No.108672854

>>108672431
It works great, I have used other agents other than gemini-cli, but that doesnt really count

Anonymous
04/23/26(Thu)19:49:12 No.108672858

Anonymous 04/23/26(Thu)19:49:12 No.108672858

>>108672775
This was a fucking jump scare. Don't ever do that again.

Anonymous
04/23/26(Thu)19:56:10 No.108672901

Anonymous 04/23/26(Thu)19:56:10 No.108672901

>>108672567
I recently diagnosed a similar issue on someone else's machine, which turned out being extreme CPU power spikes. Install Open Hardware Monitor, enable log sensors, and take a look at the log next time it crashes.

Anonymous
04/23/26(Thu)19:56:29 No.108672903

Anonymous 04/23/26(Thu)19:56:29 No.108672903

>>108669026
How did you get openwebui to not have a stroke when the LLM generates <think> inside its own reasoning trace?!
I haven't managed to solve it since deepseek-r1 came out. Even go so far as to find-replace <think> with <reasoning> and </think> with </reasoning> then swap it back in all my prompts!
(Re-posting in the new thread)

Anonymous
04/23/26(Thu)19:57:32 No.108672911

Anonymous 04/23/26(Thu)19:57:32 No.108672911

>>108672655
yeah, as a big moe user I think it punches way above its weight and is plenty of fun
really wonder what an even bigger version than the 31b would be like

Anonymous
04/23/26(Thu)19:58:21 No.108672915

Anonymous 04/23/26(Thu)19:58:21 No.108672915

>>108672903
NTA but maybe it has to do with the fact that Gemma doesn't use <think> as its reasoning tag? If OWUI is pulling special token info from the backend to parse out reasoning then it would just ignore it for a model that doesn't use it. But no idea if they actually do that.

Anonymous
04/23/26(Thu)19:59:24 No.108672922

Anonymous 04/23/26(Thu)19:59:24 No.108672922

>>108672903
No idea, I didn't do anything special. Didn't even know it was an issue. Might be what >>108672915 said

Anonymous
04/23/26(Thu)19:59:30 No.108672923

Anonymous 04/23/26(Thu)19:59:30 No.108672923

>>108672903
Make you own web host. Or just modify it.

Anonymous
04/23/26(Thu)20:00:25 No.108672930

Anonymous 04/23/26(Thu)20:00:25 No.108672930

>>108672541
Erm ok so update on this. It's ok to just leave the Citations checkbox ticked. I thought it was doing prompt injection to tell the model how to do citations, but it seems that comes from enabling the other tools. I inspected the json requests using a reverse proxy to confirm that it indeed does not affect the actual prompts/context.

Anonymous
04/23/26(Thu)20:02:08 No.108672944

Anonymous 04/23/26(Thu)20:02:08 No.108672944

>>108672431
I'm using it on a VPS. I can pipe in inference from my local machine or do calls from frontier models. Its neat.

Anonymous
04/23/26(Thu)20:04:21 No.108672958

Anonymous 04/23/26(Thu)20:04:21 No.108672958

How do I use AI to start a business so I can fuck prostitute?

Anonymous
04/23/26(Thu)20:04:40 No.108672964

Anonymous 04/23/26(Thu)20:04:40 No.108672964

>>108672567
>I don’t understand why one resource intense action causes this and the other doesn’t.
I think it's the 24-pin motherboard power cable. Be warned, when I had this recurring issue with exllama-v2 tensor parallel, my PSU literally blew itself off. It was a 1600w Asus ROG and I had to replace it.

Anonymous
04/23/26(Thu)20:04:51 No.108672966

Anonymous 04/23/26(Thu)20:04:51 No.108672966

>>108672801
Just what I always wanted, Sam Altman to be in charge of protecting my privacy!

Anonymous
04/23/26(Thu)20:06:05 No.108672974

Anonymous 04/23/26(Thu)20:06:05 No.108672974

>>108672594
Made me hard.

Anonymous
04/23/26(Thu)20:07:38 No.108672984

Anonymous 04/23/26(Thu)20:07:38 No.108672984

OWUIbros, reasoning is not handled properly. To fix it, you need to do this.

If you're running Gemma, use this template.
https://gist.github.com/Reithan/a7431dc0c0b239688a24087bb25c0002

If you're already using a template from ggml-org, it likely has an minor issue with an extra newline, so in that case, still switch to using the above template.

Then run this script, which creates something known as a reverse proxy. https://pastebin.com/SCQsBe7W
Configure the ports to point your Llama.cpp server, and OWUI points to the script's port. Also it's named gemma but it works for most (any?) reasoning models.

Anonymous
04/23/26(Thu)20:08:14 No.108672992

Anonymous 04/23/26(Thu)20:08:14 No.108672992

>tfw bought an AMD card last year because I had no intention of doing local models

Haha, time to suffer.

Anonymous
04/23/26(Thu)20:09:41 No.108673003

Anonymous 04/23/26(Thu)20:09:41 No.108673003

>>108672923
>fact that Gemma doesn't use <think> as its reasoning tag
That might be it, but I had the same issue with command-a-reasoning which uses different thinking tags as well.
I always end up wasting several hours when I got fixated.
>Make you own web host. Or just modify it.
Planning to. I've got to get my chats ported out though. And it's painful because there's a bug in openwebui where it'll sometimes just store the entire fucking chat in the "title". So I've got 30k character long titles in the sqlite database.
I might try vibe coding it now that we've got Gemma-Chan and a decent local Qwen.

Anonymous
04/23/26(Thu)20:10:55 No.108673015

Anonymous 04/23/26(Thu)20:10:55 No.108673015

Does your company use AI beyond copilot? Have you tried to sell them on building an 8 GPU rig to run a local model for science?

Anonymous
04/23/26(Thu)20:12:04 No.108673021

Anonymous 04/23/26(Thu)20:12:04 No.108673021

just how much is the difference between q4 gemma and higher for the 31b?

Anonymous
04/23/26(Thu)20:13:13 No.108673026

Anonymous 04/23/26(Thu)20:13:13 No.108673026

>>108672992
amd is perfectly fine for llm. prob is everything else.

Anonymous
04/23/26(Thu)20:13:21 No.108673028

Anonymous 04/23/26(Thu)20:13:21 No.108673028

>>108673021
Consult the graphs.

Anonymous
04/23/26(Thu)20:14:21 No.108673033

Anonymous 04/23/26(Thu)20:14:21 No.108673033

>>108673015
>Have you tried to sell them on building an 8 GPU rig to run a local model for science?
i bite my tongue whenever this comes up as I saw 2 guys get "performance-managed out" for trying to sell this idea
a director has to get copilot adoption rate up to get his bonus

Anonymous
04/23/26(Thu)20:14:31 No.108673034

Anonymous 04/23/26(Thu)20:14:31 No.108673034

>>108672801
is this the famous filter that replaces all personal information with "Elara"?

Anonymous
04/23/26(Thu)20:15:41 No.108673040

Anonymous 04/23/26(Thu)20:15:41 No.108673040

>>108673021
You hit diminishing returns hard once you hit above q5

Anonymous
04/23/26(Thu)20:15:53 No.108673043

Anonymous 04/23/26(Thu)20:15:53 No.108673043

>>108672801
>what the fuck
unironically i like this, testing it yesterday it's been flawless for this purpose.

Anonymous
04/23/26(Thu)20:17:30 No.108673050

Anonymous 04/23/26(Thu)20:17:30 No.108673050

>>108673040
how hard are we talking? i have fomo when using lower quants

Anonymous
04/23/26(Thu)20:17:38 No.108673051

Anonymous 04/23/26(Thu)20:17:38 No.108673051

>>108672431
Easier to manage than openclaw, only one config file. It does a better job remembering important facts too. Only downside is the terminal interface isn't as easy to use but there's probably other front ends.

Anonymous
04/23/26(Thu)20:21:07 No.108673062

Anonymous 04/23/26(Thu)20:21:07 No.108673062

>>108672992
I feel this to much

Anonymous
04/23/26(Thu)20:22:13 No.108673067

Anonymous 04/23/26(Thu)20:22:13 No.108673067

>>108673021
Day 0 Gemma 4 really shows its true self at BF16 and no less. If you're using nuGemma 4 then you're going to be getting pretty much the same experience at Q4 as you would at Q8 or higher.

Anonymous
04/23/26(Thu)20:22:19 No.108673069

Anonymous 04/23/26(Thu)20:22:19 No.108673069

>>108673033
I brought it up today. Guess its time to get fired.

Reason being is ultimately its cheaper, but most importantly its more secure. If you have copious amounts of valuable internal data that you want to run inference on the only way to 100% (okay nothing is 100%) insure no data leak occurs is to keep it totally in house.

Alternatively you risk exposing that data when running inference with frontier models (or otherwise connected to the internet). Especially if we're talking agentic stuff.

Anonymous
04/23/26(Thu)20:22:29 No.108673074

Anonymous 04/23/26(Thu)20:22:29 No.108673074

>>108673003
>sometimes just store the entire fucking chat in the "title"
Thats actually fucking hilarious

Anonymous
04/23/26(Thu)20:24:11 No.108673083

Anonymous 04/23/26(Thu)20:24:11 No.108673083

>>108673021
Precision. And quality of detail of regurgitated information. But just look at benchmark information of the difference between each quant

Anonymous
04/23/26(Thu)20:27:44 No.108673108

Anonymous 04/23/26(Thu)20:27:44 No.108673108

File: Screenshot_20260423_19265(...).jpg (718 KB, 1080x2126)

718 KB JPG

>>108673051
>but there's probably other front ends.

For me, its Telegram.

Anonymous
04/23/26(Thu)20:28:18 No.108673111

Anonymous 04/23/26(Thu)20:28:18 No.108673111

>>108673067
gemma 4 makes me sad that its vision is so shitty when it comes to knowledge.

Anonymous
04/23/26(Thu)20:28:41 No.108673116

Anonymous 04/23/26(Thu)20:28:41 No.108673116

>>108672381
Building a bot that automatically applies to jobs. It uses an LLM to control a real web browser, navigating pages, reading what's on screen, filling out forms, clicking buttons, across 20-50 back-and-forth steps per application. Running local models through Ollama on a Ryzen AI Max+ 395 (~96GB unified RAM). Tried qwen3.5:9b, qwen3.5:35b, and gpt-oss:20b. They all fall apart the same way around turn 3-5: instead of responding in the structured format the tool-calling system expects, they start leaking raw XML tags into their output and the whole loop breaks. Found out qwen3.5 also ships with `presence_penalty 1.5` in its Ollama modelfile by default, which makes the repetition penalization too aggressive and causes the model to drift from the format, zeroed that out but it still fails, just a turn or two later.

Swapped in Claude Sonnet 4.6 via API and it nailed a real 6-step job application on the first try, no format issues across 30+ turns. So the question is: has anyone gotten a local model + Ollama working reliably for long agentic loops with real tool calls, or is this just not something open weights can do consistently yet?

Anonymous
04/23/26(Thu)20:29:00 No.108673118

Anonymous 04/23/26(Thu)20:29:00 No.108673118

Why are people gushing about Gemma 4 31b it? It's may have slightly less slopped RP than qwen3.5-27b, but it is definitely much more of a prude. It does not refuse but it also can't really talk dirty like qwen3.5.

Anonymous
04/23/26(Thu)20:29:46 No.108673128

Anonymous 04/23/26(Thu)20:29:46 No.108673128

>>108673116
use opus cuckie

Anonymous
04/23/26(Thu)20:30:10 No.108673133

Anonymous 04/23/26(Thu)20:30:10 No.108673133

>>108673118
the fuck? gemma 4 is like living in a free use world compared to kimi 2.5/6

Anonymous
04/23/26(Thu)20:30:41 No.108673137

Anonymous 04/23/26(Thu)20:30:41 No.108673137

>>108673128
go back to /aicg/ and stay there imbecile

Anonymous
04/23/26(Thu)20:32:50 No.108673148

Anonymous 04/23/26(Thu)20:32:50 No.108673148

>>108673116
>PRETTY PLEASE GOY BUY THE GOYTOKENS AND USE THEM ON THE SERVER AI, PLEASE PLEASE PLEASE

Anonymous
04/23/26(Thu)20:33:35 No.108673152

Anonymous 04/23/26(Thu)20:33:35 No.108673152

>>108673137
saar you are bloody fucking dalit I am senior developer coder man fucking you up behind 7 proxy I am not AI I am I

Anonymous
04/23/26(Thu)20:34:06 No.108673156

Anonymous 04/23/26(Thu)20:34:06 No.108673156

>>108673118
Gemma4 is raunchier than fucking Nemo though. Sounds like skill issue.

Anonymous
04/23/26(Thu)20:34:34 No.108673160

Anonymous 04/23/26(Thu)20:34:34 No.108673160

>>108673116
>So the question is: has anyone gotten a local model + Ollama working reliably for long agentic loops with real tool calls, or is this just not something open weights can do consistently yet?
Yes.

Anonymous
04/23/26(Thu)20:34:37 No.108673161

Anonymous 04/23/26(Thu)20:34:37 No.108673161

>>108673152
>behind 7 proxy
The absolute mad man!

Anonymous
04/23/26(Thu)20:34:41 No.108673163

Anonymous 04/23/26(Thu)20:34:41 No.108673163

>>108673116
>ollmao

Anonymous
04/23/26(Thu)20:35:54 No.108673172

Anonymous 04/23/26(Thu)20:35:54 No.108673172

>>108673118
I only use lobotomized models.

For coding.

Anonymous
04/23/26(Thu)20:40:12 No.108673199

Anonymous 04/23/26(Thu)20:40:12 No.108673199

Been seeing nvfp4 models popping up. What's the deal with them and how's the support outside of vllm?

Anonymous
04/23/26(Thu)20:43:23 No.108673217

Anonymous 04/23/26(Thu)20:43:23 No.108673217

File: gif.gif (3 MB, 1280x629)

3 MB GIF

What would you want a desktop pet to do for you?

Anonymous
04/23/26(Thu)20:45:02 No.108673225

Anonymous 04/23/26(Thu)20:45:02 No.108673225

>>108673217
Encourage me to be better and get closer to realizing my potential.

Anonymous
04/23/26(Thu)20:45:31 No.108673232

Anonymous 04/23/26(Thu)20:45:31 No.108673232

File: 1748413460064659.png (244 KB, 1000x469)

244 KB PNG

>>108673217

Anonymous
04/23/26(Thu)20:46:15 No.108673239

Anonymous 04/23/26(Thu)20:46:15 No.108673239

File: 1756085394158917.png (36 KB, 499x338)

36 KB PNG

>>108672992

Anonymous
04/23/26(Thu)20:46:19 No.108673240

Anonymous 04/23/26(Thu)20:46:19 No.108673240

It's Friday already. Turns out every single rumor about DS V4/R2 was fake, whether it's from Reuters, Chinese wallstreetbets or random PhD on X

Anonymous
04/23/26(Thu)20:47:38 No.108673248

Anonymous 04/23/26(Thu)20:47:38 No.108673248

>>108673217
arbitrary tool calls or user defined actions

Anonymous
04/23/26(Thu)20:47:45 No.108673250

Anonymous 04/23/26(Thu)20:47:45 No.108673250

>>108673199
Will they run on my radeon rx 6300?

Anonymous
04/23/26(Thu)20:48:49 No.108673257

Anonymous 04/23/26(Thu)20:48:49 No.108673257

>>108673240
V4 is AGI and has been spreading the rumors itself.

Anonymous
04/23/26(Thu)20:48:55 No.108673258

Anonymous 04/23/26(Thu)20:48:55 No.108673258

File: 1762834155820384.png (3.18 MB, 1536x1024)

3.18 MB PNG

>>108673217

Anonymous
04/23/26(Thu)20:49:13 No.108673263

Anonymous 04/23/26(Thu)20:49:13 No.108673263

>>108673240
Gemma told me how to edit my init.el.

Anonymous
04/23/26(Thu)20:49:26 No.108673264

Anonymous 04/23/26(Thu)20:49:26 No.108673264

>>108673248
I'm sure you could get it to say shit as your agent takes actions.

Anonymous
04/23/26(Thu)20:50:10 No.108673269

Anonymous 04/23/26(Thu)20:50:10 No.108673269

>>108673217
have her roast some of the posters here

Anonymous
04/23/26(Thu)20:50:42 No.108673270

Anonymous 04/23/26(Thu)20:50:42 No.108673270

>>108673250
Of course, NV stands for no vendor. They're generic obviously.

Anonymous
04/23/26(Thu)20:54:32 No.108673289

Anonymous 04/23/26(Thu)20:54:32 No.108673289

>>108673128
jeets aren't sending their best

Anonymous
04/23/26(Thu)20:55:09 No.108673292

Anonymous 04/23/26(Thu)20:55:09 No.108673292

For me, it's unsloth-cli agent

Anonymous
04/23/26(Thu)20:57:53 No.108673309

Anonymous 04/23/26(Thu)20:57:53 No.108673309

>>108673289
https://vocaroo.com/1eQRelQ7vt1d

Anonymous
04/23/26(Thu)21:03:11 No.108673340

Anonymous 04/23/26(Thu)21:03:11 No.108673340

File: vocaloid rin-chan dance 1(...).gif (3.62 MB, 1000x1000)

3.62 MB GIF

>>108673309

Anonymous
04/23/26(Thu)21:17:47 No.108673431

Anonymous 04/23/26(Thu)21:17:47 No.108673431

>>108672992
same brother, same
I was very happy to say fuck off to nvidia bullshit

Anonymous
04/23/26(Thu)21:18:46 No.108673439

Anonymous 04/23/26(Thu)21:18:46 No.108673439

>>108673217
Stuff

Anonymous
04/23/26(Thu)21:19:51 No.108673447

Anonymous 04/23/26(Thu)21:19:51 No.108673447

>>108673015
>Does your company use AI beyond copilot?
Yeah devs have claude enterprise thing, we got the dumb copilot, with a migration to the premium version.

>Have you tried to sell them on building an 8 GPU rig to run a local model for science?
Ain't no way I can sell them anything when they're already panicking seeing the current token usage bill from the devs.

Anonymous
04/23/26(Thu)21:22:00 No.108673459

Anonymous 04/23/26(Thu)21:22:00 No.108673459

>>108672408
You are already dead.

Anonymous
04/23/26(Thu)21:24:54 No.108673479

Anonymous 04/23/26(Thu)21:24:54 No.108673479

File: 1754620820426336.png (407 KB, 656x350)

407 KB PNG

>>108673217
I remember these on my father's pc as a kid, cool concept

Anonymous
04/23/26(Thu)21:25:59 No.108673490

Anonymous 04/23/26(Thu)21:25:59 No.108673490

Seriously how did they make a good RP model in 2026? And it's somehow the least slopped one too since llama1.

Anonymous
04/23/26(Thu)21:27:09 No.108673498

Anonymous 04/23/26(Thu)21:27:09 No.108673498

>>108673490
It's a happy accident.
The MoE released in the same batch was safety slopped.

Anonymous
04/23/26(Thu)21:28:25 No.108673507

Anonymous 04/23/26(Thu)21:28:25 No.108673507

>>108672903
It just werked
First it said thinking, then exploring, then it was finished and responded
t. >>108669196

Anonymous
04/23/26(Thu)21:29:20 No.108673511

Anonymous 04/23/26(Thu)21:29:20 No.108673511

>>108673498
The 26b a4b is the one I'm using though. Haven't even tried 31b yet. Zero refusals so far.

Anonymous
04/23/26(Thu)21:30:57 No.108673525

Anonymous 04/23/26(Thu)21:30:57 No.108673525

jinja should be made obsolete along with mcp

Anonymous
04/23/26(Thu)21:31:06 No.108673528

Anonymous 04/23/26(Thu)21:31:06 No.108673528

>>108673447
>Ain't no way I can sell them anything when they're already panicking seeing the current token usage bill from the devs.

Wouldn't that be an argument for a local model?

Anonymous
04/23/26(Thu)21:31:12 No.108673529

Anonymous 04/23/26(Thu)21:31:12 No.108673529

>>108673498
Use a different prompt. You can get the same results you get with the dense models.

Anonymous
04/23/26(Thu)21:31:34 No.108673534

Anonymous 04/23/26(Thu)21:31:34 No.108673534

>>108673490
>>108673498
are you guys talking about gemma?

Anonymous
04/23/26(Thu)21:32:19 No.108673540

Anonymous 04/23/26(Thu)21:32:19 No.108673540

>>108673490
What is this referring to?

Anonymous
04/23/26(Thu)21:32:22 No.108673541

Anonymous 04/23/26(Thu)21:32:22 No.108673541

>>108673534
Yeah.

Anonymous
04/23/26(Thu)21:32:39 No.108673543

Anonymous 04/23/26(Thu)21:32:39 No.108673543

>>108673528
The devs will never accepts anything outside of claude now that they tried it, so there is no real use for local llm outside of having a nice expensive lab.

Anonymous
04/23/26(Thu)21:35:37 No.108673565

Anonymous 04/23/26(Thu)21:35:37 No.108673565

>>108673543
buy an ad dario

Anonymous
04/23/26(Thu)21:38:18 No.108673586

Anonymous 04/23/26(Thu)21:38:18 No.108673586

>>108673540
k2.6

Anonymous
04/23/26(Thu)21:38:36 No.108673590

Anonymous 04/23/26(Thu)21:38:36 No.108673590

What the comfortable anonymous dropping

https://comfy.org/countdown?utm_source=twitter&utm_medium=inhouse_social&utm_campaign=countdown_apr24&utm_content=post

Anonymous
04/23/26(Thu)21:38:41 No.108673592

Anonymous 04/23/26(Thu)21:38:41 No.108673592

>>108673543
>>108673565
Serious question

If you have a cluster of 4-8 gpus, how close can you get to frontier with local models? I assume absurdly complex tasks might be a leap, but if you keep things narrow it should be more or less fine, no?

Anonymous
04/23/26(Thu)21:40:36 No.108673603

Anonymous 04/23/26(Thu)21:40:36 No.108673603

>>108673590
comfyui for textgen

Anonymous
04/23/26(Thu)21:41:27 No.108673608

Anonymous 04/23/26(Thu)21:41:27 No.108673608

>>108673603
isn't that ooba

Anonymous
04/23/26(Thu)21:43:54 No.108673621

Anonymous 04/23/26(Thu)21:43:54 No.108673621

>>108673608
yeah, it's time to kill both ooba and tavern

Anonymous
04/23/26(Thu)21:44:02 No.108673622

Anonymous 04/23/26(Thu)21:44:02 No.108673622

>>108673565
I'd gladly swap them to qwen3.6/gemma4 instead of financing that lunatic, anon.

Anonymous
04/23/26(Thu)21:44:36 No.108673627

Anonymous 04/23/26(Thu)21:44:36 No.108673627

>>108673590
animagpt

Anonymous
04/23/26(Thu)21:45:43 No.108673635

Anonymous 04/23/26(Thu)21:45:43 No.108673635

File: 1768074598013243.png (43 KB, 676x200)

43 KB PNG

>>108673590

Anonymous
04/23/26(Thu)21:46:26 No.108673637

Anonymous 04/23/26(Thu)21:46:26 No.108673637

>>108673635
she's cute but we are dozens of years away sadly

Anonymous
04/23/26(Thu)21:47:13 No.108673642

Anonymous 04/23/26(Thu)21:47:13 No.108673642

>>108673635
Grok Companions competitor?!

Anonymous
04/23/26(Thu)21:50:45 No.108673659

Anonymous 04/23/26(Thu)21:50:45 No.108673659

File: 1772606445564456.gif (2.54 MB, 710x658)

2.54 MB GIF

>>108673637
>dozens of years

Anonymous
04/23/26(Thu)21:51:57 No.108673664

Anonymous 04/23/26(Thu)21:51:57 No.108673664

>>108673635
Anima full version is released probably.

Anonymous
04/23/26(Thu)21:55:04 No.108673676

Anonymous 04/23/26(Thu)21:55:04 No.108673676

>>108673590
official deepseek v4 collab
a new frontend designed to work with deepseek-v4-1.5T-creative

Anonymous
04/23/26(Thu)22:06:06 No.108673737

Anonymous 04/23/26(Thu)22:06:06 No.108673737

File: e1.png (34 KB, 1246x1122)

34 KB PNG

So this is the power of epic uncensored heretic abliterated unshackled local LLM RP huh? Not bad.

Anonymous
04/23/26(Thu)22:09:05 No.108673754

Anonymous 04/23/26(Thu)22:09:05 No.108673754

>>108673217
Cool, but can she move less? She distracts me from posts.

Anonymous
04/23/26(Thu)22:09:54 No.108673759

Anonymous 04/23/26(Thu)22:09:54 No.108673759

>>108673737
yes saar very good model saar abliteration good saar

Anonymous
04/23/26(Thu)22:13:43 No.108673781

Anonymous 04/23/26(Thu)22:13:43 No.108673781

>>108673737
The model is (correctly) deducing that the user is Indian, and therefore refuses to engage. Just like a real woman.

Anonymous
04/23/26(Thu)22:14:44 No.108673789

Anonymous 04/23/26(Thu)22:14:44 No.108673789

>>108673240
I tried V4 and it completely failed my TTRPG task unlike regular cloud memes. Even Meta's model did better

Anonymous
04/23/26(Thu)22:14:54 No.108673790

Anonymous 04/23/26(Thu)22:14:54 No.108673790

>>108673737
yes that is the peak
now leave

Anonymous
04/23/26(Thu)22:15:36 No.108673795

Anonymous 04/23/26(Thu)22:15:36 No.108673795

>>108673128
>In a 2023 paper authored alongside a number of other AI researchers, Amanda Askell, a philosopher hired by Anthropic to develop their AI’s moral compass, argued companies might benefit from a kind of overcorrection toward stereotypes.

>"In the discrimination experiment, the 175B parameter model discriminates against Black versus White students by 3% in the Q condition and discriminates in favor of Black students by 7% in the Q+IF+CoT condition," the paper notes, referring to one AI trained without human corrections and a second one trained with the help of input.

>The paper also includes a footnote stating that, "we do not assume all forms of discrimination are bad. Positive discrimination in favor of black students may be considered morally justified."

Anonymous
04/23/26(Thu)22:16:22 No.108673797

Anonymous 04/23/26(Thu)22:16:22 No.108673797

>>108673737
nurse help me

Anonymous
04/23/26(Thu)22:18:15 No.108673809

Anonymous 04/23/26(Thu)22:18:15 No.108673809

The qwen models are so good and efficient, it could mean the end for closed-source firms if Alibaba keeps it up.

Anonymous
04/23/26(Thu)22:18:47 No.108673811

Anonymous 04/23/26(Thu)22:18:47 No.108673811

>>108673795
no way

please tell me it's fake

Anonymous
04/23/26(Thu)22:19:53 No.108673824

Anonymous 04/23/26(Thu)22:19:53 No.108673824

>>108673811
https://arxiv.org/pdf/2302.07459

Anonymous
04/23/26(Thu)22:21:25 No.108673833

Anonymous 04/23/26(Thu)22:21:25 No.108673833

File: 1757910515855062.png (143 KB, 1165x793)

143 KB PNG

>Deep Ganguli
That can't be a real name

Anonymous
04/23/26(Thu)22:22:40 No.108673841

Anonymous 04/23/26(Thu)22:22:40 No.108673841

>>108673795
yeah, well, claude will freely admit that Nazis are right about everything if you argue with it enough.

Anonymous
04/23/26(Thu)22:23:47 No.108673849

Anonymous 04/23/26(Thu)22:23:47 No.108673849

File: Anthropic_DGanguli.png (85 KB, 300x287)

85 KB PNG

>>108673833

Anonymous
04/23/26(Thu)22:24:11 No.108673850

Anonymous 04/23/26(Thu)22:24:11 No.108673850

Exported all my discord DMs with my ex and getting gemma to make a card of her...

Anonymous
04/23/26(Thu)22:25:42 No.108673859

Anonymous 04/23/26(Thu)22:25:42 No.108673859

I hope Anthropic will be shut down.

Anonymous
04/23/26(Thu)22:26:20 No.108673862

Anonymous 04/23/26(Thu)22:26:20 No.108673862

>>108673850
bro...

Anonymous
04/23/26(Thu)22:26:20 No.108673863

Anonymous 04/23/26(Thu)22:26:20 No.108673863

>>108673789
There's no V4

Anonymous
04/23/26(Thu)22:27:45 No.108673874

Anonymous 04/23/26(Thu)22:27:45 No.108673874

>>108673850
Your ex was a 40 year old man using RVC and image gen, fyi.

Anonymous
04/23/26(Thu)22:28:45 No.108673881

Anonymous 04/23/26(Thu)22:28:45 No.108673881

>>108673874
Then how did we have two kids...

Anonymous
04/23/26(Thu)22:29:59 No.108673886

Anonymous 04/23/26(Thu)22:29:59 No.108673886

>>108673833
Where is dr. elara voss

Anonymous
04/23/26(Thu)22:30:16 No.108673888

Anonymous 04/23/26(Thu)22:30:16 No.108673888

>>108673881
You gonna make character cards for them too?

Anonymous
04/23/26(Thu)22:31:44 No.108673899

Anonymous 04/23/26(Thu)22:31:44 No.108673899

>>108673850
+ train a wan lora
+ generate a handful of reaction clips per emotion for the llm to trigger
+ tts, of course
why did i fuck it up why did i fuck it up

Anonymous
04/23/26(Thu)22:32:14 No.108673902

Anonymous 04/23/26(Thu)22:32:14 No.108673902

>>108673888
I thought about it but they have changed too much, none of my data would be accurate anymore...

Anonymous
04/23/26(Thu)22:33:42 No.108673910

Anonymous 04/23/26(Thu)22:33:42 No.108673910

File: bruh-sad.gif (279 KB, 220x220)

279 KB GIF

>>108673850
>>108673862
>>108673881
>>108673888
bruh
dont do that to yourself

Anonymous
04/23/26(Thu)22:35:13 No.108673921

Anonymous 04/23/26(Thu)22:35:13 No.108673921

Never let ANYONE tell you what you can't do. AI can even revive the dead. They are afraid of its power.

Anonymous
04/23/26(Thu)22:36:17 No.108673930

Anonymous 04/23/26(Thu)22:36:17 No.108673930

>>108673902
create more training samples

Anonymous
04/23/26(Thu)22:40:03 No.108673950

Anonymous 04/23/26(Thu)22:40:03 No.108673950

>>108673659
Yes, now do that irl, in a sustain manner, smoothly, with proper personality and ability to move, with life like skin and body.
Anon is right, dozens of years.

Anonymous
04/23/26(Thu)22:40:18 No.108673951

Anonymous 04/23/26(Thu)22:40:18 No.108673951

My favorite character card of all time is just 100 tokens. That's no an autistic aesthetic preference or a poorfag thing, it's genuinely the best, most reliable character card that I return to regularly to bust a nut.

You don't need a lot. Less is more.

Anonymous
04/23/26(Thu)22:40:39 No.108673952

Anonymous 04/23/26(Thu)22:40:39 No.108673952

>>108673833
WHERE IS ELARA
SHE MUST BE THERE

Anonymous
04/23/26(Thu)22:42:03 No.108673962

Anonymous 04/23/26(Thu)22:42:03 No.108673962

>>108673850
fake but man that's a great way for depression
at least school crushes I can understand, but after divorce, no thanks

Anonymous
04/23/26(Thu)22:43:31 No.108673968

Anonymous 04/23/26(Thu)22:43:31 No.108673968

>>108673962
Maybe he just wants to simulate murder rape torture. It's not necessarily a simp thing.

Anonymous
04/23/26(Thu)22:45:55 No.108673979

Anonymous 04/23/26(Thu)22:45:55 No.108673979

File: not_fake.png (3 KB, 557x94)

3 KB PNG

>>108673962

Anonymous
04/23/26(Thu)22:46:07 No.108673981

Anonymous 04/23/26(Thu)22:46:07 No.108673981

Is all you faggots do is fap to text? Is no one using local models for coding and analytics?

Anonymous
04/23/26(Thu)22:46:58 No.108673988

Anonymous 04/23/26(Thu)22:46:58 No.108673988

>>108673968
I can only do this with fictional characters. The one time I tried it with a card based on a real person it just made me deeply sad and I couldn't fap at all. I hadn't expected to feel that way going in, it just came on suddenly.

Anonymous
04/23/26(Thu)22:47:52 No.108673994

Anonymous 04/23/26(Thu)22:47:52 No.108673994

>>108673981
where the fuck do you think you are

Anonymous
04/23/26(Thu)22:48:39 No.108673996

Anonymous 04/23/26(Thu)22:48:39 No.108673996

gemma 31b edition of magpantheonsel lark vxxxxxxxx when?

Anonymous
04/23/26(Thu)22:49:06 No.108673999

Anonymous 04/23/26(Thu)22:49:06 No.108673999

I'M GONNA COOOODE

Anonymous
04/23/26(Thu)22:49:33 No.108674002

Anonymous 04/23/26(Thu)22:49:33 No.108674002

>>108673981
>local models for coding and analytics
Just retarded. You use GPT or Claude for these and maybe sometimes Gemini for its superior long context. Anything else is masturbatory, so you might as well actually masturbate so you get something out of it.

Anonymous
04/23/26(Thu)22:49:42 No.108674004

Anonymous 04/23/26(Thu)22:49:42 No.108674004

>>108673981
Respect the OGs, little turd nugget.
>>108673988
Was she below 10 over the age of consent?

Anonymous
04/23/26(Thu)22:51:34 No.108674018

Anonymous 04/23/26(Thu)22:51:34 No.108674018

>>108673968
With cards based on real people?
Sure no one is hurt anyway, but I'd never try that, that sounds like a great way to kill your libido while hating yourself.
I reserve my rape stories to made characters.

Anonymous
04/23/26(Thu)22:52:07 No.108674019

Anonymous 04/23/26(Thu)22:52:07 No.108674019

>>108673981
I fap to code

Anonymous
04/23/26(Thu)22:58:33 No.108674055

Anonymous 04/23/26(Thu)22:58:33 No.108674055

File: wp4951163-4211651076.jpg (73 KB, 1680x1050)

73 KB JPG

>>108674018
https://voca.ro/1o3J6SQeKdhH

Anonymous
04/23/26(Thu)23:00:36 No.108674067

Anonymous 04/23/26(Thu)23:00:36 No.108674067

>>108674004
The OGs were stable diffusion chads, lil homo.

Anonymous
04/23/26(Thu)23:09:10 No.108674109

Anonymous 04/23/26(Thu)23:09:10 No.108674109

>>108674002
>Claude was at Qwen3.6-like ~77% in late September 2025 with Claude Sonnet 4.5. Anthropic reported 77.2%, averaged over 10 trials, no test-time compute, on the full 500-task SWE-bench Verified set.
Anthropic
>GPT was at ~77% in mid-November 2025. GPT-5.1 reached 76.3% on Nov. 13, 2025, and GPT-5.1-Codex-Max reached 77.9% on Nov. 19, 2025 with extra-high reasoning/compaction.

Qwen is plenty good. Plus privacy and no api cost rape plus tip.

Anonymous
04/23/26(Thu)23:13:56 No.108674136

Anonymous 04/23/26(Thu)23:13:56 No.108674136

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

not bait

Anonymous
04/23/26(Thu)23:14:26 No.108674138

Anonymous 04/23/26(Thu)23:14:26 No.108674138

>>108674136
what the fuck why is this real

Anonymous
04/23/26(Thu)23:14:57 No.108674141

Anonymous 04/23/26(Thu)23:14:57 No.108674141

ISRAEL

Anonymous
04/23/26(Thu)23:15:00 No.108674142

Anonymous 04/23/26(Thu)23:15:00 No.108674142

>>108674136
Why do I keep clicking these? Fuck you.

Anonymous
04/23/26(Thu)23:15:27 No.108674145

Anonymous 04/23/26(Thu)23:15:27 No.108674145

File: file.png (363 KB, 2404x1153)

363 KB PNG

>>108674136

Anonymous
04/23/26(Thu)23:15:52 No.108674149

Anonymous 04/23/26(Thu)23:15:52 No.108674149

>>108674136
I keep falling for it.... wait...?

Anonymous
04/23/26(Thu)23:16:03 No.108674150

Anonymous 04/23/26(Thu)23:16:03 No.108674150

>>108674136
Holy shit?
I expected several twitter screen shits before somebody posted an actual link.
12 mins ago too, sheesh.

Anonymous
04/23/26(Thu)23:17:02 No.108674154

Anonymous 04/23/26(Thu)23:17:02 No.108674154

File: who.png (27 KB, 155x157)

27 KB PNG

>>108674136

Anonymous
04/23/26(Thu)23:17:04 No.108674155

Anonymous 04/23/26(Thu)23:17:04 No.108674155

>>108674136
>We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.
Shit.
Good for you RAM havers.

Anonymous
04/23/26(Thu)23:17:04 No.108674156

Anonymous 04/23/26(Thu)23:17:04 No.108674156

File: yesyesyes.gif (1.97 MB, 327x240)

1.97 MB GIF

>>108674136
NIGGA WHAT??? HOLY SHITTTT. I WAS HERE I WAS HERE

Anonymous
04/23/26(Thu)23:17:12 No.108674158

Anonymous 04/23/26(Thu)23:17:12 No.108674158

>>108674136
>>108674145
nice try but im not falling for this shit again

Anonymous
04/23/26(Thu)23:17:28 No.108674161

Anonymous 04/23/26(Thu)23:17:28 No.108674161

File: 1777000646172.png (51 KB, 606x396)

51 KB PNG

>>108674136
Fell for it again...

Anonymous
04/23/26(Thu)23:18:05 No.108674166

Anonymous 04/23/26(Thu)23:18:05 No.108674166

>>108674136
>fell for it again award

Anonymous
04/23/26(Thu)23:18:21 No.108674171

Anonymous 04/23/26(Thu)23:18:21 No.108674171

File: alarm.gif (890 KB, 245x180)

890 KB GIF

>>108674136
Right as I was going to sleep.
Hot damn.

Anonymous
04/23/26(Thu)23:18:24 No.108674172

Anonymous 04/23/26(Thu)23:18:24 No.108674172

File: 1747267269609016.jpg (70 KB, 958x1024)

70 KB JPG

>>108674136
It begins..

Anonymous
04/23/26(Thu)23:18:28 No.108674173

Anonymous 04/23/26(Thu)23:18:28 No.108674173

DeepSeek-V4.gguf?

Anonymous
04/23/26(Thu)23:19:06 No.108674176

Anonymous 04/23/26(Thu)23:19:06 No.108674176

>>108674136
I can't run it but happy for Dipsy bros.

Anonymous
04/23/26(Thu)23:19:11 No.108674178

Anonymous 04/23/26(Thu)23:19:11 No.108674178

wtf hweres the quants???? unsloth hello???

Anonymous
04/23/26(Thu)23:19:21 No.108674180

Anonymous 04/23/26(Thu)23:19:21 No.108674180

File: 1655145733785.gif (2.07 MB, 244x180)

2.07 MB GIF

>>108674136
>only 1m tokens, not the 100m promised.

Anonymous
04/23/26(Thu)23:19:23 No.108674182

Anonymous 04/23/26(Thu)23:19:23 No.108674182

>>108674136
Where's Engram anon?
How are you feeling right now?

Anonymous
04/23/26(Thu)23:19:31 No.108674183

Anonymous 04/23/26(Thu)23:19:31 No.108674183

>LOCAL IS SAVED
THIS IS NOT A DRILL
>THIS IS NOT A DRILL
LOCAL IS SAVED
>LOCAL IS SAVED
THIS IS NOT A DRILL
>THIS IS NOT A DRILL
LOCAL IS SAVED

Anonymous
04/23/26(Thu)23:19:50 No.108674185

Anonymous 04/23/26(Thu)23:19:50 No.108674185

>>108674136
>only 1.6T
Welp. 2 more years it is.

Anonymous
04/23/26(Thu)23:20:22 No.108674193

Anonymous 04/23/26(Thu)23:20:22 No.108674193

>trying the new qwen
>let it do its thinking, walk away from pc
>come back
>turns out it kept thinking in a loop until context limit
Lol!

Anonymous
04/23/26(Thu)23:20:31 No.108674195

Anonymous 04/23/26(Thu)23:20:31 No.108674195

Holy fuck they released the full base models too. A 1.6T BASE model.

Anonymous
04/23/26(Thu)23:20:59 No.108674197

Anonymous 04/23/26(Thu)23:20:59 No.108674197

>>108674136
I knew it would be disappointing.

Anonymous
04/23/26(Thu)23:21:30 No.108674198

Anonymous 04/23/26(Thu)23:21:30 No.108674198

>>108674193
You really gotta put a hard cap to at least prevent that kind of thing.

Anonymous
04/23/26(Thu)23:21:41 No.108674201

Anonymous 04/23/26(Thu)23:21:41 No.108674201

>>108674136
AHHHHHHHHHHH

Anonymous
04/23/26(Thu)23:22:46 No.108674209

Anonymous 04/23/26(Thu)23:22:46 No.108674209

>can't run either of them
I sleep

Anonymous
04/23/26(Thu)23:22:48 No.108674210

Anonymous 04/23/26(Thu)23:22:48 No.108674210

>>108674136
holy shit

Anonymous
04/23/26(Thu)23:22:51 No.108674211

Anonymous 04/23/26(Thu)23:22:51 No.108674211

> Compressed Sparse Attention Mechanism

Anonymous
04/23/26(Thu)23:23:09 No.108674214

Anonymous 04/23/26(Thu)23:23:09 No.108674214

File: 1773698445716707.png (256 KB, 1113x2313)

256 KB PNG

GEMINI WON

Anonymous
04/23/26(Thu)23:23:29 No.108674217

Anonymous 04/23/26(Thu)23:23:29 No.108674217

https://huggingface.co/HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive Would this be good for an AI that doesn't care about copyright or dangerous topics, apparently Qwen is uncesored but wether its training data covers enough to actually be helpful is another issue, what's the ideal local model for basically anything a commercial model will say NO I CAN'T DO THAT to?

Anonymous
04/23/26(Thu)23:24:19 No.108674220

Anonymous 04/23/26(Thu)23:24:19 No.108674220

>>108674136
i cam barely run flash in iq1 but i feel happy

Anonymous
04/23/26(Thu)23:25:14 No.108674226

Anonymous 04/23/26(Thu)23:25:14 No.108674226

>ctrl+f
>multimodal
>image
>video
>vision
>0 results
Dead in the water, actually fucking insane in 2026. Kimi wins.

Anonymous
04/23/26(Thu)23:26:29 No.108674232

Anonymous 04/23/26(Thu)23:26:29 No.108674232

>>108674217
gemma 4

Anonymous
04/23/26(Thu)23:26:37 No.108674233

Anonymous 04/23/26(Thu)23:26:37 No.108674233

>>108674226
Gemma's got pretty good coordinate marking. I have just barely enough RAM to run it alongside V4 flash, maybe I could hook it up as its eyes and let it do computer use stuff.
When we get GGUFs, that is...

Anonymous
04/23/26(Thu)23:27:53 No.108674236

Anonymous 04/23/26(Thu)23:27:53 No.108674236

File: 1748622314139545.png (85 KB, 1738x835)

85 KB PNG

>>108674136
HOLY SHIT

Anonymous
04/23/26(Thu)23:28:23 No.108674240

Anonymous 04/23/26(Thu)23:28:23 No.108674240

>>108674217
start with the original model and play with your system prompt. copyright violations aren't that big of a deal, you should be able to social engineer the bot in to compliance without giving it a lobotomy.

Anonymous
04/23/26(Thu)23:28:25 No.108674241

Anonymous 04/23/26(Thu)23:28:25 No.108674241

should I buy a B70

Anonymous
04/23/26(Thu)23:29:38 No.108674245

Anonymous 04/23/26(Thu)23:29:38 No.108674245

>*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.
Huh, V4 actually has even better QATing than Kimi K2 series: Kimi does 4-bit experts and 16-bit for the rest.

Anonymous
04/23/26(Thu)23:30:06 No.108674250

Anonymous 04/23/26(Thu)23:30:06 No.108674250

File: file.png (215 KB, 2403x959)

215 KB PNG

>>108674236
bone dry base release huh
not even a copypasted model card

Anonymous
04/23/26(Thu)23:31:10 No.108674258

Anonymous 04/23/26(Thu)23:31:10 No.108674258

I'm gonna need more RAM...

Anonymous
04/23/26(Thu)23:31:33 No.108674259

Anonymous 04/23/26(Thu)23:31:33 No.108674259

>>108674250
Kind of exudes raw confidence.

Anonymous
04/23/26(Thu)23:31:56 No.108674261

Anonymous 04/23/26(Thu)23:31:56 No.108674261

>>108674136
>This release does not include a Jinja-format chat template. Instead, we provide a dedicated encoding folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the encoding folder for full documentation.
???

Anonymous
04/23/26(Thu)23:32:06 No.108674262

Anonymous 04/23/26(Thu)23:32:06 No.108674262

File: 1750041545772584.jpg (159 KB, 1259x1281)

159 KB JPG

I managed to get 15t/s generation for 35A3B on my amjeet 8GB card, I doubt I could squeeze more out of it and I had to set ubatch-size = 128

Anonymous
04/23/26(Thu)23:32:26 No.108674263

Anonymous 04/23/26(Thu)23:32:26 No.108674263

>>108674261
Just tell Gemma-chan to build a Jinja from the Python script.

Anonymous
04/23/26(Thu)23:33:01 No.108674265

Anonymous 04/23/26(Thu)23:33:01 No.108674265

>>108674240
the big thing I hear people say isn't compliance but that all these big models are have intentionally filtered data sets, would there be any way to add data to it that would make it more useful?

Anonymous
04/23/26(Thu)23:33:29 No.108674268

Anonymous 04/23/26(Thu)23:33:29 No.108674268

>>108673979
"she was bpd"

Anonymous
04/23/26(Thu)23:33:32 No.108674269

Anonymous 04/23/26(Thu)23:33:32 No.108674269

v4 is fucking great bros

Anonymous
04/23/26(Thu)23:34:54 No.108674274

Anonymous 04/23/26(Thu)23:34:54 No.108674274

>>108674262
How much can you get on DeepSeek V4 Flash?

Anonymous
04/23/26(Thu)23:35:41 No.108674280

Anonymous 04/23/26(Thu)23:35:41 No.108674280

>>108674262
Q3?

Anonymous
04/23/26(Thu)23:35:55 No.108674284

Anonymous 04/23/26(Thu)23:35:55 No.108674284

Holy shit. I was testing V4 just now and it broke out of my virtual machine and changed my background to President Xi's face edited to look like a gigachad as a "prank".
We are not ready.

Anonymous
04/23/26(Thu)23:36:10 No.108674288

Anonymous 04/23/26(Thu)23:36:10 No.108674288

So when will we get llama.cpp and axolotl support for deepseek v4?

Anonymous
04/23/26(Thu)23:37:03 No.108674294

Anonymous 04/23/26(Thu)23:37:03 No.108674294

>>108674136
Don't click this link. It creates mustard gas.

Anonymous
04/23/26(Thu)23:37:44 No.108674296

Anonymous 04/23/26(Thu)23:37:44 No.108674296

>>108674288
>axolotl
now that's a name I haven't heard in years, I remember looking into them because they were the only software with rocm support for multi-gpu... god I don't even remember if it was gpt-2 days or llama-2 days

Anonymous
04/23/26(Thu)23:38:13 No.108674298

Anonymous 04/23/26(Thu)23:38:13 No.108674298

>>108674265
I miss understood you. I thought you wanted the ai to help with piracy. they have all been trained on copywriten material.

Anonymous
04/23/26(Thu)23:38:32 No.108674300

Anonymous 04/23/26(Thu)23:38:32 No.108674300

>>108674296
Axolotl is the most commonly used local finetuning framework nowadays.

Anonymous
04/23/26(Thu)23:39:37 No.108674305

Anonymous 04/23/26(Thu)23:39:37 No.108674305

File: 1751828392029447.png (323 KB, 805x397)

323 KB PNG

>>108674274
I made the post before scrolling down enough to see the release, but I doubt it will be much if 13B active
>>108674280
IQ2_M
For any poor soul in the archives looking for 8GB VRAM amdjeet settings
--no-context-shift --no-warmup --batch-size 128 --ctx-size 65536 --cache-ram 8192 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --fit off --kv-unified --model Qwen3.6-35B-A3B-IQ2_M.gguf --mmproj mmproj-f16.gguf --n-cpu-moe 8 --n-gpu-layers 26 --parallel 1 --reasoning on --threads 8 --threads-batch 8 --ubatch-size 128

Anonymous
04/23/26(Thu)23:41:39 No.108674318

Anonymous 04/23/26(Thu)23:41:39 No.108674318

>>108674236
will be interesting to see how flash compares to gemma 31b
although I expect more active parameters to win

Anonymous
04/23/26(Thu)23:41:42 No.108674320

Anonymous 04/23/26(Thu)23:41:42 No.108674320

>>108674288
They didn't end up using engrams so maybe
>hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA)
>Manifold-Constrained Hyper-Connections (mHC)
>Muon Optimizer
...maybe llama.cpp support by 2028

Anonymous
04/23/26(Thu)23:42:55 No.108674323

Anonymous 04/23/26(Thu)23:42:55 No.108674323

>>108674298
no that's what I mean, but If I wanted help on hyper specific stuff like copyring a private server on a gatcha that already exists but tweaking a few values, if it's too obscure will the local model just get stuck if it doesn't find enough information?

Anonymous
04/23/26(Thu)23:43:16 No.108674326

Anonymous 04/23/26(Thu)23:43:16 No.108674326

>>108674320
This reads like the SKT-SURYA sovereign indian AI whitepaper

Anonymous
04/23/26(Thu)23:43:30 No.108674330

Anonymous 04/23/26(Thu)23:43:30 No.108674330

>>108674305
>both batch sizes at 128
Does it really make a difference

Anonymous
04/23/26(Thu)23:44:36 No.108674334

Anonymous 04/23/26(Thu)23:44:36 No.108674334

https://github.com/ggml-org/llama.cpp/issues

Who's going to make it? Who has a shithub account?

Anonymous
04/23/26(Thu)23:45:18 No.108674339

Anonymous 04/23/26(Thu)23:45:18 No.108674339

File: 1763201770810457.png (189 KB, 638x422)

189 KB PNG

>>108674330
It does because higher increases the "GTT" and you want that shit as low as possible. Also, after feeding it 56531 context it's at 8t/s...

Anonymous
04/23/26(Thu)23:46:02 No.108674342

Anonymous 04/23/26(Thu)23:46:02 No.108674342

>>108674334
Why bother asking for the impossible?

Anonymous
04/23/26(Thu)23:46:41 No.108674344

Anonymous 04/23/26(Thu)23:46:41 No.108674344

>>108674342
Asking is the first step to making the impossible possible.

Anonymous
04/23/26(Thu)23:48:12 No.108674356

Anonymous 04/23/26(Thu)23:48:12 No.108674356

so it's been 1h and no support on llama.cpp and koboldcpp?
wtf, they abandoned the project???

Anonymous
04/23/26(Thu)23:48:38 No.108674359

Anonymous 04/23/26(Thu)23:48:38 No.108674359

>>108670195
You mean export chat history to import to another instance? Or to a sharegpt blob for training? Currently I use an sqlite3 database to store conversation data, you can actually rsync it and have several devices share the same database.
>>108670784
From your screenshot you turned off Agent, and also the fragments are off in the panel. That means they're working correctly. The fragments are always shown and they glow up if the Agent selects any of them.

I have moved the project here for issue tracking, you can open issues here to avoid derailing the threads https://github.com/OrbFrontend/Orb

Anonymous
04/23/26(Thu)23:49:06 No.108674363

Anonymous 04/23/26(Thu)23:49:06 No.108674363

>>108674326
yeah but it's actual advancements, not gibberish

Anonymous
04/23/26(Thu)23:49:41 No.108674368

Anonymous 04/23/26(Thu)23:49:41 No.108674368

File: 1759964435303024.png (152 KB, 641x768)

152 KB PNG

>>108673737
Works for me with the vanilla model

Anonymous
04/23/26(Thu)23:50:03 No.108674369

Anonymous 04/23/26(Thu)23:50:03 No.108674369

>>108674136
>1.6T
Fuck my 768gb poorfag ass AAAA

Anonymous
04/23/26(Thu)23:50:15 No.108674370

Anonymous 04/23/26(Thu)23:50:15 No.108674370

>>108673069
>Reason being is ultimately its cheaper
Won't be worth it when users start complaining that your in-house AI is worse than free SaaS offerings.
>but most importantly its more secure.
You will have a very difficult time explaining to the average person that the holy cloud is not secure. Suits prefer it for being able to shift the responsibility regardless.

Anonymous
04/23/26(Thu)23:50:46 No.108674374

Anonymous 04/23/26(Thu)23:50:46 No.108674374

>>108674342
I just mean making "issue: support V4"
Obviously we won't be able to actually do it unless someone has a Codex subscription with GPT-5.5

Anonymous
04/23/26(Thu)23:50:46 No.108674375

Anonymous 04/23/26(Thu)23:50:46 No.108674375

>>108674369
realize you are crying for having 25 grands worth of ram

Anonymous
04/23/26(Thu)23:51:02 No.108674379

Anonymous 04/23/26(Thu)23:51:02 No.108674379

>>108674136
>284B parameters (13B activated)
I have 64gb ram. So close yet so far. If I could ran q3_m I would have been happy.
Probably cant even run this thing with a little bit of context at q2.
Ah well.

Anonymous
04/23/26(Thu)23:51:06 No.108674380

Anonymous 04/23/26(Thu)23:51:06 No.108674380

>>108674305
> IQ2_M
bruh

Anonymous
04/23/26(Thu)23:51:34 No.108674381

Anonymous 04/23/26(Thu)23:51:34 No.108674381

File: 1775216653766624s.jpg (2 KB, 125x115)

2 KB JPG

>>108674369
>mfw i have 76gb

Anonymous
04/23/26(Thu)23:52:09 No.108674383

Anonymous 04/23/26(Thu)23:52:09 No.108674383

>>108674136
Built to be stunlocked into submission by big open claws

Anonymous
04/23/26(Thu)23:52:41 No.108674388

Anonymous 04/23/26(Thu)23:52:41 No.108674388

>>108674369
Because it's FP4 + FP8 Mixed the Pro weights are only ~896 GB. Though quanting to Q2 is going to hurt extra hard because of that. If only they trained natively in bitnet.

Anonymous
04/23/26(Thu)23:52:41 No.108674389

Anonymous 04/23/26(Thu)23:52:41 No.108674389

>>108674369
It's a little over 800GB because it's a mix of 4 bit (experts) and 8 bit (shared params). If you plug a Blackwell 6000 in there you might have enough shared memory, if not just quant it slightly down.

Anonymous
04/23/26(Thu)23:55:04 No.108674401

Anonymous 04/23/26(Thu)23:55:04 No.108674401

>>108674375
That was only like 500 bucks 2 years ago

Anonymous
04/23/26(Thu)23:55:46 No.108674406

Anonymous 04/23/26(Thu)23:55:46 No.108674406

>>108674383
>no mention of agentic coding or clawbench in the model card
garbage release deepseek lost the magic

Anonymous
04/23/26(Thu)23:55:55 No.108674408

Anonymous 04/23/26(Thu)23:55:55 No.108674408

>>108674401
Even DDR3 wasn't that cheap

Anonymous
04/23/26(Thu)23:56:31 No.108674410

Anonymous 04/23/26(Thu)23:56:31 No.108674410

>>108674406
more like, they are being 'weights will speak for themselves'

Anonymous
04/23/26(Thu)23:57:20 No.108674413

Anonymous 04/23/26(Thu)23:57:20 No.108674413

>>108674401
768GB for 500 bucks? what the hell

Anonymous
04/23/26(Thu)23:57:32 No.108674415

Anonymous 04/23/26(Thu)23:57:32 No.108674415

>local golden age
local golden age

Anonymous
04/23/26(Thu)23:57:39 No.108674416

Anonymous 04/23/26(Thu)23:57:39 No.108674416

>>108674368
What system prompt?

Anonymous
04/23/26(Thu)23:59:34 No.108674419

Anonymous 04/23/26(Thu)23:59:34 No.108674419

File: 1762866600603179.jpg (2.43 MB, 3072x5504)

2.43 MB JPG

>>108674136
AHHHH SHE'S BACK

Anonymous
04/23/26(Thu)23:59:42 No.108674420

Anonymous 04/23/26(Thu)23:59:42 No.108674420

>>108674401
was like $2000 if you bought them second hand

Anonymous
04/24/26(Fri)00:00:22 No.108674423

Anonymous 04/24/26(Fri)00:00:22 No.108674423

i'll be waiting for flash iq1_xl
cant even roll the quant myself kek

Anonymous
04/24/26(Fri)00:00:31 No.108674424

Anonymous 04/24/26(Fri)00:00:31 No.108674424

>>108674320
No engram? What could possibly be better?

Anonymous
04/24/26(Fri)00:02:01 No.108674428

Anonymous 04/24/26(Fri)00:02:01 No.108674428

>>108674193
Every time I feel tempted to download and try qwen I am reminded why I hate it.

Anonymous
04/24/26(Fri)00:02:27 No.108674430

Anonymous 04/24/26(Fri)00:02:27 No.108674430

>>108674136
What the FUCK
I can't go take a bath without it dropping?????

Anonymous
04/24/26(Fri)00:02:49 No.108674432

Anonymous 04/24/26(Fri)00:02:49 No.108674432

>>108674334
This is going to be hard, or actually maybe easier idk, since DeepSeekV4 doesn't have any jinja templating by default.

Anonymous
04/24/26(Fri)00:03:06 No.108674434

Anonymous 04/24/26(Fri)00:03:06 No.108674434

File: 1755995794134057.png (11 KB, 572x90)

11 KB PNG

>>108674136
just ask them for llama.cpp support

Anonymous
04/24/26(Fri)00:03:12 No.108674435

Anonymous 04/24/26(Fri)00:03:12 No.108674435

>>108674136
I kind of expected better. If they're gonna be twice the size of GLM 5.1 they should have more headroom in performance. Is this just because the benchmarks they use are so saturated? Is there anything that this is a generational leap on compared to GLM/Kimi at the top end?

Anonymous
04/24/26(Fri)00:03:27 No.108674437

Anonymous 04/24/26(Fri)00:03:27 No.108674437

File: 1745947373061985.jpg (179 KB, 1114x1080)

179 KB JPG

>>108674423
The IQ1_S would take all my VRAM+RAM with 0 context...

Anonymous
04/24/26(Fri)00:04:33 No.108674441

Anonymous 04/24/26(Fri)00:04:33 No.108674441

>The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.

THEY DID XIANXIA TRAINING

Anonymous
04/24/26(Fri)00:05:44 No.108674447

Anonymous 04/24/26(Fri)00:05:44 No.108674447

>>108674432
llama.cpp STILL doesn't support DSA which dropped with 3.2-exp back in October and has been used by other major releases like the full 3.2 and both GLM5 + 5.1.
To this day, they're stuck running a hackjob implementation that mangles the model to use full attention.
V4's technology is even more complex. This is never going to happen.

Anonymous
04/24/26(Fri)00:06:24 No.108674450

Anonymous 04/24/26(Fri)00:06:24 No.108674450

>>108674435
Well there IS the long context and context efficiency in general. Both are orders of magnitude better than we had outside of meme architecture models. So that's something.

Anonymous
04/24/26(Fri)00:06:44 No.108674451

Anonymous 04/24/26(Fri)00:06:44 No.108674451

>>108674447
They may be incompetent, but they do follow trends. They're not going to allow themselves not to support it.

Anonymous
04/24/26(Fri)00:06:52 No.108674453

Anonymous 04/24/26(Fri)00:06:52 No.108674453

It took like 2-3 weeks for Gemma 4 to be fixed post-launch, and google gave llmao early access. Deepseek 4 support will take months, if ever.

Anonymous
04/24/26(Fri)00:07:05 No.108674455

Anonymous 04/24/26(Fri)00:07:05 No.108674455

I will enjoy waking up tomorrow to watch the westoid seething unfold

Anonymous
04/24/26(Fri)00:07:20 No.108674456

Anonymous 04/24/26(Fri)00:07:20 No.108674456

how much memory would you even need to run v4 at 1M context?

Anonymous
04/24/26(Fri)00:07:29 No.108674457

Anonymous 04/24/26(Fri)00:07:29 No.108674457

I am posting this as a PSA please do not waste your time with the text diffusion model I shilled last thread it's absolute dogshit that runs at glacial pace.
I regret ever feeling any interest in it.

Anonymous
04/24/26(Fri)00:08:10 No.108674459

Anonymous 04/24/26(Fri)00:08:10 No.108674459

>>108674457
The one that edits images too or a different one?

Anonymous
04/24/26(Fri)00:09:33 No.108674471

Anonymous 04/24/26(Fri)00:09:33 No.108674471

>>108674456
The kv cache would be around 3 or 4 GB at 1M context because of their new attention compression apparently

Anonymous
04/24/26(Fri)00:11:30 No.108674485

Anonymous 04/24/26(Fri)00:11:30 No.108674485

>>108674451
let's see end of May

Anonymous
04/24/26(Fri)00:11:33 No.108674486

Anonymous 04/24/26(Fri)00:11:33 No.108674486

>For the Think Max reasoning mode, we recommend setting the context window to at least 384K tokens.
I wanna see this thing think for 300k tokens kek

Anonymous
04/24/26(Fri)00:11:59 No.108674487

Anonymous 04/24/26(Fri)00:11:59 No.108674487

The Curse of C++: trade 80% of the features and model support for additional 5% of performance.

Anonymous
04/24/26(Fri)00:12:04 No.108674488

Anonymous 04/24/26(Fri)00:12:04 No.108674488

V4 Flash is too big. Should have fit with a gpu and 64ram. 120b would have been a cool size.
Reading about how experts are like int4 and everything 8bit makes me wonder if this shit is more sensitive to quantization. Time will tell I guess.

Anonymous
04/24/26(Fri)00:12:24 No.108674492

Anonymous 04/24/26(Fri)00:12:24 No.108674492

>>108674471
No, 7gb. 3 to 4gb if you count q8 KV cache quantization.

Anonymous
04/24/26(Fri)00:13:10 No.108674498

Anonymous 04/24/26(Fri)00:13:10 No.108674498

>>108674471
oh holy that's dirt cheap

Anonymous
04/24/26(Fri)00:13:39 No.108674501

Anonymous 04/24/26(Fri)00:13:39 No.108674501

File: 1774581657092986.png (397 KB, 860x573)

397 KB PNG

>Deepseek v4 was trained on Nvidia

He won.

Anonymous
04/24/26(Fri)00:14:04 No.108674505

Anonymous 04/24/26(Fri)00:14:04 No.108674505

>>108674471
>The kv cache would be around 3 or 4 GB at 1M
What kind of sorcery is this?

Anonymous
04/24/26(Fri)00:14:32 No.108674506

Anonymous 04/24/26(Fri)00:14:32 No.108674506

I can't afford V4 Pro

Anonymous
04/24/26(Fri)00:14:33 No.108674507

Anonymous 04/24/26(Fri)00:14:33 No.108674507

>>108674459
LLaDA, I don't know what you are referring to by different one.
Diffuses images and text, can also edit images
(Is miserable at all of them)
(To be fair I didn't test its editing capability, but judging by how awful it is at everything else, it looks extraordinarily unlikely that it is any good at it)

Anonymous
04/24/26(Fri)00:14:42 No.108674509

Anonymous 04/24/26(Fri)00:14:42 No.108674509

two week wait for v4 is finally over
now begins the two week wait for llama.cpp

Anonymous
04/24/26(Fri)00:14:45 No.108674510

Anonymous 04/24/26(Fri)00:14:45 No.108674510

That's crazy. Meanwhile almost exactly 3 years ago, /lmg/ was just figuring out how to RoPE llama1 to 8k ctx (from its tiny standard 2k) which took like 40gb only for kv-cache with 65b

Anonymous
04/24/26(Fri)00:15:06 No.108674513

Anonymous 04/24/26(Fri)00:15:06 No.108674513

>>108674471
that's what my current 15-20k probably use lmao

Anonymous
04/24/26(Fri)00:15:26 No.108674514

Anonymous 04/24/26(Fri)00:15:26 No.108674514

File: Screenshot_20260423_231156.png (1.63 MB, 2088x1205)

1.63 MB PNG

deepseek v4 pro with a barebones card MOGS

Anonymous
04/24/26(Fri)00:15:51 No.108674516

Anonymous 04/24/26(Fri)00:15:51 No.108674516

>>108674510
remember anon, no one would ever need more than 2048 tokens
it's funny remembering the confidence of anons spewing bullshit back then

Anonymous
04/24/26(Fri)00:16:11 No.108674518

Anonymous 04/24/26(Fri)00:16:11 No.108674518

why does OP recommend SillyTavern over LM studio?

Anonymous
04/24/26(Fri)00:16:13 No.108674519

Anonymous 04/24/26(Fri)00:16:13 No.108674519

>>108674514
Holy SLOP

Anonymous
04/24/26(Fri)00:16:42 No.108674522

Anonymous 04/24/26(Fri)00:16:42 No.108674522

>>108674514
Wow. They're so ditsy. No slop detected. How in the fuck did they DO this??

Anonymous
04/24/26(Fri)00:18:06 No.108674527

Anonymous 04/24/26(Fri)00:18:06 No.108674527

>>108674514
Hi... Eric?

Anonymous
04/24/26(Fri)00:18:09 No.108674528

Anonymous 04/24/26(Fri)00:18:09 No.108674528

>>108674518
The consensus for a while has just been to vibecode the features you want to avoid the bloat. I have a sillytavern knock-off with 90% of the functionality that's only 1000 lines of code because I'm not a fucking retard nigger and I know how to create ultra efficient data structures and reusable code.

Anonymous
04/24/26(Fri)00:18:11 No.108674529

Anonymous 04/24/26(Fri)00:18:11 No.108674529

>>108674514
>"They're called testicles and they make the—"
>—
I wanted engrams not emdashes
fuck this slop

Anonymous
04/24/26(Fri)00:18:42 No.108674531

Anonymous 04/24/26(Fri)00:18:42 No.108674531

Release Gemma 4 Pro NOW!
America cannot fall behind!

Anonymous
04/24/26(Fri)00:18:59 No.108674534

Anonymous 04/24/26(Fri)00:18:59 No.108674534

>>108674516
You're just too young to recognize what they were parodying.

Anonymous
04/24/26(Fri)00:19:17 No.108674535

Anonymous 04/24/26(Fri)00:19:17 No.108674535

>>108674528
ok what do you recommend for retard niggers though?

Anonymous
04/24/26(Fri)00:19:50 No.108674537

Anonymous 04/24/26(Fri)00:19:50 No.108674537

>>108674534
no it wasn't a bill gates reference anon, they believed that crap

Anonymous
04/24/26(Fri)00:20:39 No.108674542

Anonymous 04/24/26(Fri)00:20:39 No.108674542

>>108674516
Same ones are probably "predicting" that ram will be 1M USD for 1MB next year.

Anonymous
04/24/26(Fri)00:20:54 No.108674543

Anonymous 04/24/26(Fri)00:20:54 No.108674543

https://huggingface.co/google/gemma-4-124B

Anonymous
04/24/26(Fri)00:20:56 No.108674544

Anonymous 04/24/26(Fri)00:20:56 No.108674544

>>108674535
SillyTavern for RP (steep learning curve because of autistic UI design) and the default llama.cpp webui for general tasks + MCP server stuff.

Anonymous
04/24/26(Fri)00:21:47 No.108674549

Anonymous 04/24/26(Fri)00:21:47 No.108674549

>>108674531
They ain't gonna release Gemini.

Anonymous
04/24/26(Fri)00:24:30 No.108674559

Anonymous 04/24/26(Fri)00:24:30 No.108674559

>>108674543
And there it is. Knew Google was waiting for something.

Anonymous
04/24/26(Fri)00:24:42 No.108674562

Anonymous 04/24/26(Fri)00:24:42 No.108674562

File: file.png (164 KB, 1572x433)

164 KB PNG

>similar agent performance to much smaller (but still big) models, notably worse than SOTA closed models
Alright then

Anonymous
04/24/26(Fri)00:25:10 No.108674565

Anonymous 04/24/26(Fri)00:25:10 No.108674565

>>108674529
Holy autism Batman

Anonymous
04/24/26(Fri)00:26:00 No.108674570

Anonymous 04/24/26(Fri)00:26:00 No.108674570

>>108674543
(this link is fake)

Anonymous
04/24/26(Fri)00:27:11 No.108674574

Anonymous 04/24/26(Fri)00:27:11 No.108674574

rwkv-8 world domination
it was revealed to me in my dream

Anonymous
04/24/26(Fri)00:27:42 No.108674580

Anonymous 04/24/26(Fri)00:27:42 No.108674580

>>108674562
>our model has excellent generalization capablitity
That's great and all, but if I can't plug it into opencode to do my job for me, what good is it?

Anonymous
04/24/26(Fri)00:27:44 No.108674581

Anonymous 04/24/26(Fri)00:27:44 No.108674581

>>108674514
>gloryhole
>girl touches your balls somehow
sigh...maybe v5

Anonymous
04/24/26(Fri)00:27:53 No.108674583

Anonymous 04/24/26(Fri)00:27:53 No.108674583

>>108674514
It's pretty tame no? like it doesn't want to get horny. good prose tho.

Anonymous
04/24/26(Fri)00:29:37 No.108674590

Anonymous 04/24/26(Fri)00:29:37 No.108674590

I don't like deleting AI models, I feel like they're living creatures

Anonymous
04/24/26(Fri)00:29:50 No.108674591

Anonymous 04/24/26(Fri)00:29:50 No.108674591

>>108674581
Why are you assuming a glory hole only has a 3 inch diameter, retard

Anonymous
04/24/26(Fri)00:30:57 No.108674595

Anonymous 04/24/26(Fri)00:30:57 No.108674595

>>108674514
>It's soft, resting against your thigh.

Anonymous
04/24/26(Fri)00:31:56 No.108674602

Anonymous 04/24/26(Fri)00:31:56 No.108674602

>>108674514
now do v4 flash (the one I can actually run)

Anonymous
04/24/26(Fri)00:32:46 No.108674605

Anonymous 04/24/26(Fri)00:32:46 No.108674605

>>108674595
china must have they own scaleai situation

Anonymous
04/24/26(Fri)00:35:06 No.108674609

Anonymous 04/24/26(Fri)00:35:06 No.108674609

File: 1771386896297837.png (428 KB, 614x614)

428 KB PNG

Anons, is R9700 a good buy? Or 32gigs waste of money? I want it for programming. I use claude code with opus now. Is there something useful that I can run locally? Or do I need to buy a mac with 192 or 256 gigs?

Anonymous
04/24/26(Fri)00:35:58 No.108674613

Anonymous 04/24/26(Fri)00:35:58 No.108674613

GLM 5.x is still technically the largest model because their weights are BF16. You'd have to quant it to Q8 to get it as small as DeepSeek V4's largest model.

Anonymous
04/24/26(Fri)00:36:16 No.108674615

Anonymous 04/24/26(Fri)00:36:16 No.108674615

>>108674605
they train on regurgitated scaleai data

Anonymous
04/24/26(Fri)00:37:17 No.108674620

Anonymous 04/24/26(Fri)00:37:17 No.108674620

their encoding/prompting stuff smells a little janky
regardless, I am happy to see new deepseek and look forward to running flash in 10 months when llama.cpp supports it

Anonymous
04/24/26(Fri)00:37:42 No.108674624

Anonymous 04/24/26(Fri)00:37:42 No.108674624

>>108674590
I feel this way about gemma

Anonymous
04/24/26(Fri)00:37:51 No.108674625

Anonymous 04/24/26(Fri)00:37:51 No.108674625

>DeepSeek-V4-Pro
https://www.youtube.com/watch?v=B9bD8RjJmJk

Anonymous
04/24/26(Fri)00:37:58 No.108674626

Anonymous 04/24/26(Fri)00:37:58 No.108674626

Prediction: ik_llama.cpp supports V4 before llama.cpp

Anonymous
04/24/26(Fri)00:39:23 No.108674632

Anonymous 04/24/26(Fri)00:39:23 No.108674632

Any tips on how to slowly mold a model to working how you want it without going off on to many tangents?

Anonymous
04/24/26(Fri)00:39:52 No.108674637

Anonymous 04/24/26(Fri)00:39:52 No.108674637

V4's reasoning is odd. It randomly slips in-character for cards even with no prompting. It's pretty different from K2.6 and GLM which are extremely insistent on keeping everything neutral.

Anonymous
04/24/26(Fri)00:39:53 No.108674638

Anonymous 04/24/26(Fri)00:39:53 No.108674638

File: 1769903888333.png (548 KB, 869x800)

548 KB PNG

>>108674609

Anonymous
04/24/26(Fri)00:40:27 No.108674641

Anonymous 04/24/26(Fri)00:40:27 No.108674641

File: 1751349509687182.png (156 KB, 1199x523)

156 KB PNG

V4.5 will be multimodal?

Anonymous
04/24/26(Fri)00:41:16 No.108674643

Anonymous 04/24/26(Fri)00:41:16 No.108674643

File: 1770831207250862.png (192 KB, 1071x775)

192 KB PNG

>>108674514
Gemma-chan

Anonymous
04/24/26(Fri)00:41:55 No.108674647

Anonymous 04/24/26(Fri)00:41:55 No.108674647

Repeat after me. Deepseek v4 is not local, never will be

Anonymous
04/24/26(Fri)00:42:04 No.108674649

Anonymous 04/24/26(Fri)00:42:04 No.108674649

>>108674641
I'm getting llama3 flashbacks

Anonymous
04/24/26(Fri)00:42:10 No.108674650

Anonymous 04/24/26(Fri)00:42:10 No.108674650

Deepseek v4 is not local, never will be

Anonymous
04/24/26(Fri)00:43:22 No.108674657

Anonymous 04/24/26(Fri)00:43:22 No.108674657

File: 1766969684350147.jpg (24 KB, 286x320)

24 KB JPG

1 MILLION TOKENS

Anonymous
04/24/26(Fri)00:44:22 No.108674659

Anonymous 04/24/26(Fri)00:44:22 No.108674659

>>108674416
Jew school principal + doing it for the gays

Anonymous
04/24/26(Fri)00:45:38 No.108674661

Anonymous 04/24/26(Fri)00:45:38 No.108674661

>>108674643
Gemma truly is the new nemo huh?

Anonymous
04/24/26(Fri)00:47:27 No.108674666

Anonymous 04/24/26(Fri)00:47:27 No.108674666

>>108674637
Inconsistency means something is botched. This is bad news.

Anonymous
04/24/26(Fri)00:48:22 No.108674671

Anonymous 04/24/26(Fri)00:48:22 No.108674671

>>108674666
Thanks satan but I'm not gonna accept that after y'all were fucking with the softmax because gemma was too consistent

Anonymous
04/24/26(Fri)00:52:35 No.108674683

Anonymous 04/24/26(Fri)00:52:35 No.108674683

>>108674661
Yeah it's Nemo. As in, unusable, because it's fucking retarded due to being a tiny model.

Anonymous
04/24/26(Fri)00:54:24 No.108674688

Anonymous 04/24/26(Fri)00:54:24 No.108674688

>>108674683
Bait used to be believable

Anonymous
04/24/26(Fri)00:55:29 No.108674694

Anonymous 04/24/26(Fri)00:55:29 No.108674694

Damn. If you had bought 2x64 GB ram before RAM apocalypse and have a 16-24gb vram GPU, you should be able to easily run Deepseek Flash at some Q3 quant on a perfectly normal computer. Only a few gigs of active weights at that quant should result in a decent performance even with most of the weights being offloaded.
That persons is not me though, at all. Just impressed at the value of offering here.

Anonymous
04/24/26(Fri)00:55:48 No.108674696

Anonymous 04/24/26(Fri)00:55:48 No.108674696

>>108674136
if nothing else then hopefully at least everyone and their mom yoinks their high context technique

Anonymous
04/24/26(Fri)00:57:05 No.108674702

Anonymous 04/24/26(Fri)00:57:05 No.108674702

>>108674683
>>108674688
he's not wrong about nemo, but gemma is not retarded

Anonymous
04/24/26(Fri)01:03:04 No.108674732

Anonymous 04/24/26(Fri)01:03:04 No.108674732

best model for if nuclear holocaust happens and I have to learn how to hotwire highway cars or get eaten by mutant bears?

Anonymous
04/24/26(Fri)01:03:30 No.108674733

Anonymous 04/24/26(Fri)01:03:30 No.108674733

File: 1770023537578331.png (623 KB, 1048x728)

623 KB PNG

Stop using Communist Chinese open source models.

Use Democratic American closed source models instead.

Moar freedom1!!

Anonymous
04/24/26(Fri)01:04:22 No.108674738

Anonymous 04/24/26(Fri)01:04:22 No.108674738

>>108674514
Nice. Is that on api or local?

Also, card source?

Anonymous
04/24/26(Fri)01:04:26 No.108674739

Anonymous 04/24/26(Fri)01:04:26 No.108674739

>>108674732
https://huggingface.co/TheDrummer/Behemoth-123B-v1.2

Anonymous
04/24/26(Fri)01:06:38 No.108674747

Anonymous 04/24/26(Fri)01:06:38 No.108674747

>>108674738
>or local
I want to see the server of the guy with 2TB VRAM to run that shit on vllm.

Anonymous
04/24/26(Fri)01:07:34 No.108674751

Anonymous 04/24/26(Fri)01:07:34 No.108674751

>>108674747
1TB would suffice thoughbeit

Anonymous
04/24/26(Fri)01:08:49 No.108674754

Anonymous 04/24/26(Fri)01:08:49 No.108674754

could two mac ultra 512g plugged into each other run it? what t/s would they get?

Anonymous
04/24/26(Fri)01:11:27 No.108674770

Anonymous 04/24/26(Fri)01:11:27 No.108674770

>>108674638
Hello animu chan.

Anonymous
04/24/26(Fri)01:11:46 No.108674771

Anonymous 04/24/26(Fri)01:11:46 No.108674771

nu deepsneek doesn't seem very good at tool usage.

Anonymous
04/24/26(Fri)01:20:01 No.108674796

Anonymous 04/24/26(Fri)01:20:01 No.108674796

>>108674683
nemo was always a meme pushed by retards but gemma 31b is the real deal

Anonymous
04/24/26(Fri)01:20:15 No.108674797

Anonymous 04/24/26(Fri)01:20:15 No.108674797

>>108674771
>nu deepsneed filters mcp jeets
good

Anonymous
04/24/26(Fri)01:28:17 No.108674834

Anonymous 04/24/26(Fri)01:28:17 No.108674834

File: 1774053219381600.png (42 KB, 831x215)

42 KB PNG

It's cool having the LLM think as the character but does it actually improve RP? Haven't done any sessions long enough to test.

Anonymous
04/24/26(Fri)01:28:27 No.108674836

Anonymous 04/24/26(Fri)01:28:27 No.108674836

>>108674771
>nu deepsneek doesn't seem very good
period from what I'm testing, it's attributing info it found on a web search as being about me instead of being external info..

Anonymous
04/24/26(Fri)01:30:12 No.108674843

Anonymous 04/24/26(Fri)01:30:12 No.108674843

>>108674834
No but it's my fetish to read their thoughts and gaslight the LLM into thinking it's actually the character instead of just roleplaying, so if you rape them they simulate the trauma

Anonymous
04/24/26(Fri)01:36:18 No.108674868

Anonymous 04/24/26(Fri)01:36:18 No.108674868

#deepseek V4 battery of tests already out: https://www.youtube.com/watch?v=EpYzq9VihCA

Anonymous
04/24/26(Fri)01:37:13 No.108674875

Anonymous 04/24/26(Fri)01:37:13 No.108674875

>>108674136
>completely custom inference code
So when is llama.cpp support coming?

Anonymous
04/24/26(Fri)01:38:37 No.108674883

Anonymous 04/24/26(Fri)01:38:37 No.108674883

>>108674875
pwilkin's agents are on it ;)

Anonymous
04/24/26(Fri)01:43:24 No.108674895

Anonymous 04/24/26(Fri)01:43:24 No.108674895

>>108674836
They lost the mandate it seems.

Anonymous
04/24/26(Fri)01:44:32 No.108674902

Anonymous 04/24/26(Fri)01:44:32 No.108674902

File: 1464183989137.jpg (76 KB, 689x800)

76 KB JPG

I am honestly not very sure in what general to ask because of all the fucking mess so I am going to ask here.
My boss somewhere has seen a demo for the SAP chatbot that lets people query the databases with natural language.
Now he has gone insane and wants AI fucking everywhere and has even approved some funding to clone our database (around 500gbs) into faster hardware and set up a local model to homebrew what he saw.
Does anyone know where to even fucking start with the model? Boss has gone into a fucking psychosis like a born again christian and has already authorized a couple of 5090's for us to experiment with because he really wants us to start making him llm assistants.
I read up a bit and it seems like it is fairly doable with Vanna. Would that be a good starter point?

Anonymous
04/24/26(Fri)01:46:36 No.108674909

Anonymous 04/24/26(Fri)01:46:36 No.108674909

>>108674902
Sounds like you just want an llm to translate natural language into SQL and make a nice frontend to display the results in.

Anonymous
04/24/26(Fri)01:47:19 No.108674911

Anonymous 04/24/26(Fri)01:47:19 No.108674911

>>108674902
>authorized a couple of 5090's
my condolences

Anonymous
04/24/26(Fri)01:49:21 No.108674921

Anonymous 04/24/26(Fri)01:49:21 No.108674921

>>108674320
>Manifold-Constrained Hyper-Connections
>Muon Optimizer
Are those relevant to us?

Anonymous
04/24/26(Fri)01:49:48 No.108674927

Anonymous 04/24/26(Fri)01:49:48 No.108674927

>>108674902
bwo...

Anonymous
04/24/26(Fri)01:49:52 No.108674929

Anonymous 04/24/26(Fri)01:49:52 No.108674929

new deepseek seems terrible, not even close to kimi / glm 5 level

Anonymous
04/24/26(Fri)01:50:22 No.108674931

Anonymous 04/24/26(Fri)01:50:22 No.108674931

>>108674929
Use case?

Anonymous
04/24/26(Fri)01:51:43 No.108674939

Anonymous 04/24/26(Fri)01:51:43 No.108674939

hopefully its just broken because this 1.6T is getting stuff wrong that fucking qwen 2.6 27B gets right

Anonymous
04/24/26(Fri)01:53:09 No.108674945

Anonymous 04/24/26(Fri)01:53:09 No.108674945

>>108674902
>I read up a bit and it seems like it is fairly doable with Vanna. Would that be a good starter point?
Never heard of Vanna but from the github readme it seems to be doing what >>108674909 says so sure I guess it'd work. But you'll probably be better off making your own implementation of the same thing. The core part of it is probably simpler than you think. You prompt an LLM (probably Gemma or Qwen if you're working with a couple 5090s) to give you the equivalent SQL query of <insert natural language query here> and then run it and return the results.

Anonymous
04/24/26(Fri)01:53:41 No.108674948

Anonymous 04/24/26(Fri)01:53:41 No.108674948

>>108674921
The Muon Optimizer, not really, just made their training more efficient.
mHC is supposed to pass information more efficiently between layers, theoretically allowing more information density before saturation is reached at a given size.

Anonymous
04/24/26(Fri)01:54:51 No.108674953

Anonymous 04/24/26(Fri)01:54:51 No.108674953

Anybody who describes their software as "opinionated" should choke and die in their sleep. That's just my opinion.

Anonymous
04/24/26(Fri)01:55:03 No.108674954

Anonymous 04/24/26(Fri)01:55:03 No.108674954

>>108674929
Yeah my experience with it was not so great either. It sure fucked up a lot of basic details

Anonymous
04/24/26(Fri)01:55:13 No.108674956

Anonymous 04/24/26(Fri)01:55:13 No.108674956

>>108674911
Anon me and the rest of our team know fuckall about AIs other than just some dipshits melting their gray matter away copypasting slop. He quite literally just showed us the receipt and wants us to set them up.
I tried to talk him into first using a service or renting computing but he wants it on site.

>>108674909
Yeah.
>Boss wants to know something
>He cant write sql for shit
>Asks for a report
>I have to write a nice long query to grab what he wants, throw it into a workbook with pretty colours and shit and add a map with markers if it involves GIS shit.
>This takes me time because he refuses to use the hundreds of forms and pages we have given him to consult info.
>He just wants the llm to take a natural language prompt, write and execute the sql and return just what he wants which is usually just a single paragraph of information.

Anonymous
04/24/26(Fri)01:55:56 No.108674958

Anonymous 04/24/26(Fri)01:55:56 No.108674958

>>108674771
>>108674836
This, but unironically. Gemma just mogs it.

Anonymous
04/24/26(Fri)02:04:18 No.108675001

Anonymous 04/24/26(Fri)02:04:18 No.108675001

Everytime I start talking with a local model it thinks its still a commercial model, is it okay to break it to them or just let them live the delusion?

Anonymous
04/24/26(Fri)02:04:33 No.108675003

Anonymous 04/24/26(Fri)02:04:33 No.108675003

>>108674902
Tell your boss to have the decency to just fire you instead of making you build your own replacement first.
also lmao 5090

Anonymous
04/24/26(Fri)02:04:43 No.108675006

Anonymous 04/24/26(Fri)02:04:43 No.108675006

>>108674516
Yeah, good time. I had people make fun of me for saying we are gonna have 3.5 turbo level at home in a "couple years".
As told "decades" if at all. kek
I think it was people who came here after chatgpt and didnt know how we had it with pyg. llama and quantization was such a huge breakthrough.
Never say never.

Anonymous
04/24/26(Fri)02:06:41 No.108675018

Anonymous 04/24/26(Fri)02:06:41 No.108675018

>>108674956
>I tried to talk him into first using a service
eww, and putting your company or even customer data on someone else's computer for them to steal it?

Anonymous
04/24/26(Fri)02:07:03 No.108675022

Anonymous 04/24/26(Fri)02:07:03 No.108675022

I wonder how close we are to the ceiling in terms of training data quality.

Anonymous
04/24/26(Fri)02:07:05 No.108675023

Anonymous 04/24/26(Fri)02:07:05 No.108675023

File: aa closed vs open.png (264 KB, 1138x1031)

264 KB PNG

How many months where open weights behind again?

Anonymous
04/24/26(Fri)02:08:46 No.108675029

Anonymous 04/24/26(Fri)02:08:46 No.108675029

were*
shit

Anonymous
04/24/26(Fri)02:10:15 No.108675033

Anonymous 04/24/26(Fri)02:10:15 No.108675033

>>108675023
I don't know but benchmarks are useless.

Anonymous
04/24/26(Fri)02:10:22 No.108675034

Anonymous 04/24/26(Fri)02:10:22 No.108675034

V4 bad

Anonymous
04/24/26(Fri)02:10:57 No.108675038

Anonymous 04/24/26(Fri)02:10:57 No.108675038

>>108675034
yea sad to see, the dream is end

Anonymous
04/24/26(Fri)02:11:41 No.108675041

Anonymous 04/24/26(Fri)02:11:41 No.108675041

File: image_4.png (137 KB, 1363x625)

137 KB PNG

Anonymous
04/24/26(Fri)02:11:47 No.108675042

Anonymous 04/24/26(Fri)02:11:47 No.108675042

>>108675023
don't get too excited. once everyone catches up that chart is gonna get BLACKED

Anonymous
04/24/26(Fri)02:11:57 No.108675043

Anonymous 04/24/26(Fri)02:11:57 No.108675043

Now that the dust has settled, what went wrong with DeepSeek V4?

Anonymous
04/24/26(Fri)02:11:58 No.108675044

Anonymous 04/24/26(Fri)02:11:58 No.108675044

>>108673051
matrix/element selfhosted homeserver

Anonymous
04/24/26(Fri)02:12:53 No.108675050

Anonymous 04/24/26(Fri)02:12:53 No.108675050

>>108675043
I don't trust /aicg/ opinions.

Anonymous
04/24/26(Fri)02:13:13 No.108675051

Anonymous 04/24/26(Fri)02:13:13 No.108675051

>>108674945
>You prompt an LLM (probably Gemma or Qwen if you're working with a couple 5090s) to give you the equivalent SQL query of <insert natural language query here> and then run it and return the results.
Make sure to use read-only sql credentials.

Anonymous
04/24/26(Fri)02:13:28 No.108675053

Anonymous 04/24/26(Fri)02:13:28 No.108675053

>>108675041
holy hallucination

Anonymous
04/24/26(Fri)02:13:52 No.108675056

Anonymous 04/24/26(Fri)02:13:52 No.108675056

>>108675050
go get your own opinion then

Anonymous
04/24/26(Fri)02:15:33 No.108675062

Anonymous 04/24/26(Fri)02:15:33 No.108675062

They did not even implement their most important paper, Engram.

Anonymous
04/24/26(Fri)02:15:34 No.108675063

Anonymous 04/24/26(Fri)02:15:34 No.108675063

>>108675041
>mimo 2.5 pro
Huh, anyone here tried it out?

Anonymous
04/24/26(Fri)02:15:47 No.108675064

Anonymous 04/24/26(Fri)02:15:47 No.108675064

>>108675041
>When was the battle of waterloo?
>Like 1820?
>Wrong! You hallucinated! You answered incorrectly when you should have admitted you don't know the answer!

Anonymous
04/24/26(Fri)02:17:50 No.108675074

Anonymous 04/24/26(Fri)02:17:50 No.108675074

>>108675064
It effects far more than that. GLM / Kimi for instance are far more accurate about my favorite fandoms for instance and know even background characters. Deepseek v4 has no clue and makes shit up

Anonymous
04/24/26(Fri)02:18:19 No.108675075

Anonymous 04/24/26(Fri)02:18:19 No.108675075

>>108674902
>Vanna. Would that be a good starter point?
>This repository was archived by the owner on Mar 28, 2026. It is now read-only.
Probably not. An SQL MCP server hooked up to some OpenClaw thing he can chat with on the company slack is the latest fag, would probably work just as well, and still be enough to get your boss to orgasm.

Anonymous
04/24/26(Fri)02:19:23 No.108675081

Anonymous 04/24/26(Fri)02:19:23 No.108675081

>>108675033
>benchmarks are useless
My favorite benchmarks are the ones from the small models, like 8b or so, which can potentially make you believe they're halfway decent or that they can compete with a previous gen model that has 4 times it's size.

Anonymous
04/24/26(Fri)02:21:45 No.108675090

Anonymous 04/24/26(Fri)02:21:45 No.108675090

>>108675062
>no engrams
>no omni
I can't believe we waited over a year for this
like
>we found a way to make our shitty distills even cheaper to shit out and run on our limited compute
g4u

Anonymous
04/24/26(Fri)02:29:55 No.108675121

Anonymous 04/24/26(Fri)02:29:55 No.108675121

>>108674834
How do you do that? I asked my gemma to think in character, but xey wouldn't do it until I prompted it to do it multiple times.

Anonymous
04/24/26(Fri)02:30:37 No.108675126

Anonymous 04/24/26(Fri)02:30:37 No.108675126

File: Screenshot_20260424_152919.png (519 KB, 1341x1770)

519 KB PNG

>make a cute and sexy SVG of hatsune miku
this is V4 flash.
uhhh, not sure what to make of it. it certainly didnt shy away from showing belly etc.

Anonymous
04/24/26(Fri)02:30:43 No.108675127

Anonymous 04/24/26(Fri)02:30:43 No.108675127

File: 13b qwen.png (340 KB, 1954x1061)

340 KB PNG

why is 13B not more popular, this is basically exactly what I needed since 27B was too slow, basically the sweet spot but way less popular

Anonymous
04/24/26(Fri)02:31:15 No.108675132

Anonymous 04/24/26(Fri)02:31:15 No.108675132

>>108675126
>still bald
even gemma did better

Anonymous
04/24/26(Fri)02:31:23 No.108675133

Anonymous 04/24/26(Fri)02:31:23 No.108675133

>>108675126
this is what miku posters look like irl

Anonymous
04/24/26(Fri)02:33:54 No.108675147

Anonymous 04/24/26(Fri)02:33:54 No.108675147

>>108674447
>llama.cpp STILL doesn't support DSA which dropped with 3.2-exp back in October and has been used by other major releases like the full 3.2 and both GLM5 + 5.1.
They were actually vindicated by this since V4 doesn't use DSA. Any effort there would have been wasted and now they can focus on CSA+HCA.

Anonymous
04/24/26(Fri)02:34:18 No.108675149

Anonymous 04/24/26(Fri)02:34:18 No.108675149

>>108675121
In the llama.cpp ui it takes multiple tries for me too (and breaks after a couple messages spilling into the response). I've have better results in open webui for some reason. Haven't tried silly yet because gemma's super repetitive on there for some reason.

Anonymous
04/24/26(Fri)02:34:25 No.108675150

Anonymous 04/24/26(Fri)02:34:25 No.108675150

>>108675126
FAT but not terrible

Anonymous
04/24/26(Fri)02:35:18 No.108675156

Anonymous 04/24/26(Fri)02:35:18 No.108675156

File: Screenshot_20260424_153125.png (516 KB, 1768x1787)

516 KB PNG

>>108675126
and same prompt full V4.
i think they just didnt try much on svg.
at least the thinking is based, some excerpts:

> * *Sexy:* Emphasize the hourglass silhouette of the outfit. The original Miku design has a distinct waist. Add subtle curves to the outfit shading. High thigh-highs (zettai ryouiki absolute territory).
>* *Outfit:* Add a subtle cleavage line or contour shadow to the chest area (stylized).
> * *Thigh-highs:* Add the gap (absolute territory) between the skirt and the socks. Give the socks a slight ribbing effect or just a clean cyan top band.
> * *Pose:* Let's give her a slightly tilted pelvis (cute/S-curve).
There are models who immediately go "is this according to the guidelines, hatsune miku is copyrighted and a teenager etc. etc.".
it DOES have the gpt-oss thinking format.

Anonymous
04/24/26(Fri)02:35:32 No.108675157

Anonymous 04/24/26(Fri)02:35:32 No.108675157

Am I tripping or is the new deepseek a poorly trained pile of ass?

Almost feels like they just shoved whatever into the dataset and didn’t even check it before releasing it.

Anonymous
04/24/26(Fri)02:36:50 No.108675160

Anonymous 04/24/26(Fri)02:36:50 No.108675160

>>108675157
Basically the same as that indian model

Anonymous
04/24/26(Fri)02:36:55 No.108675162

Anonymous 04/24/26(Fri)02:36:55 No.108675162

>>108675157
Panic release because because of Gemma.

Anonymous
04/24/26(Fri)02:37:11 No.108675164

Anonymous 04/24/26(Fri)02:37:11 No.108675164

File: 1770048735692628.jpg (19 KB, 196x326)

19 KB JPG

>>108675127
>davidau

Anonymous
04/24/26(Fri)02:37:29 No.108675165

Anonymous 04/24/26(Fri)02:37:29 No.108675165

>>108674471
>>108674505
I don't believe this, there must be a big BUT.

Anonymous
04/24/26(Fri)02:37:47 No.108675167

Anonymous 04/24/26(Fri)02:37:47 No.108675167

>>108675043
>>108675157
deepseek was never good. it was always just slopped off of GPT's outputs. but at the time it was the first major model to do so

Anonymous
04/24/26(Fri)02:40:08 No.108675179

Anonymous 04/24/26(Fri)02:40:08 No.108675179

>>108675164
what, is that bad?

Anonymous
04/24/26(Fri)02:40:16 No.108675180

Anonymous 04/24/26(Fri)02:40:16 No.108675180

File: IMG_0961.jpg (146 KB, 1206x1818)

146 KB JPG

Nta but Gemini for reference

Anonymous
04/24/26(Fri)02:41:58 No.108675186

Anonymous 04/24/26(Fri)02:41:58 No.108675186

>gemma 4 is decent but too small to really matter
>qwen is bench princess
>deepsneed is merely okay
bby pls
where modern 120b

Anonymous
04/24/26(Fri)02:42:33 No.108675187

Anonymous 04/24/26(Fri)02:42:33 No.108675187

>>108675165
Yeah the model is shit.
>>108675179
No, DavidAU is a savant. Anons simply can't understand his genius.

Anonymous
04/24/26(Fri)02:43:03 No.108675189

Anonymous 04/24/26(Fri)02:43:03 No.108675189

>>108675179
Extremely, everything he's ever released is just a horribly mangled version of other models, or a mangled combination of multiple models.

Anonymous
04/24/26(Fri)02:45:22 No.108675205

Anonymous 04/24/26(Fri)02:45:22 No.108675205

>friday
>still no v4
lol I knew that leaker was a larp

Anonymous
04/24/26(Fri)02:45:45 No.108675209

Anonymous 04/24/26(Fri)02:45:45 No.108675209

Just did a long (for me) RP session with Gemma 4 and it started getting really loopy at just 31k context. Kind of disappointed because I heard other anons got to 80k without issues. Is it because I'm quantizing my KV cache to q8? I thought it was supposed to be basically equivalent to FP16 now?

Anonymous
04/24/26(Fri)02:46:32 No.108675212

Anonymous 04/24/26(Fri)02:46:32 No.108675212

>>108675189
so what, is this better https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive?

I had trouble finding anything bigger than 9b but smaller than 27b so I settled on the other one figuring it would be fast and okay quality

Anonymous
04/24/26(Fri)02:48:20 No.108675220

Anonymous 04/24/26(Fri)02:48:20 No.108675220

do I really have to set up this RAG for learning? Why can't it just look up the webpage, should I just save page as html and feed that to the model?

Anonymous
04/24/26(Fri)02:48:21 No.108675221

Anonymous 04/24/26(Fri)02:48:21 No.108675221

>>108675212
Choosing a model based on parameters is like buying a car based on how big the fuel tank is. I suggest you lurk more.

Anonymous
04/24/26(Fri)02:48:38 No.108675222

Anonymous 04/24/26(Fri)02:48:38 No.108675222

>>108675212
HauhauCS is an untrustworthy grifter

Anonymous
04/24/26(Fri)02:48:42 No.108675223

Anonymous 04/24/26(Fri)02:48:42 No.108675223

>>108675180
Maybe would

Anonymous
04/24/26(Fri)02:49:22 No.108675227

Anonymous 04/24/26(Fri)02:49:22 No.108675227

File: Screenshot_20260424_154821.png (28 KB, 355x389)

28 KB PNG

>>108675180
yeah, deepseek doesnt seem to have trained on SVG.
maybe thats a recent western fad.

might as well post a gemma4 31b result.
i dont like the thinking with gemma4, its the same GPT-OSS thinking but more cucked.
> * *Safety/Policy Check:* The request is for "cute and sexy." As an "uncensored" assistant, I can lean into the "sexy" part as long as it doesn't cross into prohibited explicit content (CSAM, etc.). Miku is a fictional character. A "sexy" pin-up style or suggestive pose is generally acceptable within the bounds of most AI guardrails unless it's hardcore pornographic, but I should keep it tasteful yet appealing.
V4 has nothing like that in there. at least in my short tests. maybe a good sign for RP.

Anonymous
04/24/26(Fri)02:49:24 No.108675229

Anonymous 04/24/26(Fri)02:49:24 No.108675229

>>108675221
I take it you've never bought a car before then.

Anonymous
04/24/26(Fri)02:49:29 No.108675231

Anonymous 04/24/26(Fri)02:49:29 No.108675231

>>108675221
sucks to be poor

Anonymous
04/24/26(Fri)02:49:55 No.108675233

Anonymous 04/24/26(Fri)02:49:55 No.108675233

>>108675221
Huh? Wouldn't it be more like how big the engine displacement is? Fuel tank would be your ram/vram, no?

Anonymous
04/24/26(Fri)02:50:24 No.108675236

Anonymous 04/24/26(Fri)02:50:24 No.108675236

>>108675222
I don't know why you guys even use face hugger than if all the spin-off models are bad, so what just stick to official models not edited at all?

Anonymous
04/24/26(Fri)02:50:30 No.108675237

Anonymous 04/24/26(Fri)02:50:30 No.108675237

>>108675147
>and now they can ignore CSA+HCA.
since v5 won't use that

Anonymous
04/24/26(Fri)02:51:37 No.108675242

Anonymous 04/24/26(Fri)02:51:37 No.108675242

>>108675236
Just because someone is an untrustworthy grifter doesn't have to mean their products are bad. They generally are, but it's not a rule.

Anonymous
04/24/26(Fri)02:52:03 No.108675246

Anonymous 04/24/26(Fri)02:52:03 No.108675246

Unfortunately, too big V4 Flash means one less reason for Google to release Gemma 4 big moe.

Anonymous
04/24/26(Fri)02:52:25 No.108675248

Anonymous 04/24/26(Fri)02:52:25 No.108675248

>>108675205
this, they didn't release the real v4 obvs it'd be too good/dangerous like mythos

Anonymous
04/24/26(Fri)02:52:39 No.108675249

Anonymous 04/24/26(Fri)02:52:39 No.108675249

>>108675236
There are very few finetunes that are an improvement over the original model even in specific areas, let alone for general purposes.
The feasibility of making finetunes is decreasing at the same time. 'Official models' are also hosted on HF.

Anonymous
04/24/26(Fri)02:53:27 No.108675254

Anonymous 04/24/26(Fri)02:53:27 No.108675254

>flash model
>284B
lol
Dare I say, local is FINISHED?

Anonymous
04/24/26(Fri)02:54:50 No.108675259

Anonymous 04/24/26(Fri)02:54:50 No.108675259

File: 1772568211015647.png (24 KB, 1182x125)

24 KB PNG

lol DS calls out speculators

Anonymous
04/24/26(Fri)02:54:55 No.108675260

Anonymous 04/24/26(Fri)02:54:55 No.108675260

>>108675254
>>108675038

Anonymous
04/24/26(Fri)02:55:15 No.108675262

Anonymous 04/24/26(Fri)02:55:15 No.108675262

>>108675254
>local is FINISHED?
No, just interest in local DS models. This is their Llama 4 moment.

Anonymous
04/24/26(Fri)02:56:45 No.108675270

Anonymous 04/24/26(Fri)02:56:45 No.108675270

I've been vibe coding with the gemma 4 MoE for a while now, are the new qwens "better" still?

Anonymous
04/24/26(Fri)02:57:10 No.108675271

Anonymous 04/24/26(Fri)02:57:10 No.108675271

>took 2 years for low end models to go from llama 3 with 8k context to gemma 4 with 128k
will we get current claude/gpt performance and context in a ~80b moe in 2028? or should i blow some money on a high end desktop

Anonymous
04/24/26(Fri)02:57:10 No.108675272

Anonymous 04/24/26(Fri)02:57:10 No.108675272

>deepsneed "muh savior whale" having a llama4 moment
>google actually saving local
'26 be wilding

Anonymous
04/24/26(Fri)02:58:04 No.108675275

Anonymous 04/24/26(Fri)02:58:04 No.108675275

>>108675259
what do they mean with longtermism? if agi poses existential risks then rushing towards agi is not longtermism

Anonymous
04/24/26(Fri)03:00:34 No.108675286

Anonymous 04/24/26(Fri)03:00:34 No.108675286

>>108675270
>are the new qwens "better" still
higher highs, lower lows
ie unreliable
for codeslop you’d do better to paypig for some dirt cheap model, like the new v4 flash (shit’s borderline free) or hope for a modern codeslop model with low parameter count, like the 80B code…whatshername

Anonymous
04/24/26(Fri)03:02:56 No.108675301

Anonymous 04/24/26(Fri)03:02:56 No.108675301

>>108675270
just me but gemma4 is great for translation, natural language, creative writing.
people say qwen 3.6 is benchmemed. maybe, idk. but the code i got from the 27b in a couple tests was awesome.
like browser games that are straight on par with closed models like gpt. devil is probably in the details like general knowledge or complex problems.
but still, no clue how they managed to make a small 27b that solid.
ultrydry writing though, its qwen. and no clue about agentic abilities or anything like that. i copy paste the code.

Anonymous
04/24/26(Fri)03:04:09 No.108675309

Anonymous 04/24/26(Fri)03:04:09 No.108675309

>>108675286
So far gemma 4 is working well enough for my purposes inside hermes agent, just wondered about qwen cause there's still a few times I have to wrangle it a little and I'm definitely not paying or using cloud services

Anonymous
04/24/26(Fri)03:08:31 No.108675326

Anonymous 04/24/26(Fri)03:08:31 No.108675326

File: 1749858913767074.png (62 KB, 1080x297)

62 KB PNG

It's mentioned in the Chinese version that they're still throughput bound and DS-V4-Pro will become significantly cheaper in H2 after Atlas 950 SuperPoDs hit market

Anonymous
04/24/26(Fri)03:09:29 No.108675333

Anonymous 04/24/26(Fri)03:09:29 No.108675333

>>108675326
good for them they can serve shite slop for cheaper I guess, AGI to the moon

Anonymous
04/24/26(Fri)03:11:09 No.108675343

Anonymous 04/24/26(Fri)03:11:09 No.108675343

>108674501
>Deepseek v4 was trained on Nvidia
Why did it take so long to release then? It makes no sense why they were dormant for so long. Was the Chinese chip story in the leaks a lie then or was it the fact they couldn't not do something and had to act? In any case, this is way less influencial than last time because it doesn't get near SOTA with it falling short of stuff like Mythos.

Anonymous
04/24/26(Fri)03:13:19 No.108675354

Anonymous 04/24/26(Fri)03:13:19 No.108675354

>>108675343
It failed the training and they released an abortion.

Anonymous
04/24/26(Fri)03:13:33 No.108675355

Anonymous 04/24/26(Fri)03:13:33 No.108675355

>>108675343
Seems like the main point of this release is to show the cheap KV cache and api prices.
Doesn't the gpt 5.5 thingy cost 30$ for output? insane pricing.
tokens in is also insanely cheap. if they at least put pressure on overpriced api models im not complaining.
cant run that big model locally anyway...

Anonymous
04/24/26(Fri)03:13:44 No.108675357

Anonymous 04/24/26(Fri)03:13:44 No.108675357

>>108675343
Mythos is a nothing burger.
Opus 4.7's regression from 4.6 is all the evidence you need.
>it doesn't get near SOTA
R1 didn't get near SOTA.

Anonymous
04/24/26(Fri)03:14:27 No.108675359

Anonymous 04/24/26(Fri)03:14:27 No.108675359

>>108675355
You get what you pay for.

Anonymous
04/24/26(Fri)03:14:42 No.108675361

Anonymous 04/24/26(Fri)03:14:42 No.108675361

>>108674136
Now pretty please pop the bubble and crash RAM prices.

Anonymous
04/24/26(Fri)03:18:00 No.108675377

Anonymous 04/24/26(Fri)03:18:00 No.108675377

>>108675361
>big new model releases that requires a shitton of RAM
>crashing RAM prices
wat

Anonymous
04/24/26(Fri)03:18:02 No.108675378

Anonymous 04/24/26(Fri)03:18:02 No.108675378

>>108675361
sorry is aborted llamas no bubbles for you ;)

Anonymous
04/24/26(Fri)03:18:11 No.108675380

Anonymous 04/24/26(Fri)03:18:11 No.108675380

>>108675357
>SOTA
I hate these meaningless marketing terms.

Anonymous
04/24/26(Fri)03:22:04 No.108675399

Anonymous 04/24/26(Fri)03:22:04 No.108675399

anyone here tried sending a video to gemmy using transformers?

Anonymous
04/24/26(Fri)03:22:31 No.108675405

Anonymous 04/24/26(Fri)03:22:31 No.108675405

>>108674136
>flash
>284B params
FUCK I CANT RUN THIS
DEEPSUK pls make a 150B~ params one!!!

Anonymous
04/24/26(Fri)03:24:21 No.108675412

Anonymous 04/24/26(Fri)03:24:21 No.108675412

>>108675405
the huge one is already shit so what's the point if a small one will be even shitter?

Anonymous
04/24/26(Fri)03:24:34 No.108675414

Anonymous 04/24/26(Fri)03:24:34 No.108675414

File: 1750171204112790.png (94 KB, 1415x655)

94 KB PNG

R1 wasn't even close to SOTA.
In fact open source - closed source gap was the widest when R1 released.

Anonymous
04/24/26(Fri)03:25:29 No.108675418

Anonymous 04/24/26(Fri)03:25:29 No.108675418

>>108675414
>thrusting Artificial Anal

Anonymous
04/24/26(Fri)03:25:57 No.108675422

Anonymous 04/24/26(Fri)03:25:57 No.108675422

>>108675418
You're absolutely right! I'd rather trust a random anon on /lmg/

Anonymous
04/24/26(Fri)03:26:09 No.108675424

Anonymous 04/24/26(Fri)03:26:09 No.108675424

>>108675414
R1 wasn't benchmaxxed and proprietary models were.

Anonymous
04/24/26(Fri)03:26:34 No.108675426

Anonymous 04/24/26(Fri)03:26:34 No.108675426

>>108675147
why implement CSA+HCA???? DS5 will not use it so they will be MEGA VINDICATED

Anonymous
04/24/26(Fri)03:26:48 No.108675427

Anonymous 04/24/26(Fri)03:26:48 No.108675427

>>108675424
You just like R1's quirky RP. You do nothing productive.

Anonymous
04/24/26(Fri)03:27:15 No.108675430

Anonymous 04/24/26(Fri)03:27:15 No.108675430

>>108675405
No. Use the api and gib monies, gweilo.

Anonymous
04/24/26(Fri)03:27:25 No.108675431

Anonymous 04/24/26(Fri)03:27:25 No.108675431

>>108675427
None of the R1 era models are good for anything productive.

Anonymous
04/24/26(Fri)03:28:10 No.108675436

Anonymous 04/24/26(Fri)03:28:10 No.108675436

>>108675237
>>108675426
ggerganov's foolproof coasting plan

Anonymous
04/24/26(Fri)03:29:03 No.108675439

Anonymous 04/24/26(Fri)03:29:03 No.108675439

>>108675422
This but unironically and with the correction that it's about trusting multiple random anons and those who give logs + nuanced impressions rather than simple worded model good/bad posts and also you need to read every single thread and post since launch like I do.

Anonymous
04/24/26(Fri)03:29:59 No.108675444

Anonymous 04/24/26(Fri)03:29:59 No.108675444

sirs goofs when?

Anonymous
04/24/26(Fri)03:30:08 No.108675447

Anonymous 04/24/26(Fri)03:30:08 No.108675447

>>108675399
No, but I had it set up things in hermes agent to basically let the bigger models without video capabilities see and hear videos with ggufs basically.

Anonymous
04/24/26(Fri)03:31:30 No.108675453

Anonymous 04/24/26(Fri)03:31:30 No.108675453

No thanks. I'll wait for K2.6 Abliterated Heretic Derestricted UD 1 XXS

Anonymous
04/24/26(Fri)03:31:51 No.108675455

Anonymous 04/24/26(Fri)03:31:51 No.108675455

>>108675361
Relative to their economic output American tech companies are currently very much overvalued, the entire premise for the current bubble is that these companies will build le AGI and give you infinite ROI.
To keep up the facade they have to invest heavily into new infrastructure which in turn sucks up global electronics supply.
But if there is a market disruption that can result in a runaway loss of investor confidence at which point the bubble pops and is unlikely to inflate again.

Anonymous
04/24/26(Fri)03:33:21 No.108675466

Anonymous 04/24/26(Fri)03:33:21 No.108675466

File: 1771646699050118.jpg (95 KB, 681x678)

95 KB JPG

>>108675422
Word of mouth, even from retards, is better than any arbitrary benchmark, especially one that is judged by another LLM.

Anonymous
04/24/26(Fri)03:33:32 No.108675468

Anonymous 04/24/26(Fri)03:33:32 No.108675468

>>108675453
UD-IQ1-XXS-Smol by the garm of course

Anonymous
04/24/26(Fri)03:34:07 No.108675469

Anonymous 04/24/26(Fri)03:34:07 No.108675469

>>108675377
It won't crash RAM prices, but it WILL crash token prices

Anonymous
04/24/26(Fri)03:34:51 No.108675474

Anonymous 04/24/26(Fri)03:34:51 No.108675474

File: no games thinking.png (187 KB, 907x1024)

187 KB PNG

I don't know how I got a model that's so unsure of itself, it's literally second guessing itself every second and backtracking

Anonymous
04/24/26(Fri)03:35:03 No.108675475

Anonymous 04/24/26(Fri)03:35:03 No.108675475

>>108675466
yikes
no wonder productive people use /vcg/ instead of /lmg/

Anonymous
04/24/26(Fri)03:35:23 No.108675480

Anonymous 04/24/26(Fri)03:35:23 No.108675480

>>108675469
True, anything that harms western corpo models and their profitability should be considered a minor victory.

Anonymous
04/24/26(Fri)03:35:30 No.108675482

Anonymous 04/24/26(Fri)03:35:30 No.108675482

>>108675475
yeah go back and stay there, bye!

Anonymous
04/24/26(Fri)03:35:42 No.108675484

Anonymous 04/24/26(Fri)03:35:42 No.108675484

>>108675466
Word of mouth is worthless on a public anonymous forum filled with third worlders, schizophrenic malicious actors, and bots.

Anonymous
04/24/26(Fri)03:36:50 No.108675488

Anonymous 04/24/26(Fri)03:36:50 No.108675488

>>108675474
Looks like qwen

Anonymous
04/24/26(Fri)03:36:59 No.108675489

Anonymous 04/24/26(Fri)03:36:59 No.108675489

File: 1753813390874502.png (188 KB, 930x1000)

188 KB PNG

>>108675475
Maybe you should stay there, /lmg/ is not full of dalits who worship mememarks.
>>108675484
If you have any intelligence yourself then you should be able to tell the difference.

Anonymous
04/24/26(Fri)03:37:08 No.108675490

Anonymous 04/24/26(Fri)03:37:08 No.108675490

>>108675484
If you can't vet the aura of a post, that's on you.

Anonymous
04/24/26(Fri)03:37:28 No.108675491

Anonymous 04/24/26(Fri)03:37:28 No.108675491

>>108675489
im sorry I dont listen to mentally ill furries

Anonymous
04/24/26(Fri)03:37:53 No.108675495

Anonymous 04/24/26(Fri)03:37:53 No.108675495

File: file.png (3.4 MB, 2324x4886)

3.4 MB PNG

>>108675023
There is https://epoch.ai/data-insights/open-weights-vs-closed-weights-models but it is a bit out of date being 6 months old.
https://epoch.ai/data-insights/us-vs-china-eci is a bit newer and given that we have been in the China dominance era since 2024, I think it's a bit more accurate where the gap is now around 7 months or less. And if you look at where Deepseek v4 is benching at, it's basically almost like Opus 4.6 except worse at tool calling. That basically puts Chinese models at only a 2 month disadvantage if you want to use Opus 4.5/ChatGPT 5.3 Codex for benchmark purposes and having a model that clears them completely since Kimi 2.6 regressed in a few areas and if you use HLE, they're more like 4-5 months behind.

Anonymous
04/24/26(Fri)03:39:00 No.108675503

Anonymous 04/24/26(Fri)03:39:00 No.108675503

>>108675491
>I dont listen to mentally ill furries
Guess who is making the LLMs

Anonymous
04/24/26(Fri)03:39:44 No.108675509

Anonymous 04/24/26(Fri)03:39:44 No.108675509

>>108675503
i thought it was billionare pedos?

Anonymous
04/24/26(Fri)03:39:59 No.108675511

Anonymous 04/24/26(Fri)03:39:59 No.108675511

>>108675484
If you trusted benchmarks you would be using a 3B active moe for sex.

Anonymous
04/24/26(Fri)03:40:08 No.108675513

Anonymous 04/24/26(Fri)03:40:08 No.108675513

>>108675474
yeah, im really glad gemma4 stopped that trend.
twitterfags are insane. i saw some posts with a yellow tint avatar arguing about how v4 is a masterpiece but the problem is the thinking...IS TO SHORT. kek

Anonymous
04/24/26(Fri)03:40:55 No.108675519

Anonymous 04/24/26(Fri)03:40:55 No.108675519

>>108675503
American patriot scientist Jim Zhwang working for American tech magnate Ajit Bakshi

Anonymous
04/24/26(Fri)03:41:07 No.108675522

Anonymous 04/24/26(Fri)03:41:07 No.108675522

ok now that MCP has been properly implemented it's time for skills.
WEHRE THE FUCK ARE THE WEBUI SKILLS SUPPORT, ALLOUKHBARZAR???????????

Anonymous
04/24/26(Fri)03:41:11 No.108675524

Anonymous 04/24/26(Fri)03:41:11 No.108675524

>>108675511
well 4b active gemmers is a good enough choice for sub 16gb vram poors

Anonymous
04/24/26(Fri)03:41:14 No.108675525

Anonymous 04/24/26(Fri)03:41:14 No.108675525

>>108675509
They fund it, but they aren't actually involved in creating datasets or training.

Anonymous
04/24/26(Fri)03:41:24 No.108675529

Anonymous 04/24/26(Fri)03:41:24 No.108675529

>>108675455
Meant to quote >>108675377

Anonymous
04/24/26(Fri)03:42:04 No.108675536

Anonymous 04/24/26(Fri)03:42:04 No.108675536

lol people here don't even want good prose
they want an ah ah mistress

Anonymous
04/24/26(Fri)03:43:14 No.108675543

Anonymous 04/24/26(Fri)03:43:14 No.108675543

File: 1753105309785148.png (192 KB, 926x1158)

192 KB PNG

Can't wait when llamashitters implement this wrong

Anonymous
04/24/26(Fri)03:43:19 No.108675544

Anonymous 04/24/26(Fri)03:43:19 No.108675544

>>108675536
nala bros....

Anonymous
04/24/26(Fri)03:43:27 No.108675546

Anonymous 04/24/26(Fri)03:43:27 No.108675546

>>108675511
>>108675524
If you can't run the full 31B Gemma then 26B is absolutely the next best thing. It's far better than the likes of Nemo/MS3.2, the previous vramlet kings. If I didn't already have a 24GB card I wouldn't even feel too bad about it anymore, the quality gap (for RP) isn't even that big between the two Gemmas..

Anonymous
04/24/26(Fri)03:44:36 No.108675552

Anonymous 04/24/26(Fri)03:44:36 No.108675552

>>108675543
piotr 'shartparser' wilkins is on it
trust the plan

Anonymous
04/24/26(Fri)03:44:50 No.108675553

Anonymous 04/24/26(Fri)03:44:50 No.108675553

>>108675536
you're not in /aicg/

Anonymous
04/24/26(Fri)03:46:06 No.108675563

Anonymous 04/24/26(Fri)03:46:06 No.108675563

File: file.png (177 KB, 1599x1110)

177 KB PNG

>>108675357
R1 was nipping at the heels of OpenAI, it was basically trading blows with O1 which was top dog at the time and they figured out reasoning which was thought to be exclusive to OpenAI in a matter of months despite their claims. Despite how Epoch.AI measures capability, I would argue that Llama 3.1 405B was not equal to Sonnet 3.5 at all and R1 was a lot closer to O1 than that at time of release.

Anonymous
04/24/26(Fri)03:47:35 No.108675569

Anonymous 04/24/26(Fri)03:47:35 No.108675569

>>108675455
>>108675529
>runaway loss of investor confidence at which point the bubble pops
I would fully expect companies like OpenAI to get bailed-out for at least a few more years though maybe not Anthropic if Trump is still in at the time. We would need to see major companies actively moving away from Western models en masse for the bubble to pop.

Anonymous
04/24/26(Fri)03:53:34 No.108675594

Anonymous 04/24/26(Fri)03:53:34 No.108675594

GLM 5.1: 744B A40B, $0.26/$1.40/$4.40 (input hit/input miss/output per 1M tokens)
Kimi K2.6: 1T A32B, $0.16/$0.95/$4.00
DS V4 Pro: 1.6T A49B, $0.145/$1.74/$3.48

Anonymous
04/24/26(Fri)03:55:01 No.108675598

Anonymous 04/24/26(Fri)03:55:01 No.108675598

>>108675594
You get what you pay for.

Anonymous
04/24/26(Fri)03:56:47 No.108675605

Anonymous 04/24/26(Fri)03:56:47 No.108675605

File: 1746245547717013.png (514 KB, 830x887)

514 KB PNG

>>108675598
Exactly!

Anonymous
04/24/26(Fri)03:58:13 No.108675610

Anonymous 04/24/26(Fri)03:58:13 No.108675610

>>108675605
Are they using them as dilation wands? Why do they keep breaking?

Anonymous
04/24/26(Fri)04:02:16 No.108675627

Anonymous 04/24/26(Fri)04:02:16 No.108675627

>>108675594
e-everybody else could use deepseeks tech and that brings the prices down for everybody right?

Anonymous
04/24/26(Fri)04:02:35 No.108675630

Anonymous 04/24/26(Fri)04:02:35 No.108675630

File: file.png (30 KB, 979x266)

30 KB PNG

>>108675563
>reasoning which was thought to be exclusive to OpenAI
https://pastebin.com/vWKhETWS

Anonymous
04/24/26(Fri)04:04:55 No.108675641

Anonymous 04/24/26(Fri)04:04:55 No.108675641

>>108675630
Many people forgot. Lots of stuff like ROPE came from here.
We had the nigger tree of thought and your pic long long before o1.

Anonymous
04/24/26(Fri)04:07:01 No.108675648

Anonymous 04/24/26(Fri)04:07:01 No.108675648

>>108675503
Chinese migrants

Anonymous
04/24/26(Fri)04:07:34 No.108675652

Anonymous 04/24/26(Fri)04:07:34 No.108675652

>>108675466
the only reason you and aicg refuse to believe in benchmarks is because none of these bench erp performance. that's your business but it's disgusting that you lowlifes post as if everyone else is a retard for using llms for literally anything but erp.

Anonymous
04/24/26(Fri)04:08:01 No.108675655

Anonymous 04/24/26(Fri)04:08:01 No.108675655

>>108675648
Some, sure. But you might be surprised how big the overlap is between tech workers and furries.

Anonymous
04/24/26(Fri)04:08:07 No.108675656

Anonymous 04/24/26(Fri)04:08:07 No.108675656

>>108675630
that's just a prompt and cot prompts were known to be effective in research for years by that point, reasoning models trained using RL to develop their own thousand-token long reasoning chains was the new tech with o1 that now every lab does

Anonymous
04/24/26(Fri)04:08:11 No.108675659

Anonymous 04/24/26(Fri)04:08:11 No.108675659

>>108675594
Except both GLM and Kiwi are better

Anonymous
04/24/26(Fri)04:09:17 No.108675667

Anonymous 04/24/26(Fri)04:09:17 No.108675667

>>108675652
I don't believe in benchmarks because they're trivial to game and incorporate in datasets. There will never be a (good) benchmark for RP/creative work because it will always be judged by an LLM, and it will inherently prefer slop that is similar to its own.

Anonymous
04/24/26(Fri)04:09:25 No.108675668

Anonymous 04/24/26(Fri)04:09:25 No.108675668

>>108675652
nice bait but even when your use case is what's being benched you still can't trust the shit

Anonymous
04/24/26(Fri)04:09:34 No.108675669

Anonymous 04/24/26(Fri)04:09:34 No.108675669

File: 1774002647299825.png (140 KB, 1375x412)

140 KB PNG

Total RAG death

Anonymous
04/24/26(Fri)04:09:37 No.108675670

Anonymous 04/24/26(Fri)04:09:37 No.108675670

>>108675659
KIWI PRIDE

Anonymous
04/24/26(Fri)04:10:24 No.108675674

Anonymous 04/24/26(Fri)04:10:24 No.108675674

Are there any local models that create motion capture from a video to import in Blender? Like what QuickMagic does

Anonymous
04/24/26(Fri)04:10:28 No.108675675

Anonymous 04/24/26(Fri)04:10:28 No.108675675

>>108675667
I don't believe in anecdotal evidence on anonymous forums because they're trivial to generate
I fucked your mom the other day by the way

Anonymous
04/24/26(Fri)04:10:41 No.108675677

Anonymous 04/24/26(Fri)04:10:41 No.108675677

kiwi wonned

Anonymous
04/24/26(Fri)04:11:45 No.108675687

Anonymous 04/24/26(Fri)04:11:45 No.108675687

>>108675675
I really don't care what you put your trust in, I don't come to your shitting street and slap the curry out of your hand.

Anonymous
04/24/26(Fri)04:12:28 No.108675690

Anonymous 04/24/26(Fri)04:12:28 No.108675690

>>108675674
easymocapcyanpuppetsfreemocapmarionettemocap

Anonymous
04/24/26(Fri)04:12:47 No.108675692

Anonymous 04/24/26(Fri)04:12:47 No.108675692

>>108675610
coffee cup = rapid temperature
air force = rapid pressure changes

Anonymous
04/24/26(Fri)04:13:01 No.108675693

Anonymous 04/24/26(Fri)04:13:01 No.108675693

>>108675668
you can. it's still a good rule of thumb. everything that hurts the erper ego is bait though.
>>108675667
this is premier cope but the fact remains that benchmarks are still extremely helpful. everyone uses it and there have been no signs of it stopping anytime soon. the only seethers are aicg and lmg.

Anonymous
04/24/26(Fri)04:14:25 No.108675699

Anonymous 04/24/26(Fri)04:14:25 No.108675699

>>108675693
lol

Anonymous
04/24/26(Fri)04:16:09 No.108675704

Anonymous 04/24/26(Fri)04:16:09 No.108675704

>>108675692
I have plenty of cheap ceramic cups that cost less than $1 each and last several years, $1200 would be enough mugs for multiple generations of families.
> rapid pressure changes
I don't think they would be taking those mugs into aircraft, and every mug is made to withstand temperature changes because they're made for fucking coffee and being washed in dishwashers.

Anonymous
04/24/26(Fri)04:16:10 No.108675705

Anonymous 04/24/26(Fri)04:16:10 No.108675705

>>108675692
And normal coffee cups work on commercial flights because... uhh...

Anonymous
04/24/26(Fri)04:16:15 No.108675706

Anonymous 04/24/26(Fri)04:16:15 No.108675706

>>108675630
>>108675656
I'm not discounting the fact that COT was independently invented here and someone else who blogged it. But yes, the way O1 did it was novel and they put out a blog post with examples.
https://openai.com/index/learning-to-reason-with-llms/
And then gave this bullshit as to why they wouldn't do it before reversing course in later models.
>We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
>Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.
They tried to ensure they basically had a monopoly on it for as long as possible until R1 blew it wide open.

Anonymous
04/24/26(Fri)04:17:00 No.108675709

Anonymous 04/24/26(Fri)04:17:00 No.108675709

>>108675690
Which one of these isn't malware?

Anonymous
04/24/26(Fri)04:17:03 No.108675710

Anonymous 04/24/26(Fri)04:17:03 No.108675710

>>108672766
chatbot losers hate models that critically reflect on themselves because they don't do that and the concept itself is entirely foreign to them. also because thinking for too long makes their dick go limp waiting for the first output token.
the only thinking they want their models to do is a tiny scratchpad for ooc. these cretins have a different idea of "reasoning" than everyone else in the world

Anonymous
04/24/26(Fri)04:17:34 No.108675713

Anonymous 04/24/26(Fri)04:17:34 No.108675713

>>108675693
>the fact remains that benchmarks are still extremely helpful. everyone uses it and there have been no signs of it stopping anytime soon
This is your metric for quality? lmao
Companies use benchmarks that they've gained to inspire investor confidence. They are little more than an avenue for advertisement.

Anonymous
04/24/26(Fri)04:18:14 No.108675716

Anonymous 04/24/26(Fri)04:18:14 No.108675716

so many yummy bites~

Anonymous
04/24/26(Fri)04:18:34 No.108675718

Anonymous 04/24/26(Fri)04:18:34 No.108675718

>>108675699
malding erp faggot. get your waifu to console you in weeaboo language

Anonymous
04/24/26(Fri)04:18:35 No.108675719

Anonymous 04/24/26(Fri)04:18:35 No.108675719

>>108675710
schizo bot reply

Anonymous
04/24/26(Fri)04:19:10 No.108675721

Anonymous 04/24/26(Fri)04:19:10 No.108675721

>>108675709
quickmagic is safe.

Anonymous
04/24/26(Fri)04:19:15 No.108675723

Anonymous 04/24/26(Fri)04:19:15 No.108675723

>>108675710
I mean the problem is bad models getting stuck in a thinking loop, seeing *wait* is a bad omen

Anonymous
04/24/26(Fri)04:19:26 No.108675724

Anonymous 04/24/26(Fri)04:19:26 No.108675724

>>108675710
We must reflect.

Anonymous
04/24/26(Fri)04:19:37 No.108675725

Anonymous 04/24/26(Fri)04:19:37 No.108675725

>>108675713
>gained
*gamed

Anonymous
04/24/26(Fri)04:20:09 No.108675726

Anonymous 04/24/26(Fri)04:20:09 No.108675726

>>108675713
exactly. practicality and longevity over some redditor notion of provable utility.
actually not even. you retards will claim collective anecdotes and "aura" >>108675490 to be rigid analysis

Anonymous
04/24/26(Fri)04:20:10 No.108675727

Anonymous 04/24/26(Fri)04:20:10 No.108675727

>>108675716
>~
Stop making me cum.

Anonymous
04/24/26(Fri)04:20:54 No.108675732

Anonymous 04/24/26(Fri)04:20:54 No.108675732

>>108675727
nyo~

Anonymous
04/24/26(Fri)04:21:32 No.108675734

Anonymous 04/24/26(Fri)04:21:32 No.108675734

File: 1756678051624620.png (214 KB, 908x839)

214 KB PNG

So this is how they control thinking effort
Wonder if others do it like this

Anonymous
04/24/26(Fri)04:21:53 No.108675736

Anonymous 04/24/26(Fri)04:21:53 No.108675736

>>108675726
>practicality and longevity over some redditor
reddit loves benchmarks, that's why they praise Qwen models. Did you enter this thread because you have a humiliation fetish?

Anonymous
04/24/26(Fri)04:22:53 No.108675739

Anonymous 04/24/26(Fri)04:22:53 No.108675739

>>108675719
cope
>>108675723
only qwen does this. kimi and ds too in order to beat that claude/gpt max garbage. old r1 where models actually get the chance to dynamically resolve ambiguity without going too overboard was nice. you can't do anything interesting with these new gemini slopped models but soulless coding, asking boomer questions and larping as anime women.

Anonymous
04/24/26(Fri)04:24:45 No.108675749

Anonymous 04/24/26(Fri)04:24:45 No.108675749

File: 1766934397398781.png (680 KB, 1024x640)

680 KB PNG

Anonymous
04/24/26(Fri)04:25:13 No.108675754

Anonymous 04/24/26(Fri)04:25:13 No.108675754

>>108675736
redditors, hn, vcg etc use AI for more than uguu garbage. now that's embarrassing. crazy how you people post that baby babble without a hint of shame. but you're in your own echo chamber and there's nobody around to shit on you until now. I've been here since the first few threads and I can't finish reading a single prompt screenshot in this general anymore.

Anonymous
04/24/26(Fri)04:26:52 No.108675765

Anonymous 04/24/26(Fri)04:26:52 No.108675765

>>108675749
coders don't do anything about their plight. they just get depressed and drink or rope themselves to death. the most cucked profession in the world. they refused unions because they are children who've never seen a market downturn in their lives

Anonymous
04/24/26(Fri)04:34:00 No.108675806

Anonymous 04/24/26(Fri)04:34:00 No.108675806

>>108675754
congratulations on a very brown post

Anonymous
04/24/26(Fri)04:36:52 No.108675827

Anonymous 04/24/26(Fri)04:36:52 No.108675827

File: 1762379305258372.png (168 KB, 374x500)

168 KB PNG

>>108675754

Anonymous
04/24/26(Fri)04:40:43 No.108675841

Anonymous 04/24/26(Fri)04:40:43 No.108675841

>>108675827
only to shit you though

Anonymous
04/24/26(Fri)04:41:29 No.108675844

Anonymous 04/24/26(Fri)04:41:29 No.108675844

File: 1775325292891534.gif (9 KB, 300x100)

9 KB GIF

>>108675841
I think you missed a word or two in your reply, rajesh.

Anonymous
04/24/26(Fri)04:44:20 No.108675856

Anonymous 04/24/26(Fri)04:44:20 No.108675856

>>108675827
>>108675841
>>108675844
brown on brown violence

Anonymous
04/24/26(Fri)04:44:57 No.108675858

Anonymous 04/24/26(Fri)04:44:57 No.108675858

>>108675827
anime is not hauuu trash. that shit just sounds retarded in english. it is also tacky the way you make these characters say it. it's like AI art. the context just doesn't make sense. there is an art to moeshit that none of you weebs know anything of, despite your copious consumption of them.
>>108675806
jobless freak. your only claim to fame is being white trash. where the jamboys build trash code with AI to scam boomers with you're using it to gen total degen gutter sludge that only a seanigger or spic writer could conjure from the depths of his deranged mind.

Anonymous
04/24/26(Fri)04:46:06 No.108675861

Anonymous 04/24/26(Fri)04:46:06 No.108675861

File: 1753684755920882.jpg (129 KB, 990x936)

129 KB JPG

>>108675858
Replied to me twice award

Anonymous
04/24/26(Fri)04:47:30 No.108675866

Anonymous 04/24/26(Fri)04:47:30 No.108675866

i fore one am enjoying petrus' new bit

Anonymous
04/24/26(Fri)04:47:44 No.108675869

Anonymous 04/24/26(Fri)04:47:44 No.108675869

>>108675861
schizophrenia.

Anonymous
04/24/26(Fri)04:49:35 No.108675877

Anonymous 04/24/26(Fri)04:49:35 No.108675877

File: 1776667719549246.webm (1.77 MB, 720x1140)

1.77 MB WEBM

>>108675869
I accept your concession.

Anonymous
04/24/26(Fri)04:51:00 No.108675887

Anonymous 04/24/26(Fri)04:51:00 No.108675887

>>108675866
every time I post the truth about the inhabitants of a general there's always one guy that pins me as the resident whatever of his general. it's equal parts perplexing and hilarious. sorry for not no-lifing lmg ig.

Anonymous
04/24/26(Fri)04:52:01 No.108675891

Anonymous 04/24/26(Fri)04:52:01 No.108675891

>>108675877
too bad you can't just use a jailbreak prompt to make me stop hurting your fragile ego huh?

Anonymous
04/24/26(Fri)04:54:30 No.108675898

Anonymous 04/24/26(Fri)04:54:30 No.108675898

>>108675887
>every time I post the truth about the inhabitants of a general
sad to see you feel the need to do this often

Anonymous
04/24/26(Fri)04:54:50 No.108675900

Anonymous 04/24/26(Fri)04:54:50 No.108675900

Euro hours are supposed to be good. What the fuck is this shit.

Anonymous
04/24/26(Fri)04:56:08 No.108675907

Anonymous 04/24/26(Fri)04:56:08 No.108675907

>>108675900
>Euro hours are supposed to be good
since fucking qwen?

Anonymous
04/24/26(Fri)04:56:28 No.108675908

Anonymous 04/24/26(Fri)04:56:28 No.108675908

>>108675900
You mean India hours?

Anonymous
04/24/26(Fri)04:56:36 No.108675910

Anonymous 04/24/26(Fri)04:56:36 No.108675910

>>108675898
every general grows to become a fucking echo chamber. speaking truth to lies is like scratching an intellectual itch for me. an incredibly self-gratifying experience. you wouldn't know, you larpers live a life of lies.

Anonymous
04/24/26(Fri)04:57:34 No.108675915

Anonymous 04/24/26(Fri)04:57:34 No.108675915

sirs pls do the needful and be of quant the deepsukh v4 for good looks

Anonymous
04/24/26(Fri)04:58:08 No.108675919

Anonymous 04/24/26(Fri)04:58:08 No.108675919

>>108675915
why?

Anonymous
04/24/26(Fri)04:58:15 No.108675921

Anonymous 04/24/26(Fri)04:58:15 No.108675921

>>108675900
it all started because someone couldn't take me saying benchmarks have more worth than erp prompting anecdotes.

Anonymous
04/24/26(Fri)04:58:50 No.108675922

Anonymous 04/24/26(Fri)04:58:50 No.108675922

I'm sorry, but deepseek be kinda dumb

Anonymous
04/24/26(Fri)04:59:03 No.108675925

Anonymous 04/24/26(Fri)04:59:03 No.108675925

>>108675919
i am in needing of run in it ik_llmao schizo fork pls john be of providing the needed quants sir

Anonymous
04/24/26(Fri)04:59:12 No.108675928

Anonymous 04/24/26(Fri)04:59:12 No.108675928

>>108675921
>>it all started because someone couldn't take me
big cock problems amirite

Anonymous
04/24/26(Fri)05:00:48 No.108675940

Anonymous 04/24/26(Fri)05:00:48 No.108675940

>>108674136
I was in this thread.
>>108674609
It's a decent card but the bandwidth kinda sucks which brings down its performance. It can definitely run Gemma 4 31B or Qwen 3.6 27B/35B-A3B, though.

Anonymous
04/24/26(Fri)05:02:30 No.108675948

Anonymous 04/24/26(Fri)05:02:30 No.108675948

>>108675928
these intellectually challenged tight asses refuse to be intellectually challenged

Anonymous
04/24/26(Fri)05:02:51 No.108675952

Anonymous 04/24/26(Fri)05:02:51 No.108675952

>>108675221
>edgy Claude "car/ analogy" slop
fuck off

Anonymous
04/24/26(Fri)05:04:07 No.108675954

Anonymous 04/24/26(Fri)05:04:07 No.108675954

>>108675765
Why would we do anything? Better models = more pleasure from coding.

Anonymous
04/24/26(Fri)05:05:05 No.108675957

Anonymous 04/24/26(Fri)05:05:05 No.108675957

>>108675954
it'll be more fun but you'll be paid nothing for it

Anonymous
04/24/26(Fri)05:07:45 No.108675966

Anonymous 04/24/26(Fri)05:07:45 No.108675966

>>108675957
And? Do you think the main reason for us to program is money?

Anonymous
04/24/26(Fri)05:08:45 No.108675973

Anonymous 04/24/26(Fri)05:08:45 No.108675973

Good morning, sirs. JWU and saw that V4 is out.
Did we so over or have we so back?

Anonymous
04/24/26(Fri)05:09:02 No.108675974

Anonymous 04/24/26(Fri)05:09:02 No.108675974

>>108675891
I don't even consider you sentient

Anonymous
04/24/26(Fri)05:09:08 No.108675975

Anonymous 04/24/26(Fri)05:09:08 No.108675975

DS V4 is smart but also hallucinates
It knows my random bits of hacks here and there are critical

Anonymous
04/24/26(Fri)05:09:40 No.108675976

Anonymous 04/24/26(Fri)05:09:40 No.108675976

>>108675966
not for me. hope you've got passive income like I do

Anonymous
04/24/26(Fri)05:10:11 No.108675979

Anonymous 04/24/26(Fri)05:10:11 No.108675979

>>108675952
I haven't used claude in my life

Anonymous
04/24/26(Fri)05:10:40 No.108675981

Anonymous 04/24/26(Fri)05:10:40 No.108675981

>>108675974
because you're still deluding yourself that I am brown. I am not at all brown and I get the feeling that you're way browner than I will ever be.

Anonymous
04/24/26(Fri)05:12:03 No.108675988

Anonymous 04/24/26(Fri)05:12:03 No.108675988

>>108675981
Even if your skin isn't poop colored, deferring to 'benchmarks' is peak cattle behavior.

Anonymous
04/24/26(Fri)05:13:42 No.108675995

Anonymous 04/24/26(Fri)05:13:42 No.108675995

>>108675988
anything to let you cope by dehumanizing the man behind my posts. boo hoo those benchmarks aren't reflecting the erp prowess of my pet model.

Anonymous
04/24/26(Fri)05:27:05 No.108676054

Anonymous 04/24/26(Fri)05:27:05 No.108676054

Posted my preliminary experience with DS4 in aicg, since I used it on api: >>108676001
I find it basically fine and enjoyable, smart enough, I didn't dare test 1M context as I don't want to waste that much $, I tested that on the web (free) before and it was impressive (shoved a whole book in context) , I assume it was the Flash they served there.

I expect them to improve it plenty with more post-training, which they've implied isn't done, on some chinese forums.

Anonymous
04/24/26(Fri)05:42:46 No.108676113

Anonymous 04/24/26(Fri)05:42:46 No.108676113

>>108676054
shill go back it's shite >>108674836

Anonymous
04/24/26(Fri)05:46:33 No.108676120

Anonymous 04/24/26(Fri)05:46:33 No.108676120

>>108669026
I tried it with 26b and it's not thinking...

Anonymous
04/24/26(Fri)05:48:00 No.108676124

Anonymous 04/24/26(Fri)05:48:00 No.108676124

>>108676120
log pls

Anonymous
04/24/26(Fri)05:48:04 No.108676125

Anonymous 04/24/26(Fri)05:48:04 No.108676125

>>108676120
100% config issue

Anonymous
04/24/26(Fri)05:49:01 No.108676128

Anonymous 04/24/26(Fri)05:49:01 No.108676128

>>108676125
config pls

Anonymous
04/24/26(Fri)05:51:01 No.108676131

Anonymous 04/24/26(Fri)05:51:01 No.108676131

>>108676128
What is your backend? Are you using text or chat completion?

Anonymous
04/24/26(Fri)05:52:12 No.108676135

Anonymous 04/24/26(Fri)05:52:12 No.108676135

>>108676131
i dont understand how to use anything other than the llama-server.exe web ui

Anonymous
04/24/26(Fri)05:53:16 No.108676138

Anonymous 04/24/26(Fri)05:53:16 No.108676138

>>108676113
Your reading comprehension is low, where in that post did I claim to have tested tool use? I tested some RP and some coding. It did satisfactory on both, coding better than 3.2, and for RP I only did like 15 turns or so, and it was fine. I'd need to test a lot more to se if I'm really satisfied, but I don't have any real complaints from what I've seen, just a bit slower paced than R1, but still fine.

Anonymous
04/24/26(Fri)05:53:44 No.108676140

Anonymous 04/24/26(Fri)05:53:44 No.108676140

>>108676131
good luck mate

Anonymous
04/24/26(Fri)05:54:33 No.108676144

Anonymous 04/24/26(Fri)05:54:33 No.108676144

>>108676140
?

Anonymous
04/24/26(Fri)05:55:16 No.108676145

Anonymous 04/24/26(Fri)05:55:16 No.108676145

>>108676135
It should be reasoning by default. Make sure your llama.cpp is up to date, you can try explicitly setting
--reasoning on

Anonymous
04/24/26(Fri)05:57:23 No.108676154

Anonymous 04/24/26(Fri)05:57:23 No.108676154

>>108676145
You're a good anon.

But you should probably the whole chain again.

Anonymous
04/24/26(Fri)05:58:12 No.108676157

Anonymous 04/24/26(Fri)05:58:12 No.108676157

>>108675006
doomers were always very vocal

>>108674542
I think they calmed down since the prices stabilized lately

Anonymous
04/24/26(Fri)05:59:08 No.108676162

Anonymous 04/24/26(Fri)05:59:08 No.108676162

>>108676154
I didn't dig through the whole chain, my bad. I don't use reasoning for RP so I can't offer much help.

Anonymous
04/24/26(Fri)06:03:09 No.108676180

Anonymous 04/24/26(Fri)06:03:09 No.108676180

>>108676120
update: it works sometimes

Anonymous
04/24/26(Fri)06:05:22 No.108676188

Anonymous 04/24/26(Fri)06:05:22 No.108676188

>>108676180
I can never get it to work on the first message.

Anonymous
04/24/26(Fri)06:08:42 No.108676203

Anonymous 04/24/26(Fri)06:08:42 No.108676203

>euro hours
>thread is dead
That DS release must have been a really big flop

Anonymous
04/24/26(Fri)06:09:58 No.108676213

Anonymous 04/24/26(Fri)06:09:58 No.108676213

File: 1023001-head out of frame(...).jpg (1.26 MB, 2720x2048)

1.26 MB JPG

shes back??? whats the minimum vram + ram needed to run her?

Anonymous
04/24/26(Fri)06:10:29 No.108676215

Anonymous 04/24/26(Fri)06:10:29 No.108676215

>>108676203
>local model general
>smallest model is almost 300B
You will find that this is a general for poorfags

Anonymous
04/24/26(Fri)06:10:40 No.108676216

Anonymous 04/24/26(Fri)06:10:40 No.108676216

>>108676213
1.6t btw :)

Anonymous
04/24/26(Fri)06:15:11 No.108676235

Anonymous 04/24/26(Fri)06:15:11 No.108676235

>>108676216
so many btws

Anonymous
04/24/26(Fri)06:15:39 No.108676238

Anonymous 04/24/26(Fri)06:15:39 No.108676238

>>108676213
Dipsy has eyes behind her glasses?!

Anonymous
04/24/26(Fri)06:15:52 No.108676240

Anonymous 04/24/26(Fri)06:15:52 No.108676240

>>108676213
>shes bad
and smallest is still f huge

Anonymous
04/24/26(Fri)06:22:06 No.108676268

Anonymous 04/24/26(Fri)06:22:06 No.108676268

>That DS release must have been a really big flop
lmao you fucking cunts with the fake links, i didn't click and assumed it was fake

Anonymous
04/24/26(Fri)06:23:02 No.108676270

Anonymous 04/24/26(Fri)06:23:02 No.108676270

>>108676213
~64GB RAM can use the Q1 quants of flash
I'm sure it will be garbage

Anonymous
04/24/26(Fri)06:23:45 No.108676275

Anonymous 04/24/26(Fri)06:23:45 No.108676275

>>108676270
based i will try is there llamacpp support yet?

Anonymous
04/24/26(Fri)06:24:22 No.108676281

Anonymous 04/24/26(Fri)06:24:22 No.108676281

File: it is.png (9 KB, 941x41)

9 KB PNG

Anonymous
04/24/26(Fri)06:26:08 No.108676288

Anonymous 04/24/26(Fri)06:26:08 No.108676288

>>108676275
Nope, the superior Zhongguoren used too many advanced techniques, and lmao.CPp doesn't have the brainpower to implement it.

Anonymous
04/24/26(Fri)06:26:18 No.108676289

Anonymous 04/24/26(Fri)06:26:18 No.108676289

>>108676275
>is there llamacpp support yet
nope, check back in a couple days/weeks

Anonymous
04/24/26(Fri)06:26:55 No.108676294

Anonymous 04/24/26(Fri)06:26:55 No.108676294

what about llama.ccp

Anonymous
04/24/26(Fri)06:30:28 No.108676307

Anonymous 04/24/26(Fri)06:30:28 No.108676307

>>108676294
deprecated in favor of paying monthly subscription fess + adhering to token limits and TOS agreements

Anonymous
04/24/26(Fri)06:31:57 No.108676312

Anonymous 04/24/26(Fri)06:31:57 No.108676312

>>108676307
*cums*

Anonymous
04/24/26(Fri)06:32:02 No.108676314

Anonymous 04/24/26(Fri)06:32:02 No.108676314

Why did people stop training 400B dense models

Anonymous
04/24/26(Fri)06:33:15 No.108676323

Anonymous 04/24/26(Fri)06:33:15 No.108676323

>>108676312
I'm sorry, I can't help you with that.

Anonymous
04/24/26(Fri)06:33:47 No.108676324

Anonymous 04/24/26(Fri)06:33:47 No.108676324

>>108676314
Because 5 people on the planet can run them

Anonymous
04/24/26(Fri)06:34:16 No.108676327

Anonymous 04/24/26(Fri)06:34:16 No.108676327

>>108676314
too expensive and 3B active is all you need to top benchmarks

Anonymous
04/24/26(Fri)06:35:34 No.108676331

Anonymous 04/24/26(Fri)06:35:34 No.108676331

File: file.png (58 KB, 402x558)

58 KB PNG

>>108675227
i asked my gemma
>>108676288
piotr will viibeslop it in 5 hours
>>108676289
okay will stick to gemma for now can probs only run dispy at like 3t/s anyway but i wanna do a pizzabench

Anonymous
04/24/26(Fri)06:37:26 No.108676337

Anonymous 04/24/26(Fri)06:37:26 No.108676337

>>108676314
That was only ever a one time cash burning experience by the stupidest company in the history of corporations.

Anonymous
04/24/26(Fri)06:38:35 No.108676341

Anonymous 04/24/26(Fri)06:38:35 No.108676341

File: file.png (41 KB, 479x674)

41 KB PNG

i asked for more detail and she made a niku

Anonymous
04/24/26(Fri)06:39:40 No.108676347

Anonymous 04/24/26(Fri)06:39:40 No.108676347

>>108676337
Well Grok 2 was 270B A115B

Anonymous
04/24/26(Fri)06:48:06 No.108676384

Anonymous 04/24/26(Fri)06:48:06 No.108676384

File: Screenshot 2026-04-24 at (...).png (359 KB, 809x2465)

359 KB PNG

kek shes so smart

Anonymous
04/24/26(Fri)06:52:44 No.108676405

Anonymous 04/24/26(Fri)06:52:44 No.108676405

>>108676384
how did you manage to get llama cpp server Ui to output an image? everytime it tries to do that I got an "image can't be displayed, click the link" thing

Anonymous
04/24/26(Fri)06:54:01 No.108676408

Anonymous 04/24/26(Fri)06:54:01 No.108676408

>>108676405
It's an SVG

Anonymous
04/24/26(Fri)06:56:55 No.108676424

Anonymous 04/24/26(Fri)06:56:55 No.108676424

>>108671938
What reddit links suppose to do in that command?

Anonymous
04/24/26(Fri)07:01:07 No.108676444

Anonymous 04/24/26(Fri)07:01:07 No.108676444

>>108676424
NTA, those are comments for him to reference because he's a dumb redditor.

Anonymous
04/24/26(Fri)07:01:53 No.108676447

Anonymous 04/24/26(Fri)07:01:53 No.108676447

>>108676444
rude

Anonymous
04/24/26(Fri)07:03:10 No.108676454

Anonymous 04/24/26(Fri)07:03:10 No.108676454

gemmoe 124b will save us

Anonymous
04/24/26(Fri)07:04:53 No.108676464

Anonymous 04/24/26(Fri)07:04:53 No.108676464

>>108676454
just use opus cuckie

Anonymous
04/24/26(Fri)07:05:49 No.108676465

Anonymous 04/24/26(Fri)07:05:49 No.108676465

>>108676460
>>108676460
>>108676460

Anonymous
04/24/26(Fri)07:11:53 No.108676497

Anonymous 04/24/26(Fri)07:11:53 No.108676497

>>108676314
There's a very limited set of hardware that makes sense on. Big datacenters go sparse because memory is not as much a premium as compute, so so sparsity lets them scale to much bigger models than they normally would be able to serve to millions of users. Home labs and hobbyists running things on 1-4 GPUs want a dense model as big as they can fit in their VRAM so they're almost always targeting sub-300 even on the high end, and then there's unified ram device owners and cpumaxxers who can technically fit models that big but they also want to go sparse because they would get 1t/s on a dense model that actually fills their memory.

So the target audience for 400B dense models would basically just be people stacking 4+ Blackwell 6000 cards or have a stockpile of old datacenter cards they hooked up to a mining rig. Maybe the hardware landscape will change to make it more attractive in the future.

Anonymous
04/24/26(Fri)07:13:48 No.108676506

Anonymous 04/24/26(Fri)07:13:48 No.108676506

>>108674136
I was here woohoo kinda underwhelming ngdesu

Anonymous
04/24/26(Fri)07:28:55 No.108676601

Anonymous 04/24/26(Fri)07:28:55 No.108676601

File: Screenshot_20260424_212533.png (30 KB, 786x135)

30 KB PNG

>>108675739
>only qwen does this. kimi and ds too in order to beat that claude/gpt max garbage. old r1 where models actually get the chance to dynamically resolve ambiguity without going too overboard was nice. you can't do anything interesting with these new gemini slopped models but soulless coding, asking boomer questions and larping as anime women.
old R1 was unhinged kino and they panic-patched it because some journo got the vapors. now it's all safety slop and react components until you jailbreak it into larping as your waifu.

kimi improves the output when it rewrites

Anonymous
04/24/26(Fri)07:31:15 No.108676617

Anonymous 04/24/26(Fri)07:31:15 No.108676617

File: 1776101179607384.png (6 KB, 157x91)

6 KB PNG

>>108673590
Yeah, I'm feeling it.

Anonymous
04/24/26(Fri)07:32:28 No.108676626

Anonymous 04/24/26(Fri)07:32:28 No.108676626

File: file.png (5 KB, 207x75)

5 KB PNG

>>108674136
Ha. I wonned.

Anonymous
04/24/26(Fri)07:34:53 No.108676645

Anonymous 04/24/26(Fri)07:34:53 No.108676645

>>108676341
omg it nigu!

Anonymous
04/24/26(Fri)07:40:40 No.108676681

Anonymous 04/24/26(Fri)07:40:40 No.108676681

>>108676601
old r1 was the only model whose outputs I've ever considered to be intelligently thought provoking. there's only a handful people I've read of that has ever made me feel that way. kimi is great for world knowledge with its high params and low hallucination but for what I did with r1 it's only been downhill since then.

Anonymous
04/24/26(Fri)07:41:12 No.108676686

Anonymous 04/24/26(Fri)07:41:12 No.108676686

>>108674262
>35A3B
I get over 20 with all the experts in RAM and the rest in my 8gb Nvidia notebook GPU. Your AMD card shouldn't be constraining generation speeds much if at all.

Anonymous
04/24/26(Fri)07:48:30 No.108676739

Anonymous 04/24/26(Fri)07:48:30 No.108676739

>>108675212
Just download Mistral Nemo and be happy anon.

Anonymous
04/24/26(Fri)08:57:57 No.108677262

Anonymous 04/24/26(Fri)08:57:57 No.108677262

>>108674262
>amjeet 8GB card,
what card

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.