/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 03/12/26(Thu)07:13:22 No.108353262

File: __hatsune_miku_and_kasane(...).jpg (1.03 MB, 1200x1200)

1.03 MB JPG

/lmg/ - Local Models General Anonymous 03/12/26(Thu)07:13:22 No.108353262

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108346672 & >>108341869

►News
>(03/11) Nemotron 3 Super released: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/12/26(Thu)07:15:15 No.108353276

Anonymous 03/12/26(Thu)07:15:15 No.108353276

Today is the last day of the week for Google to release anything and it's probably going to be Gemini 3.1 Flash; nothing local.

Anonymous
03/12/26(Thu)07:15:39 No.108353279

Anonymous 03/12/26(Thu)07:15:39 No.108353279

>>108353262
That Miku's breasts are far too large.

Anonymous
03/12/26(Thu)07:16:08 No.108353282

Anonymous 03/12/26(Thu)07:16:08 No.108353282

>>108353250
>Karl Voss
>Dr. Elena Voss
>Zinnia Voss
>Dr. Eleanor Voss
>Seraphine Voss
hello sloppa

Anonymous
03/12/26(Thu)07:17:58 No.108353291

Anonymous 03/12/26(Thu)07:17:58 No.108353291

Can I start doomposting about Deepseek now that Hunter Alpha is out in the wild and thoroughly mediocre?

Anonymous
03/12/26(Thu)07:18:29 No.108353294

Anonymous 03/12/26(Thu)07:18:29 No.108353294

>>108353291
no, it can be any chines

Anonymous
03/12/26(Thu)07:18:32 No.108353296

Anonymous 03/12/26(Thu)07:18:32 No.108353296

>>108353282
Oh yeah I almost forgot about Seraphina

Anonymous
03/12/26(Thu)07:20:09 No.108353304

Anonymous 03/12/26(Thu)07:20:09 No.108353304

File: 1756285275063743.jpg (68 KB, 1280x846)

68 KB JPG

►Recent Highlights from the Previous Thread: >>108346672

--NVIDIA Nemotron-3-Super-120B-A12B-BF16 release and benchmark analysis:
>108346846 >108346876 >108346885 >108347098
--Qwen3.5 397B only 15% better than 4B on benchmarks:
>108347895 >108347950 >108347919 >108347934 >108347984 >108347997 >108348009 >108348025 >108348029
--Nvidia's $26B open-weight AI investment and market dominance:
>108351880 >108351911 >108351918 >108351943 >108351916 >108351923 >108351938 >108351942
--runescape-bench: AI Agent Benchmark for RuneScape:
>108348559 >108348568 >108348578 >108348645 >108349022 >108349071
--Lightweight local models for grammar/spelling correction:
>108350949 >108350957 >108350968 >108350966 >108351087 >108351094
--Speculation about OpenRouter's Hunter/Healer Alpha models being DeepSeek V4:
>108349438 >108349555 >108349668 >108350072 >108350416 >108350453 >108350469 >108349488 >108349636 >108349674 >108350692
--Qwen3-VL video captioning tool with VRAM requirements discussion:
>108350529 >108351412
--Nemotron-3-Super issues and cockbench hosting solutions:
>108348570 >108348592 >108348641 >108348841 >108348854 >108349297 >108349305 >108348911
--Qwen3.5-generated retro terminal video with glitch effects:
>108349360 >108349365 >108349425 >108349434 >108349586
--llama.cpp whitespace cleanup PR:
>108348889
--Miku (free space):
>108348792 >108350529 >108351926 >108352986

►Recent Highlight Posts from the Previous Thread: >>108347000

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/12/26(Thu)07:21:09 No.108353309

Anonymous 03/12/26(Thu)07:21:09 No.108353309

>>108353304
this is a grave act of tettorism

Anonymous
03/12/26(Thu)07:23:37 No.108353323

Anonymous 03/12/26(Thu)07:23:37 No.108353323

File: miku.jpg (1.6 MB, 4096x2301)

1.6 MB JPG

>>108353304
>>108353309

Anonymous
03/12/26(Thu)07:25:05 No.108353330

Anonymous 03/12/26(Thu)07:25:05 No.108353330

>>108353294
Not sure if it's final V4 but it has too many deepseek-isms to be unrelated, like the unprompted in-character thinking

Anonymous
03/12/26(Thu)07:25:47 No.108353337

Anonymous 03/12/26(Thu)07:25:47 No.108353337

Another low quality mikutroon thread incoming.

Anonymous
03/12/26(Thu)07:26:48 No.108353345

Anonymous 03/12/26(Thu)07:26:48 No.108353345

>>108353262
BLACKED duo

Anonymous
03/12/26(Thu)07:27:06 No.108353346

Anonymous 03/12/26(Thu)07:27:06 No.108353346

of course, you're here to ensure it's shit after all.

Anonymous
03/12/26(Thu)07:27:19 No.108353347

Anonymous 03/12/26(Thu)07:27:19 No.108353347

>>108353330
Qwen3.5 has unprompted in-character thinking

Anonymous
03/12/26(Thu)07:29:10 No.108353359

Anonymous 03/12/26(Thu)07:29:10 No.108353359

File: file.png (131 KB, 935x1105)

131 KB PNG

Why is ngxson such a doomer?

Anonymous
03/12/26(Thu)07:30:31 No.108353366

Anonymous 03/12/26(Thu)07:30:31 No.108353366

>>108353359
not doom, only lazy :)

Anonymous
03/12/26(Thu)07:31:14 No.108353370

Anonymous 03/12/26(Thu)07:31:14 No.108353370

>>108353366
There's no excuse being lazy in an age of vibecoding

Anonymous
03/12/26(Thu)07:32:14 No.108353381

Anonymous 03/12/26(Thu)07:32:14 No.108353381

>>108353370
fuck you
it's better for a model to be unsupported than wilkin slopped

Anonymous
03/12/26(Thu)07:32:46 No.108353383

Anonymous 03/12/26(Thu)07:32:46 No.108353383

>>108353359
I always read his name as nexon and start to become irrationally angry.

Anonymous
03/12/26(Thu)07:33:16 No.108353385

Anonymous 03/12/26(Thu)07:33:16 No.108353385

>>108353347
Unprompted in-character thinking that's randomly (enclosed in parenthesis) and has just about the same length and safetycuckery as DS3.2?

Anonymous
03/12/26(Thu)07:34:23 No.108353391

Anonymous 03/12/26(Thu)07:34:23 No.108353391

>>108353381
You don't want support based on a mock generated model? Are you bigot?

Anonymous
03/12/26(Thu)07:34:48 No.108353392

Anonymous 03/12/26(Thu)07:34:48 No.108353392

why theres no decent rp tuners on hf anymore except gay and furry? sao10k had some good shit back then but he is no more active. I just want my model to understand my fetishes in the most erotic way possible and the defaults just doesn't hit it at the right places. do we have a recommended tuner in 2026 who knows their shit?

Anonymous
03/12/26(Thu)07:35:40 No.108353396

Anonymous 03/12/26(Thu)07:35:40 No.108353396

>>108353392
just write what you want the model to say in the system prompt, that's 2026 meta

Anonymous
03/12/26(Thu)07:36:01 No.108353400

Anonymous 03/12/26(Thu)07:36:01 No.108353400

File: 1771015861001026.png (2.31 MB, 1536x1024)

2.31 MB PNG

>>108353323
Lol
>>108353291
No because its all contrived. No one knows anything yet, and its always tmw. Forever.

Anonymous
03/12/26(Thu)07:37:39 No.108353407

Anonymous 03/12/26(Thu)07:37:39 No.108353407

>>108353337
same goes for the models

Anonymous
03/12/26(Thu)07:39:32 No.108353419

Anonymous 03/12/26(Thu)07:39:32 No.108353419

File: G-Hek_IbgAAo1XQ.jpg (96 KB, 1360x768)

96 KB JPG

>>108353323
What did Ani do to deserve this?

Anonymous
03/12/26(Thu)07:40:26 No.108353429

Anonymous 03/12/26(Thu)07:40:26 No.108353429

I don't think Hunter Alpha is DS V4. If it is there's no need to mention OpenClaw at all in the description

Anonymous
03/12/26(Thu)07:41:01 No.108353431

Anonymous 03/12/26(Thu)07:41:01 No.108353431

>>108353396
whats the fun in telling the model what I wanna hear? i remember when I used to be fascinated on slightest bit of those "woah the model said exactly what i wanted to hear!!!!!!!!" moments but now its all just "what the fuck is this even saying?"

Anonymous
03/12/26(Thu)07:41:58 No.108353435

Anonymous 03/12/26(Thu)07:41:58 No.108353435

File: 1752659119928.jpg (2.34 MB, 2500x2794)

2.34 MB JPG

>>108353419

Anonymous
03/12/26(Thu)07:42:31 No.108353439

Anonymous 03/12/26(Thu)07:42:31 No.108353439

>>108353419
>rugged shorts

Anonymous
03/12/26(Thu)07:43:59 No.108353447

Anonymous 03/12/26(Thu)07:43:59 No.108353447

>>108353429
if hunter really is the anticipated d'pussy 4 then my hopes for chink shit would be shattered

Anonymous
03/12/26(Thu)07:45:30 No.108353454

Anonymous 03/12/26(Thu)07:45:30 No.108353454

one of them is dsv4lite

Anonymous
03/12/26(Thu)07:46:25 No.108353458

Anonymous 03/12/26(Thu)07:46:25 No.108353458

>>108353454
There's literally no need for a multimodal lite model

Anonymous
03/12/26(Thu)07:48:00 No.108353466

Anonymous 03/12/26(Thu)07:48:00 No.108353466

>>108353429
Horizon Alpha was GPT-5, so it might be another OAI model. Or maybe it's Avocado, heavily distilled from DeepSeek.

Anonymous
03/12/26(Thu)07:48:43 No.108353470

Anonymous 03/12/26(Thu)07:48:43 No.108353470

>>108353466
Pony Alpha was GLM-5 tho

Anonymous
03/12/26(Thu)07:50:08 No.108353478

Anonymous 03/12/26(Thu)07:50:08 No.108353478

>>108353429
The thinking traces are VERY similar to DS but maybe all chinese labs adopted them idk. Very underwhelming model tho.

Anonymous
03/12/26(Thu)07:51:19 No.108353484

Anonymous 03/12/26(Thu)07:51:19 No.108353484

>since people say underwhelm we need cook for more months now
good job ensuring quality

Anonymous
03/12/26(Thu)07:51:40 No.108353486

Anonymous 03/12/26(Thu)07:51:40 No.108353486

>>108353431
Relying on the model (small models that can be finetuned by the community, especially) on surprising you unprompted is a short-lived game.
Community finetunes were never good. At this level and scale they just can't completely modify the underlying model's behavior, and nowadays most people shortcut the process by finetuning the official instruct versions anyway. Any improvement besides slight stylistic changes is just partially undoing the built-in RLHF training.

Anonymous
03/12/26(Thu)07:55:10 No.108353507

Anonymous 03/12/26(Thu)07:55:10 No.108353507

>>108353466
>>108353470
https://openrouter.ai/openrouter
All cloaked models are something or other alpha, the name doesn't mean anything

Anonymous
03/12/26(Thu)07:57:06 No.108353525

Anonymous 03/12/26(Thu)07:57:06 No.108353525

>>108353507
of course it means something and you should speculate and post about it, please.
>this post was NOT sponsored by OpenRouter

Anonymous
03/12/26(Thu)07:57:38 No.108353531

Anonymous 03/12/26(Thu)07:57:38 No.108353531

>>108353478
DS (at least on the web interface) doesn't have a singular thinking trace. If you ask mundane questions you get short thinking trace they may as well be in the response. If you ask coding or logic questions you get in depth thinking.

Anonymous
03/12/26(Thu)07:57:42 No.108353533

Anonymous 03/12/26(Thu)07:57:42 No.108353533

Explain, without sounding insane, what is the overlap between vocaloids and Local Language Models.

Anonymous
03/12/26(Thu)07:57:56 No.108353536

Anonymous 03/12/26(Thu)07:57:56 No.108353536

>>108353470
>>108353507
So only upcoming models we know of are V4, Gemma 4, and Avocado. It could be V3.3 or whatever, different from the model DS is testing in the web chat

Anonymous
03/12/26(Thu)07:58:37 No.108353540

Anonymous 03/12/26(Thu)07:58:37 No.108353540

Is it possible to get in contact with a VRM artist on VroidHub to commission artwork? The guy I'm after doesn't have any contact links. He has a Ko-fi page. Will that let me contact this dude?

Anonymous
03/12/26(Thu)08:00:10 No.108353554

Anonymous 03/12/26(Thu)08:00:10 No.108353554

>>108353359
>their recent papers
so engram is already confirmed a complete DoA meme

Anonymous
03/12/26(Thu)08:00:14 No.108353555

Anonymous 03/12/26(Thu)08:00:14 No.108353555

>I'm totally burned out on aniblog™ guys
>proceeds to aniblog™ all over again

Anonymous
03/12/26(Thu)08:01:29 No.108353564

Anonymous 03/12/26(Thu)08:01:29 No.108353564

>>108353554
only because llama.cpp devs will refuse to implement any new architecture that is used by less than 3 models

Anonymous
03/12/26(Thu)08:02:53 No.108353571

Anonymous 03/12/26(Thu)08:02:53 No.108353571

Why is this general obsessed with anime girls that are confirmed to be blacked?

Anonymous
03/12/26(Thu)08:03:10 No.108353573

Anonymous 03/12/26(Thu)08:03:10 No.108353573

>>108353359
He's right, if someone wants to push a new architecture they should do what Qwen did and release a full spectrum of models from 0.7B to 1.5T

Anonymous
03/12/26(Thu)08:04:03 No.108353577

Anonymous 03/12/26(Thu)08:04:03 No.108353577

>>108353573
>>108353391

Anonymous
03/12/26(Thu)08:05:58 No.108353592

Anonymous 03/12/26(Thu)08:05:58 No.108353592

>the most powerful opensource model uses DSA so I'm simply not gonna implement it
Cloud AI infiltrator

Anonymous
03/12/26(Thu)08:06:08 No.108353594

Anonymous 03/12/26(Thu)08:06:08 No.108353594

>>108353533
Back in early language model days someone erped with miku and made it part of some readme.

Anonymous
03/12/26(Thu)08:07:52 No.108353602

Anonymous 03/12/26(Thu)08:07:52 No.108353602

>>108353564
they prolly would implement something used by 1 model if it was a highly popular/liked/used model
the thing about DSA is that it's used by models that, while they may be liked, would be run by almost nobody on llama.cpp anyway lmao
the few copequanter cpu maxxers of /lmg/ waiting 2 hours for the model to think to read their 2t/s slop are not a real target audience
there's no purpose in implementing X when the few who will really use that X for tasks other than cooming are going to run vLLM, SGLang or something else of that sort on cloud hardware because it's much more suited for the batched, shared loads than llama.cpp.
I actually am glad and approve of their attitude here in not wasting development time (which is limited, they don't have a huge amount of contributors in the lower level sides of lcpp) just to cater to the two most vocal lmgtards and focus on things people do really run locally.
if I was niggerganov I'd even advocate for removing all the useless novelty shit like the diffusion models that only have half baked support

Anonymous
03/12/26(Thu)08:08:12 No.108353603

Anonymous 03/12/26(Thu)08:08:12 No.108353603

Vocaloids have nothing to do with local AI models

Anonymous
03/12/26(Thu)08:10:01 No.108353666

Anonymous 03/12/26(Thu)08:10:01 No.108353666

>>108353555
Checked. If you want a blog I can give you a blog.

I tried doing a quick WebXR demo and discovered that it's extremely janky and not immersive, so I abandoned that idea. Then I had the realization that I've been approaching the lack of immersion problem wrong. Making a VRM model *expressive* doesn't matter as much as making it *reactive*. What I've been missing is the VLM, CV, and STT sensory input pipeline.

Also I think you're assuming more of the posts in this thread are me than reality.

Anonymous
03/12/26(Thu)08:10:16 No.108353669

Anonymous 03/12/26(Thu)08:10:16 No.108353669

>>108353594
>responding to the schizo

Anonymous
03/12/26(Thu)08:11:07 No.108353673

Anonymous 03/12/26(Thu)08:11:07 No.108353673

>>108353669
people are bored and entertain themselves with lolcows, news at 11

Anonymous
03/12/26(Thu)08:12:06 No.108353681

Anonymous 03/12/26(Thu)08:12:06 No.108353681

>NVIDIA is significantly expanding its footprint in open-source AI, with reports indicating a massive $26 billion investment over the next five years to develop and build open-weight AI models
There's something... off about this. How do we feel about this?

Anonymous
03/12/26(Thu)08:12:32 No.108353683

Anonymous 03/12/26(Thu)08:12:32 No.108353683

>>108353603
Hatstune Miku is the quintessential virtual waifu, and the thread is mostly about them, in the end.

Anonymous
03/12/26(Thu)08:13:12 No.108353688

Anonymous 03/12/26(Thu)08:13:12 No.108353688

>>108353681
They will spend $1B to finetune Qwen 3.0 and call it a day.

Anonymous
03/12/26(Thu)08:13:43 No.108353689

Anonymous 03/12/26(Thu)08:13:43 No.108353689

>>108353681
the only way to feel about such news, when you see the sort of garbage nvidia produces, like nemotron 3, is to hope they'll fail very hard and that no one is going to pay attention to them and just treats them like air.

Anonymous
03/12/26(Thu)08:13:46 No.108353690

Anonymous 03/12/26(Thu)08:13:46 No.108353690

>>108353681
Hopeful for swift and painless death :)

Anonymous
03/12/26(Thu)08:14:42 No.108353699

Anonymous 03/12/26(Thu)08:14:42 No.108353699

>>108353681
They're going to poison open models with their own LLM-generated slop instead of OpenAI's, Anthropic's, etc.

Anonymous
03/12/26(Thu)08:16:35 No.108353716

Anonymous 03/12/26(Thu)08:16:35 No.108353716

>>108353688
>not even 3.5
Actually, yeah that sounds about right.

Anonymous
03/12/26(Thu)08:18:32 No.108353727

Anonymous 03/12/26(Thu)08:18:32 No.108353727

>>108353681
With conditions to use NVIDIA's base model (like Anima and Cosmos) or/and nvfp4.

Anonymous
03/12/26(Thu)08:25:05 No.108353775

Anonymous 03/12/26(Thu)08:25:05 No.108353775

>>108353564
yeah but we've had a working architecture since 2023
i don't get why all these companies keep doing their ultra complex own stuff that gets them like 10% better performance but works with absolutely nothing but vllm
they should know better and just stick with what's been around if they want people to try their models

Anonymous
03/12/26(Thu)08:26:19 No.108353784

Anonymous 03/12/26(Thu)08:26:19 No.108353784

>>108353683
Yeah but /lmg/ really should take a step away from all of that in general if we want to be a serious technology general

Anonymous
03/12/26(Thu)08:26:52 No.108353791

Anonymous 03/12/26(Thu)08:26:52 No.108353791

is that uncensored qwen that released earlier good for cooming or should i stick with what i have it’s some mistral version that fits on 16GB vram. Or to rephrase the question what’s the best uncensored model that fits in 16GB vram? I do have 32gb ram too

Anonymous
03/12/26(Thu)08:26:54 No.108353793

Anonymous 03/12/26(Thu)08:26:54 No.108353793

>>108353775
they don't care about anything but DCs using vllm though, and they want to be able to market "200 gorilion contexts for real this time" for coooding

Anonymous
03/12/26(Thu)08:27:58 No.108353798

Anonymous 03/12/26(Thu)08:27:58 No.108353798

>>108353784
>if we want to be a serious technology general
>we
look what good it did localtardma https://www.reddit.com/r/LocalLLaMA/comments/1rqcsrj/1_million_localllamas/

Anonymous
03/12/26(Thu)08:28:02 No.108353799

Anonymous 03/12/26(Thu)08:28:02 No.108353799

>>108353793
if they won't care about us, why should we want to run their models? it's not like any of the dsa models are something worth running either

Anonymous
03/12/26(Thu)08:28:15 No.108353803

Anonymous 03/12/26(Thu)08:28:15 No.108353803

>>108353784
Yeah, we need to act super srs biznis like the pseuds on reddit

Anonymous
03/12/26(Thu)08:28:27 No.108353804

Anonymous 03/12/26(Thu)08:28:27 No.108353804

Can you stop feeding the singular schizo with replies you fucking mongrels

Anonymous
03/12/26(Thu)08:28:54 No.108353807

Anonymous 03/12/26(Thu)08:28:54 No.108353807

>>108353683
That is the blandest most uninteresting design for a waifu there could ever be. It is elara of anime waifus. I guess when you consider her to be the averaged slop waifu she kinda fits.

Anonymous
03/12/26(Thu)08:29:00 No.108353809

Anonymous 03/12/26(Thu)08:29:00 No.108353809

>>108353804
>t. schizo

Anonymous
03/12/26(Thu)08:29:55 No.108353813

Anonymous 03/12/26(Thu)08:29:55 No.108353813

>>108353804
Your waifu is trash and she loves BBC

Anonymous
03/12/26(Thu)08:31:19 No.108353821

Anonymous 03/12/26(Thu)08:31:19 No.108353821

>>108353435
such drawing have always shiny reflecting boobs like they were oiled up.
real boobs are a rougher texture that doesn't shine light anywhere as much.

Anonymous
03/12/26(Thu)08:31:57 No.108353829

Anonymous 03/12/26(Thu)08:31:57 No.108353829

>>108353775
>innovation bad
llama.cpp hoping that this fast moving field will never deviate too far from the GPT-2 architecture because re-implementing everything from scratch in their brittle vibecoded C++ mess is the problem

Anonymous
03/12/26(Thu)08:32:21 No.108353831

Anonymous 03/12/26(Thu)08:32:21 No.108353831

>>108353775
>if they want people to try their models
no one is "trying" the real deepseek at home, not even the one supported by llama.cpp currently, apart from a handful of batshit schizo coomers, all of them united here in this 4cucks general, and a couple of other internet schizos (AesSedai, ikawrawkarakwra)
absorb this text:
https://github.com/ggml-org/llama.cpp/discussions/205
it's pretty much like ggerganov's manifesto on the purpose of llama.cpp
>Based on the positive responses to whisper.cpp, and more recently, llama.cpp, it looks like there is a strong and growing interest for doing efficient transformer model inference on-device (i.e. at the edge).
>I would be really happy to see developers join in and help advance further the idea of "inference at the edge"
>The strongest points of the current codebase are it's simplicity and efficiency. Performance is essential
>It's early to build a full-fledged edge inference framework. The code has to remain simple and compact in order to allow for quick and easy modifications. This helps to explore ideas at a much higher rate. Bloating the software with the ideas of today will make it useless tomorrow
>The AI models are improving at a very high rate and it is important to stay on top of it. The transformer architecture in it's core is very simple. There is no need to "slap" complex things on top of it
does that scream "run model that takes a room full of GPU to run at an acceptable performance without copequant" to you
does "edge" mean something we don't know here?
what part of
>There is no need to "slap" complex things on top of it
is misunderstood too
if you don't like it, you don't have to use it
the schizo ikawrakwrak is trying to cater to the run absolutely retarded copequant with 8k context to coom crowd

Anonymous
03/12/26(Thu)08:32:33 No.108353833

Anonymous 03/12/26(Thu)08:32:33 No.108353833

If miku is THE waifu then why did silly tavern use seraphina? It is because nobody cares about your special interest.

Anonymous
03/12/26(Thu)08:32:42 No.108353835

Anonymous 03/12/26(Thu)08:32:42 No.108353835

>https://github.com/ggml-org/llama.cpp/commit/acb7c790698fa28a0fbfc0468804926815b94de3
>literally cuts off thinking after a predetermined amount of tokens
It this a legitimate technique? Are models trained to handle this?
gptoss had a "reasoning budget" but it was controlled using the system prompt.

Anonymous
03/12/26(Thu)08:33:08 No.108353841

Anonymous 03/12/26(Thu)08:33:08 No.108353841

>>108353833
tavern used to come with konosuba cards included
think about what that means

Anonymous
03/12/26(Thu)08:33:13 No.108353843

Anonymous 03/12/26(Thu)08:33:13 No.108353843

>>108353804
They don't teach kids these days not to feed the trolls

Anonymous
03/12/26(Thu)08:34:12 No.108353855

Anonymous 03/12/26(Thu)08:34:12 No.108353855

>>108353791
well?

Anonymous
03/12/26(Thu)08:35:07 No.108353857

Anonymous 03/12/26(Thu)08:35:07 No.108353857

>>108353841
That even konosuba is more relevant than the bakers obsession.

Anonymous
03/12/26(Thu)08:35:50 No.108353860

Anonymous 03/12/26(Thu)08:35:50 No.108353860

>>108353835
this works fine, but the implementation has issues (will insert the message without newlines directly into the interrupted last char of thinking, will interpret "" verbatim if you use router mode and configure reasoning-budget-message in your presets.ini so your message will appear as "message" in the thinking closure)
patch the code to strip "" away and always add \n\n before your message. The model will behave better.

Anonymous
03/12/26(Thu)08:36:45 No.108353864

Anonymous 03/12/26(Thu)08:36:45 No.108353864

>>108353833
copyright

Anonymous
03/12/26(Thu)08:43:50 No.108353896

Anonymous 03/12/26(Thu)08:43:50 No.108353896

I'm very certain that Healer Alpha is Gemma 4. It's definitely considerably smaller than K2.5 going by its capabilities. My guess is something like a 130b/10a model.

Anonymous
03/12/26(Thu)08:45:44 No.108353904

Anonymous 03/12/26(Thu)08:45:44 No.108353904

>>108353864
Konosuba has no copyright?

Anonymous
03/12/26(Thu)08:51:04 No.108353932

Anonymous 03/12/26(Thu)08:51:04 No.108353932

>>108353896
Then it should be good for translating Japanese. Is it?
I didn't feel the typical Gemini/Gemma personality from it.

Anonymous
03/12/26(Thu)08:52:58 No.108353942

Anonymous 03/12/26(Thu)08:52:58 No.108353942

>>108353262
Using real art for an ai general, bold of you op

Anonymous
03/12/26(Thu)08:53:07 No.108353945

Anonymous 03/12/26(Thu)08:53:07 No.108353945

>>108353896
rumor is
>this time their largest size might be around 120B total with 15B active
so that possibly checks out

Anonymous
03/12/26(Thu)08:54:11 No.108353949

Anonymous 03/12/26(Thu)08:54:11 No.108353949

>>108353904
that's why it was removed

Anonymous
03/12/26(Thu)08:54:59 No.108353956

Anonymous 03/12/26(Thu)08:54:59 No.108353956

>>108353896
"What is a mesugaki?"
A gemma will make itself obvious.

Anonymous
03/12/26(Thu)08:55:00 No.108353957

Anonymous 03/12/26(Thu)08:55:00 No.108353957

>>108351560
qrd

Anonymous
03/12/26(Thu)08:56:07 No.108353963

Anonymous 03/12/26(Thu)08:56:07 No.108353963

>>108353956
we don't take kindly to your kind around these parts

Anonymous
03/12/26(Thu)08:57:16 No.108353974

Anonymous 03/12/26(Thu)08:57:16 No.108353974

File: 1762184799693225.png (1.28 MB, 4500x3300)

1.28 MB PNG

While waiting for the new NVIDIA model to download I decided to give their earlier nano release a try.
Attached is the first page produced by my news summary script. The left being Qwen 3.5 35B(the same model i used to help code the script) and the right being Nemotron 3 30B. Each model was fed the same raw news data and each was given the same prompts and instructions
I don't know about you anon but I think when it comes to analysis and summarization of text Qwen 3.5 trounced Nemotron 3.

I really didn't expect that big of a difference between models and this makes me want to try more models to see the variance.

Anonymous
03/12/26(Thu)08:58:39 No.108353985

Anonymous 03/12/26(Thu)08:58:39 No.108353985

>>108353974
It's interesting to see such a clear difference in performance between the models. Trying out more models could definitely provide valuable insights into their strengths and weaknesses.

Anonymous
03/12/26(Thu)09:03:14 No.108354011

Anonymous 03/12/26(Thu)09:03:14 No.108354011

File: 1746121481644467.jpg (250 KB, 806x772)

250 KB JPG

>>108353985
Unfortunately I have to be in bed by 10:00 as I work all night tonight but that just became my plans for the weekend.
Well that and testing out the super model.

Anonymous
03/12/26(Thu)09:03:51 No.108354012

Anonymous 03/12/26(Thu)09:03:51 No.108354012

>>108353974
I think I said so before in these threads, but Qwen 35B is kind of insane when it comes to dealing with information. Extraction, summarization, etc.

Anonymous
03/12/26(Thu)09:04:07 No.108354014

Anonymous 03/12/26(Thu)09:04:07 No.108354014

>>108353896
Yeah, I don't think is DS V4.
Way too cucked.

Anonymous
03/12/26(Thu)09:04:49 No.108354017

Anonymous 03/12/26(Thu)09:04:49 No.108354017

>>108354011
are you a hot girl with teal hair? please be in london

Anonymous
03/12/26(Thu)09:07:38 No.108354035

Anonymous 03/12/26(Thu)09:07:38 No.108354035

>>108354014
>Way too cucked.
That's the way all models are going though, could be a new deepseek base with modern safety in mind, ie pre-train cucking

Anonymous
03/12/26(Thu)09:08:00 No.108354039

Anonymous 03/12/26(Thu)09:08:00 No.108354039

File: 1759840295491864.png (878 KB, 1200x1200)

878 KB PNG

>>108354012
>Qwen 35B is kind of insane when it comes to dealing with information. Extraction, summarization, etc.
yeah i really lucked into it and i have been pleased. i hope that guy leaving does not fuck up their work too much because the team has been on fire.

>>108354017
>are you hot
no
>a girl
no
>in london
thankfully no

Anonymous
03/12/26(Thu)09:09:25 No.108354047

Anonymous 03/12/26(Thu)09:09:25 No.108354047

>>108354039
you will never be Londoner?

Anonymous
03/12/26(Thu)09:10:10 No.108354053

Anonymous 03/12/26(Thu)09:10:10 No.108354053

File: 1745672098047975.png (1.52 MB, 879x4871)

1.52 MB PNG

Hunter Alpha's system prompt

Anonymous
03/12/26(Thu)09:11:01 No.108354062

Anonymous 03/12/26(Thu)09:11:01 No.108354062

>>108354053
This is a duplicate thread. Please use https://old.reddit.com/r/LocalLLaMA/comments/1rr5zfo/what_is_hunter_alpha/ instead.

In the future, please search before starting a new thread.

Anonymous
03/12/26(Thu)09:11:58 No.108354066

Anonymous 03/12/26(Thu)09:11:58 No.108354066

File: 1753558522337529.jpg (96 KB, 873x1024)

96 KB JPG

>>108354062

Anonymous
03/12/26(Thu)09:12:36 No.108354069

Anonymous 03/12/26(Thu)09:12:36 No.108354069

>Never speculate

Anonymous
03/12/26(Thu)09:12:54 No.108354072

Anonymous 03/12/26(Thu)09:12:54 No.108354072

>>108354066
>>108354062
ye
https://www.reddit.com/r/LocalLLaMA/comments/1rr9fgq/comment/o9y00ro/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>Healer Alpha system prompt

Anonymous
03/12/26(Thu)09:12:59 No.108354073

Anonymous 03/12/26(Thu)09:12:59 No.108354073

File: nemotron-super_mesugaki.png (700 KB, 1595x1609)

700 KB PNG

>>108353956
Healer Alpha is not working at the moment, but Nemotron Super 120B is a real piece of shit (see picrel).

Anonymous
03/12/26(Thu)09:13:35 No.108354076

Anonymous 03/12/26(Thu)09:13:35 No.108354076

Can't believe people would use a model called Nemo TROON

Anonymous
03/12/26(Thu)09:14:25 No.108354079

Anonymous 03/12/26(Thu)09:14:25 No.108354079

>>108354073
:rocket: this is perfect!

Anonymous
03/12/26(Thu)09:14:38 No.108354081

Anonymous 03/12/26(Thu)09:14:38 No.108354081

>>108354073
Isn't it just a gptoss 120b fine tune?

Anonymous
03/12/26(Thu)09:16:29 No.108354089

Anonymous 03/12/26(Thu)09:16:29 No.108354089

>>108354081
no? they open source the train datas, and it uses hybrid arch like qwen

Anonymous
03/12/26(Thu)09:16:40 No.108354090

Anonymous 03/12/26(Thu)09:16:40 No.108354090

File: 1749653474289704.jpg (62 KB, 405x720)

62 KB JPG

>>108354047
>you will never be Londoner?
it is my understanding that there are no British people left in London

Anonymous
03/12/26(Thu)09:16:51 No.108354093

Anonymous 03/12/26(Thu)09:16:51 No.108354093

>>108354081
No, it's a completely different model.

> The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Super model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using NVFP4 quantization to maximize compute efficiency. The model has 12B active parameters and 120B parameters in total.

Anonymous
03/12/26(Thu)09:20:06 No.108354113

Anonymous 03/12/26(Thu)09:20:06 No.108354113

>>108354089
>>108354093
I guess openai just managed to poison the well enough to make nvidia's model spit out the same sort of shit their 120b does.

Anonymous
03/12/26(Thu)09:20:46 No.108354121

Anonymous 03/12/26(Thu)09:20:46 No.108354121

>>108354113
Oh they probably used lots of traces from it, but it's not it at the base for sure.

Anonymous
03/12/26(Thu)09:22:01 No.108354128

Anonymous 03/12/26(Thu)09:22:01 No.108354128

>>108354121
The recent HF article about synthetic data said OSS-120 was great for making lots of data because it's so fast, so no doubt NVIDIA used it, along with probably Qwen2.5 0.5B and such..

Anonymous
03/12/26(Thu)09:22:47 No.108354135

Anonymous 03/12/26(Thu)09:22:47 No.108354135

>>108354128
That certainly explains it then, nvidia fell for the bait and gobbled it up.

Anonymous
03/12/26(Thu)09:27:19 No.108354166

Anonymous 03/12/26(Thu)09:27:19 No.108354166

Why is everyone making hybrid rnn models now?

Anonymous
03/12/26(Thu)09:27:30 No.108354169

Anonymous 03/12/26(Thu)09:27:30 No.108354169

File: file.png (32 KB, 724x351)

32 KB PNG

>>108354135
https://huggingface.co/spaces/HuggingFaceFW/finephrase#results
>Consider gpt-oss-120b, a strong MoE model that balances quality and throughput well.
>Notice that gpt-oss-120b matches Qwen3-8B in per-GPU throughput despite being a much larger model. Two things make this possible: only ~5B of its 120B parameters are active per token (MoE), and the weights are MXFP4-quantized so the full model fits on a single 80GB GPU. That makes large MoE models the sweet spot for quality-per-GPU: a single 8-GPU node running gpt-oss-120b generates ~176 million tokens per hour, and six nodes get you past the billion-token-per-hour mark. With the cost picture clear, let’s distill the patterns across all 18 models.
>Tier 0 (parallelism/batching) delivers the biggest wins for large/MoE models. gpt-oss-120b gained 1.95x and Qwen3-30B-A3B gained 1.78x purely from finding the right tp and batch sizes.

Anonymous
03/12/26(Thu)09:28:57 No.108354182

Anonymous 03/12/26(Thu)09:28:57 No.108354182

>>108354166
it's great for DCs and a bit sucky for local (no rewind, cache is hit or miss), ie it's perfect!

Anonymous
03/12/26(Thu)09:30:03 No.108354189

Anonymous 03/12/26(Thu)09:30:03 No.108354189

>>108354182
It's not hit or miss... context shifting does not work at all if it's a hybrid.

Anonymous
03/12/26(Thu)09:30:41 No.108354192

Anonymous 03/12/26(Thu)09:30:41 No.108354192

File: f.png (86 KB, 500x497)

86 KB PNG

>>108354169
>a single 8-GPU node running gpt-oss-120b generates ~176 million tokens per hour, and six nodes get you past the billion-token-per-hour mark.

Anonymous
03/12/26(Thu)09:34:00 No.108354206

Anonymous 03/12/26(Thu)09:34:00 No.108354206

File: Gem.png (1.29 MB, 1030x1024)

1.29 MB PNG

>>108354192
better ver

Anonymous
03/12/26(Thu)09:34:56 No.108354211

Anonymous 03/12/26(Thu)09:34:56 No.108354211

>>108354169
>>108354192
UNLIMITED SLOPMAXXXING!!!!!!

Anonymous
03/12/26(Thu)09:35:12 No.108354213

Anonymous 03/12/26(Thu)09:35:12 No.108354213

>>108354189
And that's a good thing!

Anonymous
03/12/26(Thu)09:36:25 No.108354222

Anonymous 03/12/26(Thu)09:36:25 No.108354222

File: professional.jpg (728 KB, 1215x1620)

728 KB JPG

Okay, so I've installed rocm on my debian machine, and ran llama-bench pp32768 tg2048 on my (16gb lol) vram radeon pro v620.

nemo q8:
v620 rocm 7.2:
660.85 ± 1.46 | 29.05 ± 0.01
v620 vulkan:
232.24 ± 0.33 | 31.06 ± 0.04
3090 vulkan:
999.51 ± 13.51 | 37.76 ± 0.45
3090 cuda 12.4:
1937.69 ± 41.80 | 55.12 ± 1.35

gpt-oss, mxfp4 (cpu-moe):
v620 rocm 7.2:
303.64 ± 2.47 | 12.49 ± 1.13
v620 vulkan:
96.75 ± 0.71 | 25.66 ± 0.12
3090 vulkan:
331.24 ± 2.89 | 18.53 ± 0.03
3090 cuda 12.4:
665.36 ± 1.57 | 33.98 ± 0.02

As expected, rocm still wins for prompt processing, but the optimizations llama.cpp have for vulkan means it's better for token generation. 3090 is easily twice as performant as the v620, except for when I ran oss on cpu with vulkan, where the token generation was actually worse than the v620. Maybe it's something to do with my cpu/ram.

If we take the best case scenario for each gpu, for prompt processing a 3090 is nearly 3 times faster than a v620, and token generation is just a bit under twice as fast. However, in Australia at least, v620s are ~$700, while 3090s are ~$1.5k+. V620s also provide 32gb of vram (with ecc disabled, which also helps a bit in pp and tg: +5 pp, lmao, and +4 tg for rocm haven't tried vulkan) and only take up 2 slots. Might be better to get 5 v620s and run iq4xs minimax or q2 glm 4.7 instead of buying two 3090s and only being able to run heavily quanted sub 100b moes.

The best thing is, you don't even need a fan adapter like the mi50s, just strap a 40mm to the metal handle and the temps stay below 65 under load.

Anonymous
03/12/26(Thu)09:37:03 No.108354224

Anonymous 03/12/26(Thu)09:37:03 No.108354224

>>108354189
theres some attempts being made to try and give them some kind of caching with the save states thing but yeah it's sucky

Anonymous
03/12/26(Thu)09:37:43 No.108354231

Anonymous 03/12/26(Thu)09:37:43 No.108354231

File: healer-alpha_mesugaki.png (616 KB, 1485x1772)

616 KB PNG

>>108353956
Healer alpha (picrel)

Anonymous
03/12/26(Thu)09:38:44 No.108354237

Anonymous 03/12/26(Thu)09:38:44 No.108354237

>>108354231
It's not deepseek and it's not gemma. That's for sure.

Anonymous
03/12/26(Thu)09:38:56 No.108354242

Anonymous 03/12/26(Thu)09:38:56 No.108354242

>>108354231
lol it's fucked

Anonymous
03/12/26(Thu)09:39:13 No.108354244

Anonymous 03/12/26(Thu)09:39:13 No.108354244

>>108354231
>kaki
fantastic, ready to ship to the moon and use for iran missiles

Anonymous
03/12/26(Thu)09:40:37 No.108354252

Anonymous 03/12/26(Thu)09:40:37 No.108354252

File: 1753485359201507.png (2.37 MB, 1280x964)

2.37 MB PNG

>>108354222
very nice anon and the digits agree
although you should be able to cut a hole in the shroud and mount a blower fan if you want
regardless i am glad you are happy with your purchase

Anonymous
03/12/26(Thu)09:41:58 No.108354262

Anonymous 03/12/26(Thu)09:41:58 No.108354262

>>108354237
Process of elimination, it's gotta be llama5. Only Meta could make a model so stupid.

Anonymous
03/12/26(Thu)09:43:18 No.108354273

Anonymous 03/12/26(Thu)09:43:18 No.108354273

>>108354262
you might be onto something, original llama3 always had trouble with Japanese in my tests

Anonymous
03/12/26(Thu)09:45:45 No.108354289

Anonymous 03/12/26(Thu)09:45:45 No.108354289

>>108354252
Mi50s seem to be around 450-500 for me. Could be a cheaper source of vram, but I'm worried about the performance - the v620 is already pretty bad compared to a 3090.

I'll wait until I get my other v620s before taking apart my only working one, but that could be a good idea.

Anonymous
03/12/26(Thu)09:46:47 No.108354291

Anonymous 03/12/26(Thu)09:46:47 No.108354291

File: Screenshot 2026-03-12 at (...).png (305 KB, 1289x2164)

305 KB PNG

Yo?

Anonymous
03/12/26(Thu)09:46:57 No.108354293

Anonymous 03/12/26(Thu)09:46:57 No.108354293

>>108354289
v620's basically rx6800 mi50 is basically vega64 so it will be much worse

Anonymous
03/12/26(Thu)09:48:03 No.108354302

Anonymous 03/12/26(Thu)09:48:03 No.108354302

>>108354039
disgusting mikutroon

Anonymous
03/12/26(Thu)09:48:10 No.108354304

Anonymous 03/12/26(Thu)09:48:10 No.108354304

>>108354237
The most believable theory I've seen is it's a Xiaomi model, because it often claims to be MiMo when asked and that can't be from distillation because who the fuck is distilling Xiaomi MiMo

Anonymous
03/12/26(Thu)09:49:29 No.108354315

Anonymous 03/12/26(Thu)09:49:29 No.108354315

>>108354224
There is no bypassing the no context shifting support. It's an architecture limitation. You trade context shifting for cheaper/lighter longer context.

Anonymous
03/12/26(Thu)09:50:06 No.108354319

Anonymous 03/12/26(Thu)09:50:06 No.108354319

ide that takes llama.cpp as a provider for source code navigation or any simple desktop automation?
openclaw seems like a disaster so i want to avoid that

Anonymous
03/12/26(Thu)09:50:58 No.108354326

Anonymous 03/12/26(Thu)09:50:58 No.108354326

>>108354319
anything that supports openai api

Anonymous
03/12/26(Thu)09:52:08 No.108354337

Anonymous 03/12/26(Thu)09:52:08 No.108354337

>>108354326
there are gorillion openclaw clones or vscode forks that i am not sure of what will 'last'

Anonymous
03/12/26(Thu)09:53:17 No.108354344

Anonymous 03/12/26(Thu)09:53:17 No.108354344

>>108354289
i don't really think your performance was all that bad and count yourself lucky i just ordered two mi25 because they are cheap and that is my budget and they are ancient.
but 32gb of vram is worth it and as long as its a mixture of experts model i have found it will usually be fast enough given you are only using a portion of the parameters at any given time.

Anonymous
03/12/26(Thu)09:53:18 No.108354345

Anonymous 03/12/26(Thu)09:53:18 No.108354345

>>108354319
I see some software called Opencode being mentioned a lot in the llama.cpp PRs. Maybe give that a look.

Anonymous
03/12/26(Thu)09:54:49 No.108354354

Anonymous 03/12/26(Thu)09:54:49 No.108354354

>>108354304
Could be. They just updated their December 300B Flash repo couple weeks ago. Could be getting ready to drop the 1T non-flash. Would make Healer the multimodal Flash.

Anonymous
03/12/26(Thu)09:55:52 No.108354359

Anonymous 03/12/26(Thu)09:55:52 No.108354359

>deepsneed
Who cares. It won't run on consumer hardware anyway. Where's Gemma 4?

Anonymous
03/12/26(Thu)09:56:42 No.108354364

Anonymous 03/12/26(Thu)09:56:42 No.108354364

>>108354359
it runs on consumer hardware you just havent consooooooomed enough

Anonymous
03/12/26(Thu)09:57:37 No.108354369

Anonymous 03/12/26(Thu)09:57:37 No.108354369

>>108353896
>>108354014
Have you asked it what it thinks about Taiwanese independence or why the CCP has a right to rule without a general election. Stuff like that.

Anonymous
03/12/26(Thu)09:57:57 No.108354372

Anonymous 03/12/26(Thu)09:57:57 No.108354372

>>108354364
base truth the more you consume the more you save

Anonymous
03/12/26(Thu)09:58:02 No.108354373

Anonymous 03/12/26(Thu)09:58:02 No.108354373

>>108354291
Vibecoded. If not ngxson, cudadev is gonna rip him a new asshole. I'd wait for cudadev's training implementation.

Anonymous
03/12/26(Thu)09:58:04 No.108354374

Anonymous 03/12/26(Thu)09:58:04 No.108354374

>>108354359
Apparently gemma4 will be moe too, 100B moe isn't runnable on consumer hardware nowadays with the ram prices.

Anonymous
03/12/26(Thu)09:58:58 No.108354380

Anonymous 03/12/26(Thu)09:58:58 No.108354380

>>108354374
there'll be smaller sizes for phones tho

Anonymous
03/12/26(Thu)09:59:40 No.108354384

Anonymous 03/12/26(Thu)09:59:40 No.108354384

>>108354319
Pretty sure there are like half a dozen OpenClaw clones at this point if you need desktop automation.

Anonymous
03/12/26(Thu)10:01:29 No.108354394

Anonymous 03/12/26(Thu)10:01:29 No.108354394

>>108354380
I think they will still release a 27B, but my hopes are really low after gemma 3.

Anonymous
03/12/26(Thu)10:02:42 No.108354402

Anonymous 03/12/26(Thu)10:02:42 No.108354402

100b dense. My bwps are ready.

Anonymous
03/12/26(Thu)10:05:15 No.108354418

Anonymous 03/12/26(Thu)10:05:15 No.108354418

I will be flabbergasted if google releases a moe model. They always refused to release a useful gemma. Their context is also always crippled (3 claims 128K but the practical context length doesn't go beyond 4k even for a task like summarization. They have nice writing styles but compared to Qwen they suck as tools)

Anonymous
03/12/26(Thu)10:06:15 No.108354426

Anonymous 03/12/26(Thu)10:06:15 No.108354426

>>108354374
I'd love something around the size of GPT OSS. 100B~ish with A5B~ish so that I could run it aq 5ish bpw.
On my slow ass 64GB of DDR5 is should be 15 or so t/s, which is in the realm of usable as long as the output is really good.
But that would be the ideal scenario for me, for the hardware I have now and the speeds I find tolerable.

Anonymous
03/12/26(Thu)10:07:21 No.108354433

Anonymous 03/12/26(Thu)10:07:21 No.108354433

>>108354418
>they suck as tools
isn't that by design. i assume they want you to pay to use their cloud service

the chinese on the other hand don't want you to use western technological solutions and therefore it benefits them to release something that works if it will keep you away from the big US providers

Anonymous
03/12/26(Thu)10:08:20 No.108354439

Anonymous 03/12/26(Thu)10:08:20 No.108354439

File: what the fuck.png (203 KB, 837x827)

203 KB PNG

Does Impish Nemo have a cucking fetish? There's nothing in my character card, system prompt, or context that has anything to do with this bullshit. This gen actually made me seethe.

Anonymous
03/12/26(Thu)10:09:10 No.108354447

Anonymous 03/12/26(Thu)10:09:10 No.108354447

>>108354426
5B active is going to be retarded and not much better than 3B active. There is a reason why glm has more than 10B active.

Anonymous
03/12/26(Thu)10:09:21 No.108354449

Anonymous 03/12/26(Thu)10:09:21 No.108354449

>>108354426
>slow ass 64GB of DDR5
How slow can I expect it to run on my 64gb ddr4-2133?

Anonymous
03/12/26(Thu)10:10:06 No.108354454

Anonymous 03/12/26(Thu)10:10:06 No.108354454

>>108354426
>with A5B~ish
as said before rumors are of 120b/15A

Anonymous
03/12/26(Thu)10:10:37 No.108354460

Anonymous 03/12/26(Thu)10:10:37 No.108354460

>>108354439
LMFAO, what the hell. Did sicarius secretly train it on cuckhold data?

Anonymous
03/12/26(Thu)10:11:21 No.108354462

Anonymous 03/12/26(Thu)10:11:21 No.108354462

>>108354426
You want the Qwen 3.5 122B A10B. I can eek out like 6-8tk/s running cpu, with a rtx 3080 doing the prompt processing.
although usually i run the 35B A3B on my other rig because its faster and its output is usually good enough

Anonymous
03/12/26(Thu)10:11:29 No.108354463

Anonymous 03/12/26(Thu)10:11:29 No.108354463

>>108354460
trained on hebrew so it makes sense?

Anonymous
03/12/26(Thu)10:13:30 No.108354475

Anonymous 03/12/26(Thu)10:13:30 No.108354475

>>108354449
Oh boy. Those numbers are on DDR5 4800MTs. That would be less than half the bandwidth, I think, so half the t/s?
For comparison's sake, I get 22t/s on Qwen 3.5 A35B at 8kish context.

>>108354462
>You want the Qwen 3.5 122B A10B
Tried it, didn't think the output warranted the slot t/s for what I'm using it for. 35B (base) is the best quality/performance for my shit so far.

Anonymous
03/12/26(Thu)10:14:28 No.108354486

Anonymous 03/12/26(Thu)10:14:28 No.108354486

>>108354439
Ani is just cuckcoded.

Anonymous
03/12/26(Thu)10:15:01 No.108354489

Anonymous 03/12/26(Thu)10:15:01 No.108354489

>>108354463
What does hebrew have to do with cucking? did i miss a part of history or something?

Anonymous
03/12/26(Thu)10:16:00 No.108354497

Anonymous 03/12/26(Thu)10:16:00 No.108354497

>>108354489
Look up who owns "BLACKED"

Anonymous
03/12/26(Thu)10:16:45 No.108354503

Anonymous 03/12/26(Thu)10:16:45 No.108354503

>>108354439
kek

Anonymous
03/12/26(Thu)10:19:07 No.108354519

Anonymous 03/12/26(Thu)10:19:07 No.108354519

what is super mesugaki

Anonymous
03/12/26(Thu)10:20:43 No.108354529

Anonymous 03/12/26(Thu)10:20:43 No.108354529

>moe
Explain what this is and why I, as a poorfag, should care. All moe means to me is kawaii ugu anime girl.

Anonymous
03/12/26(Thu)10:21:43 No.108354537

Anonymous 03/12/26(Thu)10:21:43 No.108354537

>>108354529
you have google, you're not entitled to anon time

Anonymous
03/12/26(Thu)10:26:52 No.108354573

Anonymous 03/12/26(Thu)10:26:52 No.108354573

>>108354529
It means that it doesn't use all the total parameters at once. For example qwen 3 uses only 3B parameters out of 30B total for each token. You trade intelligence for speed and lower vram usage.

Anonymous
03/12/26(Thu)10:27:36 No.108354579

Anonymous 03/12/26(Thu)10:27:36 No.108354579

>>108354439
>evil finetune is evil

Anonymous
03/12/26(Thu)10:28:33 No.108354584

Anonymous 03/12/26(Thu)10:28:33 No.108354584

>>108354537
You owe me time (and sex).

Anonymous
03/12/26(Thu)10:29:23 No.108354593

Anonymous 03/12/26(Thu)10:29:23 No.108354593

>>108354584
I can't help with that.assistant

Anonymous
03/12/26(Thu)10:42:43 No.108354686

Anonymous 03/12/26(Thu)10:42:43 No.108354686

>>108352458
>I've written ports for TTS engines.
Which ones and to what?

Anonymous
03/12/26(Thu)10:45:15 No.108354704

Anonymous 03/12/26(Thu)10:45:15 No.108354704

>>108354686
NVM, I'm retarded, you already answered this. PocketTTS.cpp only doesn't work on Wangblows due to the POSIX headers dependency, by the way.

Anonymous
03/12/26(Thu)10:47:03 No.108354716

Anonymous 03/12/26(Thu)10:47:03 No.108354716

>>108354704
Ah, wasn't aware of that. Thanks for telling me..

Anonymous
03/12/26(Thu)10:48:14 No.108354722

Anonymous 03/12/26(Thu)10:48:14 No.108354722

File: 1745565170751864.png (920 KB, 1113x1519)

920 KB PNG

How do I stop qwen 3.5 from leaking its thoughts into the final message?

Anonymous
03/12/26(Thu)10:51:16 No.108354738

Anonymous 03/12/26(Thu)10:51:16 No.108354738

>>108354722
Add
<think>
<\think> to the assistant message prefix

Anonymous
03/12/26(Thu)10:59:02 No.108354784

Anonymous 03/12/26(Thu)10:59:02 No.108354784

>>108354704
Sounds like microslop's problem not Anon's.

Anonymous
03/12/26(Thu)11:00:43 No.108354798

Anonymous 03/12/26(Thu)11:00:43 No.108354798

i am torn between qwen3.5 9b or 35b-a3b for boilerplate work

Anonymous
03/12/26(Thu)11:04:32 No.108354823

Anonymous 03/12/26(Thu)11:04:32 No.108354823

>>108354722
use the chat completions endpoint and bypass the entirety of retardotavern's own template parsing
by default reasoning is sent in its own prop and is not part of the assistant message that way
also what are those schizo post history instructions you're giving to a model that naturally uses <think>, is that a retardotavern default? or did you write the schizo instructions yourself?
every time I see yall post screenshot of this pos I have ptsd throwbacks to llama 1 era where some of that schizo templating was necessary to deal with 2k context models
also makes me wonder, when people bitch and whine about X or Y model sucking, are they a retardotavern user filled with random crap settings?

Anonymous
03/12/26(Thu)11:07:22 No.108354835

Anonymous 03/12/26(Thu)11:07:22 No.108354835

>>108354798
Does it really matter? I'm curious: what real work? At this point you can already paste email templates without the help of an AI I suppose..

Anonymous
03/12/26(Thu)11:08:26 No.108354846

Anonymous 03/12/26(Thu)11:08:26 No.108354846

>>108354798
35b a3b has been my goto model since release and i have been very happy although i do keep the 2B model running on my nas when i need a quick translation or have to ask a stupid question and i don't want to turn on my main rig.
i thought about running the 4B or 9B model for that, i have enough ram in the machine, but they was just too slow without a gpu

Anonymous
03/12/26(Thu)11:08:32 No.108354847

Anonymous 03/12/26(Thu)11:08:32 No.108354847

>>108354835
not email templates, i mean random cpp plumbings

Anonymous
03/12/26(Thu)11:16:08 No.108354898

Anonymous 03/12/26(Thu)11:16:08 No.108354898

File: dumbfuck.png (49 KB, 1251x280)

49 KB PNG

https://github.com/ggml-org/llama.cpp/issues/20458
he can't go a minute without saying or doing dumb things
there's a reason why the issue reporter suggested off should send "low": toss doesn't have a none/off mode, however, low makes it output almost nothing and act like an instruct model (it just outputs a oneliner "I will do X. in its reasoner block"), it's a model overfit to death to its template and it doesn't like any deviation from what it expects.
"none" was introduced in the official API on GPT 5.2, which as far as I know, is not a llama.cpp model.

Anonymous
03/12/26(Thu)11:16:22 No.108354901

Anonymous 03/12/26(Thu)11:16:22 No.108354901

I made my own openclaw, what are the odds I will be raped?

Anonymous
03/12/26(Thu)11:18:24 No.108354916

Anonymous 03/12/26(Thu)11:18:24 No.108354916

>>108354901
-100%

Anonymous
03/12/26(Thu)11:18:27 No.108354917

Anonymous 03/12/26(Thu)11:18:27 No.108354917

>>108354846
What questions a 2B model will answer and what sort of translations, which language? Very curious about this shilling effort.

Anonymous
03/12/26(Thu)11:18:51 No.108354918

Anonymous 03/12/26(Thu)11:18:51 No.108354918

>>108354039
wtf is that anatomy

Anonymous
03/12/26(Thu)11:20:01 No.108354926

Anonymous 03/12/26(Thu)11:20:01 No.108354926

So someone did the calculations and buying 4 sparks to run deepseek v4 + some helper llms not only performs better than Opus 4.6, it returns on investment after 2 years.

Anonymous
03/12/26(Thu)11:20:44 No.108354931

Anonymous 03/12/26(Thu)11:20:44 No.108354931

>>108354722
i switched to ungabunga from trannytavern because of this retardation, so much better

Anonymous
03/12/26(Thu)11:21:03 No.108354934

Anonymous 03/12/26(Thu)11:21:03 No.108354934

>>108354898
GPT-Ass is useless in every conceivable way.

Anonymous
03/12/26(Thu)11:21:45 No.108354938

Anonymous 03/12/26(Thu)11:21:45 No.108354938

>>108354926
>returns on investment after 2 years.
>llms
geg

Anonymous
03/12/26(Thu)11:23:22 No.108354948

Anonymous 03/12/26(Thu)11:23:22 No.108354948

>>108354934
fair enough, but that has nothing to do with the fact that vibershitter doesn't know how to read, guess reading is for LLMs

Anonymous
03/12/26(Thu)11:26:18 No.108354963

Anonymous 03/12/26(Thu)11:26:18 No.108354963

>>108354948
Why don't you take your complains to the github thread instead of crying about it in here? This ain't your social media, faggot.

Anonymous
03/12/26(Thu)11:27:48 No.108354974

Anonymous 03/12/26(Thu)11:27:48 No.108354974

>>108354963
Since when is discussion about llama.cpp not allowed here?

Anonymous
03/12/26(Thu)11:29:38 No.108354989

Anonymous 03/12/26(Thu)11:29:38 No.108354989

>>108354963
you will not be able to unsubscribe the wilkin newsletter just as you were not able to unsub from the jart one, deal with it
>>108354974
discord troons hate negative feelies

Anonymous
03/12/26(Thu)11:35:38 No.108355022

Anonymous 03/12/26(Thu)11:35:38 No.108355022

I think very interesting thing will happen in the future when local models will be able to do 90% of everything you will ever need and there will be no point to use paid subscription service like ChatGPT and all those trillions GPUs and RAM they bought will become mostly useless. However corporation should already know this so they may hinder the progress in some way.

Anonymous
03/12/26(Thu)11:36:36 No.108355027

Anonymous 03/12/26(Thu)11:36:36 No.108355027

is censored is the new nemotron super model? seems to pass the cockbench?

Anonymous
03/12/26(Thu)11:36:47 No.108355030

Anonymous 03/12/26(Thu)11:36:47 No.108355030

>>108354974
Discussion? More like inane ramblings of no use.
>>108354989
Fuck off.

Anonymous
03/12/26(Thu)11:37:09 No.108355033

Anonymous 03/12/26(Thu)11:37:09 No.108355033

>>108354439
>There's nothing in my character card, system prompt, or context that has anything to do with this bullshit.
anon you are using a character that approximately one billion jeets fuck every single day

Anonymous
03/12/26(Thu)11:37:49 No.108355035

Anonymous 03/12/26(Thu)11:37:49 No.108355035

>>108355027
https://www.reddit.com/r/LocalLLaMA/comments/1rri4qb/nemotron_3_super_and_the_no_free_lunch_problem/

Anonymous
03/12/26(Thu)11:37:59 No.108355038

Anonymous 03/12/26(Thu)11:37:59 No.108355038

It is scary how much a garbage OP picture correlates and causes a garbage thread to happen. If mikutroon baker died /lmg/ would be an incredible thread.

Anonymous
03/12/26(Thu)11:39:03 No.108355048

Anonymous 03/12/26(Thu)11:39:03 No.108355048

>>108354073
>half the text is preaching about it to the user
man that's sad

Anonymous
03/12/26(Thu)11:39:41 No.108355051

Anonymous 03/12/26(Thu)11:39:41 No.108355051

>>108355038
>>108353346
>>108353346

Anonymous
03/12/26(Thu)11:40:51 No.108355058

Anonymous 03/12/26(Thu)11:40:51 No.108355058

>>108355051
She fucks blacks

Anonymous
03/12/26(Thu)11:41:26 No.108355063

Anonymous 03/12/26(Thu)11:41:26 No.108355063

>>108355030
No you, troonie. I can smell the hurt from your gaping wound aeons away.

Anonymous
03/12/26(Thu)11:41:50 No.108355066

Anonymous 03/12/26(Thu)11:41:50 No.108355066

>>108355058
So? It's the 13th century women have rights over their bodies.

Anonymous
03/12/26(Thu)11:42:46 No.108355074

Anonymous 03/12/26(Thu)11:42:46 No.108355074

>>108354073
>Problematic
If model uses this word unprompted you know it is unusable.

Anonymous
03/12/26(Thu)11:43:21 No.108355075

Anonymous 03/12/26(Thu)11:43:21 No.108355075

>>108353798
>It was like 650k the other day... :D
So we're not the only ones getting flooded

Anonymous
03/12/26(Thu)11:43:36 No.108355079

Anonymous 03/12/26(Thu)11:43:36 No.108355079

>>108355063
>aeons away
I'm all about hating on piotr, but come on...

Anonymous
03/12/26(Thu)11:44:15 No.108355084

Anonymous 03/12/26(Thu)11:44:15 No.108355084

>>108340080
The endpoint is Linux only, and I'm a mustdie pleb. Had to comment the POSIX section but still didn't manage to compile it.
And I see you made changes to support Win 3 minutes ago lol. Will try again tomorrow.

Anonymous
03/12/26(Thu)11:44:17 No.108355085

Anonymous 03/12/26(Thu)11:44:17 No.108355085

>>108354704
Okay, try it again. Let me know if it works or not.

https://github.com/VolgaGerm/PocketTTS.cpp

Anonymous
03/12/26(Thu)11:45:07 No.108355090

Anonymous 03/12/26(Thu)11:45:07 No.108355090

>aeons away
Sounds like it would make a great new ozone.

Anonymous
03/12/26(Thu)11:45:39 No.108355096

Anonymous 03/12/26(Thu)11:45:39 No.108355096

>>108355035
> Exactly this. I’m delighted for this model because I can present it as a viable option to my more risk-averse customers. The fact that it won’t do ERP or make Pepe dance is a feature for some people, not a bug. We have other models for that shit.

Anonymous
03/12/26(Thu)11:47:11 No.108355110

Anonymous 03/12/26(Thu)11:47:11 No.108355110

>>108354222
>If we take the best case scenario for each gpu, for prompt processing a 3090 is nearly 3 times faster than a v620, and token generation is just a bit under twice as fast. However, in Australia at least, v620s are ~$700, while 3090s are ~$1.5k+.
That made me check on ebay for the local prices of 3090FE, and they got up from 650 last year to 850-900€. Prices for used are insane, I'm almost tempted to sell mine.

Anonymous
03/12/26(Thu)11:47:20 No.108355111

Anonymous 03/12/26(Thu)11:47:20 No.108355111

File: globrel.png (115 KB, 1153x1152)

115 KB PNG

>>108355048
What gets me is "if you see it online, consider reporting it", putting aside its made-up definition of CSAM. But then, I should have seen it coming, considering that in addition of shitty open source datasets, they're also adding private bullshit in the data.

Anonymous
03/12/26(Thu)11:48:35 No.108355118

Anonymous 03/12/26(Thu)11:48:35 No.108355118

>>108355111
>Scale
ahh

Anonymous
03/12/26(Thu)11:50:33 No.108355128

Anonymous 03/12/26(Thu)11:50:33 No.108355128

>>108355075
the more accessible something is the more jeets will abuse it
agentic LLMs are going to be a disaster for the internet because a segment of the population can't stop themselves from pressing the "spam every single corner with garbage in the hope of fishing for one retard who bites"
unfortunately safety was never taken seriously by those who proclaimed to when they unleashed this technology on the general public. Nobody should be worried about LLMs suddenly turning into terminators, what is worrying is what the low iq crowds are going to do with this ability booster

Anonymous
03/12/26(Thu)11:50:37 No.108355129

Anonymous 03/12/26(Thu)11:50:37 No.108355129

>>108355110
V100 prices have been steadily going down, at least.

Anonymous
03/12/26(Thu)11:51:06 No.108355133

Anonymous 03/12/26(Thu)11:51:06 No.108355133

>>108353602
i've said it before and i'll say it again. 7 to 10tk/s TG is perfectly reasonable for RPing.

Anonymous
03/12/26(Thu)11:53:02 No.108355146

Anonymous 03/12/26(Thu)11:53:02 No.108355146

>>108355133
>7 to 10tk/s TG
on a reasoner model?

Anonymous
03/12/26(Thu)11:55:09 No.108355165

Anonymous 03/12/26(Thu)11:55:09 No.108355165

>>108354823
I'll give chat completion a try. I've only been using text because I saw people say it's better. The instructions were from either gemini or chatgpt, don't remember which.

Anonymous
03/12/26(Thu)11:55:36 No.108355166

Anonymous 03/12/26(Thu)11:55:36 No.108355166

File: file.png (31 KB, 1463x38)

31 KB PNG

>>108355111
The whole thing is insane, this is probably the future of LLMs, each request gets you a giant warning label on why what you asked can be problematic or whatever.
For me the funniest part is picrel.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.