/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 12/15/25(Mon)06:20:30 No.107557369

File: noble meeku.png (2.16 MB, 768x1344)

2.16 MB PNG

/lmg/ - Local Models General Anonymous 12/15/25(Mon)06:20:30 No.107557369

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107545298 & >>107535410

►News
>(12/10) GLM-TTS with streaming, voice cloning, and emotion control: https://github.com/zai-org/GLM-TTS
>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli
>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v
>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/15/25(Mon)06:21:01 No.107557373

Anonymous 12/15/25(Mon)06:21:01 No.107557373

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>107545298

--Cost-performance challenges in optimizing K2 models with limited GPU memory:
>107552388 >107552493 >107552518 >107552577 >107552593 >107552650
--Quantization vs model size performance tradeoffs:
>107550012 >107552809 >107552934 >107553336 >107552989 >107553444 >107553425
--Optimizing local AI models for Unreal Engine C++ development:
>107554300 >107554362 >107554461 >107554482 >107554686 >107554731 >107554743
--Prototype speculative decoding methods in llama.cpp lacking server integration:
>107551899 >107552450
--Challenges and considerations in distilling and fine-tuning advanced models:
>107548258 >107548358 >107548387 >107548382 >107548399 >107548441 >107548512 >107548619 >107548693 >107548928 >107552056 >107548781 >107548665
--Comparing safety and filtering of GPT-oss 20b vs Gemma models:
>107546443 >107546488 >107546704 >107546718 >107546734
--ExL3 lacks Kimi-K2 support:
>107550440 >107550450 >107550517 >107550548 >107550553 >107550601 >107550629
--Roleplay model performance tradeoffs: 4.5 Air vs GPT-OSS-120B vs Qwen Next 80B:
>107551643 >107551662 >107551678 >107551721 >107552290 >107552464 >107552586 >107552490 >107552515
--ikllama Windows performance issues likely due to flash attention implementation:
>107549210 >107552291 >107552912
--Token banning compatibility issues between roleplay AI backends:
>107550863 >107550873 >107550885 >107550914 >107550969 >107551045 >107551472
--NVIDIA RTX PRO 6000 GPU configuration and power management issues:
>107545503 >107545537 >107545530 >107545636 >107553858
--Comparing censorship in GPT-OSS-120B vs unrestricted models like GLM Air:
>107546681 >107548705 >107549905
--Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs:
>107546364 >107546435
--Miku (free space):
>107545415 >107547832 >107548687 >107550440

►Recent Highlight Posts from the Previous Thread: >>107545300

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/15/25(Mon)06:26:41 No.107557405

Anonymous 12/15/25(Mon)06:26:41 No.107557405

miku a shit

Anonymous
12/15/25(Mon)06:29:25 No.107557416

Anonymous 12/15/25(Mon)06:29:25 No.107557416

Unbelievably based developments, llamabro.

Anonymous
12/15/25(Mon)06:29:28 No.107557417

Anonymous 12/15/25(Mon)06:29:28 No.107557417

miku a love

Anonymous
12/15/25(Mon)06:30:15 No.107557425

Anonymous 12/15/25(Mon)06:30:15 No.107557425

Gemma soon

Anonymous
12/15/25(Mon)06:36:58 No.107557453

Anonymous 12/15/25(Mon)06:36:58 No.107557453

No offense cuda dev but I didn't need settings automation, I need proper numa TP because ram prices are jacked. IK is rolling up and smoking exllama right now.

Anonymous
12/15/25(Mon)06:38:17 No.107557458

Anonymous 12/15/25(Mon)06:38:17 No.107557458

Gemma 3 27B is the only stable, non-schizo model in the sub $2k runnable hardware range, GLM-4.5 Air is too schizo and often makes 7B tier mistakes. So I'll be looking forward to Gemma 4.

Anonymous
12/15/25(Mon)06:48:01 No.107557523

Anonymous 12/15/25(Mon)06:48:01 No.107557523

>>107557458
Mogged by Mistral Small

Anonymous
12/15/25(Mon)06:53:53 No.107557568

Anonymous 12/15/25(Mon)06:53:53 No.107557568

>>107557523
I don't think so, but if they finetuned it like Ministral 3 14B (without the bad quirks) there might be some chance. Vision would still lose bigly, though.

Anonymous
12/15/25(Mon)06:54:59 No.107557577

Anonymous 12/15/25(Mon)06:54:59 No.107557577

File: google-hf.png (59 KB, 592x460)

59 KB PNG

>>107557425
Context in picrel.
https://x.com/osanseviero/status/2000493503860892049

Anonymous
12/15/25(Mon)06:56:02 No.107557585

Anonymous 12/15/25(Mon)06:56:02 No.107557585

>>107557568
>if they finetuned it like Ministral 3 14B
Ministral is liquid shit though, it's small for megavramlets with copyrighted stuff ripped out of its dataset.

Anonymous
12/15/25(Mon)07:00:27 No.107557612

Anonymous 12/15/25(Mon)07:00:27 No.107557612

>>107557585
The latest Ministral 3 models have unexpectedly nice creativity and writing, but their system instruction-following capabilities are very inconsistent and they have issues with message repetition, so they come off as retarded/broken because of that.

Anonymous
12/15/25(Mon)07:01:19 No.107557618

Anonymous 12/15/25(Mon)07:01:19 No.107557618

>>107557577
WE WILL FINALLY GET NEW SHITTY SYNTHETIC SOTA-SAFE PURPLE PROSE OPTIMIZED MODEL

Anonymous
12/15/25(Mon)07:01:24 No.107557619

Anonymous 12/15/25(Mon)07:01:24 No.107557619

>>107557577
Can't wait to download Google's new... um, you know... their "thing"...

Anonymous
12/15/25(Mon)07:03:43 No.107557633

Anonymous 12/15/25(Mon)07:03:43 No.107557633

for erp, I've only ever run nemo and mistral small. If I buy the hardware for glm air, will my mind be blown or will it disappointing?

Anonymous
12/15/25(Mon)07:05:45 No.107557646

Anonymous 12/15/25(Mon)07:05:45 No.107557646

File: miku daylight waking up s(...).png (1.01 MB, 1200x800)

1.01 MB PNG

►Recent Highlights from the Previous Thread: >>107545298

(2/2)

--llama.cpp updates for efficient GPU settings automation and user configuration debates:
>107556876 >107556898 >107556943 >107557034 >107557060 >107557120 >107557167 >107557163 >107557275
--Text generation parameter debates: temperature, minP, and TopK effectiveness:
>107555084 >107555121 >107555140 >107555175 >107556538 >107556572
--5090 GPU system configuration challenges for Australian buyers:
>107556007 >107556070 >107556107 >107556124 >107556142 >107556143

►Recent Highlight Posts from the Previous Thread: >>107545300

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/15/25(Mon)07:06:50 No.107557654

Anonymous 12/15/25(Mon)07:06:50 No.107557654

>>107557633
nah it's not great

Anonymous
12/15/25(Mon)07:09:19 No.107557675

Anonymous 12/15/25(Mon)07:09:19 No.107557675

>>107557633
If you have complicated scenarios where you want to the model to pick up how character's feel without having to spell it out, air is definitely smarter. But for simple ERP I wouldn't say it's an improvement. It doesn't really write better.

Anonymous
12/15/25(Mon)07:10:54 No.107557683

Anonymous 12/15/25(Mon)07:10:54 No.107557683

>>107557619
I'm looking forward to Gemma 4 providing me with better access.

Anonymous
12/15/25(Mon)07:11:27 No.107557686

Anonymous 12/15/25(Mon)07:11:27 No.107557686

>>107557633
It's a sidegrade. Its prose is a bit nicer but low active params means it will make dumb mistakes more often, and it also frequently parrots user's replies.

Anonymous
12/15/25(Mon)07:12:33 No.107557691

Anonymous 12/15/25(Mon)07:12:33 No.107557691

>>107557654
>>107557675
what about a Q3 of glm 4.5?

Anonymous
12/15/25(Mon)07:12:54 No.107557694

Anonymous 12/15/25(Mon)07:12:54 No.107557694

>>107557633
If you plan to buy hardware to run model X, instead of buying hardware for other things and running model X is a nice side effect, you really ought to rent some cloud hardware to give it a try for a day or two beforehand.

Anonymous
12/15/25(Mon)07:15:33 No.107557704

Anonymous 12/15/25(Mon)07:15:33 No.107557704

>>107557691
Buying new hardware in the hopes of running a cope quant is never a good idea.

Anonymous
12/15/25(Mon)07:15:44 No.107557706

Anonymous 12/15/25(Mon)07:15:44 No.107557706

>>107557633
better hardware just means less forgetfulness and faster tps
the writing quality will be very similar

Anonymous
12/15/25(Mon)07:28:58 No.107557805

Anonymous 12/15/25(Mon)07:28:58 No.107557805

>>107557704
there is glm 4.6 which is better than 4.5, but it's kinda overbaked and lacks knowledge and intelligence. deepseek r1 q2 does feel like an upgrade. but now that ram is five trillion times more expensive idk what people should do

Anonymous
12/15/25(Mon)07:30:26 No.107557816

Anonymous 12/15/25(Mon)07:30:26 No.107557816

>>107557805
Crazy that stacking 3090s are now the 'poorfag' option.

Anonymous
12/15/25(Mon)07:33:59 No.107557832

Anonymous 12/15/25(Mon)07:33:59 No.107557832

>>107557805
nta but which Dipsy is best Dipsy for creative writing?

Anonymous
12/15/25(Mon)07:35:02 No.107557839

Anonymous 12/15/25(Mon)07:35:02 No.107557839

>>107557832
Original R1 is the best for creatively sucking your dick

llama.cpp CUDA dev !!yhbFjk57TDr
12/15/25(Mon)07:44:28 No.107557899

llama.cpp CUDA dev !!yhbFjk57TDr 12/15/25(Mon)07:44:28 No.107557899

>>107557453
That is one of my immediate next priorities and the only reason I didn't do it first is that multiple other people had expressed interest in working on tensor parallelism (and then didn't seliver).
I will not delegate it again and hope to produce a working prototype over the Chistmas break when I will have plenty of time.

Anonymous
12/15/25(Mon)07:54:48 No.107557974

Anonymous 12/15/25(Mon)07:54:48 No.107557974

>>107557816
heh, I stick with what I know. About to buy an 8th 3090. I don't want to deal with different cuda versions, etc

Anonymous
12/15/25(Mon)07:55:08 No.107557977

Anonymous 12/15/25(Mon)07:55:08 No.107557977

File: ram2025.png (1.96 MB, 1520x1024)

1.96 MB PNG

>>107557816
picrel
>>107557832
stellar. no model (i tried) handles unformatted mikupad storywriting better. and yes, original r1

Anonymous
12/15/25(Mon)07:57:14 No.107557995

Anonymous 12/15/25(Mon)07:57:14 No.107557995

>>107557899
>and then didn't seliver

That's why I don't PR features to llama.cpp, I don't want to fuck your project up with features I know I might no maintain for more than a few months.

Luckily Claude is good at handling merges when I fetch upstream.

Anonymous
12/15/25(Mon)08:00:48 No.107558019

Anonymous 12/15/25(Mon)08:00:48 No.107558019

>>107557899
It's like the only thing you can count on is yourself. Always in all ways.

Anonymous
12/15/25(Mon)08:01:41 No.107558025

Anonymous 12/15/25(Mon)08:01:41 No.107558025

>>107557453
Has IK_ done anything relevant in the past few months? I'm still using my version from october for K2/GLM.

Anonymous
12/15/25(Mon)08:02:34 No.107558029

Anonymous 12/15/25(Mon)08:02:34 No.107558029

>>107558025
We have regular tensor parallel now for fully offloaded models and some MoE.

Anonymous
12/15/25(Mon)08:04:05 No.107558035

Anonymous 12/15/25(Mon)08:04:05 No.107558035

>>107558029
I assume but not yet for the basic -ot exps=cpu scenario?

Anonymous
12/15/25(Mon)08:08:37 No.107558063

Anonymous 12/15/25(Mon)08:08:37 No.107558063

>>107558035
your prompt processing will get faster if it's on GPU.

Anonymous
12/15/25(Mon)08:09:43 No.107558070

Anonymous 12/15/25(Mon)08:09:43 No.107558070

File: gemma-4-200b-jagganath-it.jpg (537 KB, 1024x1024)

537 KB JPG

>>107557577
sirs we are so back

Anonymous
12/15/25(Mon)08:10:55 No.107558080

Anonymous 12/15/25(Mon)08:10:55 No.107558080

>>107557995
Share your secret stash of patches, you selfish fuck. Maybe some vibecoder can point Claude at your repo and make the PRs you refuse to make.

Anonymous
12/15/25(Mon)08:15:50 No.107558113

Anonymous 12/15/25(Mon)08:15:50 No.107558113

>>107557577
I think we should see related PRs soon in the main backends, but there's nothing yet.

https://github.com/huggingface/transformers/pulls
https://github.com/vllm-project/vllm/pulls
https://github.com/ggml-org/llama.cpp/pulls

Anonymous
12/15/25(Mon)08:16:30 No.107558115

Anonymous 12/15/25(Mon)08:16:30 No.107558115

>>107558113
we are so back
gemma 4 will save us

Anonymous
12/15/25(Mon)08:17:56 No.107558122

Anonymous 12/15/25(Mon)08:17:56 No.107558122

>>107558115
Just like mistral saved us and air saved us?

Anonymous
12/15/25(Mon)08:19:52 No.107558137

Anonymous 12/15/25(Mon)08:19:52 No.107558137

>>107558122
true air has never tried

Anonymous
12/15/25(Mon)08:22:07 No.107558154

Anonymous 12/15/25(Mon)08:22:07 No.107558154

4.6 Air will be released today.

Anonymous
12/15/25(Mon)08:23:02 No.107558156

Anonymous 12/15/25(Mon)08:23:02 No.107558156

4.6 Air will not be released today.

Anonymous
12/15/25(Mon)08:27:22 No.107558175

Anonymous 12/15/25(Mon)08:27:22 No.107558175

>>107558137
What are you breathing?

Anonymous
12/15/25(Mon)08:40:11 No.107558267

Anonymous 12/15/25(Mon)08:40:11 No.107558267

>>107558080
>Share your secret stash of patches, you selfish fuck.

Selfish would be spamming their code base when I know I don't have time to actively maintain it.

My shit is all niche (rpc-server rewrite that requires a copy of the gguf on each node, grpc-server, re-implement training, dodgy xcodec2 implementation, etc) and I don't have the rocm/sycl/metal hardware to test it for all their platforms.

Anonymous
12/15/25(Mon)08:42:13 No.107558278

Anonymous 12/15/25(Mon)08:42:13 No.107558278

Currently unlisted
https://huggingface.co/google/gemma-4-100b-pt
https://huggingface.co/google/gemma-4-100b-pt
https://huggingface.co/google/gemma-4-100b-pt

Anonymous
12/15/25(Mon)08:44:01 No.107558292

Anonymous 12/15/25(Mon)08:44:01 No.107558292

>>107558278
Sorry. I messed up the links
https://huggingface.co/google/gemma-4-100ba10m-pt
https://huggingface.co/google/gemma-4-100ba10m-pt
https://huggingface.co/google/gemma-4-100ba10m-pt

Anonymous
12/15/25(Mon)08:48:11 No.107558310

Anonymous 12/15/25(Mon)08:48:11 No.107558310

>>107558278
>>107558292
jagganath bless. .

Anonymous
12/15/25(Mon)08:50:08 No.107558329

Anonymous 12/15/25(Mon)08:50:08 No.107558329

>>107558292
that would be interesting therefore it wont happen

Anonymous
12/15/25(Mon)08:51:13 No.107558341

Anonymous 12/15/25(Mon)08:51:13 No.107558341

https://huggingface.co/google/gemma-4peepeepoopoo
secret — do not share

Anonymous
12/15/25(Mon)08:52:35 No.107558352

Anonymous 12/15/25(Mon)08:52:35 No.107558352

>>107558341
fuck you racist mc

Anonymous
12/15/25(Mon)08:52:48 No.107558355

Anonymous 12/15/25(Mon)08:52:48 No.107558355

File: 1738083735147213.png (351 KB, 1080x1073)

351 KB PNG

>>107558329

Anonymous
12/15/25(Mon)08:52:52 No.107558357

Anonymous 12/15/25(Mon)08:52:52 No.107558357

>>107558329
I'm just waiting for 10ma100b.

Anonymous
12/15/25(Mon)08:54:41 No.107558370

Anonymous 12/15/25(Mon)08:54:41 No.107558370

>>107558278
-pt means portuguese only, btw. I hope it's not confusing.

Anonymous
12/15/25(Mon)08:57:03 No.107558385

Anonymous 12/15/25(Mon)08:57:03 No.107558385

>>107558357
That's a lot of layer reusing.

Anonymous
12/15/25(Mon)08:57:57 No.107558389

Anonymous 12/15/25(Mon)08:57:57 No.107558389

>>107558385
It's about time somebody seriously explored layer recursion for production LLMs.

Anonymous
12/15/25(Mon)08:58:53 No.107558393

Anonymous 12/15/25(Mon)08:58:53 No.107558393

>>107558385
The intellect of a god, the knowledge of a nematode worm.

Anonymous
12/15/25(Mon)09:03:04 No.107558423

Anonymous 12/15/25(Mon)09:03:04 No.107558423

>>107554263
> tl;dr open shorts with leverage, right?
I'm not a fan of any financial instrument that can lose you more than your investment.
If you know how to use shorts and are comfortable with them, great. But those mean you have to have the timing exactly right.
If you're the one writing the laws or cutting the big checks, or know those who do, you can get that timing exactly right. Everyone else is guessing.

Anonymous
12/15/25(Mon)09:13:56 No.107558505

Anonymous 12/15/25(Mon)09:13:56 No.107558505

>>107558029
so it supports proper parallel requests? like vllm?

Anonymous
12/15/25(Mon)09:21:32 No.107558550

Anonymous 12/15/25(Mon)09:21:32 No.107558550

>>107558505
Yes, but performance is more like exllamav2 than vllm. 25 t/s llama-3-70b on 3x3090.

Anonymous
12/15/25(Mon)09:23:10 No.107558555

Anonymous 12/15/25(Mon)09:23:10 No.107558555

>>107557577
Bharat class gemma 3 superfinetune will do the needful.
I am of refreshing page

Anonymous
12/15/25(Mon)09:24:13 No.107558565

Anonymous 12/15/25(Mon)09:24:13 No.107558565

Bad timing

https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
https://github.com/ggml-org/llama.cpp/pull/18058
https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-techniques-tools-and-data-that-make-it-efficient-and-accurate/
https://huggingface.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models

Anonymous
12/15/25(Mon)09:25:05 No.107558570

Anonymous 12/15/25(Mon)09:25:05 No.107558570

>>107558565
we are so back

Anonymous
12/15/25(Mon)09:25:58 No.107558574

Anonymous 12/15/25(Mon)09:25:58 No.107558574

>>107558565
a whole pile of stinky vramlet shit

Anonymous
12/15/25(Mon)09:26:42 No.107558579

Anonymous 12/15/25(Mon)09:26:42 No.107558579

File: 1752263899934131.jpg (349 KB, 1920x1080)

349 KB JPG

>>107558565
>math and code benchmax dataset tune of a math and code benchmax model

Anonymous
12/15/25(Mon)09:27:10 No.107558583

Anonymous 12/15/25(Mon)09:27:10 No.107558583

File: 1742157675597120.png (125 KB, 923x482)

125 KB PNG

>>107558565
main advertising point is the speed cope (it's as smart as oss-20b on [hand-picked benchmark]
maybe the mamba hybrid jamba wambo thing is interesting but I have no hope

Anonymous
12/15/25(Mon)09:27:13 No.107558584

Anonymous 12/15/25(Mon)09:27:13 No.107558584

>>107558565
Bloody Vishnu... not Nemotron. This is bollocks.

Anonymous
12/15/25(Mon)09:32:10 No.107558624

Anonymous 12/15/25(Mon)09:32:10 No.107558624

>>107558583
artificial anal cysts

Anonymous
12/15/25(Mon)09:35:23 No.107558655

Anonymous 12/15/25(Mon)09:35:23 No.107558655

>>107558550
i just need ik to properly support tool calling to be usable for true local agentic coding so we can plug it into Opencode, roocode...

Anonymous
12/15/25(Mon)09:38:36 No.107558681

Anonymous 12/15/25(Mon)09:38:36 No.107558681

>>107557633
you're better off running 70Bs

Anonymous
12/15/25(Mon)09:39:09 No.107558685

Anonymous 12/15/25(Mon)09:39:09 No.107558685

>>107558583
>maybe the mamba hybrid jamba wambo thing is interesting
llama.cpp support ETA: half past never

Anonymous
12/15/25(Mon)09:39:40 No.107558693

Anonymous 12/15/25(Mon)09:39:40 No.107558693

File: wait.png (19 KB, 912x103)

19 KB PNG

>>107558565

Anonymous
12/15/25(Mon)09:39:59 No.107558701

Anonymous 12/15/25(Mon)09:39:59 No.107558701

>>107558685
llama : add support for NVIDIA Nemotron 3 Nano #18058
https://github.com/ggml-org/llama.cpp/pull/18058

Anonymous
12/15/25(Mon)09:41:10 No.107558713

Anonymous 12/15/25(Mon)09:41:10 No.107558713

>>107558693
uh-oh, stinky!

Anonymous
12/15/25(Mon)09:42:23 No.107558727

Anonymous 12/15/25(Mon)09:42:23 No.107558727

>>107558565
interesting
>Nemotron 3 Super and Ultra introduce latent MoE, where experts operate on a shared latent representation before outputs are projected back to token space. This approach allows the model to call on 4x more experts at the same inference cost, enabling better specialization around subtle semantic structures, domain abstractions, or multi-hop reasoning patterns.

Anonymous
12/15/25(Mon)09:43:42 No.107558739

Anonymous 12/15/25(Mon)09:43:42 No.107558739

>>107558727
too bad it's fucking shit

Anonymous
12/15/25(Mon)09:45:29 No.107558760

Anonymous 12/15/25(Mon)09:45:29 No.107558760

>>107558727
Maybe they should take their cutting edge technologies and apply it to a model that wasn't already garbage to begin with

Anonymous
12/15/25(Mon)09:46:55 No.107558776

Anonymous 12/15/25(Mon)09:46:55 No.107558776

>>107558565
Make a 100A10b or something.

Anonymous
12/15/25(Mon)09:48:47 No.107558793

Anonymous 12/15/25(Mon)09:48:47 No.107558793

>>107558776
I want a 60BA30B.

Anonymous
12/15/25(Mon)09:54:50 No.107558839

Anonymous 12/15/25(Mon)09:54:50 No.107558839

>>107558070
https://voca.ro/18gbO3rnlIND

Anonymous
12/15/25(Mon)09:57:50 No.107558859

Anonymous 12/15/25(Mon)09:57:50 No.107558859

>>107558029
does it need any flag to enable it? i'm launching a few queries but it's putting them in a queue instead of responding to both at the same time

Anonymous
12/15/25(Mon)09:57:52 No.107558860

Anonymous 12/15/25(Mon)09:57:52 No.107558860

File: file.png (508 KB, 1078x1435)

508 KB PNG

>>107558565
it's garbage

Anonymous
12/15/25(Mon)09:58:03 No.107558861

Anonymous 12/15/25(Mon)09:58:03 No.107558861

>>107558760
It was pretrained from scratch, 25T tokens.

Anonymous
12/15/25(Mon)09:59:04 No.107558868

Anonymous 12/15/25(Mon)09:59:04 No.107558868

>>107558860
if a white man's skin started turning shit brown from being in close proximity of tech jeets, would that be reverse shittiligo?

Anonymous
12/15/25(Mon)09:59:31 No.107558869

Anonymous 12/15/25(Mon)09:59:31 No.107558869

>>107558565
goof bros let's fucking gooo

Anonymous
12/15/25(Mon)10:02:20 No.107558897

Anonymous 12/15/25(Mon)10:02:20 No.107558897

>>107558701
took 'em long enough

Anonymous
12/15/25(Mon)10:03:24 No.107558905

Anonymous 12/15/25(Mon)10:03:24 No.107558905

>>107558574
wow, congratulations, anon. By posting shit like this for the 1 millionth time your dick has fallen off and turned into a vagina, fulfilling your lifelong goal of becoming a real womxnxn.

Anonymous
12/15/25(Mon)10:04:20 No.107558909

Anonymous 12/15/25(Mon)10:04:20 No.107558909

>We want to hear from you! Share your ideas, vote on what matters, and help shape the future of Nemotron.
>https://nemotron.ideas.nvidia.com/
What would be something we as the /lmg/ collective would like these models to have?
More "natural sounding human generated" data?

Anonymous
12/15/25(Mon)10:04:36 No.107558913

Anonymous 12/15/25(Mon)10:04:36 No.107558913

>>107558655
I haven't tried it recently with Roo. I was using ClaudeCode with Qwen3 with the anthropic endpoint on mainline. I guess I'll try ikllama next week.

Anonymous
12/15/25(Mon)10:05:00 No.107558918

Anonymous 12/15/25(Mon)10:05:00 No.107558918

File: satan cat vs library mute.png (332 KB, 790x642)

332 KB PNG

>>107558905
but he'll never be a real woman

Anonymous
12/15/25(Mon)10:06:17 No.107558930

Anonymous 12/15/25(Mon)10:06:17 No.107558930

>>107558918
Powerful log.

Anonymous
12/15/25(Mon)10:08:38 No.107558951

Anonymous 12/15/25(Mon)10:08:38 No.107558951

File: 1578829723654.gif (3.54 MB, 280x200)

3.54 MB GIF

>>107558860
>pajeeted
How do tech companies keep falling for this?
It's literally just been one major tech blunder after another, worldwide, since the great pajeeting began.

Anonymous
12/15/25(Mon)10:09:21 No.107558956

Anonymous 12/15/25(Mon)10:09:21 No.107558956

Gemma-4 has image gen? Why the diffusers stuff in the PR?

Anonymous
12/15/25(Mon)10:09:44 No.107558959

Anonymous 12/15/25(Mon)10:09:44 No.107558959

48GB vramlet here
Miqumidnight still queen?

Anonymous
12/15/25(Mon)10:10:16 No.107558963

Anonymous 12/15/25(Mon)10:10:16 No.107558963

File: satan cat toilet of a womb.png (386 KB, 788x742)

386 KB PNG

>>107558930
i have a suspicion it was the satan cat anon that suddenly power moved everyone in this general from never sharing logs again. can't top 'em.

Anonymous
12/15/25(Mon)10:10:22 No.107558966

Anonymous 12/15/25(Mon)10:10:22 No.107558966

File: synthmaxx.png (203 KB, 706x895)

203 KB PNG

>>107558909
You'll never get anything like that from Nvidia Nemotron models. They're meant to be safe benchmaxxed models trained on crawled web data and synthetic data.

Anonymous
12/15/25(Mon)10:10:46 No.107558971

Anonymous 12/15/25(Mon)10:10:46 No.107558971

>>107558860
I understand your prejudice, but just because someone attended university in the US that doesn't automatically mean they're unqualified.

Anonymous
12/15/25(Mon)10:12:45 No.107558983

Anonymous 12/15/25(Mon)10:12:45 No.107558983

>>107558966
>synthetic code
Oh god it must shit out absurd amounts of remarks when writing code.

Anonymous
12/15/25(Mon)10:13:44 No.107558990

Anonymous 12/15/25(Mon)10:13:44 No.107558990

>>107558966
I'm aware, but the vote is open, so feel free to go wild.

Anonymous
12/15/25(Mon)10:19:14 No.107559032

Anonymous 12/15/25(Mon)10:19:14 No.107559032

>>107558909
>Introduce a “semantic firewall” layer that optimizes inference at the language-law level — a symbolic energy compression mechanism that cuts redundant compute cycles while preserving meaning fidelity.
>Instead of scaling by GPU count, this layer redefines compute as coherence between intention and output.
>It’s a governance-first, efficiency-driven approach: models learn to “understand” before they “generate,” lowering both latency and energy use.
People sure love posting the llm schizo ramblings everywhere.

Anonymous
12/15/25(Mon)10:22:19 No.107559048

Anonymous 12/15/25(Mon)10:22:19 No.107559048

>>107558990
https://nemotron.ideas.nvidia.com/ideas/LLAMANEMO-I-47

Anonymous
12/15/25(Mon)10:22:52 No.107559052

Anonymous 12/15/25(Mon)10:22:52 No.107559052

>>107558918
FUCK YOU SATAN FUCK YOU SATAN
KILL SATAN KILL SATAN
DIE DIE DIE DIE DIE

Anonymous
12/15/25(Mon)10:25:44 No.107559067

Anonymous 12/15/25(Mon)10:25:44 No.107559067

>>107558859
You're mistaking what tensor parallel is. It means parallel processing on ur GPU and not parallel requests on server.

Anonymous
12/15/25(Mon)10:26:55 No.107559076

Anonymous 12/15/25(Mon)10:26:55 No.107559076

>>107559048
A little blunt, but I'll take it.

>>107559032
LMAO

Anonymous
12/15/25(Mon)10:27:34 No.107559081

Anonymous 12/15/25(Mon)10:27:34 No.107559081

>>107559052
unhinged and based

Anonymous
12/15/25(Mon)10:27:57 No.107559086

Anonymous 12/15/25(Mon)10:27:57 No.107559086

>>107558905
did I strike a nerve? insult me harder, maybe it will let you run a bigger model.

Anonymous
12/15/25(Mon)10:28:17 No.107559089

Anonymous 12/15/25(Mon)10:28:17 No.107559089

>>107558565
>The model was trained with 25T tokens,
Synth-slopped and hyper-fit. This shit will be amusing if nothing else.

Anonymous
12/15/25(Mon)10:29:04 No.107559092

Anonymous 12/15/25(Mon)10:29:04 No.107559092

>>107558959
strawberry lemonade not bad

Anonymous
12/15/25(Mon)10:29:18 No.107559094

Anonymous 12/15/25(Mon)10:29:18 No.107559094

>>107559086
pajeet/kike level self awareness on display.
You literally just insulted multiple people in the thread and now you're acting like I threw the first punch.
Holy shit.
Your mother really fucked up with you

Anonymous
12/15/25(Mon)10:29:42 No.107559098

Anonymous 12/15/25(Mon)10:29:42 No.107559098

Considering a cope-quant of super nemotron 49B. Is it any good?

Anonymous
12/15/25(Mon)10:30:55 No.107559109

Anonymous 12/15/25(Mon)10:30:55 No.107559109

>>107559094
oh no.. the poors are seething. whatever will I do. 5b of their own active parameters are now upset. to the moon rocket emoji.

Anonymous
12/15/25(Mon)10:33:39 No.107559126

Anonymous 12/15/25(Mon)10:33:39 No.107559126

>>107558959
24gb vramlet here running it at iq2_s
i'm still happy with it and it somehow quantizes really well

Anonymous
12/15/25(Mon)10:34:30 No.107559131

Anonymous 12/15/25(Mon)10:34:30 No.107559131

>>107558951
Subversion

Anonymous
12/15/25(Mon)10:34:48 No.107559133

Anonymous 12/15/25(Mon)10:34:48 No.107559133

>>107559048
anons please vote this is our chance

Anonymous
12/15/25(Mon)10:36:24 No.107559151

Anonymous 12/15/25(Mon)10:36:24 No.107559151

>>107559133
crab

Anonymous
12/15/25(Mon)10:40:49 No.107559183

Anonymous 12/15/25(Mon)10:40:49 No.107559183

>>107559048
>>107559133
It's obiviously a long shot, but might as well.

Anonymous
12/15/25(Mon)10:40:53 No.107559184

Anonymous 12/15/25(Mon)10:40:53 No.107559184

>>107559048
Will never happen again with NVidia's name on it. They'll only train their models with open source safe and effective datasets, now.

Anonymous
12/15/25(Mon)10:44:49 No.107559214

Anonymous 12/15/25(Mon)10:44:49 No.107559214

>>107559048
One of the resident redditors should post this in one of their boards.

Anonymous
12/15/25(Mon)10:50:58 No.107559257

Anonymous 12/15/25(Mon)10:50:58 No.107559257

>>107559109
May Shiva redeem your bants with much bob and vagene sir

Anonymous
12/15/25(Mon)10:52:46 No.107559276

Anonymous 12/15/25(Mon)10:52:46 No.107559276

>>107558565
great, more synthslopped and benchmaxxed trash
i miss 2024

Anonymous
12/15/25(Mon)10:58:09 No.107559330

Anonymous 12/15/25(Mon)10:58:09 No.107559330

>>107559048
they're not releasing any more models like that and you know it

Anonymous
12/15/25(Mon)10:58:31 No.107559334

Anonymous 12/15/25(Mon)10:58:31 No.107559334

>>107559276
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
>We use a considerable amount of synthetic data. Out of 10.6 trillion tokens, 3,534,013,958,278 tokens are synthetically generated.

Anonymous
12/15/25(Mon)10:58:55 No.107559338

Anonymous 12/15/25(Mon)10:58:55 No.107559338

File: G4UVGCbW4AAnsIu.jpg (86 KB, 1170x1170)

86 KB JPG

>>107559276
How do you know this new mixture of slop won't do the trick?

Anonymous
12/15/25(Mon)11:02:30 No.107559367

Anonymous 12/15/25(Mon)11:02:30 No.107559367

File: stemsynthmathmaxx.png (102 KB, 706x911)

102 KB PNG

>>107559338
This ain't Nemo-12B's dataset.

Anonymous
12/15/25(Mon)11:03:20 No.107559375

Anonymous 12/15/25(Mon)11:03:20 No.107559375

any good dense model above 30B?

Anonymous
12/15/25(Mon)11:04:10 No.107559381

Anonymous 12/15/25(Mon)11:04:10 No.107559381

>>107559375
Dolphin-X1-Llama-3.1-405B is underrated.

Anonymous
12/15/25(Mon)11:05:02 No.107559384

Anonymous 12/15/25(Mon)11:05:02 No.107559384

>>107559367
>synthetic CC
4.3 trillion tokens of fake comments sections written by positivityslopped LLMs.
This might end up being so shitty it's good for a laugh.

Anonymous
12/15/25(Mon)11:07:12 No.107559405

Anonymous 12/15/25(Mon)11:07:12 No.107559405

File: sangatsu wavy tears cryin(...).jpg (192 KB, 1920x1080)

192 KB JPG

>>107559367
>Books - 0
They're proud of this and I hate them for it.

Anonymous
12/15/25(Mon)11:07:26 No.107559406

Anonymous 12/15/25(Mon)11:07:26 No.107559406

>>107559334
>3,534,013,958,278 tokens
That sounds a bit expensive to generate with a sota model. I hope this isn't a toss distill or something

Anonymous
12/15/25(Mon)11:09:26 No.107559424

Anonymous 12/15/25(Mon)11:09:26 No.107559424

>ik
prompt eval time = 19841.27 ms / 11023 tokens ( 1.80 ms per token, 555.56 tokens per second)
generation eval time = 86733.83 ms / 2546 runs ( 34.07 ms per token, 29.35 tokens per second)

>mainline
prompt eval time = 24553.96 ms / 11023 tokens ( 2.23 ms per token, 448.93 tokens per second)
eval time = 118823.52 ms / 3154 tokens ( 37.67 ms per token, 26.54 tokens per second)

ik is faster even with non-ik/ubergarm quants. Tested at 11K tokens, with glm-4.6 at Q4_K_S

Any reason to use mainline over ik at the moment? mainline needs less tweaking in the cli with their defaults maybe?

>ik cmd:
CUDA_VISIBLE_DEVICES=2,0,6,1,3,4,5 ./build/bin/llama-server \
--model /mnt/llms/models/unsloth/GLM-4.6-GGUF/Q4_K_S/GLM-4.6-Q4_K_S-00001-of-00005.gguf \
--alias "glm-4.6" \
--ctx-size 64000 \
-mla 3 -amb 512 \
-ngl 99 \
--host 0.0.0.0 \
--port 5000 \
--no-mmap --jinja

>mainline cmd:
CUDA_VISIBLE_DEVICES=2,0,1,3,4,5,6 ./build/bin/llama-server \
--model /mnt/llms/models/unsloth/GLM-4.6-GGUF/Q4_K_S/GLM-4.6-Q4_K_S-00001-of-00005.gguf \
--alias glm-4.6 \
--host 0.0.0.0 \
--port 5000 -c 64000

Anonymous
12/15/25(Mon)11:10:57 No.107559439

Anonymous 12/15/25(Mon)11:10:57 No.107559439

>>107559424
>https://nemotron.ideas.nvidia.com/ideas/LLAMANEMO-I-47
Fine. I'll pull and compile ik.

Anonymous
12/15/25(Mon)11:14:46 No.107559469

Anonymous 12/15/25(Mon)11:14:46 No.107559469

File: bf8e358142ff076ce9b498fd7(...).jpg (30 KB, 400x560)

30 KB JPG

>>107559367

Anonymous
12/15/25(Mon)11:16:40 No.107559483

Anonymous 12/15/25(Mon)11:16:40 No.107559483

>>107559424
What are your specs?

Anonymous
12/15/25(Mon)11:17:38 No.107559496

Anonymous 12/15/25(Mon)11:17:38 No.107559496

>>107559483
Yes

Anonymous
12/15/25(Mon)11:19:14 No.107559513

Anonymous 12/15/25(Mon)11:19:14 No.107559513

>>107559483
1 rtx pro 6000
2 5090
4 3090

at Q4 it fits in vram at 64K ctx. Q6 needs offloading to ram and speeds drop to 9-10t/s

Anonymous
12/15/25(Mon)11:20:20 No.107559524

Anonymous 12/15/25(Mon)11:20:20 No.107559524

File: sloppotron3nano.png (82 KB, 952x365)

82 KB PNG

Even at t=0.6 it seems to be suffering from a bit of gender confusion- like the only purpose for the user to be on their stomach in this scenario would be for her to do the fucking.
Also that whole
"Would you like me to make a listicle of why LLMs keep getting worse?" seems to have generalized into the roleplay.
Probably the best use of non-human anatomy in a model with only 3B active that I've seen so far, though.
The dialogue is like a horrible mash-up between a 1-on-1 anime battle and Debbie Does Dallas.

Anonymous
12/15/25(Mon)11:20:42 No.107559529

Anonymous 12/15/25(Mon)11:20:42 No.107559529

Ik ook

Anonymous
12/15/25(Mon)11:23:36 No.107559558

Anonymous 12/15/25(Mon)11:23:36 No.107559558

>dense 70B q8 @ tg128 2.87
is this acceptable speed?

Anonymous
12/15/25(Mon)11:24:32 No.107559568

Anonymous 12/15/25(Mon)11:24:32 No.107559568

gemma WILL drop in 2 more hours and WILL save local

Anonymous
12/15/25(Mon)11:31:26 No.107559638

Anonymous 12/15/25(Mon)11:31:26 No.107559638

qat = always better?

Anonymous
12/15/25(Mon)11:34:57 No.107559673

Anonymous 12/15/25(Mon)11:34:57 No.107559673

>>107559568
Shieldgemma will save us

Anonymous
12/15/25(Mon)11:39:50 No.107559717

Anonymous 12/15/25(Mon)11:39:50 No.107559717

So using slopotron as an assistant it seems to write out a thought process, but not use thinking tokens. So that's a problemydoo.

Anonymous
12/15/25(Mon)11:40:49 No.107559724

Anonymous 12/15/25(Mon)11:40:49 No.107559724

>>107559717
Are you using --special?

Anonymous
12/15/25(Mon)11:41:23 No.107559731

Anonymous 12/15/25(Mon)11:41:23 No.107559731

>>107558951
this is how a dying civilization looks like
simple as

:(

Anonymous
12/15/25(Mon)11:43:33 No.107559754

Anonymous 12/15/25(Mon)11:43:33 No.107559754

File: bench.png (77 KB, 891x565)

77 KB PNG

>>107559424
>mainline
don't do that unless there's a specific feature you need

their retardation starts to show big time

Anonymous
12/15/25(Mon)11:46:12 No.107559787

Anonymous 12/15/25(Mon)11:46:12 No.107559787

>>107559524
two littles in one sentence. sloppy

Anonymous
12/15/25(Mon)11:50:23 No.107559841

Anonymous 12/15/25(Mon)11:50:23 No.107559841

>>107559731
wanna snuggle up and watch the world burn together? UwU.
#nohomo (jk it will be very homo).

Anonymous
12/15/25(Mon)11:51:40 No.107559857

Anonymous 12/15/25(Mon)11:51:40 No.107559857

File: nemo.png (170 KB, 1094x1327)

170 KB PNG

We are winning

Anonymous
12/15/25(Mon)11:51:44 No.107559859

Anonymous 12/15/25(Mon)11:51:44 No.107559859

>>107559558
Depends on your hardware.

Anonymous
12/15/25(Mon)11:52:28 No.107559864

Anonymous 12/15/25(Mon)11:52:28 No.107559864

>>107559857
It would be very funny if that got some real traction.

Anonymous
12/15/25(Mon)11:53:33 No.107559874

Anonymous 12/15/25(Mon)11:53:33 No.107559874

Anyway as expected slopotron is bad.
But surprisingly not as bad as the gargantuan quantities of synthslop data would make you expect.
Which unfortunately just means it's conventionally bad and not so bad it's good.

Anonymous
12/15/25(Mon)11:56:00 No.107559906

Anonymous 12/15/25(Mon)11:56:00 No.107559906

very cool but how long until OLLAMA offers nemotron

Anonymous
12/15/25(Mon)11:59:38 No.107559945

Anonymous 12/15/25(Mon)11:59:38 No.107559945

>>107559513
Holy shit, why? What kind of ERP scenario exceeds what you can do with a mistral small tune or maybe qwen3-30b-instruct? If it's not ERP then why not use a cloud model? Something like grok-code-fast-1 is unbelievably cheap for the speed and capability. Let the cloud AI companies fight over who can lose money the fastest. There's no way you can match them locally for speed or context.

If I had your budget, I'd sell the 5090s and 3090s and get a second 6000 pro. Then at least you could focus what's actually interesting locally, which is things like LongCat-Video or Ovi.

Anonymous
12/15/25(Mon)11:59:52 No.107559949

Anonymous 12/15/25(Mon)11:59:52 No.107559949

>>107559857
It only shows two out of six comments. My rocket emoji went through but my other did not.

Anonymous
12/15/25(Mon)12:01:39 No.107559971

Anonymous 12/15/25(Mon)12:01:39 No.107559971

File: 1422449559229.jpg (16 KB, 330x344)

16 KB JPG

>>107559857
>I own 20 nvidia shares

Anonymous
12/15/25(Mon)12:02:29 No.107559979

Anonymous 12/15/25(Mon)12:02:29 No.107559979

>>107559949
try clearing cookies

Anonymous
12/15/25(Mon)12:03:19 No.107559987

Anonymous 12/15/25(Mon)12:03:19 No.107559987

>>107559859
dgx spark

Anonymous
12/15/25(Mon)12:09:23 No.107560052

Anonymous 12/15/25(Mon)12:09:23 No.107560052

>>107559987
lol

Anonymous
12/15/25(Mon)12:11:13 No.107560070

Anonymous 12/15/25(Mon)12:11:13 No.107560070

>>107559874
If you look at the 'benchmark' results Nemotron is a direct competitor to GTP-OSS.
You can deduct the rest.

Anonymous
12/15/25(Mon)12:13:32 No.107560094

Anonymous 12/15/25(Mon)12:13:32 No.107560094

>>107559987
My condolences.

Anonymous
12/15/25(Mon)12:13:57 No.107560104

Anonymous 12/15/25(Mon)12:13:57 No.107560104

File: nemoidea.png (13 KB, 589x215)

13 KB PNG

>>107559048
aight

Anonymous
12/15/25(Mon)12:22:18 No.107560174

Anonymous 12/15/25(Mon)12:22:18 No.107560174

>>107560104
>>107559857
someone tried to be a little more discreet
https://nemotron.ideas.nvidia.com/ideas/LLAMANEMO-I-48
>no vote
kek

Anonymous
12/15/25(Mon)12:27:53 No.107560227

Anonymous 12/15/25(Mon)12:27:53 No.107560227

>>107560174
this is both hilarious and terrifying, the man is asking for "non-synthetic, real human conversation data" with the most aislopped post ever.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.