[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: noble meeku.png (2.16 MB, 768x1344)
2.16 MB
2.16 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107545298 & >>107535410

►News
>(12/10) GLM-TTS with streaming, voice cloning, and emotion control: https://github.com/zai-org/GLM-TTS
>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli
>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v
>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>107545298

--Cost-performance challenges in optimizing K2 models with limited GPU memory:
>107552388 >107552493 >107552518 >107552577 >107552593 >107552650
--Quantization vs model size performance tradeoffs:
>107550012 >107552809 >107552934 >107553336 >107552989 >107553444 >107553425
--Optimizing local AI models for Unreal Engine C++ development:
>107554300 >107554362 >107554461 >107554482 >107554686 >107554731 >107554743
--Prototype speculative decoding methods in llama.cpp lacking server integration:
>107551899 >107552450
--Challenges and considerations in distilling and fine-tuning advanced models:
>107548258 >107548358 >107548387 >107548382 >107548399 >107548441 >107548512 >107548619 >107548693 >107548928 >107552056 >107548781 >107548665
--Comparing safety and filtering of GPT-oss 20b vs Gemma models:
>107546443 >107546488 >107546704 >107546718 >107546734
--ExL3 lacks Kimi-K2 support:
>107550440 >107550450 >107550517 >107550548 >107550553 >107550601 >107550629
--Roleplay model performance tradeoffs: 4.5 Air vs GPT-OSS-120B vs Qwen Next 80B:
>107551643 >107551662 >107551678 >107551721 >107552290 >107552464 >107552586 >107552490 >107552515
--ikllama Windows performance issues likely due to flash attention implementation:
>107549210 >107552291 >107552912
--Token banning compatibility issues between roleplay AI backends:
>107550863 >107550873 >107550885 >107550914 >107550969 >107551045 >107551472
--NVIDIA RTX PRO 6000 GPU configuration and power management issues:
>107545503 >107545537 >107545530 >107545636 >107553858
--Comparing censorship in GPT-OSS-120B vs unrestricted models like GLM Air:
>107546681 >107548705 >107549905
--Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs:
>107546364 >107546435
--Miku (free space):
>107545415 >107547832 >107548687 >107550440

►Recent Highlight Posts from the Previous Thread: >>107545300

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
miku a shit
>>
Unbelievably based developments, llamabro.
>>
miku a love
>>
Gemma soon
>>
No offense cuda dev but I didn't need settings automation, I need proper numa TP because ram prices are jacked. IK is rolling up and smoking exllama right now.
>>
Gemma 3 27B is the only stable, non-schizo model in the sub $2k runnable hardware range, GLM-4.5 Air is too schizo and often makes 7B tier mistakes. So I'll be looking forward to Gemma 4.
>>
>>107557458
Mogged by Mistral Small
>>
>>107557523
I don't think so, but if they finetuned it like Ministral 3 14B (without the bad quirks) there might be some chance. Vision would still lose bigly, though.
>>
File: google-hf.png (59 KB, 592x460)
59 KB
59 KB PNG
>>107557425
Context in picrel.
https://x.com/osanseviero/status/2000493503860892049
>>
>>107557568
>if they finetuned it like Ministral 3 14B
Ministral is liquid shit though, it's small for megavramlets with copyrighted stuff ripped out of its dataset.
>>
>>107557585
The latest Ministral 3 models have unexpectedly nice creativity and writing, but their system instruction-following capabilities are very inconsistent and they have issues with message repetition, so they come off as retarded/broken because of that.
>>
>>107557577
WE WILL FINALLY GET NEW SHITTY SYNTHETIC SOTA-SAFE PURPLE PROSE OPTIMIZED MODEL
>>
>>107557577
Can't wait to download Google's new... um, you know... their "thing"...
>>
for erp, I've only ever run nemo and mistral small. If I buy the hardware for glm air, will my mind be blown or will it disappointing?
>>
►Recent Highlights from the Previous Thread: >>107545298

(2/2)

--llama.cpp updates for efficient GPU settings automation and user configuration debates:
>107556876 >107556898 >107556943 >107557034 >107557060 >107557120 >107557167 >107557163 >107557275
--Text generation parameter debates: temperature, minP, and TopK effectiveness:
>107555084 >107555121 >107555140 >107555175 >107556538 >107556572
--5090 GPU system configuration challenges for Australian buyers:
>107556007 >107556070 >107556107 >107556124 >107556142 >107556143

►Recent Highlight Posts from the Previous Thread: >>107545300

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107557633
nah it's not great
>>
>>107557633
If you have complicated scenarios where you want to the model to pick up how character's feel without having to spell it out, air is definitely smarter. But for simple ERP I wouldn't say it's an improvement. It doesn't really write better.
>>
>>107557619
I'm looking forward to Gemma 4 providing me with better access.
>>
>>107557633
It's a sidegrade. Its prose is a bit nicer but low active params means it will make dumb mistakes more often, and it also frequently parrots user's replies.
>>
>>107557654
>>107557675
what about a Q3 of glm 4.5?
>>
>>107557633
If you plan to buy hardware to run model X, instead of buying hardware for other things and running model X is a nice side effect, you really ought to rent some cloud hardware to give it a try for a day or two beforehand.
>>
>>107557691
Buying new hardware in the hopes of running a cope quant is never a good idea.
>>
>>107557633
better hardware just means less forgetfulness and faster tps
the writing quality will be very similar
>>
>>107557704
there is glm 4.6 which is better than 4.5, but it's kinda overbaked and lacks knowledge and intelligence. deepseek r1 q2 does feel like an upgrade. but now that ram is five trillion times more expensive idk what people should do
>>
>>107557805
Crazy that stacking 3090s are now the 'poorfag' option.
>>
>>107557805
nta but which Dipsy is best Dipsy for creative writing?
>>
>>107557832
Original R1 is the best for creatively sucking your dick
>>
>>107557453
That is one of my immediate next priorities and the only reason I didn't do it first is that multiple other people had expressed interest in working on tensor parallelism (and then didn't seliver).
I will not delegate it again and hope to produce a working prototype over the Chistmas break when I will have plenty of time.
>>
>>107557816
heh, I stick with what I know. About to buy an 8th 3090. I don't want to deal with different cuda versions, etc
>>
File: ram2025.png (1.96 MB, 1520x1024)
1.96 MB
1.96 MB PNG
>>107557816
picrel
>>107557832
stellar. no model (i tried) handles unformatted mikupad storywriting better. and yes, original r1
>>
>>107557899
>and then didn't seliver

That's why I don't PR features to llama.cpp, I don't want to fuck your project up with features I know I might no maintain for more than a few months.

Luckily Claude is good at handling merges when I fetch upstream.
>>
>>107557899
It's like the only thing you can count on is yourself. Always in all ways.
>>
>>107557453
Has IK_ done anything relevant in the past few months? I'm still using my version from october for K2/GLM.
>>
>>107558025
We have regular tensor parallel now for fully offloaded models and some MoE.
>>
>>107558029
I assume but not yet for the basic -ot exps=cpu scenario?
>>
>>107558035
your prompt processing will get faster if it's on GPU.
>>
File: gemma-4-200b-jagganath-it.jpg (537 KB, 1024x1024)
537 KB
537 KB JPG
>>107557577
sirs we are so back
>>
>>107557995
Share your secret stash of patches, you selfish fuck. Maybe some vibecoder can point Claude at your repo and make the PRs you refuse to make.
>>
>>107557577
I think we should see related PRs soon in the main backends, but there's nothing yet.

https://github.com/huggingface/transformers/pulls
https://github.com/vllm-project/vllm/pulls
https://github.com/ggml-org/llama.cpp/pulls
>>
>>107558113
we are so back
gemma 4 will save us
>>
>>107558115
Just like mistral saved us and air saved us?
>>
>>107558122
true air has never tried
>>
4.6 Air will be released today.
>>
4.6 Air will not be released today.
>>
>>107558137
What are you breathing?
>>
>>107558080
>Share your secret stash of patches, you selfish fuck.

Selfish would be spamming their code base when I know I don't have time to actively maintain it.

My shit is all niche (rpc-server rewrite that requires a copy of the gguf on each node, grpc-server, re-implement training, dodgy xcodec2 implementation, etc) and I don't have the rocm/sycl/metal hardware to test it for all their platforms.
>>
Currently unlisted
https://huggingface.co/google/gemma-4-100b-pt
https://huggingface.co/google/gemma-4-100b-pt
https://huggingface.co/google/gemma-4-100b-pt
>>
>>107558278
Sorry. I messed up the links
https://huggingface.co/google/gemma-4-100ba10m-pt
https://huggingface.co/google/gemma-4-100ba10m-pt
https://huggingface.co/google/gemma-4-100ba10m-pt
>>
>>107558278
>>107558292
jagganath bless. .
>>
>>107558292
that would be interesting therefore it wont happen
>>
https://huggingface.co/google/gemma-4peepeepoopoo
secret — do not share
>>
>>107558341
fuck you racist mc
>>
File: 1738083735147213.png (351 KB, 1080x1073)
351 KB
351 KB PNG
>>107558329
>>
>>107558329
I'm just waiting for 10ma100b.
>>
>>107558278
-pt means portuguese only, btw. I hope it's not confusing.
>>
>>107558357
That's a lot of layer reusing.
>>
>>107558385
It's about time somebody seriously explored layer recursion for production LLMs.
>>
>>107558385
The intellect of a god, the knowledge of a nematode worm.
>>
>>107554263
> tl;dr open shorts with leverage, right?
I'm not a fan of any financial instrument that can lose you more than your investment.
If you know how to use shorts and are comfortable with them, great. But those mean you have to have the timing exactly right.
If you're the one writing the laws or cutting the big checks, or know those who do, you can get that timing exactly right. Everyone else is guessing.
>>
>>107558029
so it supports proper parallel requests? like vllm?
>>
>>107558505
Yes, but performance is more like exllamav2 than vllm. 25 t/s llama-3-70b on 3x3090.
>>
>>107557577
Bharat class gemma 3 superfinetune will do the needful.
I am of refreshing page
>>
Bad timing

https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
https://github.com/ggml-org/llama.cpp/pull/18058
https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-techniques-tools-and-data-that-make-it-efficient-and-accurate/
https://huggingface.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>>
>>107558565
we are so back
>>
>>107558565
a whole pile of stinky vramlet shit
>>
File: 1752263899934131.jpg (349 KB, 1920x1080)
349 KB
349 KB JPG
>>107558565
>math and code benchmax dataset tune of a math and code benchmax model
>>
File: 1742157675597120.png (125 KB, 923x482)
125 KB
125 KB PNG
>>107558565
main advertising point is the speed cope (it's as smart as oss-20b on [hand-picked benchmark]
maybe the mamba hybrid jamba wambo thing is interesting but I have no hope
>>
>>107558565
Bloody Vishnu... not Nemotron. This is bollocks.
>>
>>107558583
artificial anal cysts
>>
>>107558550
i just need ik to properly support tool calling to be usable for true local agentic coding so we can plug it into Opencode, roocode...
>>
>>107557633
you're better off running 70Bs
>>
>>107558583
>maybe the mamba hybrid jamba wambo thing is interesting
llama.cpp support ETA: half past never
>>
File: wait.png (19 KB, 912x103)
19 KB
19 KB PNG
>>107558565
>>
>>107558685
llama : add support for NVIDIA Nemotron 3 Nano #18058
https://github.com/ggml-org/llama.cpp/pull/18058
>>
>>107558693
uh-oh, stinky!
>>
>>107558565
interesting
>Nemotron 3 Super and Ultra introduce latent MoE, where experts operate on a shared latent representation before outputs are projected back to token space. This approach allows the model to call on 4x more experts at the same inference cost, enabling better specialization around subtle semantic structures, domain abstractions, or multi-hop reasoning patterns.
>>
>>107558727
too bad it's fucking shit
>>
>>107558727
Maybe they should take their cutting edge technologies and apply it to a model that wasn't already garbage to begin with
>>
>>107558565
Make a 100A10b or something.
>>
>>107558776
I want a 60BA30B.
>>
>>107558070
https://voca.ro/18gbO3rnlIND
>>
>>107558029
does it need any flag to enable it? i'm launching a few queries but it's putting them in a queue instead of responding to both at the same time
>>
File: file.png (508 KB, 1078x1435)
508 KB
508 KB PNG
>>107558565
it's garbage
>>
>>107558760
It was pretrained from scratch, 25T tokens.
>>
>>107558860
if a white man's skin started turning shit brown from being in close proximity of tech jeets, would that be reverse shittiligo?
>>
>>107558565
goof bros let's fucking gooo
>>
>>107558701
took 'em long enough
>>
>>107558574
wow, congratulations, anon. By posting shit like this for the 1 millionth time your dick has fallen off and turned into a vagina, fulfilling your lifelong goal of becoming a real womxnxn.
>>
>We want to hear from you! Share your ideas, vote on what matters, and help shape the future of Nemotron.
>https://nemotron.ideas.nvidia.com/
What would be something we as the /lmg/ collective would like these models to have?
More "natural sounding human generated" data?
>>
>>107558655
I haven't tried it recently with Roo. I was using ClaudeCode with Qwen3 with the anthropic endpoint on mainline. I guess I'll try ikllama next week.
>>
>>107558905
but he'll never be a real woman
>>
>>107558918
Powerful log.
>>
File: 1578829723654.gif (3.54 MB, 280x200)
3.54 MB
3.54 MB GIF
>>107558860
>pajeeted
How do tech companies keep falling for this?
It's literally just been one major tech blunder after another, worldwide, since the great pajeeting began.
>>
Gemma-4 has image gen? Why the diffusers stuff in the PR?
>>
48GB vramlet here
Miqumidnight still queen?
>>
>>107558930
i have a suspicion it was the satan cat anon that suddenly power moved everyone in this general from never sharing logs again. can't top 'em.
>>
File: synthmaxx.png (203 KB, 706x895)
203 KB
203 KB PNG
>>107558909
You'll never get anything like that from Nvidia Nemotron models. They're meant to be safe benchmaxxed models trained on crawled web data and synthetic data.
>>
>>107558860
I understand your prejudice, but just because someone attended university in the US that doesn't automatically mean they're unqualified.
>>
>>107558966
>synthetic code
Oh god it must shit out absurd amounts of remarks when writing code.
>>
>>107558966
I'm aware, but the vote is open, so feel free to go wild.
>>
>>107558909
>Introduce a “semantic firewall” layer that optimizes inference at the language-law level — a symbolic energy compression mechanism that cuts redundant compute cycles while preserving meaning fidelity.
>Instead of scaling by GPU count, this layer redefines compute as coherence between intention and output.
>It’s a governance-first, efficiency-driven approach: models learn to “understand” before they “generate,” lowering both latency and energy use.
People sure love posting the llm schizo ramblings everywhere.
>>
>>107558990
https://nemotron.ideas.nvidia.com/ideas/LLAMANEMO-I-47
>>
>>107558918
FUCK YOU SATAN FUCK YOU SATAN
KILL SATAN KILL SATAN
DIE DIE DIE DIE DIE
>>
>>107558859
You're mistaking what tensor parallel is. It means parallel processing on ur GPU and not parallel requests on server.
>>
>>107559048
A little blunt, but I'll take it.

>>107559032
LMAO
>>
>>107559052
unhinged and based
>>
>>107558905
did I strike a nerve? insult me harder, maybe it will let you run a bigger model.
>>
>>107558565
>The model was trained with 25T tokens,
Synth-slopped and hyper-fit. This shit will be amusing if nothing else.
>>
>>107558959
strawberry lemonade not bad
>>
>>107559086
pajeet/kike level self awareness on display.
You literally just insulted multiple people in the thread and now you're acting like I threw the first punch.
Holy shit.
Your mother really fucked up with you
>>
Considering a cope-quant of super nemotron 49B. Is it any good?
>>
>>107559094
oh no.. the poors are seething. whatever will I do. 5b of their own active parameters are now upset. to the moon rocket emoji.
>>
>>107558959
24gb vramlet here running it at iq2_s
i'm still happy with it and it somehow quantizes really well
>>
>>107558951
Subversion
>>
>>107559048
anons please vote this is our chance
>>
>>107559133
crab
>>
>>107559048
>>107559133
It's obiviously a long shot, but might as well.
>>
>>107559048
Will never happen again with NVidia's name on it. They'll only train their models with open source safe and effective datasets, now.
>>
>>107559048
One of the resident redditors should post this in one of their boards.
>>
>>107559109
May Shiva redeem your bants with much bob and vagene sir
>>
>>107558565
great, more synthslopped and benchmaxxed trash
i miss 2024
>>
>>107559048
they're not releasing any more models like that and you know it
>>
>>107559276
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
>We use a considerable amount of synthetic data. Out of 10.6 trillion tokens, 3,534,013,958,278 tokens are synthetically generated.
>>
File: G4UVGCbW4AAnsIu.jpg (86 KB, 1170x1170)
86 KB
86 KB JPG
>>107559276
How do you know this new mixture of slop won't do the trick?
>>
File: stemsynthmathmaxx.png (102 KB, 706x911)
102 KB
102 KB PNG
>>107559338
This ain't Nemo-12B's dataset.
>>
any good dense model above 30B?
>>
>>107559375
Dolphin-X1-Llama-3.1-405B is underrated.
>>
>>107559367
>synthetic CC
4.3 trillion tokens of fake comments sections written by positivityslopped LLMs.
This might end up being so shitty it's good for a laugh.
>>
>>107559367
>Books - 0
They're proud of this and I hate them for it.
>>
>>107559334
>3,534,013,958,278 tokens
That sounds a bit expensive to generate with a sota model. I hope this isn't a toss distill or something
>>
>ik
prompt eval time = 19841.27 ms / 11023 tokens ( 1.80 ms per token, 555.56 tokens per second)
generation eval time = 86733.83 ms / 2546 runs ( 34.07 ms per token, 29.35 tokens per second)

>mainline
prompt eval time = 24553.96 ms / 11023 tokens ( 2.23 ms per token, 448.93 tokens per second)
eval time = 118823.52 ms / 3154 tokens ( 37.67 ms per token, 26.54 tokens per second)

ik is faster even with non-ik/ubergarm quants. Tested at 11K tokens, with glm-4.6 at Q4_K_S

Any reason to use mainline over ik at the moment? mainline needs less tweaking in the cli with their defaults maybe?

>ik cmd:
CUDA_VISIBLE_DEVICES=2,0,6,1,3,4,5 ./build/bin/llama-server \
--model /mnt/llms/models/unsloth/GLM-4.6-GGUF/Q4_K_S/GLM-4.6-Q4_K_S-00001-of-00005.gguf \
--alias "glm-4.6" \
--ctx-size 64000 \
-mla 3 -amb 512 \
-ngl 99 \
--host 0.0.0.0 \
--port 5000 \
--no-mmap --jinja

>mainline cmd:
CUDA_VISIBLE_DEVICES=2,0,1,3,4,5,6 ./build/bin/llama-server \
--model /mnt/llms/models/unsloth/GLM-4.6-GGUF/Q4_K_S/GLM-4.6-Q4_K_S-00001-of-00005.gguf \
--alias glm-4.6 \
--host 0.0.0.0 \
--port 5000 -c 64000
>>
>>107559424
>https://nemotron.ideas.nvidia.com/ideas/LLAMANEMO-I-47
Fine. I'll pull and compile ik.
>>
>>107559367
>>
>>107559424
What are your specs?
>>
>>107559483
Yes
>>
>>107559483
1 rtx pro 6000
2 5090
4 3090

at Q4 it fits in vram at 64K ctx. Q6 needs offloading to ram and speeds drop to 9-10t/s
>>
File: sloppotron3nano.png (82 KB, 952x365)
82 KB
82 KB PNG
Even at t=0.6 it seems to be suffering from a bit of gender confusion- like the only purpose for the user to be on their stomach in this scenario would be for her to do the fucking.
Also that whole
"Would you like me to make a listicle of why LLMs keep getting worse?" seems to have generalized into the roleplay.
Probably the best use of non-human anatomy in a model with only 3B active that I've seen so far, though.
The dialogue is like a horrible mash-up between a 1-on-1 anime battle and Debbie Does Dallas.
>>
Ik ook
>>
>dense 70B q8 @ tg128 2.87
is this acceptable speed?
>>
gemma WILL drop in 2 more hours and WILL save local
>>
qat = always better?
>>
>>107559568
Shieldgemma will save us
>>
So using slopotron as an assistant it seems to write out a thought process, but not use thinking tokens. So that's a problemydoo.
>>
>>107559717
Are you using --special?
>>
>>107558951
this is how a dying civilization looks like
simple as

:(
>>
File: bench.png (77 KB, 891x565)
77 KB
77 KB PNG
>>107559424
>mainline
don't do that unless there's a specific feature you need

their retardation starts to show big time
>>
>>107559524
two littles in one sentence. sloppy
>>
>>107559731
wanna snuggle up and watch the world burn together? UwU.
#nohomo (jk it will be very homo).
>>
File: nemo.png (170 KB, 1094x1327)
170 KB
170 KB PNG
We are winning
>>
>>107559558
Depends on your hardware.
>>
>>107559857
It would be very funny if that got some real traction.
>>
Anyway as expected slopotron is bad.
But surprisingly not as bad as the gargantuan quantities of synthslop data would make you expect.
Which unfortunately just means it's conventionally bad and not so bad it's good.
>>
very cool but how long until OLLAMA offers nemotron
>>
>>107559513
Holy shit, why? What kind of ERP scenario exceeds what you can do with a mistral small tune or maybe qwen3-30b-instruct? If it's not ERP then why not use a cloud model? Something like grok-code-fast-1 is unbelievably cheap for the speed and capability. Let the cloud AI companies fight over who can lose money the fastest. There's no way you can match them locally for speed or context.

If I had your budget, I'd sell the 5090s and 3090s and get a second 6000 pro. Then at least you could focus what's actually interesting locally, which is things like LongCat-Video or Ovi.
>>
>>107559857
It only shows two out of six comments. My rocket emoji went through but my other did not.
>>
File: 1422449559229.jpg (16 KB, 330x344)
16 KB
16 KB JPG
>>107559857
>I own 20 nvidia shares
>>
>>107559949
try clearing cookies
>>
>>107559859
dgx spark
>>
>>107559987
lol
>>
>>107559874
If you look at the 'benchmark' results Nemotron is a direct competitor to GTP-OSS.
You can deduct the rest.
>>
>>107559987
My condolences.
>>
File: nemoidea.png (13 KB, 589x215)
13 KB
13 KB PNG
>>107559048
aight
>>
>>107560104
>>107559857
someone tried to be a little more discreet
https://nemotron.ideas.nvidia.com/ideas/LLAMANEMO-I-48
>no vote
kek
>>
>>107560174
this is both hilarious and terrifying, the man is asking for "non-synthetic, real human conversation data" with the most aislopped post ever.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.