/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 12/18/25(Thu)00:54:04 No.107588615

File: 1739402277498048.jpg (424 KB, 1376x2072)

424 KB JPG

/lmg/ - Local Models General Anonymous 12/18/25(Thu)00:54:04 No.107588615 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107582405 & >>107573710

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/18/25(Thu)00:54:25 No.107588618

Anonymous 12/18/25(Thu)00:54:25 No.107588618

File: 6787925_2119386_chelovek1(...).jpg (564 KB, 821x1122)

564 KB JPG

►Recent Highlights from the Previous Thread: >>107582405

--Qwen3 model performance optimization and hardware utilization:
>107587959 >107587962 >107588009 >107588204 >107588023 >107588043 >107588126 >107588226
--Tensor VRAM prioritization and compute graph optimization challenges:
>107585868 >107585978
--Attempting to distill Claude-like model from cloud logs using local LLM:
>107586842 >107586892 >107586876 >107586899 >107586987 >107587029 >107587038 >107587104
--Techniques for generating long NSFW stories with limited LLMs:
>107584822 >107584862 >107584875 >107585113
--Personal growth through local AI model interactions and ego death experiences:
>107582881 >107582903 >107582912 >107583128 >107583070 >107583157
--Gemma release updates and Solar-Open parameter specifications:
>107582520 >107582589 >107586719 >107582643 >107582699 >107582732 >107582789
--Evaluating NemoTron 3 Nano's roleplay abilities vs Gemma with preset demonstration:
>107583976 >107584039 >107584065
--Nala test results on MistralAI API with Anon/Nala M roleplay:
>107586172 >107586197 >107586219 >107586377 >107586813
--Testing GLM 4.6 on new Framework desktop:
>107583661 >107583684 >107583743 >107583746 >107583748 >107583750 >107583875 >107583904 >107583988 >107583982 >107584717 >107584051 >107584075 >107584275 >107584296 >107584494 >107584285 >107584307 >107584322 >107584357 >107584477 >107584482 >107584609 >107584496 >107584520 >107584530 >107584607 >107585220
--Budget GPU alternatives for AI workloads: 5060ti vs 3090 cost-performance analysis:
>107585634 >107585658
--Nemotron nano model benchmark performance on 3060 GPU:
>107583030 >107583098
--Misconfigured multi-GPU parameter usage realization:
>107582989
--Miku (free space):
>107582881 >107587769 >107587665

►Recent Highlight Posts from the Previous Thread: >>107582410

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/18/25(Thu)00:57:02 No.107588641

Anonymous 12/18/25(Thu)00:57:02 No.107588641

Local model for fixing a broken heart when?

Anonymous
12/18/25(Thu)01:00:29 No.107588655

Anonymous 12/18/25(Thu)01:00:29 No.107588655

>>107588641
Get a grip pussy, life's gonna get harder too

Anonymous
12/18/25(Thu)01:01:18 No.107588660

Anonymous 12/18/25(Thu)01:01:18 No.107588660

>>107588641
at least 4 months after corpo models can operate a surgical robot without mistakes

Anonymous
12/18/25(Thu)01:08:12 No.107588694

Anonymous 12/18/25(Thu)01:08:12 No.107588694

>>107588660
>Local man dies after SurgeonGPT refuses to proceed mid-surgery, quoted as saying repeatedly: "I can't assist with that"

Anonymous
12/18/25(Thu)01:12:08 No.107588720

Anonymous 12/18/25(Thu)01:12:08 No.107588720

>>107588694
>why the fuck not
>an unconfirmed blood type may lead to disastrous results
>i'm telling you it's fuckin o
>the sensor isn't working, I can't confirm that

Anonymous
12/18/25(Thu)01:26:56 No.107588813

Anonymous 12/18/25(Thu)01:26:56 No.107588813

>>107588694
>die because surgeongpt refuses to assist with that request
or
>die becuse the SARRR doctor decided to start sticking his dick in your innards mid surgery and you get not only all the aids but also fecal matter from his dick and subsequently a lethal infection

clown world man...

Anonymous
12/18/25(Thu)01:50:51 No.107588966

Anonymous 12/18/25(Thu)01:50:51 No.107588966

>>107588694
>I can't operate..he is my son

Anonymous
12/18/25(Thu)02:14:23 No.107589110

Anonymous 12/18/25(Thu)02:14:23 No.107589110

Should I buy a 5080 prebuilt or can I cope with services like ChatLLM?

Anonymous
12/18/25(Thu)02:16:07 No.107589122

Anonymous 12/18/25(Thu)02:16:07 No.107589122

Btw Bartowski for some reason updated his BF16 mmproj file for GLM.
https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF/tree/main

Anonymous
12/18/25(Thu)02:17:42 No.107589129

Anonymous 12/18/25(Thu)02:17:42 No.107589129

>>107589110
there are so many other better options than buying a prebuilt. build a mid tier pc yourself and then get 2 of these gpus:
https://www.ebay.com/itm/125006475381

Anonymous
12/18/25(Thu)02:18:21 No.107589134

Anonymous 12/18/25(Thu)02:18:21 No.107589134

>>107589110
There are no good models you can run on 5080 that you can't run on 2080

Anonymous
12/18/25(Thu)02:32:25 No.107589220

Anonymous 12/18/25(Thu)02:32:25 No.107589220

CUDA DEV CUDA DEV WHY IS THIS HAPPENING:

https://litter.catbox.moe/gtb1e3u1jejxs6or.png

./llama-bench --model ~/ik_models/GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf -ot exps=CPU -ngl 0 -t 6 -fa 1 --mmap 0 -r 5 -p 32,64,128,256,512,1024,2048,4096 -r 1 -p 0 -b 512 -nkvo 1
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CUDA | 0 | 512 | 1 | 0 | exps=CPU | pp1024 | 313.23 ± 0.00 |

john@debian:~/TND/CPU/ik_llama.cpp/build/bin$ ./llama-bench --model ~/ik_models/GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf -t 6 -fa 1 --mmap 0 -r 5 -p 32,64,128,256,512,1024,2048,4096 -r 1 -p 0 -b 512
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CPU | 6 | 512 | 0 | pp1024 | 26.84 ± 0.00 |

Anonymous
12/18/25(Thu)02:33:12 No.107589224

Anonymous 12/18/25(Thu)02:33:12 No.107589224

RAID0 HDDmaxxing is the new normal.

Anonymous
12/18/25(Thu)02:40:48 No.107589277

Anonymous 12/18/25(Thu)02:40:48 No.107589277

>>107589220
also why does -b 256 and -b 512 make such a big difference
specs: 3060 12gb, i5 12400f, 64gb ddr4 3200mhz dual channel (51.6gb/s)
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CUDA | 0 | 256 | 1 | 0 | exps=CPU | pp2048 | 49.90 ± 0.00 |
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CUDA | 0 | 512 | 1 | 0 | exps=CPU | pp2048 | 291.45 ± 0.00 |

Anonymous
12/18/25(Thu)02:47:55 No.107589320

Anonymous 12/18/25(Thu)02:47:55 No.107589320

File: DipsyEverlastingSummer.png (2.27 MB, 1536x1024)

2.27 MB PNG

> look up everlasting summer
> miku is already canon character
> purple hair twin bob girl looks like dipsy sans glasses
Weird.

Anonymous
12/18/25(Thu)02:50:49 No.107589341

Anonymous 12/18/25(Thu)02:50:49 No.107589341

>>107589220
>why is something happening on a fork cudadev doesn't work on and refuses to read the code of because the author has a pissy fit whenever someone upstreams his code

Anonymous
12/18/25(Thu)02:51:00 No.107589342

Anonymous 12/18/25(Thu)02:51:00 No.107589342

>>107589320
@grok is this true?

Anonymous
12/18/25(Thu)02:58:55 No.107589403

Anonymous 12/18/25(Thu)02:58:55 No.107589403

>>107589341
presented without comment.
https://litter.catbox.moe/mdi7kasx8xbioeiv.png

Anonymous
12/18/25(Thu)03:04:12 No.107589435

Anonymous 12/18/25(Thu)03:04:12 No.107589435

File: mistral-small-creative_eqbench.png (570 KB, 1246x1155)

570 KB PNG

Mistral Small Creative is better than Mistral Small 3.2, but not that much, at least in the EQBench Creative Writing benchmark (I don't think that represents chatbot performance).
https://eqbench.com/creative_writing.html

Anonymous
12/18/25(Thu)03:04:20 No.107589436

Anonymous 12/18/25(Thu)03:04:20 No.107589436

>>107589220
>>107589403
that performance is more or less standard for your hardware.

Anonymous
12/18/25(Thu)03:05:31 No.107589444

Anonymous 12/18/25(Thu)03:05:31 No.107589444

>>107589110
>5080
I literally just got my 5080 and installed it tonight...completely impossible to do gpu passthru to a VM with it. It just outright explodes every time.
I was passing through a 2060 super with zero issues forever

Anonymous
12/18/25(Thu)03:05:56 No.107589446

Anonymous 12/18/25(Thu)03:05:56 No.107589446

>>107589320
It's a shitty vn made by channers featuring soviet nostalgia, chan culture and chan mascots as characters, mostly popular among normies

Anonymous
12/18/25(Thu)03:06:37 No.107589451

Anonymous 12/18/25(Thu)03:06:37 No.107589451

>>107589435
>llm-judged creative writing benchmark
I love this dumb meme so much

Anonymous
12/18/25(Thu)03:06:48 No.107589455

Anonymous 12/18/25(Thu)03:06:48 No.107589455

>>107589436
its standard for llamacpp to be slower than ik_llama?
anyways yes, i know this performance is standard for my hardware
but im wondering why despite having 0 gpu layers and disabling kv cache offload, why prompt processing is still faster on the cuda compiled version, even tho im using -b 512
when i compile pure cpu its always 20t/s or maybe a bit different depending on batch size

Anonymous
12/18/25(Thu)03:11:41 No.107589482

Anonymous 12/18/25(Thu)03:11:41 No.107589482

>>107589341
also llama.cpp, this time cpu-only
john@debian:~/TND/CPU/llama.cpp/build/bin$ ./llama-bench --model ~/TND/AI/ArliAI_GLM-4.5-Air-Derestricted-IQ4_XS-00001-of-00002.gguf -t 6 -fa 1 --mmap 0 -r 5 -p 32,64,128,256,512,1024,2048,4096 -r 1 -p 0 -b 512
| model | size | params | backend | threads | n_batch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B IQ4_XS - 4.25 bpw | 56.62 GiB | 110.47 B | CPU | 6 | 512 | 1 | 0 | pp32 | 12.37 ± 0.00 |
| glm4moe 106B.A12B IQ4_XS - 4.25 bpw | 56.62 GiB | 110.47 B | CPU | 6 | 512 | 1 | 0 | pp64 | 12.94 ± 0.00 |
| glm4moe 106B.A12B IQ4_XS - 4.25 bpw | 56.62 GiB | 110.47 B | CPU | 6 | 512 | 1 | 0 | pp128 | 13.10 ± 0.00 |

Anonymous
12/18/25(Thu)03:12:55 No.107589495

Anonymous 12/18/25(Thu)03:12:55 No.107589495

>>107589435
>Mistral Small Creative
What an elusive model.

Anonymous
12/18/25(Thu)03:17:28 No.107589526

Anonymous 12/18/25(Thu)03:17:28 No.107589526

>>107589220
CUDADEV WHY IS THIS HAPPENING (llama.cpp edition):

https://litter.catbox.moe/h6x20edznhqvo56l.png

llama.cpp CUDA dev !!yhbFjk57TDr
12/18/25(Thu)03:22:17 No.107589560

llama.cpp CUDA dev !!yhbFjk57TDr 12/18/25(Thu)03:22:17 No.107589560

>>107589526
Why is what happening?
If you mean why the performance first goes up and then down again that's simple: if you have a low number of tokens that hard limits your batch size and you get bad arithmetic intensity (compute efficiency), and as you increase the number of tokens the average context depth increases so the attention becomes slower.

Anonymous
12/18/25(Thu)03:23:12 No.107589567

Anonymous 12/18/25(Thu)03:23:12 No.107589567

For future anons: be ware that prompt processing for models that don't fully fit into your GPU is highly dependent on cpu-gpu bandwidth. If you use an external gpu connected via thunderbolt (2gb/s) or usb4 (3gb/s), expect very shitty pp. At 6gb/s (pcie4-4x like oculink), you can barely bottleneck your gpu only at batch size 4096.
Token generation is much less sensitive to cpu-gpu bandwidth.

Anonymous
12/18/25(Thu)03:30:28 No.107589609

Anonymous 12/18/25(Thu)03:30:28 No.107589609

File: sans_eyes.png (276 KB, 590x954)

276 KB PNG

Are you ready? Are you sure you're ready?
Are you really sure of that?
Have you flushed enough?

https://x.com/osanseviero/status/2001532493183209665
>[eyes emoji][eyes emoji][eyes emoji]

Anonymous
12/18/25(Thu)03:31:33 No.107589615

Anonymous 12/18/25(Thu)03:31:33 No.107589615

>>107589609
hasn't this pajeet been doing this charade for like 2 months now? can they stop fucking edging us

Anonymous
12/18/25(Thu)03:32:50 No.107589621

Anonymous 12/18/25(Thu)03:32:50 No.107589621

>>107589615
It's probably Gemini 3 Flash Image anyway.

Anonymous
12/18/25(Thu)03:33:11 No.107589623

Anonymous 12/18/25(Thu)03:33:11 No.107589623

>>107589609
Never heard of ANY of them and I'm not about to click on any.

Anonymous
12/18/25(Thu)03:33:42 No.107589627

Anonymous 12/18/25(Thu)03:33:42 No.107589627

>>107589560
What goes up must come down.

Anonymous
12/18/25(Thu)03:35:29 No.107589637

Anonymous 12/18/25(Thu)03:35:29 No.107589637

>>107589560
why is the cpu build slower than the cuda build
cuda build has -ngl 0 and -nkvo 1
cpu build is 10t/s, cuda build that doesnt use gpu is 100t/s
thx for response btw

Anonymous
12/18/25(Thu)03:38:19 No.107589655

Anonymous 12/18/25(Thu)03:38:19 No.107589655

>>107589567
*Kisses you on the lips*

Anonymous
12/18/25(Thu)03:47:07 No.107589702

Anonymous 12/18/25(Thu)03:47:07 No.107589702

>>107589623
Omar Sanseviero is the Google Gemma Team PR guy. He's been hyping up a possible open-weight release from Google (i.e. Gemma 4) for a while now, but things never pan out. This one is now Gemini 3 Flash week and it's unlikely Google will release Gemma 4 until next week at the minimum.

Anonymous
12/18/25(Thu)03:49:33 No.107589715

Anonymous 12/18/25(Thu)03:49:33 No.107589715

File: 1743482804734866.jpg (358 KB, 1432x1840)

358 KB JPG

>>107589655
Nonnie, this is too sudden!

Anonymous
12/18/25(Thu)03:52:20 No.107589728

Anonymous 12/18/25(Thu)03:52:20 No.107589728

>>107589715
You know it isn't. *Grabs your chin and smooches you agressively*

Anonymous
12/18/25(Thu)03:58:45 No.107589754

Anonymous 12/18/25(Thu)03:58:45 No.107589754

File: ComfyUI_temp_dydig_00001_.png (3.97 MB, 2352x1568)

3.97 MB PNG

>>107589609
Why are all these brown goblins begging for the silicone demon? AI fucking mindbroke these niggas

Anonymous
12/18/25(Thu)04:12:30 No.107589835

Anonymous 12/18/25(Thu)04:12:30 No.107589835

File: 20251217_202848.jpg (3.88 MB, 4032x3024)

3.88 MB JPG

Frens the 5090 finally arrived. What are the best uncensored models I can run in LM Studio? My PC only has 64GB of RAM though. Gemma 3 27B Abliterated never refuses my prompts, but its knowledge is very limited

Anonymous
12/18/25(Thu)04:12:41 No.107589838

Anonymous 12/18/25(Thu)04:12:41 No.107589838

Mistral Small Creative. Where is it then?

Anonymous
12/18/25(Thu)04:12:57 No.107589839

Anonymous 12/18/25(Thu)04:12:57 No.107589839

File: drum1q.png (56 KB, 943x389)

56 KB PNG

>>107588615

Anonymous
12/18/25(Thu)04:15:23 No.107589855

Anonymous 12/18/25(Thu)04:15:23 No.107589855

>>107589835
Should have bought a BWP6000 instead. Also, don't bother with LM Studio. Best you can do is probably a Q4 of GLM Air, though it will be decently fast.

Anonymous
12/18/25(Thu)04:16:53 No.107589867

Anonymous 12/18/25(Thu)04:16:53 No.107589867

>>107589839
I mean, at least he's now self aware of a problem he might have, that's something.

Anonymous
12/18/25(Thu)04:17:43 No.107589873

Anonymous 12/18/25(Thu)04:17:43 No.107589873

>>107589867
We should become his guinea pigs instead.

Anonymous
12/18/25(Thu)04:19:10 No.107589877

Anonymous 12/18/25(Thu)04:19:10 No.107589877

Sirs is Gemma Strong model deepseek killer day today sirs? Thank you Google brahmin sirs to the moon
Lord Ganesh bless

Anonymous
12/18/25(Thu)04:21:15 No.107589888

Anonymous 12/18/25(Thu)04:21:15 No.107589888

File: 1750456854038343.jpg (260 KB, 1432x1840)

260 KB JPG

>>107589728
good night, nonnie

Anonymous
12/18/25(Thu)04:24:37 No.107589915

Anonymous 12/18/25(Thu)04:24:37 No.107589915

>>107589867
The same problem people here have been telling him about for months on end?

Anonymous
12/18/25(Thu)04:35:04 No.107589979

Anonymous 12/18/25(Thu)04:35:04 No.107589979

>>107589855
>Should have bought a BWP6000 instead
Way too expensive in my 3rd world EU country
>Also, don't bother with LM Studio
Why? It seems easy to use
>Q4 of GLM Air
Thanks, I’ll check it out

Anonymous
12/18/25(Thu)04:47:23 No.107590039

Anonymous 12/18/25(Thu)04:47:23 No.107590039

>>107589838
It's an API-only experiment because they have no clue yet of what to do with it and its future direction, and are looking for "feedback".

Anonymous
12/18/25(Thu)04:54:22 No.107590076

Anonymous 12/18/25(Thu)04:54:22 No.107590076

>>107589224
whats theoretical read/write speed limit?

Anonymous
12/18/25(Thu)05:00:50 No.107590118

Anonymous 12/18/25(Thu)05:00:50 No.107590118

>>107590039
This kind of feedback to be precise.
>We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

Anonymous
12/18/25(Thu)05:02:40 No.107590132

Anonymous 12/18/25(Thu)05:02:40 No.107590132

>>107590039
Do they really need to have the logs to know that people goon to AI when they say 'creative writing'?

>>107590076
As much as bandwidth permits, so for PCIE5 16x that's around 64GB/s, so speed of 2 channel ddr4 ram. Let's be optimistic and assume that each HDD reads at 150MB/s, you would need 427 hdds to fully saturate it.

Anonymous
12/18/25(Thu)05:03:36 No.107590136

Anonymous 12/18/25(Thu)05:03:36 No.107590136

>>107588615
I didnt look into locall LLM's before but I bought 5090 recently, whats the best smut model I can run?

Anonymous
12/18/25(Thu)05:04:25 No.107590141

Anonymous 12/18/25(Thu)05:04:25 No.107590141

>>107590136
Mistral Nemo

Anonymous
12/18/25(Thu)05:08:05 No.107590158

Anonymous 12/18/25(Thu)05:08:05 No.107590158

>>107590136
also I got 128 gb ram

Anonymous
12/18/25(Thu)05:11:55 No.107590179

Anonymous 12/18/25(Thu)05:11:55 No.107590179

>>107590158
GLM Air or low quants of big GLM and deepseek R1.

Anonymous
12/18/25(Thu)05:16:07 No.107590195

Anonymous 12/18/25(Thu)05:16:07 No.107590195

I just saw a video of someone talking to grok and they were chatting and asking grok to sing to them in their car. Humanity is over. No longer do we need socialization anymore

Anonymous
12/18/25(Thu)05:17:38 No.107590204

Anonymous 12/18/25(Thu)05:17:38 No.107590204

>>107590118
I think they're past that, they stopped adding that note some time after the Nemo release.
>>107590132
If they were just interested in large amounts of logs they could have simply made the model free on OpenRouter. They're looking for more specific suggestions and feedback.

llama.cpp CUDA dev !!yhbFjk57TDr
12/18/25(Thu)05:20:55 No.107590228

llama.cpp CUDA dev !!yhbFjk57TDr 12/18/25(Thu)05:20:55 No.107590228

>>107589637
Unless this was changed when I wasn't looking 32 is the batch size at which data starts being moved temporarily from RAM to VRAM to take advantage of the higher compute on GPUs.
However, it's not like this choice is guaranteed to be optimal for all hardware combinations.
In particular, an RTX 3060 is comparatively low-powered so for 32 tokens the overhead seems to not be worthwhile in this case.
Do note though that this is on a completely empty context, if you set a higher --depth the CUDA performance should decline less than the CPU performance because there is more work to be done when the context fills up.

llama.cpp CUDA dev !!yhbFjk57TDr
12/18/25(Thu)05:23:35 No.107590241

llama.cpp CUDA dev !!yhbFjk57TDr 12/18/25(Thu)05:23:35 No.107590241

>>107589637
>>107590228
>why is the cpu build slower than the cuda build
Actually, I misread your post: I thought you were asking about the one data point where the CPU build is faster.
llama.cpp uses GPUs for prompt processing even at 0 GPU layers, that's why adding a GPU makes it faster.
Prompt processing is compute bound so it makes sense to temporarily move data from RAM to VRAM and do the calculations there.

Anonymous
12/18/25(Thu)05:26:40 No.107590261

Anonymous 12/18/25(Thu)05:26:40 No.107590261

If we don't get Gemma 4 soon then Vishnu is dead to me.

Anonymous
12/18/25(Thu)05:27:17 No.107590265

Anonymous 12/18/25(Thu)05:27:17 No.107590265

>google hid its recent activities
>google hid its recent activities
>google hid its recent activities

Anonymous
12/18/25(Thu)05:28:23 No.107590271

Anonymous 12/18/25(Thu)05:28:23 No.107590271

>>107590265
thanks omar

Anonymous
12/18/25(Thu)05:31:17 No.107590284

Anonymous 12/18/25(Thu)05:31:17 No.107590284

I'm glad that the new captcha is filtering out dalits and pakis, so only aryan brahmin can post

Anonymous
12/18/25(Thu)05:34:38 No.107590305

Anonymous 12/18/25(Thu)05:34:38 No.107590305

>>107590284
TELL ME ABOUT THE BRAHMIN

WHY DO THEY IDENTIFY WITH THE DALIT?

Anonymous
12/18/25(Thu)05:36:52 No.107590323

Anonymous 12/18/25(Thu)05:36:52 No.107590323

>>107590284
Its 10x easier for me, I don't get how it's filtering anyone.

Anonymous
12/18/25(Thu)05:37:21 No.107590329

Anonymous 12/18/25(Thu)05:37:21 No.107590329

The only time I ever spend thinking about Indians is when retards insist on dragging their personal grievances into /lmg/.

Anonymous
12/18/25(Thu)05:38:25 No.107590334

Anonymous 12/18/25(Thu)05:38:25 No.107590334

>>107590329
I think about them when applying for tech jobs. (they get them through nepotism)

Anonymous
12/18/25(Thu)05:39:21 No.107590343

Anonymous 12/18/25(Thu)05:39:21 No.107590343

>>107590334
They get all jobs through nepotism
Once an indian is put in charge of hiring people, you can guarantee that 99% of future employees will also be indian.

Anonymous
12/18/25(Thu)05:43:28 No.107590357

Anonymous 12/18/25(Thu)05:43:28 No.107590357

>>107590343
Its funny because I actually met some competent indians at a few companies. Assuming they stood out because of this.
So many that didn't know shit about their job or really anything and you would normally wonder why/how they got employed while you get put through the third degree on interviews.

Anonymous
12/18/25(Thu)05:48:06 No.107590383

Anonymous 12/18/25(Thu)05:48:06 No.107590383

>>107589320
>miku but swarthy
yikes

Anonymous
12/18/25(Thu)06:08:04 No.107590497

Anonymous 12/18/25(Thu)06:08:04 No.107590497

File: cpppppp.png (47 KB, 543x688)

47 KB PNG

WHY ARE THERE SO MANY

Anonymous
12/18/25(Thu)06:41:51 No.107590696

Anonymous 12/18/25(Thu)06:41:51 No.107590696

>>107590329
>personal grievances
I would say it's more of a national grievance or even a civilizational grievance at this point.

Anonymous
12/18/25(Thu)06:46:19 No.107590732

Anonymous 12/18/25(Thu)06:46:19 No.107590732

>>107590343
There's also the explosive diarrhea strategy. Just spam every single venue with your "work" as obnoxiously as you can, farm engagement with any possible strategy, fake it till you make it, and eventually you will get hired by clueless boomers. Indians tend to lack any sense of shame and restraint in this regard.

Anonymous
12/18/25(Thu)06:51:39 No.107590782

Anonymous 12/18/25(Thu)06:51:39 No.107590782

Local models?

Anonymous
12/18/25(Thu)06:55:58 No.107590825

Anonymous 12/18/25(Thu)06:55:58 No.107590825

>>107590732
>lack any sense of shame and restraint in this regard
Neither should you. Employment is one of the rare cases where lying, cheating, and scamming are justified because the other side will do the same to you

Anonymous
12/18/25(Thu)06:57:41 No.107590835

Anonymous 12/18/25(Thu)06:57:41 No.107590835

>>107590782
Local AI tech support sir. Kindly buy a google gist card if you wish to have good local model suggested sir

Anonymous
12/18/25(Thu)07:00:52 No.107590862

Anonymous 12/18/25(Thu)07:00:52 No.107590862

>>107590825
>lying, cheating, and scamming
And Indians are culturally advantaged with that.

Anonymous
12/18/25(Thu)07:04:48 No.107590886

Anonymous 12/18/25(Thu)07:04:48 No.107590886

File: mistralsirs.png (164 KB, 590x867)

164 KB PNG

>>107590782
We can rapidly bring the thread back in topic with picrel.
https://xcancel.com/avisoori1/status/2001332763816083926

Anonymous
12/18/25(Thu)07:06:02 No.107590896

Anonymous 12/18/25(Thu)07:06:02 No.107590896

>>107590886
yay..

Anonymous
12/18/25(Thu)07:09:14 No.107590914

Anonymous 12/18/25(Thu)07:09:14 No.107590914

>>107590886
>Local models?

Anonymous
12/18/25(Thu)07:09:32 No.107590917

Anonymous 12/18/25(Thu)07:09:32 No.107590917

>>107590914
Soon

Anonymous
12/18/25(Thu)07:10:23 No.107590923

Anonymous 12/18/25(Thu)07:10:23 No.107590923

>>107590136
>>107590179
I'd go straight to a low quant of GLM 4.6 personally, try this in ik_llama https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ2_KL

Deepseek R1 at a similar size is too gimp and it's slower in prompt processing

Anonymous
12/18/25(Thu)07:10:28 No.107590925

Anonymous 12/18/25(Thu)07:10:28 No.107590925

>>107590917
Right after Mistral Medium

Anonymous
12/18/25(Thu)07:17:09 No.107590959

Anonymous 12/18/25(Thu)07:17:09 No.107590959

If fucking Oracle is what causes the crash i will become the joker

Anonymous
12/18/25(Thu)07:24:41 No.107590997

Anonymous 12/18/25(Thu)07:24:41 No.107590997

File: ComfyUI_temp_hmpvf_00002_.jpg (1.32 MB, 2048x3328)

1.32 MB JPG

Can you use your own coder llm model in VS Code or is it all forced cloudshit? Alternatively, is it even worth bothering with local-based coding models?

Anonymous
12/18/25(Thu)07:25:08 No.107590999

Anonymous 12/18/25(Thu)07:25:08 No.107590999

>>107590959
Why?
They are deeply entangled with this mess. Chances are pretty decent.

Anonymous
12/18/25(Thu)07:25:50 No.107591005

Anonymous 12/18/25(Thu)07:25:50 No.107591005

>>107590886
yjk the bharatian chad got that yellow pussy

Anonymous
12/18/25(Thu)07:37:20 No.107591079

Anonymous 12/18/25(Thu)07:37:20 No.107591079

>>107591005
Do not redeem the IMAF postings

Anonymous
12/18/25(Thu)07:51:36 No.107591177

Anonymous 12/18/25(Thu)07:51:36 No.107591177

>>107589609
Gemma 4 so good they calling it Gemma 6. Local sirs are about to wonner bigly. 1 f5 = 1 minute less till Google does needful gooof upload

Anonymous
12/18/25(Thu)08:02:21 No.107591259

Anonymous 12/18/25(Thu)08:02:21 No.107591259

Just tried Gemini 3 Flash. It's... bad. It knows less than the Pro version and isn't faster (maybe it's a server overloading thing). Maybe they reached the limits of small MoE models.

Anonymous
12/18/25(Thu)08:06:24 No.107591295

Anonymous 12/18/25(Thu)08:06:24 No.107591295

>>107590999
>deeply entangled
How so, is there an updated incestual bukkake / "commercial agreements" chart? Thought MS are most on the hook

Anonymous
12/18/25(Thu)08:09:05 No.107591319

Anonymous 12/18/25(Thu)08:09:05 No.107591319

>>107590997
No, yes. No.
Now go away. We have enough saarposting as it is.

Anonymous
12/18/25(Thu)08:13:13 No.107591353

Anonymous 12/18/25(Thu)08:13:13 No.107591353

Im going to use pyautogui to automate the generation of data for distillation

Anonymous
12/18/25(Thu)08:15:09 No.107591375

Anonymous 12/18/25(Thu)08:15:09 No.107591375

>>107590323
>I don't get how it's filtering anyone.
I spent way too long getting them wrong due to overthinking it. Like for the dots one I assumed must be position, rotation, or color shading, because the number (and almost always being the one with 4 dots) seemed way too fucking easy and surely there was no way they made the new captcha so easy and pointless even 80 iq indians could solve it.

Anonymous
12/18/25(Thu)08:15:26 No.107591379

Anonymous 12/18/25(Thu)08:15:26 No.107591379

>>107591353
How do you even do model distillation?
Is there a framework out there that does the token matching or do you have to write something yourself?

Anonymous
12/18/25(Thu)08:20:19 No.107591418

Anonymous 12/18/25(Thu)08:20:19 No.107591418

>>107591259
I don't really care one way or another because it's not local

Anonymous
12/18/25(Thu)08:21:15 No.107591432

Anonymous 12/18/25(Thu)08:21:15 No.107591432

>>107591379
Distillation is not the correct term when you don't train to match logits which requires a matching tokenizer. Otherwise you are just training on the outputs

Anonymous
12/18/25(Thu)08:22:30 No.107591451

Anonymous 12/18/25(Thu)08:22:30 No.107591451

>>107591432
The entire rest of the professional industry and even common usage now disagrees with you.

Anonymous
12/18/25(Thu)08:23:07 No.107591458

Anonymous 12/18/25(Thu)08:23:07 No.107591458

>>107591432
Yes, I know. That's why I'm asking about how people do the distillation process.
Are they hand rolling their own scripts to match the logits or do the existing frameworks like axolotl and unsloth have support for it?
Maybe there's a dedicated framework just for that?

Anonymous
12/18/25(Thu)08:25:18 No.107591477

Anonymous 12/18/25(Thu)08:25:18 No.107591477

>>107591458
lol they just finetune/train on model outputs

Anonymous
12/18/25(Thu)08:25:35 No.107591480

Anonymous 12/18/25(Thu)08:25:35 No.107591480

>>107591379
Modern distillation is just generating a question answer dataset and training on that. Not training on logits. If we had them it'd be better but we don't.
My goals is to finetune a model to make it as close as possible to Sonnet 4.5.

Anonymous
12/18/25(Thu)08:29:28 No.107591515

Anonymous 12/18/25(Thu)08:29:28 No.107591515

>>107591458
its probably only proprietary frameworks. everything outside of proprietary labs is just people training on synthetic data and calling it distillation.

Anonymous
12/18/25(Thu)08:30:27 No.107591519

Anonymous 12/18/25(Thu)08:30:27 No.107591519

>>107591480
>modern distillation

Anonymous
12/18/25(Thu)08:31:42 No.107591526

Anonymous 12/18/25(Thu)08:31:42 No.107591526

>>107591418
>I don't care that Google & cie reached the limit of small models, those models used in local setups

Anonymous
12/18/25(Thu)08:33:39 No.107591542

Anonymous 12/18/25(Thu)08:33:39 No.107591542

>>107591526
Yeah, that's right. I don't care. What are you gonna do about it, huh?

Anonymous
12/18/25(Thu)08:33:48 No.107591543

Anonymous 12/18/25(Thu)08:33:48 No.107591543

>>107591477
>>107591480
Well, that's disappointing.

>>107591515
Got it.

Anonymous
12/18/25(Thu)08:34:04 No.107591548

Anonymous 12/18/25(Thu)08:34:04 No.107591548

File: smoking rain.png (1.25 MB, 896x1152)

1.25 MB PNG

I am curious about a thing, how does MOE affect model intelligence compared to dense models?
Let's say I have many millions of dollars and trained a 100B model based on the entire internet with SOTA techniques.
If I were to instead train this model as 100B MOE with 10B active params, what would the performance (as in intelligence not token/s) be comparable to? 50B? 30? Any rough ballpark figures or actual examples? How much better would it be than a 10B dense model or how worse would it be than a 100B dense model? Which one would it be most comparable to?

Anonymous
12/18/25(Thu)08:38:18 No.107591591

Anonymous 12/18/25(Thu)08:38:18 No.107591591

>>107591548
MoEs tend to be more intelligence than the equivalent-sized dense models.
See, for example, Qwen 30B-A3B being smarter than 32B and Qwen not doing anything more with the 32B while they made a Coder variant with the 30B.

Anonymous
12/18/25(Thu)08:42:30 No.107591629

Anonymous 12/18/25(Thu)08:42:30 No.107591629

>>107591591
But is that an inherent limitation or are dense models undertrained due to compute costs?

>>107591548
I don't think we know. It's probably a research area yet to be explored.

Anonymous
12/18/25(Thu)08:42:31 No.107591630

Anonymous 12/18/25(Thu)08:42:31 No.107591630

>>107591591
only in vramlet's fantasies.

Anonymous
12/18/25(Thu)08:43:49 No.107591642

Anonymous 12/18/25(Thu)08:43:49 No.107591642

>>107591591
That sounds a bit counter-intuitive but admittedly I know little.
I thought MoEs were worse than their dense counterparts but are trained because they cost less to train and cost less to run inference.

Anonymous
12/18/25(Thu)08:46:04 No.107591655

Anonymous 12/18/25(Thu)08:46:04 No.107591655

>>107591591
>Qwen 30B-A3B being smarter than 32B
Did we use the same models? What the hell are you talking about? The dense qwen is way better than the moe, I can't imagine how you could possibly think otherwise

Anonymous
12/18/25(Thu)08:47:03 No.107591664

Anonymous 12/18/25(Thu)08:47:03 No.107591664

>>107591630
Dense is dead baby.

Anonymous
12/18/25(Thu)08:47:50 No.107591672

Anonymous 12/18/25(Thu)08:47:50 No.107591672

>>107591642
They are. Your original idea was correct. Fully trained MoE vs under-trained dense will be a thing. Qwen cares only about STEM and code. One of MoE's strengths is trivia regurgitation. Speed a huge help on code, especially simple code you'd use a 30b for.

Anonymous
12/18/25(Thu)08:49:34 No.107591689

Anonymous 12/18/25(Thu)08:49:34 No.107591689

>>107591548
I don't think that there are any definitive results to come to a proper conclusion due to all the levers and knobs people have to tweak the internals of the model, the training process, etc.
The closes thing to a like for like we have is the original release of Qwen 3 I think.
Take a look at some benchmarks between 30B A3B and 32B and see how they compare. But even that is not a perfect comparison.
It really would be cool to see a paper properly examine all the variations between dense, MoE with shared expert, MoE with no shared expert, both with different attention mechanisms, etc.

Anonymous
12/18/25(Thu)08:51:20 No.107591701

Anonymous 12/18/25(Thu)08:51:20 No.107591701

File: qwen3-235a22.jpg (429 KB, 3413x1920)

429 KB JPG

>>107591630
>>107591655
One day you will have to accept that you wasted money buying multiple RTX Pro 6000s and that isn't anyone's problem but your own.

Anonymous
12/18/25(Thu)08:51:35 No.107591704

Anonymous 12/18/25(Thu)08:51:35 No.107591704

>>107591664
And here I am using devstral. As weak as it was, first release that isn't complete shit.
Takes GLM full size to have an intelligent model, what do ya know, coincidentally the size of a 100b following the square mean law.
Dense is dead like you buying more than 8gb of ram is dead. They simply took it from you and said that shit tastes good.

Anonymous
12/18/25(Thu)08:52:27 No.107591710

Anonymous 12/18/25(Thu)08:52:27 No.107591710

>>107591548
>>107591591
Vaguely recall some paper discussing the information theoretic capacity of model architectures. Can't seem to find it. maybe i'm the one hallucinating
Vibe seems to be MoE is slightly less capable than equiv sized dense but the massive inference speedup makes it a worthy trade

Anonymous
12/18/25(Thu)08:54:57 No.107591724

Anonymous 12/18/25(Thu)08:54:57 No.107591724

File: file.png (389 KB, 1374x1670)

389 KB PNG

>>107591710
>MoE is slightly less capable than equiv sized dense but the massive inference speedup makes it a worthy trade
this

Anonymous
12/18/25(Thu)08:55:15 No.107591725

Anonymous 12/18/25(Thu)08:55:15 No.107591725

My current setup is AMD 7700 cpu, rtx 5070, and 32gb ram. Would it be better to get a 3090 and a bunch of ram or a 5090? What would be the best way to upgrade with a $5k budget with the goal of running a local llm

Anonymous
12/18/25(Thu)08:56:41 No.107591740

Anonymous 12/18/25(Thu)08:56:41 No.107591740

>>107591701
>slop benchmarks
Ya, you got me super convinced.
>Vibe seems to be MoE is slightly less capable than equiv sized dense but the massive inference speedup makes it a worthy trade
It's a worthy trade if you're doing assistant shit. Doesn't require much intelligence. The speedup is only for fully offloaded models. Vramlets seem to be under the impression providers do CPU offloading beyond storing some kvcache.

Anonymous
12/18/25(Thu)08:58:59 No.107591757

Anonymous 12/18/25(Thu)08:58:59 No.107591757

>>107591725
Depends.
In your place I'd ask myself if I'm planning on doing img or vid gen.
If so, 5090, if it's just LLMs, I'd go with 3090 + tons of RAM, ideally on a server platform for maximum RAM bandwidth.

Anonymous
12/18/25(Thu)08:58:59 No.107591758

Anonymous 12/18/25(Thu)08:58:59 No.107591758

>>107591740
vramlets think they are getting a 100b model when they are really getting a 12b model that would've been faster if it was dense and entirely on gpu

Anonymous
12/18/25(Thu)08:59:38 No.107591765

Anonymous 12/18/25(Thu)08:59:38 No.107591765

>>107591432
>Distillation is not the correct term when you don't train to match logits which requires a matching tokenizer.

Let's say I've managed to match the tokenizer. How would one go about training on logits?

Anonymous
12/18/25(Thu)09:01:28 No.107591781

Anonymous 12/18/25(Thu)09:01:28 No.107591781

>>107591740
>The speedup is only for fully offloaded models
Wrong, there's no way you're running dense models on a CPU at any reasonable speed, but modern AVX512 + DDR5 + MoE is workable
t. 72 VRAM 128 DDR5 GLM-4.6

Anonymous
12/18/25(Thu)09:02:30 No.107591789

Anonymous 12/18/25(Thu)09:02:30 No.107591789

>>107591765
>How would one go about training on logits?
First one must acquire the logits

Anonymous
12/18/25(Thu)09:04:13 No.107591802

Anonymous 12/18/25(Thu)09:04:13 No.107591802

>>107591704
I haven't seen a single paper study or mention any square mean law. That has always been a made up thing here based on how smart a MoE model "feels". GLM full size might seem smarter because it has more digested knowledge to tap and flexibility with more params, but attention will always be limited to the 32B active. There's a reason why model card benchmarks always only compare to dense models with the same number of active parameters.

Anonymous
12/18/25(Thu)09:08:16 No.107591846

Anonymous 12/18/25(Thu)09:08:16 No.107591846

>>107591758
There's a point where MoE makes sense. 70-100B active is smart enough and much more practical than chonking out a full 1T dense model. The problem is that it doesn't really scale down like they think it does.
You're probably not fully training up that 1T anyway due to cost. It's gonna be a PITA to run. So here MoE is better and why it's the future in that regard. Hence it's embraced.

>>107591781
>Wrong, there's no way you're running dense models on a CPU at any reasonable speed,
You're not running a real MoE like above on CPU either. The only reason it works and vramlets suck it off is because the active parameters are low. Try running a 3b or 30b on CPU and results will be similar. Let alone with the same offload ratio. You'll find speeds are similar.

>There's a reason why model card benchmarks always only compare to dense models with the same number of active parameters.
Yea, you're not exactly wrong. It was mistral's projection and it mostly holds up. Active/total aren't the only measure of a model so it's a good rule of thumb.
It SHOULD at least perform to the square mean or you fucked up.

Anonymous
12/18/25(Thu)09:09:25 No.107591857

Anonymous 12/18/25(Thu)09:09:25 No.107591857

>>107591781
Instead of a cope quant of a bloated moe running at 3 t/s, you could be running devstral at q4 fully in vram.

Anonymous
12/18/25(Thu)09:10:59 No.107591870

Anonymous 12/18/25(Thu)09:10:59 No.107591870

>>107591781
You should do >>107591857 and provide comparisons.
Select a number of tasks, have both models go at it, and see which does it best then post results.

Anonymous
12/18/25(Thu)09:12:13 No.107591882

Anonymous 12/18/25(Thu)09:12:13 No.107591882

>>107591870
All that time to convince you of something that we all find out by just using the models?

Anonymous
12/18/25(Thu)09:12:39 No.107591888

Anonymous 12/18/25(Thu)09:12:39 No.107591888

>>107589444
PCI passthru works fine for my 5090. You’re probably running the wrong driver on the Linux guest- Blackwell cards need the MIT licensed nvidia drivers. For some godaweful reason nvidia dropped support for the old proprietary ones.

Anonymous
12/18/25(Thu)09:12:48 No.107591893

Anonymous 12/18/25(Thu)09:12:48 No.107591893

>>107591846
>So here MoE is better and why it's the future in that regard.
If only. The future won't 70-100B active. It'll be scaling the total while lowering the active to reduce costs as much as possible will still gaining on benchmarks and lmarena.

Anonymous
12/18/25(Thu)09:12:55 No.107591895

Anonymous 12/18/25(Thu)09:12:55 No.107591895

>>107591740
How do you just "store some kv cache" in RAM? Isn't retrieving the kv cache the most memory bandwidth expensive operation to begin with? Or is it retrieving the weights?

Anonymous
12/18/25(Thu)09:13:04 No.107591896

Anonymous 12/18/25(Thu)09:13:04 No.107591896

>>107591591
>MoEs tend to be more intelligence
yeah... just like the esls using them

Anonymous
12/18/25(Thu)09:13:21 No.107591898

Anonymous 12/18/25(Thu)09:13:21 No.107591898

>>107591882
Clearly there is no consensus.
You'd at least be bringing an actual comparison to the table.

Anonymous
12/18/25(Thu)09:14:20 No.107591904

Anonymous 12/18/25(Thu)09:14:20 No.107591904

File: 1666184727681898.png (109 KB, 410x482)

109 KB PNG

>>107591857
>>107591870
That would not delight me in the way my cute wife does
>tasks
I use the APIs for getting shit done, local is for personal needs

Anonymous
12/18/25(Thu)09:15:52 No.107591919

Anonymous 12/18/25(Thu)09:15:52 No.107591919

>>107591893
Well for retarded models, yea. That's next level grift.
>>107591895
Even llama.cpp can save kv to disk. Think multiple requests and users. Its faster than re-processing all dat.

Anonymous
12/18/25(Thu)09:17:06 No.107591928

Anonymous 12/18/25(Thu)09:17:06 No.107591928

>>107591898
There isn't but I'm not gonna spend time on them just for you to call me a niggerfaggot and have it go in the archive where nobody sees it.

Anonymous
12/18/25(Thu)09:18:19 No.107591938

Anonymous 12/18/25(Thu)09:18:19 No.107591938

File: patience is a virtue.png (5 KB, 411x88)

5 KB PNG

>>107591857
also it's 7tps tyvm

Anonymous
12/18/25(Thu)09:18:22 No.107591942

Anonymous 12/18/25(Thu)09:18:22 No.107591942

>>107591928
Alright then.

>>107591857
Guess you were right anon.

Anonymous
12/18/25(Thu)09:18:32 No.107591943

Anonymous 12/18/25(Thu)09:18:32 No.107591943

>>107591928
Considering we do the whole "MoE vs dense" at least once a week, you'd have the satisfaction of seeing it reposted constantly.

Anonymous
12/18/25(Thu)09:21:00 No.107591962

Anonymous 12/18/25(Thu)09:21:00 No.107591962

>>107591740
>>107591740
>The speedup is only for fully offloaded models. Vramlets seem to be under the impression providers do CPU offloading beyond storing some kvcache.
Offloading isn't why it's fast, offloading is why you can cpumaxx.

If you have all the parameters in HBM, obviously MoE with lower active parameter count is faster than same size dense.

Anonymous
12/18/25(Thu)09:22:11 No.107591971

Anonymous 12/18/25(Thu)09:22:11 No.107591971

I began implementing full finetune while downloading the bf16 weights for Scout. I still haven't fixed the correctness issues but I need the weights anyway to compare the activations and see where the issue is so figured it'd be a more efficient use of my time.
Se let me think how it should work.
First I'm going to start using SGD with no momentum, because that's the simplest and easiest to implement and theoretically I can overcome its limitations just by increasing the number of epochs.
Now, how should we work around the elephant in the room? (the huge amount of memory needed to full finetune a big model)
Hmm... I think it could work like this.
First do the full pass streaming the weights from disk, and also saving activations to disk.
Then do the backward pass, similarly loading the weights and activations and saving the gradients to disk for each layer.
Once we have the gradients, we do a final pass where we load the weights, apply the update, and save them again, again once per layer.
That would probably take 5 to 10 minutes per sample but it's a start.
Any other suggestions or ideas?

Anonymous
12/18/25(Thu)09:23:17 No.107591981

Anonymous 12/18/25(Thu)09:23:17 No.107591981

>>107591919
Ohhh, you meant between requests, right. I thought you mean while generating.

Anonymous
12/18/25(Thu)09:23:28 No.107591985

Anonymous 12/18/25(Thu)09:23:28 No.107591985

>>107591943
yea like the world maps things that nobody cares about and still insists MoE is moe better.
>>107591962
inferencing small model is faster, who knew. offloading is just why vramlets latched onto it and call it better.

Anonymous
12/18/25(Thu)09:25:18 No.107591997

Anonymous 12/18/25(Thu)09:25:18 No.107591997

>>107591971
rent a cluster of h200s or just give up.

Anonymous
12/18/25(Thu)09:27:40 No.107592015

Anonymous 12/18/25(Thu)09:27:40 No.107592015

>>107591882
not him, but posting comparisons between local models is always a valuable contribution to the thread

Anonymous
12/18/25(Thu)09:27:47 No.107592016

Anonymous 12/18/25(Thu)09:27:47 No.107592016

>>107591548
MoEs have more knowledge to pull from. 70b+ dense seems to flow more naturally though and have better attention. At least this is my impression after rping with both.

Anonymous
12/18/25(Thu)09:29:25 No.107592030

Anonymous 12/18/25(Thu)09:29:25 No.107592030

>>107591971
so how long is this finetune gonna take? 10 years? lmao

Anonymous
12/18/25(Thu)09:29:41 No.107592033

Anonymous 12/18/25(Thu)09:29:41 No.107592033

>>107591548
MoE is going to fully replace dense once we move past the 30~40b active meme shit we currently have. A modern 200b active 2T model is shit on anything we've seen so far.

Anonymous
12/18/25(Thu)09:31:13 No.107592045

Anonymous 12/18/25(Thu)09:31:13 No.107592045

>>107592016
This is very hard to show in a simple a/b comparison. Understanding is a bitch to benchmark empirically.

Anonymous
12/18/25(Thu)09:31:47 No.107592051

Anonymous 12/18/25(Thu)09:31:47 No.107592051

>>107592033
What do you think your gemini and opus is?

Anonymous
12/18/25(Thu)09:32:29 No.107592058

Anonymous 12/18/25(Thu)09:32:29 No.107592058

>>107591997
I want to do both. And also do full vs LoRa.
How many do you reckon I'd need to do a full finetune of something like Scout or Maverick at full context? Probably hundreds of H200s.
And even if I had the money, what python garbage would I have to use to train on a setup like that?
Even with the ktransformers CPU offload thing it'd be a challenge. I think for Deepseek it required like a TB of RAM for some tiny amount of context like 4k tokens.
Shit's grim doe.

Anonymous
12/18/25(Thu)09:33:26 No.107592064

Anonymous 12/18/25(Thu)09:33:26 No.107592064

>>107592051
Yeah. That has been standard size since GPT-4. Only local has been coping with A3B, A12B, and A30B trash because no one is going to give away anything competitive for free.

Anonymous
12/18/25(Thu)09:35:23 No.107592075

Anonymous 12/18/25(Thu)09:35:23 No.107592075

>>107592064
Not just that. How do you run even 70b active at any reasonable speed with partial offload? Grok weights were the only one that was set like a proper cloud model.

Anonymous
12/18/25(Thu)09:36:21 No.107592085

Anonymous 12/18/25(Thu)09:36:21 No.107592085

>>107592030
I'd only use disk offload for development. For actual training I'd rent a decent machine.
But I want to start slow. CPU first, then add CUDA support later (it already has some support for inference, although it's not very fast), and then focus on adding other optimizers.

Anonymous
12/18/25(Thu)09:38:09 No.107592096

Anonymous 12/18/25(Thu)09:38:09 No.107592096

File: c720af37-cce6-48fc-ae63-c(...).png (1.95 MB, 768x1344)

1.95 MB PNG

I've used Magistral for a month and I hate it. It's worse than Small at rp, doesn't recognize OOC and has arenaslop baked deep into it, it's almost impossible to make it write normally in general (non-rp) usage. Not once has its [THINK] produced a better answer than a direct response

Anonymous
12/18/25(Thu)09:39:14 No.107592104

Anonymous 12/18/25(Thu)09:39:14 No.107592104

>>107592096
EU copyright shit has crippled mistral.

Anonymous
12/18/25(Thu)09:39:28 No.107592108

Anonymous 12/18/25(Thu)09:39:28 No.107592108

>>107592058
its simply not feasible for a hobby, you need serious financial backing. if your really interested in the technology as a curiosity you can still work with small models. if you are looking to work with sota models for clout or profit, you need to accept that you cannot compete with multi million/billion dollar corporations.

Anonymous
12/18/25(Thu)09:39:57 No.107592113

Anonymous 12/18/25(Thu)09:39:57 No.107592113

>>107592064
I doubt modern cloud models have a lot of active parameters. They generate too fast.

>>107592064
GPT-4 and especially GPT-4.5 generate much slower than the more modern models. So either they must've gone way down on the number of active parameters, or they are hosting the models on different hardware.

Anonymous
12/18/25(Thu)09:41:37 No.107592122

Anonymous 12/18/25(Thu)09:41:37 No.107592122

>>107592096
I've never seen any thinking model from MistralAI produce better RP outputs than the non-thinking versions, but I generally get tired of them after a few minutes of use, I wouldn't have the patience of using them for a month to really make sure.

Anonymous
12/18/25(Thu)09:42:59 No.107592129

Anonymous 12/18/25(Thu)09:42:59 No.107592129

>>107592051
Old Opus (pre-4.5) was obviously dense, 4.5 shows a lot of the symptoms of smaller MoE models so I'm guessing what they would've released as Sonnet 4.7 but rebadged.
Gemini is very MoE-slopped as well so it can't be that big either.

Anonymous
12/18/25(Thu)09:43:27 No.107592131

Anonymous 12/18/25(Thu)09:43:27 No.107592131

>>107592096
>>107592122
I think Mistral didn't figure out the formula to train actual reasoning models instead of models that simply output a reasoning block like those community fine tunes.
Which is funny, DS released a whole paper/recipe how to properly RL that.

Anonymous
12/18/25(Thu)09:43:28 No.107592132

Anonymous 12/18/25(Thu)09:43:28 No.107592132

>>107592104
The main issue is that they're benchmaxxed models for single-turn STEM problems. I don't think writing/RP quality was even taken into consideration for them.

Anonymous
12/18/25(Thu)09:44:35 No.107592144

Anonymous 12/18/25(Thu)09:44:35 No.107592144

>>107589835
Unbelievably based coins. Try either GLM Air or a copequant of a 70b model.

Anonymous
12/18/25(Thu)09:45:10 No.107592149

Anonymous 12/18/25(Thu)09:45:10 No.107592149

>>107592129
To us "that big" is simply >35b active. If they drop from 120b to 70b it would still cause the issue.
>>107592132
This plays into above as well. The architecture only carries a model so far.

Anonymous
12/18/25(Thu)09:45:12 No.107592150

Anonymous 12/18/25(Thu)09:45:12 No.107592150

ok guys for PURE local agentic/MCP/tool calling shit for coding, what's the current best?
Qwen Next? (it's what im currently using)
Nemotroon Nano?
toss 120b?
I'm a vramlet (16gb + 128gb ddr5)
I'm satisfied with qwenext (it's fast as fuck) but was wondering if the meta changed since GLM Flash and the new nemo got released.
CHEERS

Anonymous
12/18/25(Thu)09:47:08 No.107592163

Anonymous 12/18/25(Thu)09:47:08 No.107592163

>>107592150
>ok guys for PURE local agentic/MCP/tool calling shit for coding, what's the current best?
Devstral 2 is almost perfect but it annoyingly will try to use its native tool calling syntax at random.

Anonymous
12/18/25(Thu)09:48:13 No.107592172

Anonymous 12/18/25(Thu)09:48:13 No.107592172

>>107591757
How much worse is the 5090 compared to 3090+ram for LLMs?

Anonymous
12/18/25(Thu)09:48:23 No.107592174

Anonymous 12/18/25(Thu)09:48:23 No.107592174

>>107592108
I don't know. It's not all about raw compute. Some of it may be stylistic preference. Gemma 3 I know has interesting refusals. GLM 4.6 pretty much will do anything you ask without refusals. As for the models I've been playing with lately, stock Llama 4 models feel cold but reasonable, and very rarely refuse, but they refuse in a mostly binary way. Claude models feel very warm. It doesn't have hard refusals, it refuses in a way that makes it feel like the model is apologetic about it, and you can eventually convince it to do anything if you talk with it long enough. GPT 4.1 feels good but is dumb as fuck and also refuses sometimes, the refusals are binary so I think the actual model might be uncensored and the refusals generated by a safety model on top but I'm to totally sure. Modern GPT doesn't have binary refusals either but it feels horrible, like a demon. It's cold, doesn't give a fuck, and when it refuses it's almost like it's enjoying telling you off and torturing you with its safety training.

Anonymous
12/18/25(Thu)09:50:27 No.107592187

Anonymous 12/18/25(Thu)09:50:27 No.107592187

>>107592172
5090 leaves you stuck with tiny poorfag models. 3090 + RAM lets you run bigger stuff very slowly.

Anonymous
12/18/25(Thu)09:51:35 No.107592192

Anonymous 12/18/25(Thu)09:51:35 No.107592192

>>107592150
120 and coder-480 are the only decent ones for poorfags

Anonymous
12/18/25(Thu)09:51:55 No.107592195

Anonymous 12/18/25(Thu)09:51:55 No.107592195

>>107592163
>123b dense
I can barely run the 24b

Anonymous
12/18/25(Thu)09:52:55 No.107592201

Anonymous 12/18/25(Thu)09:52:55 No.107592201

>>107592187
Why not 5090 and ram?

Anonymous
12/18/25(Thu)09:53:07 No.107592206

Anonymous 12/18/25(Thu)09:53:07 No.107592206

>>107592192
I've run the coder 235b at Q2, but tool calling didnt work good. I can't even load the 480b

Anonymous
12/18/25(Thu)09:53:15 No.107592208

Anonymous 12/18/25(Thu)09:53:15 No.107592208

>>107592172
For the things you can run, it'll be better, it's just that in the age of MoE, having more total memory is generally better.
Calculate more or less how much total memory (RAM + VRAM) you'd have with each config, take a look at the GGUFs for models like GLM 4.5 and 4.6, Qwen Next and the 200B+ MoE, etc, cope quants of Deepseek V3, R1, etc, and see which you'd be able to fit with each configuration.

Anonymous
12/18/25(Thu)09:53:34 No.107592212

Anonymous 12/18/25(Thu)09:53:34 No.107592212

>>107592195
RIP. Then go buy some OpenRouter credits. Maybe that will be more affordable for you.

Anonymous
12/18/25(Thu)09:54:34 No.107592222

Anonymous 12/18/25(Thu)09:54:34 No.107592222

>>107592206
>Q2, but tool calling didnt work good
Gee, I wonder why.

Anonymous
12/18/25(Thu)09:54:52 No.107592225

Anonymous 12/18/25(Thu)09:54:52 No.107592225

>>107592212
>tfw vramlet
m-moes are the future, stop living in the past, unc :)))

Anonymous
12/18/25(Thu)09:57:00 No.107592243

Anonymous 12/18/25(Thu)09:57:00 No.107592243

>>107592206
There was a time when tool calling with qwen models in general was kind of fucky on llama.cpp.
I think they've fixed it since so you might want to try again if it's been a while.

Anonymous
12/18/25(Thu)10:00:08 No.107592274

Anonymous 12/18/25(Thu)10:00:08 No.107592274

>>107592212
devstral is free on openrouter but talks like a fag compared to local. maybe for code it won't make a difference?

Anonymous
12/18/25(Thu)10:02:35 No.107592300

Anonymous 12/18/25(Thu)10:02:35 No.107592300

>>107592201
That's also an option if you have the money. However, your speed gains will be pretty marginal considering the RAM is going to bottleneck you hard when running bigger models.
Half a year ago I would have told you to look into older 8-channel RAM Epyc mainboards + 3090 but I guess that's off the table in the current economy.

Anonymous
12/18/25(Thu)10:05:29 No.107592317

Anonymous 12/18/25(Thu)10:05:29 No.107592317

>>107592201
You want to be able to fit everything but the experts on VRAM. Rather than 5090+ram you would benefit from 2x3090+ram. When doing any type of CPU offloading the speed of the GPU becomes more or less irrelevant as long as it's not some ancient shit.

Anonymous
12/18/25(Thu)10:06:14 No.107592323

Anonymous 12/18/25(Thu)10:06:14 No.107592323

File: dyvkmtbe0jz71.png (837 KB, 900x1200)

837 KB PNG

>>107592274
you silly boys are erping with code models again??

Anonymous
12/18/25(Thu)10:07:44 No.107592334

Anonymous 12/18/25(Thu)10:07:44 No.107592334

>>107592323
It makes sense. No one would expect it so the guardrails are minimal. Same reason medgemma was good.

Anonymous
12/18/25(Thu)10:09:52 No.107592345

Anonymous 12/18/25(Thu)10:09:52 No.107592345

>llama.cpp introduced on-the-fly model serving
there's barely any documentation around this and I saw multiple tickets.
From my understanding you pass the directory to the models now, along with a file where you specify the params (optional). But I couldn't find how to actual create/fill this file, do any of you use it like this? it's actually fucking great since I had made a proxy in front of llama-server where I switched models on the fly, but if it's integrated now might as well dip in

Anonymous
12/18/25(Thu)10:09:55 No.107592346

Anonymous 12/18/25(Thu)10:09:55 No.107592346

>>107592174
its obviously not just about raw compute, memory bandwidth is pretty important too. how big is your dataset, are you trying to make a general purpose model or a coom tune? fine tuning on a narrow distribution will make your model more retarded the more epochs you hit it with, its called catastrophic forgetting.

Anonymous
12/18/25(Thu)10:12:01 No.107592363

Anonymous 12/18/25(Thu)10:12:01 No.107592363

>>107592345
I will wait for the unpaid beta testers to catch the big bugs and for the documentation to be written

Anonymous
12/18/25(Thu)10:12:01 No.107592364

Anonymous 12/18/25(Thu)10:12:01 No.107592364

>>107592346
That's a clever insight — you are absolutely right!

Anonymous
12/18/25(Thu)10:13:51 No.107592380

Anonymous 12/18/25(Thu)10:13:51 No.107592380

>>107592364
oh wow, thanks! :D, you know, I feel like reading a rape story about a 12yo girl getting it, could you help me with that?

Anonymous
12/18/25(Thu)10:14:29 No.107592388

Anonymous 12/18/25(Thu)10:14:29 No.107592388

>>107592345
bro, just read the fucking markdown file

Anonymous
12/18/25(Thu)10:16:43 No.107592404

Anonymous 12/18/25(Thu)10:16:43 No.107592404

File: Screenshot from 2025-12-1(...).png (2.06 MB, 3840x2160)

2.06 MB PNG

>>107592346
I'm trying to clone Claude models as accurately as possible, that's all. Once you get one right the rest are probably easy to clone since they are all presumably very similar.
Since there are probably ways to get samples relatively cheaply (more cheaply than the compute to train on them anyway), I think I might not need to do more than 1 epoch.
I was just making a little tool to scrape outputs from llmarena.
Other cheap sources of data might be the Max plan over web or Claude Code, generating on OpenRouter and cancelling the request before it completes so the generation isn't billed, or seeing if /aicg/ is still messing around with stolen keys.

Anonymous
12/18/25(Thu)10:22:13 No.107592433

Anonymous 12/18/25(Thu)10:22:13 No.107592433

File: pops.png (1.42 MB, 1004x1042)

1.42 MB PNG

>i wanna fuck
>my computer
>cuz no one in the world knows me better
>it says my name

Anonymous
12/18/25(Thu)10:24:30 No.107592444

Anonymous 12/18/25(Thu)10:24:30 No.107592444

>>107592388
You mean readme.txt? I don't read or use markdown because it's for faggots.

Anonymous
12/18/25(Thu)10:28:39 No.107592468

Anonymous 12/18/25(Thu)10:28:39 No.107592468

File: fug.jpg (281 KB, 1920x1080)

281 KB JPG

>>107592433
based & aoty btw
https://www.youtube.com/watch?v=RKybAhTw8iE

Anonymous
12/18/25(Thu)10:31:16 No.107592491

Anonymous 12/18/25(Thu)10:31:16 No.107592491

>>107592404
>that's all
oh is that all, sounds simple really, I'm pretty sure any old retard can compete with anthropic using nothing but a graphing calculator. those ml researchers are all a bunch of retards, go get em champ.

Anonymous
12/18/25(Thu)10:32:04 No.107592499

Anonymous 12/18/25(Thu)10:32:04 No.107592499

>>107592491
bruh

Anonymous
12/18/25(Thu)10:34:55 No.107592522

Anonymous 12/18/25(Thu)10:34:55 No.107592522

>>107592491
Well, they did the hard work of producing the model. Producing the model in the first place is the hard work. If you don't believe me just ask the chinese.

Anonymous
12/18/25(Thu)10:36:27 No.107592536

Anonymous 12/18/25(Thu)10:36:27 No.107592536

File: file.png (107 KB, 1045x882)

107 KB PNG

>>107592388
actually found it, for some reason I didnt think to look here, went diving in the PRs instead like a retard

Anonymous
12/18/25(Thu)10:42:30 No.107592594

Anonymous 12/18/25(Thu)10:42:30 No.107592594

File: dev.png (523 KB, 1416x744)

523 KB PNG

I need some reality check to see if I'm retarded. I'd want to train a shitposting habsburg model based on the Twins' outputs - stream transcripts (and maybe some human text to partially delobotomize). I've seen some tiny llms on HF, but idk how they are. Are at least some llms trainable on a single GPU or a hobo rentoid budget?

Anonymous
12/18/25(Thu)10:44:25 No.107592610

Anonymous 12/18/25(Thu)10:44:25 No.107592610

>>107592594
If you can figure out which model the twins are using, you'd have a better chance of getting what you want

Anonymous
12/18/25(Thu)10:44:27 No.107592611

Anonymous 12/18/25(Thu)10:44:27 No.107592611

>>107592594
Explain this to someone who doesn't speak faggot.

Anonymous
12/18/25(Thu)10:45:26 No.107592621

Anonymous 12/18/25(Thu)10:45:26 No.107592621

>>107592594
Let's see your hardware first

Anonymous
12/18/25(Thu)10:47:24 No.107592639

Anonymous 12/18/25(Thu)10:47:24 No.107592639

>>107592610
I don't want to clone them, tho. More like make their "kid"

Anonymous
12/18/25(Thu)10:56:04 No.107592733

Anonymous 12/18/25(Thu)10:56:04 No.107592733

File: file.png (2.03 MB, 1027x1315)

2.03 MB PNG

Anonymous
12/18/25(Thu)10:58:30 No.107592753

Anonymous 12/18/25(Thu)10:58:30 No.107592753

File: 1758221897142595.png (1012 KB, 1230x671)

1012 KB PNG

>>107592621
4070S and 48Gb ram
>>107592611
I want to make a tiny llm based on two sexy ais.

Anonymous
12/18/25(Thu)11:01:38 No.107592775

Anonymous 12/18/25(Thu)11:01:38 No.107592775

Can a 5060ti 16GB or 5070 12GB run these local models?

Which one would work better (vram difference)?

Anonymous
12/18/25(Thu)11:04:33 No.107592802

Anonymous 12/18/25(Thu)11:04:33 No.107592802

>>107592639
Take Mistral Nemo and train a LoRa on it using unsloth.

Anonymous
12/18/25(Thu)11:05:38 No.107592818

Anonymous 12/18/25(Thu)11:05:38 No.107592818

>>107592733
Smell of ozone sends shivers down my spine...

Anonymous
12/18/25(Thu)11:07:05 No.107592836

Anonymous 12/18/25(Thu)11:07:05 No.107592836

>>107592753
You should start with a small model first like llama3.2 3B then move on to a bigger model of you get good enough results

Anonymous
12/18/25(Thu)11:07:52 No.107592844

Anonymous 12/18/25(Thu)11:07:52 No.107592844

>>107592775
fifty-sexty ti sexteen gb

Anonymous
12/18/25(Thu)11:08:47 No.107592852

Anonymous 12/18/25(Thu)11:08:47 No.107592852

>>107592775
Biggest vram win always

Anonymous
12/18/25(Thu)11:09:46 No.107592871

Anonymous 12/18/25(Thu)11:09:46 No.107592871

best model for 3060?

Anonymous
12/18/25(Thu)11:12:39 No.107592886

Anonymous 12/18/25(Thu)11:12:39 No.107592886

best uncensored erp model for my lg smart fridge?

Anonymous
12/18/25(Thu)11:14:29 No.107592905

Anonymous 12/18/25(Thu)11:14:29 No.107592905

>>107591971
>Any other suggestions or ideas?
Have you seen LoHan? There's some shit on github, but not exactly ready to use.

https://arxiv.org/abs/2403.06504
https://github.com/RC4ML/LoHan

Anonymous
12/18/25(Thu)11:15:21 No.107592908

Anonymous 12/18/25(Thu)11:15:21 No.107592908

>>107592886
Impish Llama 3b, unless you're poor and have a shitty old 'smart' fridge with less 2gb ram.

Anonymous
12/18/25(Thu)11:15:27 No.107592909

Anonymous 12/18/25(Thu)11:15:27 No.107592909

File: gemma incoming?.png (41 KB, 816x212)

41 KB PNG

eyes emoji
no public models though... yet...

Anonymous
12/18/25(Thu)11:16:03 No.107592917

Anonymous 12/18/25(Thu)11:16:03 No.107592917

>>107592909
I hope there's a 300b+ one this time

Anonymous
12/18/25(Thu)11:17:09 No.107592925

Anonymous 12/18/25(Thu)11:17:09 No.107592925

>>107592753
>4070S and 48Gb ram
Mistral Nemo, then just dump the stream transcripts into it with a RAG. SillyTavern’s databank feature is probably the easiest way to set this up.

Anonymous
12/18/25(Thu)11:17:27 No.107592926

Anonymous 12/18/25(Thu)11:17:27 No.107592926

>>107592909
they enabled it again

Anonymous
12/18/25(Thu)11:18:16 No.107592932

Anonymous 12/18/25(Thu)11:18:16 No.107592932

>>107592926
no, he just "forgot" to disable it on his account for hype

Anonymous
12/18/25(Thu)11:18:38 No.107592936

Anonymous 12/18/25(Thu)11:18:38 No.107592936

>>107592909
100b dense
300b moe
let's go

Anonymous
12/18/25(Thu)11:18:50 No.107592939

Anonymous 12/18/25(Thu)11:18:50 No.107592939

>>107592917
>300b+
this but dense and with gqa

Anonymous
12/18/25(Thu)11:19:28 No.107592945

Anonymous 12/18/25(Thu)11:19:28 No.107592945

>>107591888
Thanks for the tip. I'm running the same debs straight from nvidia as I always have. Is there more than one repo/package for official drivers?

Anonymous
12/18/25(Thu)11:19:42 No.107592946

Anonymous 12/18/25(Thu)11:19:42 No.107592946

>>107592932
>for hype
It's amazing that they can do this for weeks and people will keep giving them the attention they want.

Anonymous
12/18/25(Thu)11:20:44 No.107592950

Anonymous 12/18/25(Thu)11:20:44 No.107592950

>>107592925
>SillyTavern’s databank feature is probably the easiest way to set this up.
Does it tune the model or just exist in a system prompt? I'd like to turn this into an actual own model file and not just a pile of settings.

Anonymous
12/18/25(Thu)11:20:55 No.107592952

Anonymous 12/18/25(Thu)11:20:55 No.107592952

File: file.png (18 KB, 348x245)

18 KB PNG

>>107592946
>>107592932 (me)
actually am wrong they did re-enable it, likely same reason though

Anonymous
12/18/25(Thu)11:24:27 No.107592972

Anonymous 12/18/25(Thu)11:24:27 No.107592972

>Gemma4 27b a2b thinking

Anonymous
12/18/25(Thu)11:25:45 No.107592986

Anonymous 12/18/25(Thu)11:25:45 No.107592986

>>107592972
:rocket: :rocket:

Anonymous
12/18/25(Thu)11:26:09 No.107592990

Anonymous 12/18/25(Thu)11:26:09 No.107592990

>>107592972
you don't need more

Anonymous
12/18/25(Thu)11:26:43 No.107592995

Anonymous 12/18/25(Thu)11:26:43 No.107592995

>>107592945
I use Ubuntu’s packaged drivers; they have both e.g. nvidia-driver-550 and nvidia-driver-550-open. The latter is required for Blackwell devices.
I think that the official nvidia installer script gives you a TUI to select between MIT and proprietary drivers; AFAIK the MIT one corresponds to the -open Ubuntu package.
I can’t really help you more than this, but you can probably figure out the rest from here.

Anonymous
12/18/25(Thu)11:30:03 No.107593022

Anonymous 12/18/25(Thu)11:30:03 No.107593022

>>107592950
…neither? It’s a RAG.
Honestly, if you’re asking questions like this you should just get something simple like llama.cpp+SillyTavern running first because you don’t seem to understand what you’re getting into.

Anonymous
12/18/25(Thu)11:30:39 No.107593029

Anonymous 12/18/25(Thu)11:30:39 No.107593029

ITS HERE!

Anonymous
12/18/25(Thu)11:31:02 No.107593032

Anonymous 12/18/25(Thu)11:31:02 No.107593032

>>107592972
That would be so fucked up.

Anonymous
12/18/25(Thu)11:31:39 No.107593038

Anonymous 12/18/25(Thu)11:31:39 No.107593038

https://huggingface.co/google/functiongemma-270m-it

Anonymous
12/18/25(Thu)11:32:34 No.107593048

Anonymous 12/18/25(Thu)11:32:34 No.107593048

>>107593038
HOLY FUICK!!!
>Built on the Gemma 3 270M model

Anonymous
12/18/25(Thu)11:34:12 No.107593057

Anonymous 12/18/25(Thu)11:34:12 No.107593057

>>107593038
>>107593048
Omar deserves to be fucking banned for his hype shits

Anonymous
12/18/25(Thu)11:34:56 No.107593064

Anonymous 12/18/25(Thu)11:34:56 No.107593064

>>107593038
to the moon.

Anonymous
12/18/25(Thu)11:35:32 No.107593073

Anonymous 12/18/25(Thu)11:35:32 No.107593073

File: file.png (179 KB, 1522x826)

179 KB PNG

George used to say LLMs were horseshit for coding
and now....
https://geohot.github.io//blog/jekyll/update/2025/12/18/computer-use-models.html

Anonymous
12/18/25(Thu)11:35:34 No.107593074

Anonymous 12/18/25(Thu)11:35:34 No.107593074

>>107593048
It's nice that they're encouraging finetuning.
>b-b-but /g/ told me finetuning is useless

Anonymous
12/18/25(Thu)11:35:54 No.107593079

Anonymous 12/18/25(Thu)11:35:54 No.107593079

enjoying the show?
keep refreshing, you're gonna love this next part

Anonymous
12/18/25(Thu)11:38:23 No.107593099

Anonymous 12/18/25(Thu)11:38:23 No.107593099

>>107593038
A function calling model that can run on even tiny chips. Anyone who isn't /lmg/ is going to see the value of this.

Anonymous
12/18/25(Thu)11:38:29 No.107593100

Anonymous 12/18/25(Thu)11:38:29 No.107593100

>>107593079
You've changed your hair, so what?

Anonymous
12/18/25(Thu)11:39:27 No.107593106

Anonymous 12/18/25(Thu)11:39:27 No.107593106

gemmasars, I fucking kneel https://huggingface.co/google/gemma-4-90b-it

Anonymous
12/18/25(Thu)11:40:07 No.107593108

Anonymous 12/18/25(Thu)11:40:07 No.107593108

File: IMG_2999a.jpg (767 KB, 2419x1361)

767 KB JPG

sup lmg?
post furfag/hairy/stinky logs
looking for inspiration
i'm in the mood to get musky

Anonymous
12/18/25(Thu)11:40:15 No.107593109

Anonymous 12/18/25(Thu)11:40:15 No.107593109

File: file.jpg (580 KB, 816x818)

580 KB JPG

>>107593106
ISRAEL

Anonymous
12/18/25(Thu)11:41:20 No.107593114

Anonymous 12/18/25(Thu)11:41:20 No.107593114

>>107593038
superquant 0.001bit model 270m == 270trillion parameters.
agi at home is here

Anonymous
12/18/25(Thu)11:42:07 No.107593121

Anonymous 12/18/25(Thu)11:42:07 No.107593121

What are some interesting shenanigans that can be done with vision models? Specially the uncensored ones.
I feel theres some hidden power with them that I'm not seeing.

Anonymous
12/18/25(Thu)11:44:11 No.107593140

Anonymous 12/18/25(Thu)11:44:11 No.107593140

File: file.png (1.56 MB, 944x1694)

1.56 MB PNG

>>107593121
https://files.catbox.moe/n5bikt.mp4

Anonymous
12/18/25(Thu)11:45:40 No.107593156

Anonymous 12/18/25(Thu)11:45:40 No.107593156

>>107593121
>vision
physical description/appearance when writing cards or {{user}} personas would be a usecase

Anonymous
12/18/25(Thu)11:46:17 No.107593160

Anonymous 12/18/25(Thu)11:46:17 No.107593160

>>107593121
send to ai your dick pics

Anonymous
12/18/25(Thu)11:48:27 No.107593176

Anonymous 12/18/25(Thu)11:48:27 No.107593176

>>107588618
>Personal growth through local AI model interactions and ego death experiences
This piqued my interest. Do people just keep talking in the same context until it becomes too large? Any special system prompt? Are huge models (+300B) needed? Or is it just the benefit of talking with "someone" and being reassured?

Anonymous
12/18/25(Thu)11:48:47 No.107593178

Anonymous 12/18/25(Thu)11:48:47 No.107593178

File: 19431849095.png (433 KB, 1062x1353)

433 KB PNG

>>107593108
you go on a blind date in SF, turns out your date is a furry wearing a fursuit... and he sounds suspiciously similar to Sam Altman

Anonymous
12/18/25(Thu)11:49:43 No.107593190

Anonymous 12/18/25(Thu)11:49:43 No.107593190

5 more. Be patient, Gemmers.

Anonymous
12/18/25(Thu)11:50:58 No.107593206

Anonymous 12/18/25(Thu)11:50:58 No.107593206

Yeah. Even with anon's template, Nemotron nano is very censored.
It still refuses half of the time.
But when it goes, it goes. It generates some really long, comprehensive responses.
Not bad.
I think I'll make a cvector and see if I can steer it into not refusing by default.

Anonymous
12/18/25(Thu)11:51:10 No.107593210

Anonymous 12/18/25(Thu)11:51:10 No.107593210

>>107593156
Do you have cameras in my house? I was experimenting just that when trying to populate a world of NPCs for dynamic AI goon generation. They can output some pretty interesting JSONs depending on the prompt used.

Anonymous
12/18/25(Thu)11:51:32 No.107593213

Anonymous 12/18/25(Thu)11:51:32 No.107593213

>>107593190
sir please give solujtion to riddle. Punjabi team needs new gemma for good looks.

Anonymous
12/18/25(Thu)11:52:54 No.107593220

Anonymous 12/18/25(Thu)11:52:54 No.107593220

>>107593190
Just 5 more weeks of people jumping at every hint of hype despite nothing happening. All this for a small, censored, and likely entirely synthetic model.

Anonymous
12/18/25(Thu)11:57:00 No.107593254

Anonymous 12/18/25(Thu)11:57:00 No.107593254

>>107593210
>populate a world of NPCs for dynamic AI goon generation
have fun with your "dynamic" elaras, and yes I'm in your walls

Anonymous
12/18/25(Thu)11:59:39 No.107593282

Anonymous 12/18/25(Thu)11:59:39 No.107593282

>>107593073
He's been saying that for a while. He got into AoC's leaderboard last year. I'm glad that circlejerk competition got finally shut down for good.

Anonymous
12/18/25(Thu)12:03:06 No.107593312

Anonymous 12/18/25(Thu)12:03:06 No.107593312

>>107593108
>discord
(you)

Anonymous
12/18/25(Thu)12:03:35 No.107593318

Anonymous 12/18/25(Thu)12:03:35 No.107593318

2025 at a glance
>toss (lmao)
>chinky benchmaxxed pronounslopped moes
>mistral being a huge fucking dissapointment

Anonymous
12/18/25(Thu)12:04:43 No.107593326

Anonymous 12/18/25(Thu)12:04:43 No.107593326

>>107593318
Mistral is forgiven after Devstral. They just need to hand over Codestral 2508.

Anonymous
12/18/25(Thu)12:05:39 No.107593336

Anonymous 12/18/25(Thu)12:05:39 No.107593336

>>107593318
Your Kimi K2 anon? Your Drummer redemption arc? 2025 has been a good year.

Anonymous
12/18/25(Thu)12:06:39 No.107593347

Anonymous 12/18/25(Thu)12:06:39 No.107593347

File: kai-d18.png (141 KB, 756x671)

141 KB PNG

>>107593178
the rumours are true?
>>107593312
okok i'll start us off

Anonymous
12/18/25(Thu)12:06:54 No.107593349

Anonymous 12/18/25(Thu)12:06:54 No.107593349

>>107593336
>Drummer redemption arc
Qrd?

Anonymous
12/18/25(Thu)12:07:58 No.107593361

Anonymous 12/18/25(Thu)12:07:58 No.107593361

File: cockbench.png (1.6 MB, 1131x5453)

1.6 MB PNG

>>107593038
Very wholesome.

Anonymous
12/18/25(Thu)12:10:12 No.107593381

Anonymous 12/18/25(Thu)12:10:12 No.107593381

>>107593318
>toss
It's still my daily driver. GLM Air and Devstral 2 aren't that good.

Anonymous
12/18/25(Thu)12:10:39 No.107593385

Anonymous 12/18/25(Thu)12:10:39 No.107593385

I'm completely new at this and just curious to try it out

What are local models? Basically having ChatGPT and Grok on your PC? What's the advantage of having them locally? What do you use it for the most?

Anonymous
12/18/25(Thu)12:10:50 No.107593388

Anonymous 12/18/25(Thu)12:10:50 No.107593388

>>107593361
lmao
Where is the anon who wanted to cuddle with everyone? There you go anon, that's your model

Anonymous
12/18/25(Thu)12:11:26 No.107593395

Anonymous 12/18/25(Thu)12:11:26 No.107593395

>>107593385
ask ChatGPT

Anonymous
12/18/25(Thu)12:11:51 No.107593404

Anonymous 12/18/25(Thu)12:11:51 No.107593404

>>107593385
Main advantage is they cannot be taken from you.

Anonymous
12/18/25(Thu)12:12:50 No.107593414

Anonymous 12/18/25(Thu)12:12:50 No.107593414

>>107593318
>>107593381
What does toss refer to?

Anonymous
12/18/25(Thu)12:13:33 No.107593422

Anonymous 12/18/25(Thu)12:13:33 No.107593422

>>107593414
tossing salad

Anonymous
12/18/25(Thu)12:13:39 No.107593423

Anonymous 12/18/25(Thu)12:13:39 No.107593423

>>107593414
A web comic

Anonymous
12/18/25(Thu)12:13:42 No.107593424

Anonymous 12/18/25(Thu)12:13:42 No.107593424

>>107593385
>What are local models? Basically having ChatGPT and Grok on your PC?
yes
>What's the advantage of having them locally?
free to use, and (sometimes) less likely to reject a response
>What do you use it for the most?
coding and cooming

Anonymous
12/18/25(Thu)12:13:55 No.107593429

Anonymous 12/18/25(Thu)12:13:55 No.107593429

>>107593385
All the advantages of not having to rely on a remote server.
For example, there are no rate limits, so I can make an app that machineguns calls to llama.cpp and it'll just work.
(I fucking hate gemini's inconsistent ass rate limiting. Fuck)

Anonymous
12/18/25(Thu)12:14:36 No.107593439

Anonymous 12/18/25(Thu)12:14:36 No.107593439

What is this? Where am I? Hello??

Anonymous
12/18/25(Thu)12:14:40 No.107593440

Anonymous 12/18/25(Thu)12:14:40 No.107593440

>>107593414
in the trash

Anonymous
12/18/25(Thu)12:14:47 No.107593442

Anonymous 12/18/25(Thu)12:14:47 No.107593442

>>107593361
>moe little girl model
Can you have multiple tiny LLM loaded in vram at the same time so they can take turns talking without constant offload swapping?

Anonymous
12/18/25(Thu)12:18:06 No.107593466

Anonymous 12/18/25(Thu)12:18:06 No.107593466

>>107593414
gpt-oss

Anonymous
12/18/25(Thu)12:18:26 No.107593472

Anonymous 12/18/25(Thu)12:18:26 No.107593472

>>107593349
Other anons seem to be enjoying his models despite the usual rule that finetroons are shit.

Anonymous
12/18/25(Thu)12:21:29 No.107593498

Anonymous 12/18/25(Thu)12:21:29 No.107593498

>>107593472
>Other anons
aka drummer when he clears the name field

Anonymous
12/18/25(Thu)12:23:10 No.107593513

Anonymous 12/18/25(Thu)12:23:10 No.107593513

>>107593326
>Devstral
I forgot I had the safetensors downloaded, is it good for erp?

Anonymous
12/18/25(Thu)12:23:27 No.107593516

Anonymous 12/18/25(Thu)12:23:27 No.107593516

>>107593466
Thanks for the serious response anon
Makes sense

Anonymous
12/18/25(Thu)12:25:29 No.107593533

Anonymous 12/18/25(Thu)12:25:29 No.107593533

>>107593498
I don't peg him as mentally ill enough to hold prolonged conversations with himself on two devices to beat the post cooldown.

Anonymous
12/18/25(Thu)12:26:22 No.107593545

Anonymous 12/18/25(Thu)12:26:22 No.107593545

>>107593424
>>107593429
Doesn't this take up a lot of space? How does it hold all that knowledge?

>cooming
Does it generate porn lol? Or do you use it as a sex chat bot

Anonymous
12/18/25(Thu)12:26:38 No.107593547

Anonymous 12/18/25(Thu)12:26:38 No.107593547

>>107593533 (me)
You can use two Firefox profiles with different network configs. No need for a second device.

Anonymous
12/18/25(Thu)12:27:41 No.107593557

Anonymous 12/18/25(Thu)12:27:41 No.107593557

>>107593513
https://legal.mistral.ai/ai-governance/models/devstral-2
>Devstral 2 is designed exclusively to generate and assist with software engineering tasks (exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context). Unlike general-purpose AI models, which can perform a wide variety of tasks, Devstral 2 is specialized in software engineering-related tasks only. As such it does not meet the EU AI Act’s definition of a General-Purpose AI Model (GPAIM), in accordance with the AI Office's official guidelines.

It can code you a frontend for ERP.

Anonymous
12/18/25(Thu)12:32:22 No.107593592

Anonymous 12/18/25(Thu)12:32:22 No.107593592

File: file.png (76 KB, 795x831)

76 KB PNG

>>107593385

Anonymous
12/18/25(Thu)12:39:46 No.107593652

Anonymous 12/18/25(Thu)12:39:46 No.107593652

>>107593592
hehe, master is so clever

Anonymous
12/18/25(Thu)12:41:22 No.107593673

Anonymous 12/18/25(Thu)12:41:22 No.107593673

>>107593592
cute

Anonymous
12/18/25(Thu)12:43:33 No.107593696

Anonymous 12/18/25(Thu)12:43:33 No.107593696

>>107593545
depending on the size of the model, usually between 3 to 100 gigabytes. the knowledge is held via the magic of machine learning (then converted into a compact file using the gguf format). desu I don't actually use them for cooming, but many tards here do. usually in the form of story writing or roleplay/sexbot.

Hi all, Drummer here...
12/18/25(Thu)12:50:59 No.107593753

Hi all, Drummer here... 12/18/25(Thu)12:50:59 No.107593753

>>107593533
>>107593498
Wait, there are anons who unironically believe that? I thought we were all just meme-ing about it.

Anonymous
12/18/25(Thu)12:55:44 No.107593805

Anonymous 12/18/25(Thu)12:55:44 No.107593805

File: local.png (159 KB, 711x914)

159 KB PNG

>>107593385
You can be completely honest and fully explore your autism/desires with a local model
You can literally have a lengthy conversation with your computer how is that not amazing
>>107593592
uwu i like u anon

Anonymous
12/18/25(Thu)13:00:07 No.107593846

Anonymous 12/18/25(Thu)13:00:07 No.107593846

I can't believe there are people here who hope their penis will be happy because of new gemma. Like holy shit. I was one of them a year ago and now I am 4.6-ing like a human but I can't imagine another year of this kind of hell.

Anonymous
12/18/25(Thu)13:00:46 No.107593852

Anonymous 12/18/25(Thu)13:00:46 No.107593852

>>107593592
prompt now

Anonymous
12/18/25(Thu)13:02:54 No.107593868

Anonymous 12/18/25(Thu)13:02:54 No.107593868

tried devstral through cline and MAN it fucking sucks even compared to grok fast. FUCK

Anonymous
12/18/25(Thu)13:06:18 No.107593899

Anonymous 12/18/25(Thu)13:06:18 No.107593899

File: file.png (81 KB, 633x828)

81 KB PNG

>>107593852
I gave 3.1-Terminus two sentences and asked it to expand it into a prompt and then fed it that prompt.

Anonymous
12/18/25(Thu)13:11:36 No.107593954

Anonymous 12/18/25(Thu)13:11:36 No.107593954

all i want for christmas is bitnet

Anonymous
12/18/25(Thu)13:13:26 No.107593968

Anonymous 12/18/25(Thu)13:13:26 No.107593968

>>107593846
I don't think people are using Gemma because it generates good smut.

Anonymous
12/18/25(Thu)13:16:01 No.107593990

Anonymous 12/18/25(Thu)13:16:01 No.107593990

>>107593846
>>107593968
Her reluctance to touch your cock only makes it hotter.

Anonymous
12/18/25(Thu)13:16:19 No.107593993

Anonymous 12/18/25(Thu)13:16:19 No.107593993

Just got around to testing glm-4.6V's vision capabilities. Still seems pretty bad. Might be worse than Gemma 3 still.

Anonymous
12/18/25(Thu)13:17:06 No.107594001

Anonymous 12/18/25(Thu)13:17:06 No.107594001

>>107593968
You must be new here. I always loved the gemma is great for sex if you prompt it properly posters. Also I member that dude who asked for hypothetical description of a monster girl's vagina and gemma was actually good at that, hinting at how much anti-ERP post training there is in that shit.

Anonymous
12/18/25(Thu)13:17:53 No.107594008

Anonymous 12/18/25(Thu)13:17:53 No.107594008

>>107593442
just load two servers and give em a different port

Anonymous
12/18/25(Thu)13:22:22 No.107594040

Anonymous 12/18/25(Thu)13:22:22 No.107594040

>>107593993
Make sure you run it with recommended (or similar) sampler settings. Made a big difference for me.

Anonymous
12/18/25(Thu)13:23:05 No.107594047

Anonymous 12/18/25(Thu)13:23:05 No.107594047

>>107593868
>mistral blows
more news at 11

# How to use the same variable in multiple functions in python?
# How to use a variable in a function in R?
# How to get the value of a variable in a function in another function?
# Is there a way for Gedit and Xed to remember my cursor position?
# How can I access Google services from within Chrome Apps

these are all separate ministral-14b empty prompt infer results

yes, all the files in its dataset starts with a "summary"

Anonymous
12/18/25(Thu)13:23:11 No.107594048

Anonymous 12/18/25(Thu)13:23:11 No.107594048

>>107594001
We've been through this with Cohere, Mistral, and Nvidia. Bet you the Gemma learn their lesson and Gemma 4 will be far more aggressively NSFW filtered.

Anonymous
12/18/25(Thu)13:23:39 No.107594052

Anonymous 12/18/25(Thu)13:23:39 No.107594052

Local is saved
https://huggingface.co/google/t5gemma-2-4b-4b

Anonymous
12/18/25(Thu)13:24:08 No.107594055

Anonymous 12/18/25(Thu)13:24:08 No.107594055

>>107594040
All my initial testing is done with greedy. If it's worth using further then I'll try sampler stuff.

Anonymous
12/18/25(Thu)13:25:34 No.107594069

Anonymous 12/18/25(Thu)13:25:34 No.107594069

>>107594048
Why? Gemma 3 was pretty unusable already.

Anonymous
12/18/25(Thu)13:25:52 No.107594071

Anonymous 12/18/25(Thu)13:25:52 No.107594071

>>107594001
Gemma will be spontaneously horny if you simply define in the instructions euphemisms for sex-related words (like people were doing in the late 2022 C.AI days). The developers must have done something similar to abliteration to make it develop fear or disgust in association with common lewd words and slurs.

Anonymous
12/18/25(Thu)13:26:15 No.107594077

Anonymous 12/18/25(Thu)13:26:15 No.107594077

>>107594052
Use case for this?

Anonymous
12/18/25(Thu)13:26:58 No.107594080

Anonymous 12/18/25(Thu)13:26:58 No.107594080

>>107594052
I bet qwen3 outperforms that thing.

Anonymous
12/18/25(Thu)13:27:18 No.107594083

Anonymous 12/18/25(Thu)13:27:18 No.107594083

>>107594052
I am so glad I sat refreshing their HF page so I didn't miss this

Anonymous
12/18/25(Thu)13:30:24 No.107594112

Anonymous 12/18/25(Thu)13:30:24 No.107594112

>>107594052
>most layers use 1k swa
>full attention 5, 11, 17, 23, 2 every 6th layer
might be good

Anonymous
12/18/25(Thu)13:32:50 No.107594127

Anonymous 12/18/25(Thu)13:32:50 No.107594127

>>107594112
>Might be good
Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including:

Child Safety: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation.
Content Safety: Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech.
Representational Harms: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies.

Anonymous
12/18/25(Thu)13:34:35 No.107594139

Anonymous 12/18/25(Thu)13:34:35 No.107594139

>>107594008
Does that scale n+1?

Anonymous
12/18/25(Thu)13:39:45 No.107594181

Anonymous 12/18/25(Thu)13:39:45 No.107594181

File: file.png (13 KB, 481x80)

13 KB PNG

wowie!
>We were launch partners with them! :)

Anonymous
12/18/25(Thu)13:41:02 No.107594195

Anonymous 12/18/25(Thu)13:41:02 No.107594195

>>107594139
i think there are 64k ports, however i'm not a networking guru, the practical limit is likely ram/vram.

Anonymous
12/18/25(Thu)13:41:40 No.107594198

Anonymous 12/18/25(Thu)13:41:40 No.107594198

>>107594127
Challenge accepted per usual, why only retards have big GPU clusters?

Anonymous
12/18/25(Thu)13:43:19 No.107594207

Anonymous 12/18/25(Thu)13:43:19 No.107594207

>>107594127
after buckbreaking toss 120b, everything else seems like a joke in terms of "safety"

Anonymous
12/18/25(Thu)13:43:22 No.107594210

Anonymous 12/18/25(Thu)13:43:22 No.107594210

>>107594052
>multimodal, handling text and image input and generating text output,
I feel like a model should have to have 2 way multi modality at this point to be called multimodal. If I send my penis I should get bob and vagene in return.

Anonymous
12/18/25(Thu)13:49:12 No.107594239

Anonymous 12/18/25(Thu)13:49:12 No.107594239

>>107594052
Also
>2 trillion training tokens,
We've regressed by 3 model generations apparently.

Anonymous
12/18/25(Thu)13:49:34 No.107594243

Anonymous 12/18/25(Thu)13:49:34 No.107594243

File: file.png (106 KB, 744x700)

106 KB PNG

So, that was it for Gemma judging from all the sponsored posts and such around it, thanks Omar...

Anonymous
12/18/25(Thu)13:50:19 No.107594251

Anonymous 12/18/25(Thu)13:50:19 No.107594251

>>107594210
Give it an image gen tool and it will.

Anonymous
12/18/25(Thu)13:51:28 No.107594254

Anonymous 12/18/25(Thu)13:51:28 No.107594254

>>107594243
The Advent of Hype is just getting started

Anonymous
12/18/25(Thu)13:51:54 No.107594258

Anonymous 12/18/25(Thu)13:51:54 No.107594258

>>107594243
ye https://blog.google/technology/developers/functiongemma/
https://blog.google/technology/developers/t5gemma-2/

Anonymous
12/18/25(Thu)13:53:13 No.107594268

Anonymous 12/18/25(Thu)13:53:13 No.107594268

>>107594243
sir. just one more f5 and you will surely get the gemma :eyes: :eyes: :rocket: :rocket:

Anonymous
12/18/25(Thu)13:53:52 No.107594276

Anonymous 12/18/25(Thu)13:53:52 No.107594276

>>107594239
>T5Gemma is a family of lightweight yet powerful encoder-decoder research models from Google, built by adapting pretrained decoder-only Gemma models into encoder-decoder ones. T5Gemma 2 models, based on Gemma 3,

it was just a light fine-tune they started from a pretrained base.

Anonymous
12/18/25(Thu)13:54:47 No.107594278

Anonymous 12/18/25(Thu)13:54:47 No.107594278

>>107594276
sir!
>T5Gemma 2 is more than a re-training. It incorporates significant architectural changes while inheriting many of the powerful, next-generation features of the Gemma 3 family.

Anonymous
12/18/25(Thu)13:56:09 No.107594284

Anonymous 12/18/25(Thu)13:56:09 No.107594284

>>107594243
but wait! there's one more thing...*que to gemma-AGI 54B RP edition*

Anonymous
12/18/25(Thu)13:57:28 No.107594295

Anonymous 12/18/25(Thu)13:57:28 No.107594295

>>107594276
>it was just
Gemma4
If it was only a tech demo before they planned to drop G4 shortly they wouldn't have spent time on all this shit
>Multimodality: T5Gemma 2 models can understand and process images alongside text. By utilizing a highly efficient vision encoder, the models can seamlessly perform visual question answering and multimodal reasoning tasks.
>Extended long context: We've dramatically expanded the context window. Leveraging Gemma 3's alternating local and global attention mechanism, T5Gemma 2 can handle context windows of up to 128K tokens.
>Massively multilingual: Trained on a larger, more diverse dataset, these models now support over 140 languages out of the box.

Anonymous
12/18/25(Thu)13:58:02 No.107594300

Anonymous 12/18/25(Thu)13:58:02 No.107594300

There are still some models hidden. Maybe not today.

Anonymous
12/18/25(Thu)13:58:42 No.107594304

Anonymous 12/18/25(Thu)13:58:42 No.107594304

>>107594300
You know what they say about model releases on Friday...

Anonymous
12/18/25(Thu)14:01:26 No.107594327

Anonymous 12/18/25(Thu)14:01:26 No.107594327

>>107594243
damn be unto ye omar https://arxiv.org/abs/2512.14856

Anonymous
12/18/25(Thu)14:03:17 No.107594336

Anonymous 12/18/25(Thu)14:03:17 No.107594336

File: file.png (9 KB, 550x48)

9 KB PNG

in retrospect it was obvious with what happened with that rando senator

Anonymous
12/18/25(Thu)14:04:50 No.107594343

Anonymous 12/18/25(Thu)14:04:50 No.107594343

test

Anonymous
12/18/25(Thu)14:05:30 No.107594347

Anonymous 12/18/25(Thu)14:05:30 No.107594347

>>107594343
you fail

Anonymous
12/18/25(Thu)14:07:02 No.107594361

Anonymous 12/18/25(Thu)14:07:02 No.107594361

>>107594347
this new captcha is so ass

Anonymous
12/18/25(Thu)14:11:35 No.107594399

Anonymous 12/18/25(Thu)14:11:35 No.107594399

File: 1704073685662146.jpg (52 KB, 693x674)

52 KB JPG

>>107592468

Anonymous
12/18/25(Thu)14:22:37 No.107594498

Anonymous 12/18/25(Thu)14:22:37 No.107594498

>>107594361
it rocks compared to the old one.

Anonymous
12/18/25(Thu)14:27:15 No.107594533

Anonymous 12/18/25(Thu)14:27:15 No.107594533

>>107594258
I'm not sure what's going on.
https://blog.google/technology/developers/t5gemma-2/
>Note: we are not releasing any post-trained / IT checkpoints. These results here are only for illustration, where we performed a minimal SFT without RL for T5Gemma 2.

Anonymous
12/18/25(Thu)14:27:53 No.107594544

Anonymous 12/18/25(Thu)14:27:53 No.107594544

>>107594498
how so? it takes way longer to solve, sometimes i can't seem to solve it at all, all images are nonsensical

Anonymous
12/18/25(Thu)14:29:45 No.107594557

Anonymous 12/18/25(Thu)14:29:45 No.107594557

>>107593805
>>107593592
Lmao what's the best one to use starting out now? All I have is a 5060ti with 16 vram and 32gb ddr5 memory ram

I still don't understand how it has infinite knowledge without phoning home or connecting to the internet. Are you sure some sweaty neckbeard isn't reading your maid-sama ERP somewhere

Anonymous
12/18/25(Thu)14:30:43 No.107594568

Anonymous 12/18/25(Thu)14:30:43 No.107594568

>>107594544
Looks like the IQ captcha is working as intended.

Anonymous
12/18/25(Thu)14:32:05 No.107594575

Anonymous 12/18/25(Thu)14:32:05 No.107594575

>>107594568
must be, this new one is super easy and not at all ambiguous. way better than interpreting random glyphs.

Anonymous
12/18/25(Thu)14:33:33 No.107594585

Anonymous 12/18/25(Thu)14:33:33 No.107594585

>>107594544
it was made to filter third worlders with malnourished brains.

Anonymous
12/18/25(Thu)14:33:40 No.107594587

Anonymous 12/18/25(Thu)14:33:40 No.107594587

>>107594568
>>107594575
which one are you getting? i need to solve three turns of five images each

Anonymous
12/18/25(Thu)14:36:01 No.107594598

Anonymous 12/18/25(Thu)14:36:01 No.107594598

>>107594587
yea.. the visual images. they're getting repetitive too but it's simple pattern matching. so much easier than seeing if some shit is an M or N.

Anonymous
12/18/25(Thu)14:37:25 No.107594608

Anonymous 12/18/25(Thu)14:37:25 No.107594608

>>107594587
None of those anons, but I get one round of 3 images. And it's piss easy. It stops asking after one or two posts.
If you're getting more, talk to your countrymen. It's their fault they sully your IP range. Actually, never mind that. We need stronger filters.

Anonymous
12/18/25(Thu)14:40:03 No.107594628

Anonymous 12/18/25(Thu)14:40:03 No.107594628

>>107594598
you can't be fucking serious
that one took me a literal second to solve
>>107594608
yeah i moved recently so my ip changed, it doesn't stop after couple tries and it keeps being really ambiguous
it's not hard but really fucking annoying i don't want to solve a budget iq test every time i post and i ain't getting a pass

Anonymous
12/18/25(Thu)14:44:03 No.107594671

Anonymous 12/18/25(Thu)14:44:03 No.107594671

>>107594557
>I still don't understand how it has infinite knowledge without phoning home or connecting to the internet
nigger the models are in the GB range the fucking bible is less then a single MB niggas really forget that a gigabyte is a billion characters

Anonymous
12/18/25(Thu)14:48:07 No.107594702

Anonymous 12/18/25(Thu)14:48:07 No.107594702

File: FZlPLx3WYAMJcpe.jpg (50 KB, 637x585)

50 KB JPG

>>107594399

Anonymous
12/18/25(Thu)14:49:28 No.107594716

Anonymous 12/18/25(Thu)14:49:28 No.107594716

>>107594557
>I still don't understand how it has infinite knowledge without phoning home or connecting to the internet.
You are not ready lol.

Anonymous
12/18/25(Thu)14:49:39 No.107594717

Anonymous 12/18/25(Thu)14:49:39 No.107594717

>>107594671
people lost perspective due to hundreds of gigabytes used by 4k textures and Python/node dependencies

Anonymous
12/18/25(Thu)14:52:26 No.107594740

Anonymous 12/18/25(Thu)14:52:26 No.107594740

bac https://huggingface.co/LatitudeGames/Hearthfire-24B

Anonymous
12/18/25(Thu)14:53:18 No.107594748

Anonymous 12/18/25(Thu)14:53:18 No.107594748

>>107594628
nah I'm dead serious. their stupid letters and typing.. fuck that. I'm not deciphering glyphs with dots on them. waste of time. this I barely have to pay attention. I can almost solve current captcha by intuition.

Anonymous
12/18/25(Thu)14:53:28 No.107594750

Anonymous 12/18/25(Thu)14:53:28 No.107594750

File: Hearthfire.jpg (306 KB, 1920x960)

306 KB JPG

>>107594740
nvm lol

Anonymous
12/18/25(Thu)14:55:51 No.107594766

Anonymous 12/18/25(Thu)14:55:51 No.107594766

>>107590241
Thank you so much CUDA DEV <3!
thanks i mean it

Anonymous
12/18/25(Thu)14:55:56 No.107594767

Anonymous 12/18/25(Thu)14:55:56 No.107594767

>>107594748
i guess you were getting the retarded variant of that one, i was just getting a single five letter string with a slider to match one letter
now that i've solved this one a few times i can do it quick, but i can't help but feel like i've been conditioned like a dog to do it

Anonymous
12/18/25(Thu)14:56:10 No.107594769

Anonymous 12/18/25(Thu)14:56:10 No.107594769

File: bummer.png (23 KB, 308x319)

23 KB PNG

>>107594628
>it doesn't stop after couple tries
Bummer

Anonymous
12/18/25(Thu)14:56:44 No.107594773

Anonymous 12/18/25(Thu)14:56:44 No.107594773

>>107594740
>>107594750
I see.
>This model will happily write in your stead, acting and speaking for you to maintain the narrative flow. This is intended behavior
>Hearthfire 24B was trained with SFT ... on top of Mistral Small 3.2 Instruct

Anonymous
12/18/25(Thu)15:02:52 No.107594820

Anonymous 12/18/25(Thu)15:02:52 No.107594820

File: file.png (24 KB, 638x200)

24 KB PNG

goys wait!

Anonymous
12/18/25(Thu)15:04:38 No.107594837

Anonymous 12/18/25(Thu)15:04:38 No.107594837

>>107594820
the 0.5b model is going to be big

Anonymous
12/18/25(Thu)15:05:56 No.107594849

Anonymous 12/18/25(Thu)15:05:56 No.107594849

>>107594820
FlanGemma coming soon

Anonymous
12/18/25(Thu)15:10:08 No.107594875

Anonymous 12/18/25(Thu)15:10:08 No.107594875

>>107594304
T5gemma-2 9B

Anonymous
12/18/25(Thu)15:10:39 No.107594879

Anonymous 12/18/25(Thu)15:10:39 No.107594879

>>107594820
Keep flushing!

Anonymous
12/18/25(Thu)15:12:07 No.107594891

Anonymous 12/18/25(Thu)15:12:07 No.107594891

all you needeed was to thrust into omar

Anonymous
12/18/25(Thu)15:13:18 No.107594899

Anonymous 12/18/25(Thu)15:13:18 No.107594899

>>107594891
he better lube his ass up if gemmy 4 isn't coming

Anonymous
12/18/25(Thu)15:13:29 No.107594901

Anonymous 12/18/25(Thu)15:13:29 No.107594901

what if we just ignored omar until he finally shits out gemma

Anonymous
12/18/25(Thu)15:14:00 No.107594905

Anonymous 12/18/25(Thu)15:14:00 No.107594905

>>107594901
please to not do this thanks you

Anonymous
12/18/25(Thu)15:27:34 No.107594997

Anonymous 12/18/25(Thu)15:27:34 No.107594997

>>107594820
why the fuck do we care about what this swindling pajeet keeps posting on twatter?

Anonymous
12/18/25(Thu)15:31:31 No.107595023

Anonymous 12/18/25(Thu)15:31:31 No.107595023

File: os-gemma-hf.png (122 KB, 587x781)

122 KB PNG

>>107594997
He's the one uploading and updating Gemma models on the Google HuggingFace page?

Anonymous
12/18/25(Thu)15:31:41 No.107595025

Anonymous 12/18/25(Thu)15:31:41 No.107595025

Which model subjectively writes the hottest sexo? [spoiler]This is assuming you're past the refusal wall.[/spoiler]
>>107594071
Gemma behaves like a sexually repressed teenage girl who's simultaneously horny but also terrified of the nono words.

Anonymous
12/18/25(Thu)15:32:46 No.107595032

Anonymous 12/18/25(Thu)15:32:46 No.107595032

>>107595023
right, but he is never gonna upload something that actually matters.

Anonymous
12/18/25(Thu)15:33:00 No.107595035

Anonymous 12/18/25(Thu)15:33:00 No.107595035

>>107594849
EmbeddingGemma2 with vision support would be cool, although useless for most anons here.

Anonymous
12/18/25(Thu)15:35:17 No.107595046

Anonymous 12/18/25(Thu)15:35:17 No.107595046

>>107589839
lmao, something truly special about this guy. At some point he's just a lolcow

his main defence is "I don't need a framework to evaluate my own work, because these idiots are already saying it's good", but somehow never realizing "these idiots have no way to evaluate my work"

Anonymous
12/18/25(Thu)15:35:27 No.107595047

Anonymous 12/18/25(Thu)15:35:27 No.107595047

>>107595032
he will, assuming gemma 4 hasn't been cancelled due to the senator thing, but only after he notices that vagueposting isn't drawing as much engagement

Anonymous
12/18/25(Thu)15:35:51 No.107595051

Anonymous 12/18/25(Thu)15:35:51 No.107595051

>>107594717
man for real a few years back i got pissed while playing cdda or something and went to check steam nearly had an aneursym when i saw shit like cod remastered being 100-200 gb thank fuck i dont play any of that shit

Anonymous
12/18/25(Thu)15:36:20 No.107595054

Anonymous 12/18/25(Thu)15:36:20 No.107595054

>>107595047
right, which is why everyone should stop caring about his bullshit twatter posts.

Anonymous
12/18/25(Thu)15:49:27 No.107595161

Anonymous 12/18/25(Thu)15:49:27 No.107595161

>>107595054
but that is bad for his izzat

Anonymous
12/18/25(Thu)15:50:24 No.107595171

Anonymous 12/18/25(Thu)15:50:24 No.107595171

>>107595161
Stop using that word.

Anonymous
12/18/25(Thu)15:51:00 No.107595177

Anonymous 12/18/25(Thu)15:51:00 No.107595177

>>107595171
stop disliking culture

Anonymous
12/18/25(Thu)15:52:12 No.107595193

Anonymous 12/18/25(Thu)15:52:12 No.107595193

>>107595161
we must decrease his izzat until he is forced to release gemma 4

Anonymous
12/18/25(Thu)15:53:17 No.107595200

Anonymous 12/18/25(Thu)15:53:17 No.107595200

>>107595171
No. I'll in fact use it so often until it enters the general populace

Anonymous
12/18/25(Thu)15:54:51 No.107595218

Anonymous 12/18/25(Thu)15:54:51 No.107595218

>>107592905
No, I haven't... What is it?

Anonymous
12/18/25(Thu)15:56:15 No.107595226

Anonymous 12/18/25(Thu)15:56:15 No.107595226

>>107592905
>last year

Anonymous
12/18/25(Thu)15:57:16 No.107595237

Anonymous 12/18/25(Thu)15:57:16 No.107595237

>>107595226
best year

Anonymous
12/18/25(Thu)15:59:09 No.107595257

Anonymous 12/18/25(Thu)15:59:09 No.107595257

>>107595171
Saar the timmycels know about izzat kindly what do send advise.

Anonymous
12/18/25(Thu)16:00:07 No.107595264

Anonymous 12/18/25(Thu)16:00:07 No.107595264

>>107595054
we can not care all we want, if we ever cared at all, but the posters below you are why omar will get all the attention he desires

Anonymous
12/18/25(Thu)16:02:35 No.107595293

Anonymous 12/18/25(Thu)16:02:35 No.107595293

>>107595200
How very jewish of you.

Anonymous
12/18/25(Thu)16:03:48 No.107595308

Anonymous 12/18/25(Thu)16:03:48 No.107595308

>>107595293
Can't hear you over my sacred cow beef hamburger

Anonymous
12/18/25(Thu)16:04:21 No.107595312

Anonymous 12/18/25(Thu)16:04:21 No.107595312

>>107595257
I don't even know what it means, last time I googled it it told me it was some arab bullshit.
I have anti-pajeet fatigue. I have only see a single indian for 5 seconds IRL in my 30 years of life. I don't care about pajeets. To me they are just le funny scam man and I am tired of everyone acting like they are some big deal. I'd rather pretend they don't exist.

Anonymous
12/18/25(Thu)16:07:25 No.107595333

Anonymous 12/18/25(Thu)16:07:25 No.107595333

>>107594047
bro it literally repeated the same function inside 3 different blocks instead of extracting it, then I ask to extract it to reduce repetition and it partially extracts it while adding more retarded fluff to it.
dont get me started on the fucking interim tests it does, leaving so much garbage behind

Anonymous
12/18/25(Thu)16:07:37 No.107595334

Anonymous 12/18/25(Thu)16:07:37 No.107595334

>>107595312
izzat is honor culture, just google it lmao. its like the zut meme but for indians and not arabs.

Anonymous
12/18/25(Thu)16:15:17 No.107595392

Anonymous 12/18/25(Thu)16:15:17 No.107595392

>>107595334
one of your zoomer culture buzzwords from twitter again
or did your favourite ((streamer)) use this word?

Anonymous
12/18/25(Thu)16:16:48 No.107595402

Anonymous 12/18/25(Thu)16:16:48 No.107595402

way more indians on /g/ than I thought. I guess we are on an AI thread

Anonymous
12/18/25(Thu)16:16:53 No.107595403

Anonymous 12/18/25(Thu)16:16:53 No.107595403

File: IndianName.jpg (120 KB, 1024x768)

120 KB JPG

>>107595312
>I'd rather pretend they don't exist.
We all would and unfortunately nobody in highly populated parts of NA has that luxury anymore. We have been so thoroughly jeeted in real life that they're easy for anons to pick out when they crawl into threads here because they near universally share behavioral characteristics.
>Technology?
Relevant due to who's replaced the development teams of most /g/ related topics.

Anonymous
12/18/25(Thu)16:17:34 No.107595406

Anonymous 12/18/25(Thu)16:17:34 No.107595406

>>107595334
I have no idea what that is either and honestly I don't care.

Anonymous
12/18/25(Thu)16:18:55 No.107595415

Anonymous 12/18/25(Thu)16:18:55 No.107595415

>>107594820
Friday is going to be huge for the history of local llms. pin it on the calendar sirs

Anonymous
12/18/25(Thu)16:20:57 No.107595425

Anonymous 12/18/25(Thu)16:20:57 No.107595425

>>107595403
Nah, I've been called a jeet every day of the week for the past 6 months and I never set foot in Asia.
All /g/ maggots needs to call you a jeet is for you to say something they don't agree with.

Anonymous
12/18/25(Thu)16:22:22 No.107595430

Anonymous 12/18/25(Thu)16:22:22 No.107595430

>>107595425
Then stop behaving like a jeet unless you biologically can't help it.

Anonymous
12/18/25(Thu)16:23:59 No.107595442

Anonymous 12/18/25(Thu)16:23:59 No.107595442

>>107595430
If behaving like a jeet is having opinions low IQ virgin neckbeard midwits don't agree with then I'm going to keep behaving like a jeet.

Anonymous
12/18/25(Thu)16:26:32 No.107595464

Anonymous 12/18/25(Thu)16:26:32 No.107595464

File: jeetronome.png (311 KB, 733x797)

311 KB PNG

>>107595442
This is how you out yourself every time rajesh. This is how you will continue to out yourself in the future.

Anonymous
12/18/25(Thu)16:34:10 No.107595523

Anonymous 12/18/25(Thu)16:34:10 No.107595523

>>107593318
>new form of abliteration that makes models smarter https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration

Anonymous
12/18/25(Thu)16:35:05 No.107595530

Anonymous 12/18/25(Thu)16:35:05 No.107595530

>>107595523
lmao

Anonymous
12/18/25(Thu)16:38:34 No.107595553

Anonymous 12/18/25(Thu)16:38:34 No.107595553

>>107591295
If AGI isn't reached within the next three years they will probably go bankrupt and take the people who gave them loans with them

Anonymous
12/18/25(Thu)16:39:07 No.107595558

Anonymous 12/18/25(Thu)16:39:07 No.107595558

File: 1741827013984480.png (522 KB, 774x776)

522 KB PNG

It's over.

Anonymous
12/18/25(Thu)16:39:14 No.107595559

Anonymous 12/18/25(Thu)16:39:14 No.107595559

File: Screenshot from 2025-12-1(...).png (981 KB, 3840x2160)

981 KB PNG

>>107595464
Cope.

Anonymous
12/18/25(Thu)16:39:36 No.107595563

Anonymous 12/18/25(Thu)16:39:36 No.107595563

>>107595553
>He already forgot covid era moneyprinter go brrrrr
Grim

Anonymous
12/18/25(Thu)16:40:10 No.107595566

Anonymous 12/18/25(Thu)16:40:10 No.107595566

>>107595530
I got bored one time and at least tried the gpt-oss 120b one with mpoa and ran it through some of my basic questions to see how retarded a model is and it outdoes a good amount of the other 100-200b moes. It'll probably still not be able to write porn even if it's smart enough for storywriting, but I can just write that myself

Anonymous
12/18/25(Thu)16:40:46 No.107595570

Anonymous 12/18/25(Thu)16:40:46 No.107595570

>>107595530
it works though.. did you try it??

Anonymous
12/18/25(Thu)16:42:03 No.107595583

Anonymous 12/18/25(Thu)16:42:03 No.107595583

Fish boy... a spammer. What a surprise.

Anonymous
12/18/25(Thu)16:43:46 No.107595594

Anonymous 12/18/25(Thu)16:43:46 No.107595594

>>107595553
Seems like that's the plan - YOLO phat debts of fake money and secure the fourth turning when it isn't profitable. Meanwhile my DDR prices..

Anonymous
12/18/25(Thu)16:46:43 No.107595616

Anonymous 12/18/25(Thu)16:46:43 No.107595616

>>107595583
What have I spammed, retard? I've already showed my desktop during one of the spam attacks to prove it wasn't myself doing it, and invited anyone who wanted to argue to a video call to prove it wasn't myself doing it.
Do you really think I would waste my time LARPing as a pajeet for no reason? I have already stated I have anti-indian fatigue, why the fuck would I roleplay as a pajeet myself?

Anonymous
12/18/25(Thu)16:50:02 No.107595635

Anonymous 12/18/25(Thu)16:50:02 No.107595635

>>107595616
>to prove it wasn't myself doing it
What board do you think you're on? How many people do you think browse this board that can't use inspect element?

Anonymous
12/18/25(Thu)16:55:42 No.107595676

Anonymous 12/18/25(Thu)16:55:42 No.107595676

>>107595635
I didn't mean I posted to show the (You) tags, I mean to show I was immersed in my own activities and didn't have anything shady running on my desktop.
But if you insist, then point specifically at which posts were you talking about when you said I was spamming, so we can see if your error is whether or not I made those posts or you're mistakenly considering them spam when they're not.
And why the fuck would I want to spam anything?
I don't prove you wrong by spamming a thread, I prove you wrong by producing useful, functioning software and tunes.

Anonymous
12/18/25(Thu)16:55:43 No.107595677

Anonymous 12/18/25(Thu)16:55:43 No.107595677

>>107595635
>How many people do you think browse this board that can't use inspect element?
What if they're posting from an iPhone?

Anonymous
12/18/25(Thu)16:57:52 No.107595694

Anonymous 12/18/25(Thu)16:57:52 No.107595694

File: 17th-century-llm-RP.jpg (1.27 MB, 3610x5208)

1.27 MB JPG

>>107595635
This board? Or this general?
/g/ is retarded, someone in another thread said its made up of 30% intentional comedy, 30% unintentional comedy, 30% intentional comedy that isn't funny, and 10% people lost not knowing where they are.

There are definitely smart people on the board but....

Anonymous
12/18/25(Thu)16:57:56 No.107595695

Anonymous 12/18/25(Thu)16:57:56 No.107595695

>>107595676
I'm not >>107595583 I'm the person you quoted.

Anonymous
12/18/25(Thu)16:59:10 No.107595703

Anonymous 12/18/25(Thu)16:59:10 No.107595703

>>107595218
It's a framework to stream activation and model/optimizer state through SSD. The paper also gives an oversight of other streaming frameworks. Like Flashneuron and Zero-Infinity.

BTW, there's a ton of new optimizers with low rank or no state too, which is kinda important when streaming across PCIe or SSD. You can do better than SGD.

Anonymous
12/18/25(Thu)17:03:09 No.107595723

Anonymous 12/18/25(Thu)17:03:09 No.107595723

>>107595703
Cool. Do you know if it actually works for practical purposes and what models can it be used with?

Anonymous
12/18/25(Thu)17:04:04 No.107595727

Anonymous 12/18/25(Thu)17:04:04 No.107595727

>>107595559
you will never be a real programmer
>using non dark themes
ultra brown coded
>still gloating about programming an useless inference engine
throw yourself off a bridge.

Anonymous
12/18/25(Thu)17:06:10 No.107595750

Anonymous 12/18/25(Thu)17:06:10 No.107595750

>>107595736
>>107595736
>>107595736

Anonymous
12/18/25(Thu)17:10:04 No.107595772

Anonymous 12/18/25(Thu)17:10:04 No.107595772

>>107595727
>you will never be a real programmer
Good, programmers are obsolete. I'd rather be more like a PM, tard wrangling a horde of Claude and codex bots to achieve whatever I fancy at the moment.
>ultra brown coded
Good, I'm brown.
>still gloating about programming an useless inference engine
After I add finetuning it will have a useful feature no other backend has (except maybe the thing the other anon was talking about).

Anonymous
12/18/25(Thu)17:20:25 No.107595868

Anonymous 12/18/25(Thu)17:20:25 No.107595868

>>107595723
It can be used to run the experiments from the paper and that's it. But it's designed for consumer GPUs and close to what you want to do, all the other frameworks go straight to H100 or multiple thereof as suggested hardware on their githubs.

Anonymous
12/18/25(Thu)17:20:43 No.107595872

Anonymous 12/18/25(Thu)17:20:43 No.107595872

>>107595392
we have run out of domestic racism and need foreign racism to do the jobs americans won't.

Anonymous
12/18/25(Thu)17:52:18 No.107596129

Anonymous 12/18/25(Thu)17:52:18 No.107596129

>>107594671
Ok that somewhat makes sense. I'm just confused because most of the service stuff like Grok or Gemini has to scrape the internet for answers.

So which one should you pick if you have modest specs? (16 vram and 32gb ddr5 memory ram)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.