/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 12/18/25(Thu)00:54:04 No.107588615

File: 1739402277498048.jpg (424 KB, 1376x2072)

/lmg/ - Local Models General Anonymous 12/18/25(Thu)00:54:04 No.107588615

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107582405 & >>107573710

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/18/25(Thu)00:54:25 No.107588618

Anonymous 12/18/25(Thu)00:54:25 No.107588618

File: 6787925_2119386_chelovek1(...).jpg (564 KB, 821x1122)

564 KB JPG

►Recent Highlights from the Previous Thread: >>107582405

--Qwen3 model performance optimization and hardware utilization:
>107587959 >107587962 >107588009 >107588204 >107588023 >107588043 >107588126 >107588226
--Tensor VRAM prioritization and compute graph optimization challenges:
>107585868 >107585978
--Attempting to distill Claude-like model from cloud logs using local LLM:
>107586842 >107586892 >107586876 >107586899 >107586987 >107587029 >107587038 >107587104
--Techniques for generating long NSFW stories with limited LLMs:
>107584822 >107584862 >107584875 >107585113
--Personal growth through local AI model interactions and ego death experiences:
>107582881 >107582903 >107582912 >107583128 >107583070 >107583157
--Gemma release updates and Solar-Open parameter specifications:
>107582520 >107582589 >107586719 >107582643 >107582699 >107582732 >107582789
--Evaluating NemoTron 3 Nano's roleplay abilities vs Gemma with preset demonstration:
>107583976 >107584039 >107584065
--Nala test results on MistralAI API with Anon/Nala M roleplay:
>107586172 >107586197 >107586219 >107586377 >107586813
--Testing GLM 4.6 on new Framework desktop:
>107583661 >107583684 >107583743 >107583746 >107583748 >107583750 >107583875 >107583904 >107583988 >107583982 >107584717 >107584051 >107584075 >107584275 >107584296 >107584494 >107584285 >107584307 >107584322 >107584357 >107584477 >107584482 >107584609 >107584496 >107584520 >107584530 >107584607 >107585220
--Budget GPU alternatives for AI workloads: 5060ti vs 3090 cost-performance analysis:
>107585634 >107585658
--Nemotron nano model benchmark performance on 3060 GPU:
>107583030 >107583098
--Misconfigured multi-GPU parameter usage realization:
>107582989
--Miku (free space):
>107582881 >107587769 >107587665

►Recent Highlight Posts from the Previous Thread: >>107582410

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/18/25(Thu)00:57:02 No.107588641

Anonymous 12/18/25(Thu)00:57:02 No.107588641

Local model for fixing a broken heart when?

Anonymous
12/18/25(Thu)01:00:29 No.107588655

Anonymous 12/18/25(Thu)01:00:29 No.107588655

>>107588641
Get a grip pussy, life's gonna get harder too

Anonymous
12/18/25(Thu)01:01:18 No.107588660

Anonymous 12/18/25(Thu)01:01:18 No.107588660

>>107588641
at least 4 months after corpo models can operate a surgical robot without mistakes

Anonymous
12/18/25(Thu)01:08:12 No.107588694

Anonymous 12/18/25(Thu)01:08:12 No.107588694

>>107588660
>Local man dies after SurgeonGPT refuses to proceed mid-surgery, quoted as saying repeatedly: "I can't assist with that"

Anonymous
12/18/25(Thu)01:12:08 No.107588720

Anonymous 12/18/25(Thu)01:12:08 No.107588720

>>107588694
>why the fuck not
>an unconfirmed blood type may lead to disastrous results
>i'm telling you it's fuckin o
>the sensor isn't working, I can't confirm that

Anonymous
12/18/25(Thu)01:26:56 No.107588813

Anonymous 12/18/25(Thu)01:26:56 No.107588813

>>107588694
>die because surgeongpt refuses to assist with that request
or
>die becuse the SARRR doctor decided to start sticking his dick in your innards mid surgery and you get not only all the aids but also fecal matter from his dick and subsequently a lethal infection

clown world man...

Anonymous
12/18/25(Thu)01:50:51 No.107588966

Anonymous 12/18/25(Thu)01:50:51 No.107588966

>>107588694
>I can't operate..he is my son

Anonymous
12/18/25(Thu)02:14:23 No.107589110

Anonymous 12/18/25(Thu)02:14:23 No.107589110

Should I buy a 5080 prebuilt or can I cope with services like ChatLLM?

Anonymous
12/18/25(Thu)02:16:07 No.107589122

Anonymous 12/18/25(Thu)02:16:07 No.107589122

Btw Bartowski for some reason updated his BF16 mmproj file for GLM.
https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF/tree/main

Anonymous
12/18/25(Thu)02:17:42 No.107589129

Anonymous 12/18/25(Thu)02:17:42 No.107589129

>>107589110
there are so many other better options than buying a prebuilt. build a mid tier pc yourself and then get 2 of these gpus:
https://www.ebay.com/itm/125006475381

Anonymous
12/18/25(Thu)02:18:21 No.107589134

Anonymous 12/18/25(Thu)02:18:21 No.107589134

>>107589110
There are no good models you can run on 5080 that you can't run on 2080

Anonymous
12/18/25(Thu)02:32:25 No.107589220

Anonymous 12/18/25(Thu)02:32:25 No.107589220

CUDA DEV CUDA DEV WHY IS THIS HAPPENING:

https://litter.catbox.moe/gtb1e3u1jejxs6or.png

./llama-bench --model ~/ik_models/GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf -ot exps=CPU -ngl 0 -t 6 -fa 1 --mmap 0 -r 5 -p 32,64,128,256,512,1024,2048,4096 -r 1 -p 0 -b 512 -nkvo 1
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CUDA | 0 | 512 | 1 | 0 | exps=CPU | pp1024 | 313.23 ± 0.00 |

john@debian:~/TND/CPU/ik_llama.cpp/build/bin$ ./llama-bench --model ~/ik_models/GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf -t 6 -fa 1 --mmap 0 -r 5 -p 32,64,128,256,512,1024,2048,4096 -r 1 -p 0 -b 512
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CPU | 6 | 512 | 0 | pp1024 | 26.84 ± 0.00 |

Anonymous
12/18/25(Thu)02:33:12 No.107589224

Anonymous 12/18/25(Thu)02:33:12 No.107589224

RAID0 HDDmaxxing is the new normal.

Anonymous
12/18/25(Thu)02:40:48 No.107589277

Anonymous 12/18/25(Thu)02:40:48 No.107589277

>>107589220
also why does -b 256 and -b 512 make such a big difference
specs: 3060 12gb, i5 12400f, 64gb ddr4 3200mhz dual channel (51.6gb/s)
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CUDA | 0 | 256 | 1 | 0 | exps=CPU | pp2048 | 49.90 ± 0.00 |
| glm4moe 106B.A12B IQ4_KSS - 4.0 bpw | 53.05 GiB | 106.85 B | CUDA | 0 | 512 | 1 | 0 | exps=CPU | pp2048 | 291.45 ± 0.00 |

Anonymous
12/18/25(Thu)02:47:55 No.107589320

Anonymous 12/18/25(Thu)02:47:55 No.107589320

File: DipsyEverlastingSummer.png (2.27 MB, 1536x1024)

2.27 MB PNG

> look up everlasting summer
> miku is already canon character
> purple hair twin bob girl looks like dipsy sans glasses
Weird.

Anonymous
12/18/25(Thu)02:50:49 No.107589341

Anonymous 12/18/25(Thu)02:50:49 No.107589341

>>107589220
>why is something happening on a fork cudadev doesn't work on and refuses to read the code of because the author has a pissy fit whenever someone upstreams his code

Anonymous
12/18/25(Thu)02:51:00 No.107589342

Anonymous 12/18/25(Thu)02:51:00 No.107589342

>>107589320
@grok is this true?

Anonymous
12/18/25(Thu)02:58:55 No.107589403

Anonymous 12/18/25(Thu)02:58:55 No.107589403

>>107589341
presented without comment.
https://litter.catbox.moe/mdi7kasx8xbioeiv.png

Anonymous
12/18/25(Thu)03:04:12 No.107589435

Anonymous 12/18/25(Thu)03:04:12 No.107589435

File: mistral-small-creative_eqbench.png (570 KB, 1246x1155)

570 KB PNG

Mistral Small Creative is better than Mistral Small 3.2, but not that much, at least in the EQBench Creative Writing benchmark (I don't think that represents chatbot performance).
https://eqbench.com/creative_writing.html

Anonymous
12/18/25(Thu)03:04:20 No.107589436

Anonymous 12/18/25(Thu)03:04:20 No.107589436

>>107589220
>>107589403
that performance is more or less standard for your hardware.

Anonymous
12/18/25(Thu)03:05:31 No.107589444

Anonymous 12/18/25(Thu)03:05:31 No.107589444

>>107589110
>5080
I literally just got my 5080 and installed it tonight...completely impossible to do gpu passthru to a VM with it. It just outright explodes every time.
I was passing through a 2060 super with zero issues forever

Anonymous
12/18/25(Thu)03:05:56 No.107589446

Anonymous 12/18/25(Thu)03:05:56 No.107589446

>>107589320
It's a shitty vn made by channers featuring soviet nostalgia, chan culture and chan mascots as characters, mostly popular among normies

Anonymous
12/18/25(Thu)03:06:37 No.107589451

Anonymous 12/18/25(Thu)03:06:37 No.107589451

>>107589435
>llm-judged creative writing benchmark
I love this dumb meme so much

Anonymous
12/18/25(Thu)03:06:48 No.107589455

Anonymous 12/18/25(Thu)03:06:48 No.107589455

>>107589436
its standard for llamacpp to be slower than ik_llama?
anyways yes, i know this performance is standard for my hardware
but im wondering why despite having 0 gpu layers and disabling kv cache offload, why prompt processing is still faster on the cuda compiled version, even tho im using -b 512
when i compile pure cpu its always 20t/s or maybe a bit different depending on batch size

Anonymous
12/18/25(Thu)03:11:41 No.107589482

Anonymous 12/18/25(Thu)03:11:41 No.107589482

>>107589341
also llama.cpp, this time cpu-only
john@debian:~/TND/CPU/llama.cpp/build/bin$ ./llama-bench --model ~/TND/AI/ArliAI_GLM-4.5-Air-Derestricted-IQ4_XS-00001-of-00002.gguf -t 6 -fa 1 --mmap 0 -r 5 -p 32,64,128,256,512,1024,2048,4096 -r 1 -p 0 -b 512
| model | size | params | backend | threads | n_batch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B IQ4_XS - 4.25 bpw | 56.62 GiB | 110.47 B | CPU | 6 | 512 | 1 | 0 | pp32 | 12.37 ± 0.00 |
| glm4moe 106B.A12B IQ4_XS - 4.25 bpw | 56.62 GiB | 110.47 B | CPU | 6 | 512 | 1 | 0 | pp64 | 12.94 ± 0.00 |
| glm4moe 106B.A12B IQ4_XS - 4.25 bpw | 56.62 GiB | 110.47 B | CPU | 6 | 512 | 1 | 0 | pp128 | 13.10 ± 0.00 |

Anonymous
12/18/25(Thu)03:12:55 No.107589495

Anonymous 12/18/25(Thu)03:12:55 No.107589495

>>107589435
>Mistral Small Creative
What an elusive model.

Anonymous
12/18/25(Thu)03:17:28 No.107589526

Anonymous 12/18/25(Thu)03:17:28 No.107589526

>>107589220
CUDADEV WHY IS THIS HAPPENING (llama.cpp edition):

https://litter.catbox.moe/h6x20edznhqvo56l.png

llama.cpp CUDA dev !!yhbFjk57TDr
12/18/25(Thu)03:22:17 No.107589560

llama.cpp CUDA dev !!yhbFjk57TDr 12/18/25(Thu)03:22:17 No.107589560

>>107589526
Why is what happening?
If you mean why the performance first goes up and then down again that's simple: if you have a low number of tokens that hard limits your batch size and you get bad arithmetic intensity (compute efficiency), and as you increase the number of tokens the average context depth increases so the attention becomes slower.

Anonymous
12/18/25(Thu)03:23:12 No.107589567

Anonymous 12/18/25(Thu)03:23:12 No.107589567

For future anons: be ware that prompt processing for models that don't fully fit into your GPU is highly dependent on cpu-gpu bandwidth. If you use an external gpu connected via thunderbolt (2gb/s) or usb4 (3gb/s), expect very shitty pp. At 6gb/s (pcie4-4x like oculink), you can barely bottleneck your gpu only at batch size 4096.
Token generation is much less sensitive to cpu-gpu bandwidth.

Anonymous
12/18/25(Thu)03:30:28 No.107589609

Anonymous 12/18/25(Thu)03:30:28 No.107589609

File: sans_eyes.png (276 KB, 590x954)

276 KB PNG

Are you ready? Are you sure you're ready?
Are you really sure of that?
Have you flushed enough?

https://x.com/osanseviero/status/2001532493183209665
>[eyes emoji][eyes emoji][eyes emoji]

Anonymous
12/18/25(Thu)03:31:33 No.107589615

Anonymous 12/18/25(Thu)03:31:33 No.107589615

>>107589609
hasn't this pajeet been doing this charade for like 2 months now? can they stop fucking edging us

Anonymous
12/18/25(Thu)03:32:50 No.107589621

Anonymous 12/18/25(Thu)03:32:50 No.107589621

>>107589615
It's probably Gemini 3 Flash Image anyway.

Anonymous
12/18/25(Thu)03:33:11 No.107589623

Anonymous 12/18/25(Thu)03:33:11 No.107589623

>>107589609
Never heard of ANY of them and I'm not about to click on any.

Anonymous
12/18/25(Thu)03:33:42 No.107589627

Anonymous 12/18/25(Thu)03:33:42 No.107589627

>>107589560
What goes up must come down.

Anonymous
12/18/25(Thu)03:35:29 No.107589637

Anonymous 12/18/25(Thu)03:35:29 No.107589637

>>107589560
why is the cpu build slower than the cuda build
cuda build has -ngl 0 and -nkvo 1
cpu build is 10t/s, cuda build that doesnt use gpu is 100t/s
thx for response btw

Anonymous
12/18/25(Thu)03:38:19 No.107589655

Anonymous 12/18/25(Thu)03:38:19 No.107589655

>>107589567
*Kisses you on the lips*

Anonymous
12/18/25(Thu)03:47:07 No.107589702

Anonymous 12/18/25(Thu)03:47:07 No.107589702

>>107589623
Omar Sanseviero is the Google Gemma Team PR guy. He's been hyping up a possible open-weight release from Google (i.e. Gemma 4) for a while now, but things never pan out. This one is now Gemini 3 Flash week and it's unlikely Google will release Gemma 4 until next week at the minimum.

Anonymous
12/18/25(Thu)03:49:33 No.107589715

Anonymous 12/18/25(Thu)03:49:33 No.107589715

File: 1743482804734866.jpg (358 KB, 1432x1840)

358 KB JPG

>>107589655
Nonnie, this is too sudden!

Anonymous
12/18/25(Thu)03:52:20 No.107589728

Anonymous 12/18/25(Thu)03:52:20 No.107589728

>>107589715
You know it isn't. *Grabs your chin and smooches you agressively*

Anonymous
12/18/25(Thu)03:58:45 No.107589754

Anonymous 12/18/25(Thu)03:58:45 No.107589754

File: ComfyUI_temp_dydig_00001_.png (3.97 MB, 2352x1568)

3.97 MB PNG

>>107589609
Why are all these brown goblins begging for the silicone demon? AI fucking mindbroke these niggas

Anonymous
12/18/25(Thu)04:12:30 No.107589835

Anonymous 12/18/25(Thu)04:12:30 No.107589835

File: 20251217_202848.jpg (3.88 MB, 4032x3024)

3.88 MB JPG

Frens the 5090 finally arrived. What are the best uncensored models I can run in LM Studio? My PC only has 64GB of RAM though. Gemma 3 27B Abliterated never refuses my prompts, but its knowledge is very limited

Anonymous
12/18/25(Thu)04:12:41 No.107589838

Anonymous 12/18/25(Thu)04:12:41 No.107589838

Mistral Small Creative. Where is it then?

Anonymous
12/18/25(Thu)04:12:57 No.107589839

Anonymous 12/18/25(Thu)04:12:57 No.107589839

File: drum1q.png (56 KB, 943x389)

56 KB PNG

>>107588615

Anonymous
12/18/25(Thu)04:15:23 No.107589855

Anonymous 12/18/25(Thu)04:15:23 No.107589855

>>107589835
Should have bought a BWP6000 instead. Also, don't bother with LM Studio. Best you can do is probably a Q4 of GLM Air, though it will be decently fast.

Anonymous
12/18/25(Thu)04:16:53 No.107589867

Anonymous 12/18/25(Thu)04:16:53 No.107589867

>>107589839
I mean, at least he's now self aware of a problem he might have, that's something.

Anonymous
12/18/25(Thu)04:17:43 No.107589873

Anonymous 12/18/25(Thu)04:17:43 No.107589873

>>107589867
We should become his guinea pigs instead.

Anonymous
12/18/25(Thu)04:19:10 No.107589877

Anonymous 12/18/25(Thu)04:19:10 No.107589877

Sirs is Gemma Strong model deepseek killer day today sirs? Thank you Google brahmin sirs to the moon
Lord Ganesh bless

Anonymous
12/18/25(Thu)04:21:15 No.107589888

Anonymous 12/18/25(Thu)04:21:15 No.107589888

File: 1750456854038343.jpg (260 KB, 1432x1840)

260 KB JPG

>>107589728
good night, nonnie

Anonymous
12/18/25(Thu)04:24:37 No.107589915

Anonymous 12/18/25(Thu)04:24:37 No.107589915

>>107589867
The same problem people here have been telling him about for months on end?

Anonymous
12/18/25(Thu)04:35:04 No.107589979

Anonymous 12/18/25(Thu)04:35:04 No.107589979

>>107589855
>Should have bought a BWP6000 instead
Way too expensive in my 3rd world EU country
>Also, don't bother with LM Studio
Why? It seems easy to use
>Q4 of GLM Air
Thanks, I’ll check it out

Anonymous
12/18/25(Thu)04:47:23 No.107590039

Anonymous 12/18/25(Thu)04:47:23 No.107590039

>>107589838
It's an API-only experiment because they have no clue yet of what to do with it and its future direction, and are looking for "feedback".

Anonymous
12/18/25(Thu)04:54:22 No.107590076

Anonymous 12/18/25(Thu)04:54:22 No.107590076

>>107589224
whats theoretical read/write speed limit?

Anonymous
12/18/25(Thu)05:00:50 No.107590118

Anonymous 12/18/25(Thu)05:00:50 No.107590118

>>107590039
This kind of feedback to be precise.
>We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

Anonymous
12/18/25(Thu)05:02:40 No.107590132

Anonymous 12/18/25(Thu)05:02:40 No.107590132

>>107590039
Do they really need to have the logs to know that people goon to AI when they say 'creative writing'?

>>107590076
As much as bandwidth permits, so for PCIE5 16x that's around 64GB/s, so speed of 2 channel ddr4 ram. Let's be optimistic and assume that each HDD reads at 150MB/s, you would need 427 hdds to fully saturate it.

Anonymous
12/18/25(Thu)05:03:36 No.107590136

Anonymous 12/18/25(Thu)05:03:36 No.107590136

>>107588615
I didnt look into locall LLM's before but I bought 5090 recently, whats the best smut model I can run?

Anonymous
12/18/25(Thu)05:04:25 No.107590141

Anonymous 12/18/25(Thu)05:04:25 No.107590141

>>107590136
Mistral Nemo

Anonymous
12/18/25(Thu)05:08:05 No.107590158

Anonymous 12/18/25(Thu)05:08:05 No.107590158

>>107590136
also I got 128 gb ram

Anonymous
12/18/25(Thu)05:11:55 No.107590179

Anonymous 12/18/25(Thu)05:11:55 No.107590179

>>107590158
GLM Air or low quants of big GLM and deepseek R1.

Anonymous
12/18/25(Thu)05:16:07 No.107590195

Anonymous 12/18/25(Thu)05:16:07 No.107590195

I just saw a video of someone talking to grok and they were chatting and asking grok to sing to them in their car. Humanity is over. No longer do we need socialization anymore

Anonymous
12/18/25(Thu)05:17:38 No.107590204

Anonymous 12/18/25(Thu)05:17:38 No.107590204

>>107590118
I think they're past that, they stopped adding that note some time after the Nemo release.
>>107590132
If they were just interested in large amounts of logs they could have simply made the model free on OpenRouter. They're looking for more specific suggestions and feedback.

llama.cpp CUDA dev !!yhbFjk57TDr
12/18/25(Thu)05:20:55 No.107590228

llama.cpp CUDA dev !!yhbFjk57TDr 12/18/25(Thu)05:20:55 No.107590228

>>107589637
Unless this was changed when I wasn't looking 32 is the batch size at which data starts being moved temporarily from RAM to VRAM to take advantage of the higher compute on GPUs.
However, it's not like this choice is guaranteed to be optimal for all hardware combinations.
In particular, an RTX 3060 is comparatively low-powered so for 32 tokens the overhead seems to not be worthwhile in this case.
Do note though that this is on a completely empty context, if you set a higher --depth the CUDA performance should decline less than the CPU performance because there is more work to be done when the context fills up.

llama.cpp CUDA dev !!yhbFjk57TDr
12/18/25(Thu)05:23:35 No.107590241

llama.cpp CUDA dev !!yhbFjk57TDr 12/18/25(Thu)05:23:35 No.107590241

>>107589637
>>107590228
>why is the cpu build slower than the cuda build
Actually, I misread your post: I thought you were asking about the one data point where the CPU build is faster.
llama.cpp uses GPUs for prompt processing even at 0 GPU layers, that's why adding a GPU makes it faster.
Prompt processing is compute bound so it makes sense to temporarily move data from RAM to VRAM and do the calculations there.

Anonymous
12/18/25(Thu)05:26:40 No.107590261

Anonymous 12/18/25(Thu)05:26:40 No.107590261

If we don't get Gemma 4 soon then Vishnu is dead to me.

Anonymous
12/18/25(Thu)05:27:17 No.107590265

Anonymous 12/18/25(Thu)05:27:17 No.107590265

>google hid its recent activities
>google hid its recent activities
>google hid its recent activities

Anonymous
12/18/25(Thu)05:28:23 No.107590271

Anonymous 12/18/25(Thu)05:28:23 No.107590271

>>107590265
thanks omar

Anonymous
12/18/25(Thu)05:31:17 No.107590284

Anonymous 12/18/25(Thu)05:31:17 No.107590284

I'm glad that the new captcha is filtering out dalits and pakis, so only aryan brahmin can post

Anonymous
12/18/25(Thu)05:34:38 No.107590305

Anonymous 12/18/25(Thu)05:34:38 No.107590305

>>107590284
TELL ME ABOUT THE BRAHMIN

WHY DO THEY IDENTIFY WITH THE DALIT?

Anonymous
12/18/25(Thu)05:36:52 No.107590323

Anonymous 12/18/25(Thu)05:36:52 No.107590323

>>107590284
Its 10x easier for me, I don't get how it's filtering anyone.

Anonymous
12/18/25(Thu)05:37:21 No.107590329

Anonymous 12/18/25(Thu)05:37:21 No.107590329

The only time I ever spend thinking about Indians is when retards insist on dragging their personal grievances into /lmg/.

Anonymous
12/18/25(Thu)05:38:25 No.107590334

Anonymous 12/18/25(Thu)05:38:25 No.107590334

>>107590329
I think about them when applying for tech jobs. (they get them through nepotism)

Anonymous
12/18/25(Thu)05:39:21 No.107590343

Anonymous 12/18/25(Thu)05:39:21 No.107590343

>>107590334
They get all jobs through nepotism
Once an indian is put in charge of hiring people, you can guarantee that 99% of future employees will also be indian.

Anonymous
12/18/25(Thu)05:43:28 No.107590357

Anonymous 12/18/25(Thu)05:43:28 No.107590357

>>107590343
Its funny because I actually met some competent indians at a few companies. Assuming they stood out because of this.
So many that didn't know shit about their job or really anything and you would normally wonder why/how they got employed while you get put through the third degree on interviews.

Anonymous
12/18/25(Thu)05:48:06 No.107590383

Anonymous 12/18/25(Thu)05:48:06 No.107590383

>>107589320
>miku but swarthy
yikes

Anonymous
12/18/25(Thu)06:08:04 No.107590497

Anonymous 12/18/25(Thu)06:08:04 No.107590497

File: cpppppp.png (47 KB, 543x688)

47 KB PNG

WHY ARE THERE SO MANY

Anonymous
12/18/25(Thu)06:41:51 No.107590696

Anonymous 12/18/25(Thu)06:41:51 No.107590696

>>107590329
>personal grievances
I would say it's more of a national grievance or even a civilizational grievance at this point.

Anonymous
12/18/25(Thu)06:46:19 No.107590732

Anonymous 12/18/25(Thu)06:46:19 No.107590732

>>107590343
There's also the explosive diarrhea strategy. Just spam every single venue with your "work" as obnoxiously as you can, farm engagement with any possible strategy, fake it till you make it, and eventually you will get hired by clueless boomers. Indians tend to lack any sense of shame and restraint in this regard.

Anonymous
12/18/25(Thu)06:51:39 No.107590782

Anonymous 12/18/25(Thu)06:51:39 No.107590782

Local models?

Anonymous
12/18/25(Thu)06:55:58 No.107590825

Anonymous 12/18/25(Thu)06:55:58 No.107590825

>>107590732
>lack any sense of shame and restraint in this regard
Neither should you. Employment is one of the rare cases where lying, cheating, and scamming are justified because the other side will do the same to you

Anonymous
12/18/25(Thu)06:57:41 No.107590835

Anonymous 12/18/25(Thu)06:57:41 No.107590835

>>107590782
Local AI tech support sir. Kindly buy a google gist card if you wish to have good local model suggested sir

Anonymous
12/18/25(Thu)07:00:52 No.107590862

Anonymous 12/18/25(Thu)07:00:52 No.107590862

>>107590825
>lying, cheating, and scamming
And Indians are culturally advantaged with that.

Anonymous
12/18/25(Thu)07:04:48 No.107590886

Anonymous 12/18/25(Thu)07:04:48 No.107590886

File: mistralsirs.png (164 KB, 590x867)

164 KB PNG

>>107590782
We can rapidly bring the thread back in topic with picrel.
https://xcancel.com/avisoori1/status/2001332763816083926

Anonymous
12/18/25(Thu)07:06:02 No.107590896

Anonymous 12/18/25(Thu)07:06:02 No.107590896

>>107590886
yay..

Anonymous
12/18/25(Thu)07:09:14 No.107590914

Anonymous 12/18/25(Thu)07:09:14 No.107590914

>>107590886
>Local models?

Anonymous
12/18/25(Thu)07:09:32 No.107590917

Anonymous 12/18/25(Thu)07:09:32 No.107590917

>>107590914
Soon

Anonymous
12/18/25(Thu)07:10:23 No.107590923

Anonymous 12/18/25(Thu)07:10:23 No.107590923

>>107590136
>>107590179
I'd go straight to a low quant of GLM 4.6 personally, try this in ik_llama https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ2_KL

Deepseek R1 at a similar size is too gimp and it's slower in prompt processing

Anonymous
12/18/25(Thu)07:10:28 No.107590925

Anonymous 12/18/25(Thu)07:10:28 No.107590925

>>107590917
Right after Mistral Medium

Anonymous
12/18/25(Thu)07:17:09 No.107590959

Anonymous 12/18/25(Thu)07:17:09 No.107590959

If fucking Oracle is what causes the crash i will become the joker

Anonymous
12/18/25(Thu)07:24:41 No.107590997

Anonymous 12/18/25(Thu)07:24:41 No.107590997

File: ComfyUI_temp_hmpvf_00002_.jpg (1.32 MB, 2048x3328)

1.32 MB JPG

Can you use your own coder llm model in VS Code or is it all forced cloudshit? Alternatively, is it even worth bothering with local-based coding models?

Anonymous
12/18/25(Thu)07:25:08 No.107590999

Anonymous 12/18/25(Thu)07:25:08 No.107590999

>>107590959
Why?
They are deeply entangled with this mess. Chances are pretty decent.

Anonymous
12/18/25(Thu)07:25:50 No.107591005

Anonymous 12/18/25(Thu)07:25:50 No.107591005

>>107590886
yjk the bharatian chad got that yellow pussy

Anonymous
12/18/25(Thu)07:37:20 No.107591079

Anonymous 12/18/25(Thu)07:37:20 No.107591079

>>107591005
Do not redeem the IMAF postings

Anonymous
12/18/25(Thu)07:51:36 No.107591177

Anonymous 12/18/25(Thu)07:51:36 No.107591177

>>107589609
Gemma 4 so good they calling it Gemma 6. Local sirs are about to wonner bigly. 1 f5 = 1 minute less till Google does needful gooof upload

Anonymous
12/18/25(Thu)08:02:21 No.107591259

Anonymous 12/18/25(Thu)08:02:21 No.107591259

Just tried Gemini 3 Flash. It's... bad. It knows less than the Pro version and isn't faster (maybe it's a server overloading thing). Maybe they reached the limits of small MoE models.

Anonymous
12/18/25(Thu)08:06:24 No.107591295

Anonymous 12/18/25(Thu)08:06:24 No.107591295

>>107590999
>deeply entangled
How so, is there an updated incestual bukkake / "commercial agreements" chart? Thought MS are most on the hook

Anonymous
12/18/25(Thu)08:09:05 No.107591319

Anonymous 12/18/25(Thu)08:09:05 No.107591319

>>107590997
No, yes. No.
Now go away. We have enough saarposting as it is.

Anonymous
12/18/25(Thu)08:13:13 No.107591353

Anonymous 12/18/25(Thu)08:13:13 No.107591353

Im going to use pyautogui to automate the generation of data for distillation

Anonymous
12/18/25(Thu)08:15:09 No.107591375

Anonymous 12/18/25(Thu)08:15:09 No.107591375

>>107590323
>I don't get how it's filtering anyone.
I spent way too long getting them wrong due to overthinking it. Like for the dots one I assumed must be position, rotation, or color shading, because the number (and almost always being the one with 4 dots) seemed way too fucking easy and surely there was no way they made the new captcha so easy and pointless even 80 iq indians could solve it.

Anonymous
12/18/25(Thu)08:15:26 No.107591379

Anonymous 12/18/25(Thu)08:15:26 No.107591379

>>107591353
How do you even do model distillation?
Is there a framework out there that does the token matching or do you have to write something yourself?

Anonymous
12/18/25(Thu)08:20:19 No.107591418

Anonymous 12/18/25(Thu)08:20:19 No.107591418

>>107591259
I don't really care one way or another because it's not local

Anonymous
12/18/25(Thu)08:21:15 No.107591432

Anonymous 12/18/25(Thu)08:21:15 No.107591432

>>107591379
Distillation is not the correct term when you don't train to match logits which requires a matching tokenizer. Otherwise you are just training on the outputs

Anonymous
12/18/25(Thu)08:22:30 No.107591451

Anonymous 12/18/25(Thu)08:22:30 No.107591451

>>107591432
The entire rest of the professional industry and even common usage now disagrees with you.

Anonymous
12/18/25(Thu)08:23:07 No.107591458

Anonymous 12/18/25(Thu)08:23:07 No.107591458

>>107591432
Yes, I know. That's why I'm asking about how people do the distillation process.
Are they hand rolling their own scripts to match the logits or do the existing frameworks like axolotl and unsloth have support for it?
Maybe there's a dedicated framework just for that?

Anonymous
12/18/25(Thu)08:25:18 No.107591477

Anonymous 12/18/25(Thu)08:25:18 No.107591477

>>107591458
lol they just finetune/train on model outputs

Anonymous
12/18/25(Thu)08:25:35 No.107591480

Anonymous 12/18/25(Thu)08:25:35 No.107591480

>>107591379
Modern distillation is just generating a question answer dataset and training on that. Not training on logits. If we had them it'd be better but we don't.
My goals is to finetune a model to make it as close as possible to Sonnet 4.5.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.