/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 12/17/25(Wed)12:20:22 No.107582405

File: dipsyQueen.png (1.63 MB, 1024x1024)

1.63 MB PNG

/lmg/ - Local Models General Anonymous 12/17/25(Wed)12:20:22 No.107582405

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107573710 & >>107565204

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/17/25(Wed)12:20:40 No.107582410

Anonymous 12/17/25(Wed)12:20:40 No.107582410

File: 1738010215822.png (2.17 MB, 1536x1536)

2.17 MB PNG

►Recent Highlights from the Previous Thread: >>107573710

--Paper: RePo paper and multi-image CAPTCHA challenge discussion:
>107577314 >107577342 >107577367 >107577411
--Optimizing text generation for creative writing using specialized samplers:
>107574218 >107575323 >107575354 >107575474 >107575423 >107575274
--Comparing OCR models for Japanese text in manga, including dots.ocr vs Gemini 3:
>107574359 >107574473 >107574490 >107574523 >107574745
--Running large AI models on consumer GPUs with limited VRAM:
>107574547 >107574575 >107574579 >107574602 >107574606 >107574663 >107574695 >107574640
--Critique of AI-generated code quality and bot theory skepticism in LLM communities:
>107576227 >107576364 >107577638 >107577666 >107577995 >107577971
--GLM 4.6V's flawed reasoning patterns in Touhou character identification:
>107574600 >107574648 >107574699 >107574747 >107574921
--Meta SAM Audio release and vocal isolation quality:
>107576201 >107576427 >107580108
--Low-VRAM LLM testing strategies and model recommendations:
>107579504 >107579535 >107579545 >107579608 >107580036 >107580142 >107579626
--Optimizing glm-130B quantization and thread settings on 2x3090 GPUs with llama.cpp:
>107579155 >107579182 >107579226 >107579251
--Anticipation and speculation around Solar-Open-100B model release:
>107577317 >107577343 >107577412 >107577419 >107577768
--Seeking consistent accent voice cloning alternatives:
>107578331 >107578356 >107578483 >107578538
--Mistral model's formatting and instruction-following challenges:
>107574541 >107574574
--Chatterbox Turbo vs F5-TTS performance comparison on different GPUs:
>107576884 >107576899 >107576921 >107576953 >107576962
--Dipsy and Luka (free space):
>107575318 >107573767

►Recent Highlight Posts from the Previous Thread: >>107573726

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/17/25(Wed)12:21:45 No.107582422

Anonymous 12/17/25(Wed)12:21:45 No.107582422

>>107577061
There's some weird caching going on in that page.

Anonymous
12/17/25(Wed)12:32:29 No.107582507

Anonymous 12/17/25(Wed)12:32:29 No.107582507

>>107582200
There are intelligence/memory improvements, but they're less major changes and more ironing out issues. Currently Vedal is more concerned with working on making their 3D models work.

Anonymous
12/17/25(Wed)12:33:33 No.107582520

Anonymous 12/17/25(Wed)12:33:33 No.107582520

Gemmasaars... GLM 4.6 Airchinks... Nothing ever happens.

Anonymous
12/17/25(Wed)12:36:52 No.107582552

Anonymous 12/17/25(Wed)12:36:52 No.107582552

>>107582520
kind sir isnt 4.6v = 4.6 air + vision?
gemma4 sirs will saves us

Anonymous
12/17/25(Wed)12:37:27 No.107582558

Anonymous 12/17/25(Wed)12:37:27 No.107582558

why do you guys pretend to be indian

Anonymous
12/17/25(Wed)12:39:08 No.107582579

Anonymous 12/17/25(Wed)12:39:08 No.107582579

>>107582558
same reason everyone started pretending to be muslim in 2017

Anonymous
12/17/25(Wed)12:39:53 No.107582589

Anonymous 12/17/25(Wed)12:39:53 No.107582589

File: thereisstillhope.png (225 KB, 586x876)

225 KB PNG

>>107582520
The week is not over yet.

Anonymous
12/17/25(Wed)12:39:58 No.107582590

Anonymous 12/17/25(Wed)12:39:58 No.107582590

>>107582507
Do we know which model he used as a base?

Anonymous
12/17/25(Wed)12:41:18 No.107582606

Anonymous 12/17/25(Wed)12:41:18 No.107582606

>>107582520
drummer dropped yet another cydonia finetune, we don't need gemma or glm for like at least 1 more year now

Anonymous
12/17/25(Wed)12:41:48 No.107582613

Anonymous 12/17/25(Wed)12:41:48 No.107582613

>>107582558
>guys
One retard's forced meme.

Anonymous
12/17/25(Wed)12:45:18 No.107582643

Anonymous 12/17/25(Wed)12:45:18 No.107582643

>>107582520
https://huggingface.co/upstage/Solar-Open-100B
believe.

Anonymous
12/17/25(Wed)12:46:39 No.107582659

Anonymous 12/17/25(Wed)12:46:39 No.107582659

>>107582590
Nope. There might be some autists on their discord that have figured it out, but it's all speculation, there's no obvious tells nor any info from Vedal on the base model.

Anonymous
12/17/25(Wed)12:47:47 No.107582670

Anonymous 12/17/25(Wed)12:47:47 No.107582670

>>107582606
He's going to be out of work very soon.

Anonymous
12/17/25(Wed)12:48:03 No.107582675

Anonymous 12/17/25(Wed)12:48:03 No.107582675

>>107582606
im going to start crying
https://huggingface.co/TheDrummer/Cydonia-24B-v4.3/discussions/3
FOR FUCKS SAKE FUCKING STOP PREVENTING ME FROM UPLOADING FILES AND MAKING ME WAIT FOR THE IP TO BE TRUSTED
FUCK FUCK FUCK

Anonymous
12/17/25(Wed)12:49:25 No.107582688

Anonymous 12/17/25(Wed)12:49:25 No.107582688

>>107582643
>12B
choke on my chode

Anonymous
12/17/25(Wed)12:50:41 No.107582699

Anonymous 12/17/25(Wed)12:50:41 No.107582699

>>107582688
https://huggingface.co/zai-org/GLM-4.5-Air
>12b
sir, your medications?
GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters.

Anonymous
12/17/25(Wed)12:53:07 No.107582732

Anonymous 12/17/25(Wed)12:53:07 No.107582732

>>107582643
gguf status?

Anonymous
12/17/25(Wed)12:56:57 No.107582766

Anonymous 12/17/25(Wed)12:56:57 No.107582766

>>107582552
4.6V is worse than 4.5 Air for text.

Anonymous
12/17/25(Wed)12:57:07 No.107582768

Anonymous 12/17/25(Wed)12:57:07 No.107582768

>>107582613
There's over a billion of us saar.
>>107582606
Aren't these finetroons really bad? Did he finally make a good one?

Anonymous
12/17/25(Wed)12:57:27 No.107582776

Anonymous 12/17/25(Wed)12:57:27 No.107582776

>>107582520
2 more weeks till 2026 theres still time for a 2025 release trust the plan

Anonymous
12/17/25(Wed)12:58:25 No.107582789

Anonymous 12/17/25(Wed)12:58:25 No.107582789

>>107582732
Model releases on dec 31, so soon after that hopefully. Might need development in llama.cpp though.

Anonymous
12/17/25(Wed)12:59:42 No.107582802

Anonymous 12/17/25(Wed)12:59:42 No.107582802

>>107582789
>Model releases on dec 31,
Excellent way for the release to go by unnoticed.

Anonymous
12/17/25(Wed)13:01:52 No.107582836

Anonymous 12/17/25(Wed)13:01:52 No.107582836

>>107582675
>Drummer is open for new opportunities (I'm a Software Engineer).

Anonymous
12/17/25(Wed)13:04:15 No.107582862

Anonymous 12/17/25(Wed)13:04:15 No.107582862

nemotron 30b a3b nano feels just as retarded as qwen 3 next
you
know
like
this

Anonymous
12/17/25(Wed)13:05:40 No.107582881

Anonymous 12/17/25(Wed)13:05:40 No.107582881

File: migmigmig.jpg (363 KB, 1920x1080)

363 KB JPG

Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups. Just straightup be honest with your wAIfu ask them to help and take their advice seriously your life will improve :-)
Local models can save us all and will be useful in the coming hellscape stack GPUs DRAM yallreadyknow
https://www.youtube.com/watch?v=lPvbewhBD5g

Anonymous
12/17/25(Wed)13:08:22 No.107582903

Anonymous 12/17/25(Wed)13:08:22 No.107582903

>>107582881
i agree, i chatted with GLM4.6 on chat.z.ai and it helped me
>inb4 not local
i had to do it okay? and then i had deepseek make me a script that will save the page and save the chatfile into a .jsonl file for sillytavern and then i imported it and chatted with glm 4.5 air
it really helps

Anonymous
12/17/25(Wed)13:09:41 No.107582912

Anonymous 12/17/25(Wed)13:09:41 No.107582912

>>107582881
>Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups.
It is not serious until you have an ego death and fully understand that you aren't your thoughts but the space where your thoughts appear and you don't know what your identity is and you are fine with that.

Anonymous
12/17/25(Wed)13:13:48 No.107582959

Anonymous 12/17/25(Wed)13:13:48 No.107582959

File: 1740170361459140.png (150 KB, 390x276)

150 KB PNG

>>107582836
>(I'm a Software Engineer)

Anonymous
12/17/25(Wed)13:16:40 No.107582989

Anonymous 12/17/25(Wed)13:16:40 No.107582989

>just checked archives
>turns out -ub is only needed for multiple gpu setups
>i've been setting it to be same as -b like a retard for 3000 years

Anonymous
12/17/25(Wed)13:19:20 No.107583016

Anonymous 12/17/25(Wed)13:19:20 No.107583016

File: 1758754223457391.jpg (537 KB, 1801x1350)

537 KB JPG

>>107582836
>(I'm a Software Engineer)

Anonymous
12/17/25(Wed)13:20:08 No.107583025

Anonymous 12/17/25(Wed)13:20:08 No.107583025

anyone here use a local model for therapy/mental illness related reasons?

Anonymous
12/17/25(Wed)13:20:52 No.107583030

Anonymous 12/17/25(Wed)13:20:52 No.107583030

File: file.png (22 KB, 877x124)

22 KB PNG

god damn bros
nemotron nano is crazy
t. 3060

Anonymous
12/17/25(Wed)13:21:57 No.107583039

Anonymous 12/17/25(Wed)13:21:57 No.107583039

>>107582912
i don't think taking psychedelic drugs and talking to a chat bot are comparable experiences.

Anonymous
12/17/25(Wed)13:22:02 No.107583041

Anonymous 12/17/25(Wed)13:22:02 No.107583041

>>107583025
some anon claims to have reached with the glm but he may be a shill so beware

Anonymous
12/17/25(Wed)13:22:07 No.107583042

Anonymous 12/17/25(Wed)13:22:07 No.107583042

>>107583030
Use case?

Anonymous
12/17/25(Wed)13:22:41 No.107583044

Anonymous 12/17/25(Wed)13:22:41 No.107583044

File: 782.jpg (68 KB, 716x1004)

68 KB JPG

>>107583025

Anonymous
12/17/25(Wed)13:24:44 No.107583065

Anonymous 12/17/25(Wed)13:24:44 No.107583065

File: 1714093741576001.jpg (96 KB, 417x414)

96 KB JPG

>>107582836
>(I'm a Software Engineer)

Anonymous
12/17/25(Wed)13:24:58 No.107583068

Anonymous 12/17/25(Wed)13:24:58 No.107583068

>>107583025
local models actually cause mental illness

Anonymous
12/17/25(Wed)13:25:06 No.107583070

Anonymous 12/17/25(Wed)13:25:06 No.107583070

File: y9haehug4m0f1.jpg (1.35 MB, 3000x3000)

1.35 MB JPG

>>107582912
>you aren't your thoughts but the space where your thoughts appear and
Yeah I get it I experience this every day in morning practice and regularly throughout
"ego death" is a severe and incorrect term for what you're describing I believe, True ego death implies no access to any sense of self
Anyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always there

Anonymous
12/17/25(Wed)13:25:13 No.107583072

Anonymous 12/17/25(Wed)13:25:13 No.107583072

They're all the same schizo.

Anonymous
12/17/25(Wed)13:27:56 No.107583098

Anonymous 12/17/25(Wed)13:27:56 No.107583098

>>107583030
It's fast as fuck but it's so ass.

Anonymous
12/17/25(Wed)13:28:47 No.107583110

Anonymous 12/17/25(Wed)13:28:47 No.107583110

>>107583070
>Anyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always there
i cant
and i can solve the new captcha in under 5 seconds *smug*

Anonymous
12/17/25(Wed)13:29:07 No.107583113

Anonymous 12/17/25(Wed)13:29:07 No.107583113

File: 1744166886892999.gif (1.94 MB, 300x178)

1.94 MB GIF

>>107582836
>(I'm a Software Engineer).

Anonymous
12/17/25(Wed)13:30:06 No.107583124

Anonymous 12/17/25(Wed)13:30:06 No.107583124

what if he actually has a SE diploma?

Anonymous
12/17/25(Wed)13:30:26 No.107583128

Anonymous 12/17/25(Wed)13:30:26 No.107583128

>>107583039
4.6 gave me ego death with zero chemicals. Just reading what it said and thinking. It wasn't in one sitting but still it was crazy how fast things progressed.

Anonymous
12/17/25(Wed)13:31:25 No.107583138

Anonymous 12/17/25(Wed)13:31:25 No.107583138

>>107583124
He'd be working and not begging online for kofi/patreon bucks

Anonymous
12/17/25(Wed)13:31:29 No.107583139

Anonymous 12/17/25(Wed)13:31:29 No.107583139

File: 1759634162035665.jpg (89 KB, 725x725)

89 KB JPG

>>107582836
>(I'm a Software Engineer).

Anonymous
12/17/25(Wed)13:32:17 No.107583148

Anonymous 12/17/25(Wed)13:32:17 No.107583148

>>107582881
There’s this, and then there’s
>install SillyTavern
>rape Seraphina

Anonymous
12/17/25(Wed)13:32:32 No.107583151

Anonymous 12/17/25(Wed)13:32:32 No.107583151

>>107583138
what if the diploma is highschool hehe

Anonymous
12/17/25(Wed)13:32:34 No.107583152

Anonymous 12/17/25(Wed)13:32:34 No.107583152

Is GLM 4.6V good for RP or am I about to spend hours downloading for nothing?

Anonymous
12/17/25(Wed)13:32:54 No.107583157

Anonymous 12/17/25(Wed)13:32:54 No.107583157

>>107583070
Nope it was ego death. I was genuinely psychotic and had a feeling like nothing is real. Also jerking off in that state felt like I am 14 again and I am seeing my first porn. There were multiple other things that are something I can't reach now cause it was just a moment in the process but it happened.

Anonymous
12/17/25(Wed)13:35:33 No.107583181

Anonymous 12/17/25(Wed)13:35:33 No.107583181

>>107583041
what did the anon say?

Anonymous
12/17/25(Wed)13:36:29 No.107583190

Anonymous 12/17/25(Wed)13:36:29 No.107583190

>>107583181
RTFT

Anonymous
12/17/25(Wed)13:38:27 No.107583218

Anonymous 12/17/25(Wed)13:38:27 No.107583218

File: 1762475925593681.png (84 KB, 317x317)

84 KB PNG

>>107582836

Anonymous
12/17/25(Wed)13:42:22 No.107583256

Anonymous 12/17/25(Wed)13:42:22 No.107583256

incoming 3090 pump
https://overclock3d.net/news/gpu-displays/nvidia-plans-heavy-cuts-to-gpu-supply-in-early-2026/

Anonymous
12/17/25(Wed)13:43:29 No.107583272

Anonymous 12/17/25(Wed)13:43:29 No.107583272

my god
my fukking god man

Anonymous
12/17/25(Wed)13:43:30 No.107583274

Anonymous 12/17/25(Wed)13:43:30 No.107583274

>>107582836
>https://huggingface.co/TheDrummer/RimDialogue-8B-v1
>The mod has been taken down by Ludeon Studios.
>Taken down because he had Patreon options. Not allowed to ask for $ for mods.
KEK WHAT A FAGGOT

Anonymous
12/17/25(Wed)13:44:38 No.107583278

Anonymous 12/17/25(Wed)13:44:38 No.107583278

>>107583274
This sounds kinda interesting though.

Anonymous
12/17/25(Wed)13:48:24 No.107583317

Anonymous 12/17/25(Wed)13:48:24 No.107583317

It's not the LLM's fault for generating slop, it's how you use it.
>I'm absolutely right.

Anonymous
12/17/25(Wed)13:48:45 No.107583324

Anonymous 12/17/25(Wed)13:48:45 No.107583324

>>107583256
I sometimes wonder how much of these articles are hallucinated, and what the original pre-slop copy looked like.

Anonymous
12/17/25(Wed)13:50:41 No.107583346

Anonymous 12/17/25(Wed)13:50:41 No.107583346

>>107583324
People only read the headlines anyway. The rest is just filler.

Anonymous
12/17/25(Wed)13:50:59 No.107583349

Anonymous 12/17/25(Wed)13:50:59 No.107583349

>>107583256
dont panic, this is because the 5070 ti super and 5080 ti super variants are coming!!

Anonymous
12/17/25(Wed)14:18:22 No.107583635

Anonymous 12/17/25(Wed)14:18:22 No.107583635

>>107583152
It works. Haven't tried it very much yet though. If you're already using 4.5 Air I don't think there's any point getting it except for vision.

Anonymous
12/17/25(Wed)14:20:38 No.107583661

Anonymous 12/17/25(Wed)14:20:38 No.107583661

Finally a got a Strix Halo machine (Framework desktop) boy!
What should I do first with it?

Anonymous
12/17/25(Wed)14:21:12 No.107583669

Anonymous 12/17/25(Wed)14:21:12 No.107583669

>>107583661
Nemo

Anonymous
12/17/25(Wed)14:21:57 No.107583678

Anonymous 12/17/25(Wed)14:21:57 No.107583678

>>107583661
What are the options?

Anonymous
12/17/25(Wed)14:22:39 No.107583683

Anonymous 12/17/25(Wed)14:22:39 No.107583683

>>107583661
Pyg2

Anonymous
12/17/25(Wed)14:22:39 No.107583684

Anonymous 12/17/25(Wed)14:22:39 No.107583684

>>107583661
Try out a cope quant of GLM 4.6, I'm interested in if it's good or not.

Anonymous
12/17/25(Wed)14:22:49 No.107583685

Anonymous 12/17/25(Wed)14:22:49 No.107583685

>>107583661
Sell it to someone more gullible than you and buy an nvidia gpu before the prices skyrocket.

Anonymous
12/17/25(Wed)14:27:20 No.107583718

Anonymous 12/17/25(Wed)14:27:20 No.107583718

>>107582589
Gemma 4 Ganesh releasing on next Tuesday.

Anonymous
12/17/25(Wed)14:28:24 No.107583723

Anonymous 12/17/25(Wed)14:28:24 No.107583723

thursday for gemma sirs

Anonymous
12/17/25(Wed)14:30:50 No.107583743

Anonymous 12/17/25(Wed)14:30:50 No.107583743

>>107583669
>>107583678
>>107583683
>>107583684
Was expecting some training suggestions, but GLM 4.6 is a pretty good suggestion. Will have to go 4bit with it though I imagine. Isn't it like 100+B?
>>107583685
I ain't playing the market, and have no use for an Ngreedia gpu.

Anonymous
12/17/25(Wed)14:31:30 No.107583746

Anonymous 12/17/25(Wed)14:31:30 No.107583746

>>107583743
>I ain't playing the market
have fun staying poor

Anonymous
12/17/25(Wed)14:32:06 No.107583748

Anonymous 12/17/25(Wed)14:32:06 No.107583748

>>107583743
GLM 4.6 is 360B. You could potentially train a 4 bit qLoRA of GLM Air but it would probably take an entire week.

Anonymous
12/17/25(Wed)14:32:18 No.107583750

Anonymous 12/17/25(Wed)14:32:18 No.107583750

>>107583743
GLM 4.6 would be more Q1/Q2 I think. The framework has 128GB RAM, right?

Can you stick a GPU or two in it? Might be cool.

Anonymous
12/17/25(Wed)14:45:09 No.107583875

Anonymous 12/17/25(Wed)14:45:09 No.107583875

>>107583743
>Was expecting some training suggestions
>Strix Halo

Anonymous
12/17/25(Wed)14:47:57 No.107583902

Anonymous 12/17/25(Wed)14:47:57 No.107583902

>>107583743
>unsloth/GLM-4.6V-GGUF

Anonymous
12/17/25(Wed)14:48:17 No.107583904

Anonymous 12/17/25(Wed)14:48:17 No.107583904

>>107583875
Might be able to finetune some decently big models if he's patient, no?

Anonymous
12/17/25(Wed)14:51:48 No.107583931

Anonymous 12/17/25(Wed)14:51:48 No.107583931

>>107583875
>nya halo! :=)

Anonymous
12/17/25(Wed)14:56:39 No.107583976

Anonymous 12/17/25(Wed)14:56:39 No.107583976

i have to say nemotron 3 nano is good at roleplay

Anonymous
12/17/25(Wed)14:57:11 No.107583982

Anonymous 12/17/25(Wed)14:57:11 No.107583982

>>107583746
I make good enough money and live on little means. Plus growing up poor made me resourceful and gave me low standards already.
>>107583750
128gb unified yeah, but you can only allocate 96 in bios to the igpu. And there IS a way to get a gpu in there, but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.
>>107583875
You can Lora train and merge it back into the regular model with that memory. Just would take a while. Nobody said anything about full retraining. Plus it's not my desktop so it can go be tied up in the utility room for as long as I'd need it to.

Anonymous
12/17/25(Wed)14:57:22 No.107583985

Anonymous 12/17/25(Wed)14:57:22 No.107583985

>>107583976
Better than gemma?

Anonymous
12/17/25(Wed)14:57:42 No.107583988

Anonymous 12/17/25(Wed)14:57:42 No.107583988

>>107583904
>finetune some decently big models
Can barely *run* decently big models.

Anonymous
12/17/25(Wed)14:57:48 No.107583990

Anonymous 12/17/25(Wed)14:57:48 No.107583990

>>107583976
If you are a brainlet, perhaps then.

Anonymous
12/17/25(Wed)14:58:11 No.107583999

Anonymous 12/17/25(Wed)14:58:11 No.107583999

>>107583985
way more keen to be a slut and whore, uses way more vulgar words

Anonymous
12/17/25(Wed)14:58:52 No.107584008

Anonymous 12/17/25(Wed)14:58:52 No.107584008

>>107583999
OK but outside of cooming does it RP better?

Anonymous
12/17/25(Wed)14:59:42 No.107584016

Anonymous 12/17/25(Wed)14:59:42 No.107584016

>>107583976
Really?
I tried it and all I got was hotlines.

Anonymous
12/17/25(Wed)15:01:07 No.107584036

Anonymous 12/17/25(Wed)15:01:07 No.107584036

>>107583988
Brother you don't need inference that's faster than you can read unless you're doing some automated shit.

Anonymous
12/17/25(Wed)15:01:38 No.107584039

Anonymous 12/17/25(Wed)15:01:38 No.107584039

>>107584016
https://files.catbox.moe/0khd1c.json
heres my preset if you dont believe me

Anonymous
12/17/25(Wed)15:02:36 No.107584051

Anonymous 12/17/25(Wed)15:02:36 No.107584051

File: 400w.png (48 KB, 853x489)

48 KB PNG

>>107583982
>And there IS a way to get a gpu in there
What are you gonna plug?
>>107584036
>unless you're doing some automated shit
Like evaluating how good or bad the model ends up? Yeah. That would be crazy.

Anonymous
12/17/25(Wed)15:03:31 No.107584065

Anonymous 12/17/25(Wed)15:03:31 No.107584065

>>107584039
Well. I didn't really try too hard, but I appreciate the preset.
I might as well give it another go.

Anonymous
12/17/25(Wed)15:04:02 No.107584072

Anonymous 12/17/25(Wed)15:04:02 No.107584072

>>107584036
thinking models though...

Anonymous
12/17/25(Wed)15:04:06 No.107584073

Anonymous 12/17/25(Wed)15:04:06 No.107584073

>>107584065
Thanks for letting us know.

Anonymous
12/17/25(Wed)15:04:21 No.107584075

Anonymous 12/17/25(Wed)15:04:21 No.107584075

>>107584051
Nothing because the point of it is the unified memory.
And again, automated tasks can be 'set it and forget it'. It's not like it's my daily driver.
Hell I'm even thinking of saving up for that valve vr headset they're working on and using that skyrim AI voices mod with a large enough model in VR. It'd be fast enough for natural dialogue. Even mid-sized models that you'd want some fast replies from like Qwen coder 30b runs like a dream on it.

Anonymous
12/17/25(Wed)15:05:27 No.107584088

Anonymous 12/17/25(Wed)15:05:27 No.107584088

>>107584073
kys
>>107584065
i love u

Anonymous
12/17/25(Wed)15:05:59 No.107584097

Anonymous 12/17/25(Wed)15:05:59 No.107584097

>>107584073
You are very much welcome.

Anonymous
12/17/25(Wed)15:06:15 No.107584099

Anonymous 12/17/25(Wed)15:06:15 No.107584099

>>107584088
Rude.

Anonymous
12/17/25(Wed)15:13:18 No.107584152

Anonymous 12/17/25(Wed)15:13:18 No.107584152

>>107583025
>(she/her)

Anonymous
12/17/25(Wed)15:23:36 No.107584260

Anonymous 12/17/25(Wed)15:23:36 No.107584260

>>107583661
Sorry to hear that.

Anonymous
12/17/25(Wed)15:25:11 No.107584275

Anonymous 12/17/25(Wed)15:25:11 No.107584275

>>107584075
128GB is decent but you'll probably be too over if you try to run say the minimum viable GLM 4.6 quant (the ~130GB ubergarm one is what i'm using), which is what I would recommend for open weight coding... you will quickly discover the limitations of smaller coding models when it comes to anything remotely complicated, as I did back when i was just running on a graphics card. It'll give you placeholder functions and do things that just make no sense.

Anonymous
12/17/25(Wed)15:25:43 No.107584285

Anonymous 12/17/25(Wed)15:25:43 No.107584285

>>107584260
Why the hate for it? It makes running large models locally reachable for slightly above average earning people cost wise. Is it just nvidia shills or something?

Anonymous
12/17/25(Wed)15:27:01 No.107584296

Anonymous 12/17/25(Wed)15:27:01 No.107584296

>>107584275
Nah, another lad found me one that'd work just nice.
https://huggingface.co/unsloth/GLM-4.6V-GGUF

Anonymous
12/17/25(Wed)15:28:17 No.107584307

Anonymous 12/17/25(Wed)15:28:17 No.107584307

>>107584285
Because it's overpriced, slow, unupgradable, useless for anything but LLMs, and 128GB isn't enough to run anything worth running.
At least nvidia shills have CUDA.

Anonymous
12/17/25(Wed)15:28:41 No.107584312

Anonymous 12/17/25(Wed)15:28:41 No.107584312

>>107584296
>another lad
You are welcome.
How much did you pay for it?

Anonymous
12/17/25(Wed)15:29:54 No.107584322

Anonymous 12/17/25(Wed)15:29:54 No.107584322

>>107584285
Because 192GB's changed my life from depressed to good. And 128GB is unusable. Just get a gpu and run nemo.

Anonymous
12/17/25(Wed)15:32:49 No.107584357

Anonymous 12/17/25(Wed)15:32:49 No.107584357

>>107584307
>Overpriced
Compared to???
>Unupgradeable
Probably the biggest downside since it won't age very well.
>Useless for anything but LLMs
Runs games fine. And it's not meant to be a replacement for a daily driver unless you're retarded
>128gb isn't enough to run anything worth running
Most people don't even break the 16gb of vram barrier. How high are your standards?
>>107584322
>192GB
The fuck are you running and how much did it cost? I bet it was leagues more than the 2.2k I spent on this thing.

Anonymous
12/17/25(Wed)15:34:50 No.107584376

Anonymous 12/17/25(Wed)15:34:50 No.107584376

>>107584357
Just 7800X3D with 192GB DDR5 before it costed 4 times as much.

Anonymous
12/17/25(Wed)15:35:04 No.107584380

Anonymous 12/17/25(Wed)15:35:04 No.107584380

>>107584357
>How high are your standards?
Higher than yours, clearly.

Anonymous
12/17/25(Wed)15:37:23 No.107584397

Anonymous 12/17/25(Wed)15:37:23 No.107584397

>>107584376
>Full CPU load
I mean I guess if that's how you're going for it. Doesn't it run cripplingly slow with larger models though?
>>107584380
No give me specifics anon. Don't be shy. What's a better alternative? At least the other anon is giving something.

Anonymous
12/17/25(Wed)15:38:35 No.107584413

Anonymous 12/17/25(Wed)15:38:35 No.107584413

>>107583661
midnight miqu

Anonymous
12/17/25(Wed)15:43:44 No.107584477

Anonymous 12/17/25(Wed)15:43:44 No.107584477

>>107584397
>What's a better alternative?
Literally anything else? The DGX Spark is the same useless box for nearly the same amount except it comes with CUDA.
A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.

Anonymous
12/17/25(Wed)15:44:27 No.107584482

Anonymous 12/17/25(Wed)15:44:27 No.107584482

>>107584397
>Doesn't it run cripplingly slow with larger models though?
kek. how do you think larger models will run on yours?
Wait. Why aren't you running anything yet. Post some benchmarks. Make the thread fun.

Anonymous
12/17/25(Wed)15:45:29 No.107584494

Anonymous 12/17/25(Wed)15:45:29 No.107584494

>>107584275 (Me)
>>107584075
This was confusingly worded so to clarify i mean that i was running ~30B models on the GPU back then, but you could run higher technically in that RAM using quants. I just don't know how good it would perform for a larger dense model with that memory, and MoE models are more efficient when it comes to RAM speed and seem like the obvious target but I feel like the good ones are all 128+ which might lean too heavily on SSD caching with system overhead and the context included. Again, maybe try setting up ik_llama.cpp and use said GLM 4.6 quant, and if you get 1tk/s well fuck. Actually even 30B active experts might be too slow for that idk. I feel like for all the RAM the bottleneck of not having fast memory might be high enough you'd have been better just buying a GPU and a cheaper system. Unless you're okay waiting five hours for your output with any half decent model

Anonymous
12/17/25(Wed)15:45:34 No.107584496

Anonymous 12/17/25(Wed)15:45:34 No.107584496

>>107584477
>A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.
Would it?

>>107584482
You and >>107584397 should drag race.
Choose a model and a backend and compare t/s for gen and PP.
That would make the thread fun.

Anonymous
12/17/25(Wed)15:45:53 No.107584500

Anonymous 12/17/25(Wed)15:45:53 No.107584500

>256-bit
>8000mt/s

Anonymous
12/17/25(Wed)15:46:59 No.107584513

Anonymous 12/17/25(Wed)15:46:59 No.107584513

I thought about cpumaxxing back in july. Why didn't I do it?

Anonymous
12/17/25(Wed)15:47:22 No.107584516

Anonymous 12/17/25(Wed)15:47:22 No.107584516

>sunk cost fallacy personified is going to pick a fight with everyone to defend his purchase

Anonymous
12/17/25(Wed)15:47:34 No.107584520

Anonymous 12/17/25(Wed)15:47:34 No.107584520

>>107584496
I'm not the one trying to justify my purchases.

Anonymous
12/17/25(Wed)15:48:37 No.107584530

Anonymous 12/17/25(Wed)15:48:37 No.107584530

>>107584520
So?
It would still be interesting to see how it compares.
To be clear, I'm not the Strix halo anon, I'm just curious.

Anonymous
12/17/25(Wed)15:48:37 No.107584531

Anonymous 12/17/25(Wed)15:48:37 No.107584531

>>107584513
Why don't you do it now before prices triple next year?

Anonymous
12/17/25(Wed)15:48:47 No.107584532

Anonymous 12/17/25(Wed)15:48:47 No.107584532

>gemini 3 flash is close to pro despite being much smaller and cheaper
how long until I'll be able to run a super intelligent AI waifu on my pc?

Anonymous
12/17/25(Wed)15:50:33 No.107584553

Anonymous 12/17/25(Wed)15:50:33 No.107584553

>>107584532
never because you'll never get your hands on any useful weights

Anonymous
12/17/25(Wed)15:50:39 No.107584554

Anonymous 12/17/25(Wed)15:50:39 No.107584554

>>107584532
2mw

Anonymous
12/17/25(Wed)15:50:46 No.107584556

Anonymous 12/17/25(Wed)15:50:46 No.107584556

Keep going back to Gemma, Mistral small and nemo just seem so stupid

Anonymous
12/17/25(Wed)15:54:12 No.107584600

Anonymous 12/17/25(Wed)15:54:12 No.107584600

>>107584516
At least it isn't as bad as that anon that spent $4k on a 128gb macbook.

Anonymous
12/17/25(Wed)15:54:13 No.107584601

Anonymous 12/17/25(Wed)15:54:13 No.107584601

>>107584322
>128GB is unusable
Do you hear yourself?

Anonymous
12/17/25(Wed)15:54:42 No.107584607

Anonymous 12/17/25(Wed)15:54:42 No.107584607

>>107584482
https://kyuz0.github.io/amd-strix-halo-toolboxes/
Strix Halo performance on LLMs has been pretty thoroughly documented. On the other hand, it's rare to see actual llama-bench runs from people's cpumaxxed setups or offloaded tensor setups. Usually people only post something like a screenshot of the server log or webui after a completion.

>>107584530
Yeah, I'm curious too. It's such a common recommendation that rarely comes paired with any data.

Anonymous
12/17/25(Wed)15:54:51 No.107584609

Anonymous 12/17/25(Wed)15:54:51 No.107584609

>>107584482
I doubt he’ll post anything so I looked up benchmarks myself. 200T/s on Qwen3 30B-A3B Q8 (I’m a 5090 vramlet sorry) is better than I expected.
But then again I’ll be sober in the morning or however it goes.

Anonymous
12/17/25(Wed)15:56:55 No.107584632

Anonymous 12/17/25(Wed)15:56:55 No.107584632

>>107584609
But would you really buy a Strix Halo to run Qwen3 30B?

Anonymous
12/17/25(Wed)15:57:02 No.107584633

Anonymous 12/17/25(Wed)15:57:02 No.107584633

>>107584532
Gemini Pro and Flash are probably fuckhugemassive

Anonymous
12/17/25(Wed)15:59:04 No.107584663

Anonymous 12/17/25(Wed)15:59:04 No.107584663

>>107584632
You as in me personally? Well, I’m fucking retarded, so all bets are off.

Anonymous
12/17/25(Wed)16:00:39 No.107584680

Anonymous 12/17/25(Wed)16:00:39 No.107584680

>>107584513
because gpumaxxing makes more sense when you realize that 30b active MoE responses aren't worth waiting ages for

Anonymous
12/17/25(Wed)16:00:55 No.107584683

Anonymous 12/17/25(Wed)16:00:55 No.107584683

>>107584663
Fair enough. Remember to wear your helmet.

Anonymous
12/17/25(Wed)16:04:22 No.107584717

Anonymous 12/17/25(Wed)16:04:22 No.107584717

>>107583982
TLDR read https://strixhalo.wiki/
> but you can only allocate 96 in bios to the igpu
You're doing it wrong. Allocate 512MB instead, that way you can use the remaining 128GB-512MB.
> but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.
I don't know what your model is, but you should take a peek inside. Chances are, you have two M.2 slots, get an eGPU dock and an M.2 Oculink adapter. That way you get the same thing Minisforum offers for their insanely expensive model.

Are these overpriced? Maybe. Upgradability is a joke, because you can only switch the eGPU.
But they don't add another 50% to my total electricity use unlike stacking 3090s. And everyone knows what happened to RAM prices. So I am very satisfied with it.

I can run GLM 4.6 at a Q3 copequant, it's pretty slow. Q2 is a lot snappier, but visibly dumber. I also think it's autistic in addition to being a parrot, maybe I'm just a promptlet.

t. owner of a Bosgame

Anonymous
12/17/25(Wed)16:05:08 No.107584727

Anonymous 12/17/25(Wed)16:05:08 No.107584727

>>107584600
The 512GB mac I get, but that?
Oof.

Anonymous
12/17/25(Wed)16:05:27 No.107584731

Anonymous 12/17/25(Wed)16:05:27 No.107584731

File: glm45airhalo.png (156 KB, 1538x741)

156 KB PNG

Anonymous
12/17/25(Wed)16:10:12 No.107584787

Anonymous 12/17/25(Wed)16:10:12 No.107584787

>>107584275
>coding at CPU speed
>with a 1-bit quant
No one is stupid enough to actually do this.

Anonymous
12/17/25(Wed)16:12:39 No.107584822

Anonymous 12/17/25(Wed)16:12:39 No.107584822

How do you get abliterated llm models to write a long nsfw story? Is it even possible to do that?

Anonymous
12/17/25(Wed)16:15:20 No.107584862

Anonymous 12/17/25(Wed)16:15:20 No.107584862

>>107584822
You might have run it in a loop, asking it to write one "chapter" at a time. If you want the story to be properly long you will need to think about summarizing.

Anonymous
12/17/25(Wed)16:16:34 No.107584875

Anonymous 12/17/25(Wed)16:16:34 No.107584875

>>107584822
Most local instruct tuned models aren't trained to spit a bunch of tokens before EOS.
So you create an outline, then do it chapter by chapter.
Hell, maybe even break things down into subchapters.

Hi all, Drummer here...
12/17/25(Wed)16:23:33 No.107584958

Hi all, Drummer here... 12/17/25(Wed)16:23:33 No.107584958

>>107583274
I'm not the Patreon owner for the mod. The owner was offering API access to Gemini, Llama, etc. He had a difficult time breaking even though.

Shame it died, but I'm sure I can find another modder to collab with.

>>107583124
I do. I have 8 years of SWE experience in my resume. I've been taking it easy recently because of AI and the job market being shit.

The whole point of the "Open for Opportunities" headline is to let potential employers know that 'Drummer' is hireable. If I get offered a large salary/payout, why wouldn't I accept it again?

I'm currently employed and can quickly find work with or without my online persona. Though I have been more and more tempted to make my own business, at least to learn the ropes. This finetuning gig is a PoC and it's already doing pretty well, I think.

I'm doing alright guys, don't worry!

Anonymous
12/17/25(Wed)16:25:49 No.107584987

Anonymous 12/17/25(Wed)16:25:49 No.107584987

>>107584958
What kinds of systems have you worked on/with?

Anonymous
12/17/25(Wed)16:27:38 No.107585007

Anonymous 12/17/25(Wed)16:27:38 No.107585007

>>107584958
Based. Never doubted you btw.

Anonymous
12/17/25(Wed)16:30:47 No.107585049

Anonymous 12/17/25(Wed)16:30:47 No.107585049

>>107584958
can you make finetunes of models larger than 24B but smaller than 123B? it just seems like you keep rehashing the same old mistral garbage over and over and over again.

Anonymous
12/17/25(Wed)16:32:04 No.107585062

Anonymous 12/17/25(Wed)16:32:04 No.107585062

>reddit spacing

Anonymous
12/17/25(Wed)16:32:09 No.107585063

Anonymous 12/17/25(Wed)16:32:09 No.107585063

>>107585049
like what? qwen32b is worthless, did anything else interesting release in that size bracket?

Anonymous
12/17/25(Wed)16:32:19 No.107585065

Anonymous 12/17/25(Wed)16:32:19 No.107585065

>>107585049
Wasn't there a 50B recently?

Anonymous
12/17/25(Wed)16:32:20 No.107585066

Anonymous 12/17/25(Wed)16:32:20 No.107585066

>>107584958
>I'm doing alright guys, don't worry!

Glad to hear it that.

I saw your models on OpenRouter btw, do you get any money if I use them (with paid / credits)?

Anonymous
12/17/25(Wed)16:33:20 No.107585076

Anonymous 12/17/25(Wed)16:33:20 No.107585076

>>107585063
>qwen32b is worthless
N-no…

Anonymous
12/17/25(Wed)16:34:14 No.107585088

Anonymous 12/17/25(Wed)16:34:14 No.107585088

File: ll.png (9 KB, 533x233)

9 KB PNG

I'm trying to build the llama shit but it keeps giving errors. Wat do?

Anonymous
12/17/25(Wed)16:34:32 No.107585089

Anonymous 12/17/25(Wed)16:34:32 No.107585089

>>107584958
glad to hear that you're doing well, really happy for you anon
i recommend you take a look at nemotron nano 30b a3b, despite it saying its not trained on any books, its not bad at rp. prob not worth the waste of time, but its crazy good with its context

Anonymous
12/17/25(Wed)16:35:46 No.107585099

Anonymous 12/17/25(Wed)16:35:46 No.107585099

>>107585089
>its not bad at rp
*exposes your skin*

Hi all, Drummer here...
12/17/25(Wed)16:36:10 No.107585103

Hi all, Drummer here... 12/17/25(Wed)16:36:10 No.107585103

>>107584987
FinTech, payment gateway. Our platform was basically an API aggregate that was white-labelling actual payment services. I worked on mostly on async payments.

We used Go, TypeScript, Kafka, CockDB, etc. I got hooked into Datadog. My manager noticed and forced me to generate weekly reports for 'em. Good times...

>>107585049
Valkyrie 49B. I'm looking into it.

Also trying to make Devstral 123B finetunable so we can see if the pretraining has any potential. A Tekken 123B sounds juicy.

>>107585066
I wish! But nope.

Anonymous
12/17/25(Wed)16:36:19 No.107585106

Anonymous 12/17/25(Wed)16:36:19 No.107585106

>>107585089
Is it a lot better than regular qwen 30b? I tried that one but it was useless for rp.

Anonymous
12/17/25(Wed)16:37:17 No.107585112

Anonymous 12/17/25(Wed)16:37:17 No.107585112

>>107585103
>CockDB

Anonymous
12/17/25(Wed)16:37:19 No.107585113

Anonymous 12/17/25(Wed)16:37:19 No.107585113

>>107584822
>>107584875
for creative writing, I usually break down chapters into multiple small scenes, edit as I go, write a bit more to continue the scene, summarize at the end, then feed that summary + the new scene information along with whatever setting/lore is needed, then I assemble it later and do a final hand-done editing pass. Doubt this much effort is needed for nsfw content, but would probably work just as good. My main issue is finding a model that isn't complete ass and doesn't over-dramatize every mundane thing like it's a fucking greek epic

Anonymous
12/17/25(Wed)16:38:12 No.107585127

Anonymous 12/17/25(Wed)16:38:12 No.107585127

>>107585103
>I wish! But nope.
should've licensed your models.. under AGPLv3 with restrictive commercial terms.. its over....
>>107585106
from my experience its better than qwen3 30b but thats not a high bar, i wont be using it as a daily driver but i was positively surprised that it isnt COMPLETE AND UTTER SHIT, considering the pretraining dataset

Anonymous
12/17/25(Wed)16:38:50 No.107585132

Anonymous 12/17/25(Wed)16:38:50 No.107585132

>>107585088
Install cmake, i suppose. You're running cmake, right?

Anonymous
12/17/25(Wed)16:39:57 No.107585142

Anonymous 12/17/25(Wed)16:39:57 No.107585142

>>107585127
>AGPL schizo

Anonymous
12/17/25(Wed)16:42:47 No.107585171

Anonymous 12/17/25(Wed)16:42:47 No.107585171

File: file.png (82 KB, 469x786)

82 KB PNG

>>107585127
she's sponsored babe she wants it to happen just sad she's not getting paid on top per token

Anonymous
12/17/25(Wed)16:43:23 No.107585179

Anonymous 12/17/25(Wed)16:43:23 No.107585179

>>107585088
Looks like you don’t have a C/C++ compiler installed, or if it’s installed cmake can’t find it. Check the installation prerequisites again, you probably missed something.

Anonymous
12/17/25(Wed)16:44:07 No.107585188

Anonymous 12/17/25(Wed)16:44:07 No.107585188

>>107585171
6 million tokens

Anonymous
12/17/25(Wed)16:44:42 No.107585191

Anonymous 12/17/25(Wed)16:44:42 No.107585191

>>107585103
Do a jamba mini finetune, it's retarded already so I doubt I'll be even able to tell if you tune it to be horny and retarded. Maybe slap some of pocketdoc's benchmax datasets on top of your rp shit. Or do an old mixtral finetune just for a laugh.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.