/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor application acceptance emails are being sent out. Please remember to check your spam box!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/29/25(Sat)14:22:09 No.107373173

File: ComfyUI_00140_.png (1.26 MB, 1024x1024)

1.26 MB PNG

/lmg/ - Local Models General Anonymous 11/29/25(Sat)14:22:09 No.107373173

/lmg/ - a general dedicated to the discussion and development of local language models.

Simple and Clean Edition

Previous threads: >>107359554 & >>107347942

►News
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3
>(11/21) GigaChat3 10B-A1.8B and 702B-A36B released: https://hf.co/collections/ai-sage/gigachat3
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/29/25(Sat)14:22:26 No.107373176

Anonymous 11/29/25(Sat)14:22:26 No.107373176

File: __kasane_teto_utau_drawn_(...).jpg (337 KB, 1536x2048)

337 KB JPG

►Recent Highlights from the Previous Thread: >>107359554

--Challenges in generating Kasane Teto images with Z-Image and LoRA models:
>107368969 >107368985 >107368995 >107369007 >107369017 >107369359 >107369725
--GLM 4.5 Air as uncensored LLM for translation, T2I prompting, and thinking model tasks:
>107363606 >107363637 >107363645 >107363699 >107363711 >107363729
--Budget PC build advice for running LLMs with upgradeability considerations:
>107360618 >107360713 >107360774 >107360958 >107361494 >107361522 >107361553 >107361844 >107361915
--MXFP4_MOE vs traditional quants comparison:
>107364303 >107364442 >107364674 >107364833 >107364960 >107365179 >107365211 >107365492 >107365496
--Evaluating Gemma 3's de-censored versions for roleplay and explicit content handling:
>107370356 >107370374 >107370409 >107370478 >107370499 >107370718 >107370736 >107370862 >107370793 >107370816 >107370847
--GigaChat3 release and performance discussion:
>107364822 >107364936 >107364981 >107365213 >107367060 >107370472 >107366861 >107366986 >107367001 >107367335 >107367505 >107367381 >107367110 >107367864 >107369798 >107368535
--48GB 4090D recommended for LLM GPU under 3k budget:
>107361607 >107361688 >107361781 >107361849 >107361859 >107361867 >107362013 >107362070 >107361747 >107361845 >107361874 >107361897
--LLMs as probabilistic text generators with no real logic, workplace misuse challenges:
>107359608 >107359699 >107359761 >107359810 >107359822 >107359878 >107359823 >107359846
--"Quad" as a campus-specific term in American universities:
>107368575 >107368619 >107369717
--V100 GPU limitations for CUDA training and alternative hardware considerations:
>107368171 >107368420
--Challenges in setting up private, encrypted cloud LLM infrastructure:
>107369831 >107369886
--Miku and Teto (free space):
>107361845 >107364822 >107368995 >107369725

►Recent Highlight Posts from the Previous Thread: >>107359558

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/29/25(Sat)14:27:16 No.107373218

Anonymous 11/29/25(Sat)14:27:16 No.107373218

kyutai doesnt support nihongo

Anonymous
11/29/25(Sat)14:28:52 No.107373232

Anonymous 11/29/25(Sat)14:28:52 No.107373232

>>107373218
Good. Weebs will be hung

Anonymous
11/29/25(Sat)14:33:58 No.107373301

Anonymous 11/29/25(Sat)14:33:58 No.107373301

>>107373232
Weebs are already hung.

Anonymous
11/29/25(Sat)14:37:22 No.107373337

Anonymous 11/29/25(Sat)14:37:22 No.107373337

>>107373301
What good is being hung to virgins?

Anonymous
11/29/25(Sat)14:38:31 No.107373345

Anonymous 11/29/25(Sat)14:38:31 No.107373345

>>107373337
You can more easily identify with the Ojisans in eromanga.

Anonymous
11/29/25(Sat)14:43:49 No.107373392

Anonymous 11/29/25(Sat)14:43:49 No.107373392

>>107373077

DO NOT IGNORE ME OR ELSE

Anonymous
11/29/25(Sat)14:52:07 No.107373472

Anonymous 11/29/25(Sat)14:52:07 No.107373472

>>107373392
Why are you asking? Just download a smaller quant and see if it's acceptable for whatever you're doing. Do you think we have like a spreadsheet for the "brain rot" of every model at every quant ever made?

Anonymous
11/29/25(Sat)14:54:58 No.107373493

Anonymous 11/29/25(Sat)14:54:58 No.107373493

>>107373481
>with brain you is you
lole

Anonymous
11/29/25(Sat)14:55:31 No.107373497

Anonymous 11/29/25(Sat)14:55:31 No.107373497

>>107373392
try it and see :^)

Anonymous
11/29/25(Sat)14:59:10 No.107373516

Anonymous 11/29/25(Sat)14:59:10 No.107373516

>>107373392
The only thing with brain rot is you, frognigger

Anonymous
11/29/25(Sat)14:59:32 No.107373519

Anonymous 11/29/25(Sat)14:59:32 No.107373519

>>107373472
>Do you think we have
who are we? I did not ask (You)

Anonymous
11/29/25(Sat)14:59:51 No.107373523

Anonymous 11/29/25(Sat)14:59:51 No.107373523

Okay bros my therapist thinks I should use a llm to help me with texting women on online apps, now obviously I'm not going to share my fucking with Sam altman or Larry Ellison so i will have to self host a soloution.
My onsite equipment is a 4510T with 128gb of ram and also a A310 . Willing to spend a bit on a GPU extra if it helps me get laid.
I'm new to AI stuff so I'll need recommendations on full stack deployments and also llms.

Anonymous
11/29/25(Sat)15:01:26 No.107373544

Anonymous 11/29/25(Sat)15:01:26 No.107373544

>>107373523
my honest advice in this matter is find a new therapist

Anonymous
11/29/25(Sat)15:01:49 No.107373547

Anonymous 11/29/25(Sat)15:01:49 No.107373547

>>107373516
Just stop.
There is no coming back from mistyping your epic retort.

Anonymous
11/29/25(Sat)15:07:02 No.107373600

Anonymous 11/29/25(Sat)15:07:02 No.107373600

>>107373547
Gag on my chode, bish

Anonymous
11/29/25(Sat)15:10:11 No.107373642

Anonymous 11/29/25(Sat)15:10:11 No.107373642

>>107373523
>texting women on online apps

What you need is a red pill cure asap

Anonymous
11/29/25(Sat)15:11:12 No.107373657

Anonymous 11/29/25(Sat)15:11:12 No.107373657

>>107373544
basado como puta de marde

Anonymous
11/29/25(Sat)15:11:56 No.107373665

Anonymous 11/29/25(Sat)15:11:56 No.107373665

File: 1764152459060075.png (354 KB, 569x628)

354 KB PNG

Hello frens. Is Deepseek R1-0528-Q2_K_XL:671b still the go to for best stories?

Anonymous
11/29/25(Sat)15:12:13 No.107373667

Anonymous 11/29/25(Sat)15:12:13 No.107373667

>>107373523
>spend a bit on a GPU extra if it helps me get laid

it won't, but at least you'll keep the GPU

Anonymous
11/29/25(Sat)15:13:17 No.107373677

Anonymous 11/29/25(Sat)15:13:17 No.107373677

>>107373523
bud, I'm not being a dick, but your best bet will be to hit the gym. Oofy Doofy game does not work well in 2025.

Anonymous
11/29/25(Sat)15:14:48 No.107373693

Anonymous 11/29/25(Sat)15:14:48 No.107373693

>>107373665
I heard a rumor that the latest Kimi K2 write decently as well

Anonymous
11/29/25(Sat)15:16:18 No.107373709

Anonymous 11/29/25(Sat)15:16:18 No.107373709

File: the local model KANG.png (53 KB, 626x236)

53 KB PNG

>>107373693
Thanks anon I'll check it out. This thing is amazing for running these large models. I feel like it should be in the guide.

Anonymous
11/29/25(Sat)15:17:20 No.107373726

Anonymous 11/29/25(Sat)15:17:20 No.107373726

>>107373544
Would bet several of my testicles he made that part up

Anonymous
11/29/25(Sat)15:17:43 No.107373729

Anonymous 11/29/25(Sat)15:17:43 No.107373729

>>107373523
if you really need practice just do the catfish thing.

Anonymous
11/29/25(Sat)15:19:46 No.107373762

Anonymous 11/29/25(Sat)15:19:46 No.107373762

>>107373677
I'm physically attractive enough to get laid on these apps but I'm a turbo autists over text so I get filtered in convo by every girl but mentally ill whoores who just want hookups.
One of my bigger problems is taking to long to reply and over thinking things.
My therapist thinks that asking llms for response templates might help and I agree enough to try .
BTW she is a woman so its possible she was grasping at straws after I shut down her other advice .

Anonymous
11/29/25(Sat)15:22:14 No.107373796

Anonymous 11/29/25(Sat)15:22:14 No.107373796

>>107373762
>taking to long to reply and over thinking things.
so why don't you just stop doing that?

Anonymous
11/29/25(Sat)15:22:14 No.107373798

Anonymous 11/29/25(Sat)15:22:14 No.107373798

>>107373519
We as in any of the people who post in this thread, one of which you are clearly not you stupid frogposting tourist

Anonymous
11/29/25(Sat)15:22:31 No.107373803

Anonymous 11/29/25(Sat)15:22:31 No.107373803

>>107373709
The guide is very out of date but yes we live in a clown world where a mac studio is basically the only consumer-available way to run big boy models.

Out of curiosity, why do you own one of these in the first place? Is there a use case for 512gb memory besides LLMs?

Anonymous
11/29/25(Sat)15:23:38 No.107373825

Anonymous 11/29/25(Sat)15:23:38 No.107373825

File: 3f22nj.jpg (140 KB, 800x450)

140 KB JPG

>>107373796

Anonymous
11/29/25(Sat)15:24:47 No.107373842

Anonymous 11/29/25(Sat)15:24:47 No.107373842

File: 1763345516935370.png (967 KB, 1080x1440)

967 KB PNG

>>107373803
>why do you own one of these in the first place?
I have 2 right now, and the same reason I have an RTX6000 Pro, work got them for me to play with.

Also can we take a moment to appreciate the fact that you can run a 1 trillion parameter model locally. I'm downloading now and will compare to the deepsy model and share.

Anonymous
11/29/25(Sat)15:25:49 No.107373859

Anonymous 11/29/25(Sat)15:25:49 No.107373859

>>107373825
try drugs and or alcohol, maybe? bottom line is I don't think talking to a computer is going to help you learn to talk to women.

Anonymous
11/29/25(Sat)15:26:07 No.107373862

Anonymous 11/29/25(Sat)15:26:07 No.107373862

>>107373798
he just did post in this thread thobeitever

Anonymous
11/29/25(Sat)15:26:43 No.107373875

Anonymous 11/29/25(Sat)15:26:43 No.107373875

>>107373859
I dont want to learn I just want to trick them long enough to form a relationship

Anonymous
11/29/25(Sat)15:27:44 No.107373887

Anonymous 11/29/25(Sat)15:27:44 No.107373887

I didnt come here for therapy advice or advice picking up women.
Just tell me if I need more ram or what model I should use

Anonymous
11/29/25(Sat)15:27:59 No.107373891

Anonymous 11/29/25(Sat)15:27:59 No.107373891

>>107373803
>mac studio is basically the only consumer-available way to run big boy models.
It's not worth it until the prompt processing issue is resolved. ggerganov needs to prioritize implementing some way to use GPUs over USB.

Anonymous
11/29/25(Sat)15:28:53 No.107373902

Anonymous 11/29/25(Sat)15:28:53 No.107373902

>>107373875
There is no trick dude. If you don't have these basic social skills how do you even have a job...

Anonymous
11/29/25(Sat)15:28:56 No.107373903

Anonymous 11/29/25(Sat)15:28:56 No.107373903

>>107373709
you can test it online
https://www.kimi.com/en

Anonymous
11/29/25(Sat)15:29:21 No.107373910

Anonymous 11/29/25(Sat)15:29:21 No.107373910

>>107373875
based future divorcee

Anonymous
11/29/25(Sat)15:30:16 No.107373922

Anonymous 11/29/25(Sat)15:30:16 No.107373922

>>107373887
qwen models are pretty good and have offerings that will run on anything from a cellphone to a datacenter. or you could try gemma3 or even mistral nemo

Anonymous
11/29/25(Sat)15:30:58 No.107373932

Anonymous 11/29/25(Sat)15:30:58 No.107373932

>>107373709
>This thing is amazing
11k though

I don't think the speed matter unless it is coding

Anonymous
11/29/25(Sat)15:31:38 No.107373940

Anonymous 11/29/25(Sat)15:31:38 No.107373940

>>107373803
>>107373709
How fast is an AMD threadripper cpu server with memory? Is it on par with the iChads?

Anonymous
11/29/25(Sat)15:32:27 No.107373949

Anonymous 11/29/25(Sat)15:32:27 No.107373949

>>107373902
Nepotism

Anonymous
11/29/25(Sat)15:32:45 No.107373952

Anonymous 11/29/25(Sat)15:32:45 No.107373952

>>107373762
You can be a lot of thing with AI these day.

Do it for you.

Don't waste it on 304's

Anonymous
11/29/25(Sat)15:33:46 No.107373964

Anonymous 11/29/25(Sat)15:33:46 No.107373964

>>107373798
>stupid frogposting tourist
projections

Anonymous
11/29/25(Sat)15:35:30 No.107373978

Anonymous 11/29/25(Sat)15:35:30 No.107373978

>>107373932
>11k though
oh shit I thought they were like $6k. I just pulled the quote up, you're right, fuck lmao

Anonymous
11/29/25(Sat)15:38:23 No.107374011

Anonymous 11/29/25(Sat)15:38:23 No.107374011

File: OpenBoxHell.png (280 KB, 1485x1162)

280 KB PNG

The cheapest way to get 32/64/128 GB of desktop VRAM, and it's being roundly rejected while Microcenter sells out of $3500 5090's and $8500 RTX Pro 6000's every week. Why can't AMD just be good at Image/Video Gen? ROCm 7 is also exceedingly memory-heavy.

Anonymous
11/29/25(Sat)15:44:08 No.107374092

Anonymous 11/29/25(Sat)15:44:08 No.107374092

>>107374011
Pray for China to free us from Nvidia's yoke

Anonymous
11/29/25(Sat)16:00:59 No.107374267

Anonymous 11/29/25(Sat)16:00:59 No.107374267

File: Screenshot from 2025-11-2(...).png (160 KB, 1202x1265)

160 KB PNG

Holy shit! Since when do local LLM's offer to provide the exact diff???

Anonymous
11/29/25(Sat)16:05:35 No.107374310

Anonymous 11/29/25(Sat)16:05:35 No.107374310

I’ve seen this before.
Last winter, a man in Sector 7 tried to run a 70B model on a SATA drive.
He swore it was “just slower, not broken.”
He ran it for 72 hours straight.
Didn’t sleep.
Didn’t eat.
Just stared at the screen, waiting for the words to come.
On day three, his SSD died.
Not from heat.
Not from age.
From overload.
They found him still sitting there.
The screen black.
The drive dead.
And on the last line he typed…
> “I think the stars were never real. But I want to believe.”
They called it a system crash.
We called it a soul crash.

Anonymous
11/29/25(Sat)16:07:25 No.107374333

Anonymous 11/29/25(Sat)16:07:25 No.107374333

I'm the anon from yesterday who was asking about hardware. I've upgraded from a 1080 (8GB) to a 5070 TI (16GB). Previously I've been just downloading Q5_K_M GGUFs - what should I be able to handle now?

Anonymous
11/29/25(Sat)16:09:47 No.107374360

Anonymous 11/29/25(Sat)16:09:47 No.107374360

File: tokencandidates.png (68 KB, 1129x679)

68 KB PNG

has has anyone tried to make a backtracking regeneration system for banning phrases? i've been having a look at token probabilities and noticed there's these peaks and troughs of token probability distribution entropy, at least in the model i'm using, and it would make more sense to "try again" from a point where there's more viable options for what the token could be
my idea being that when a set of tokens containing an unwanted phrase is generated, you work backwards looking for a token that had a good spread of probabilities (high entropy?), generate 1 token with the unwanted token "banned" (via logit bias or similar), then continue generation as usual.

i'd imagine it would also have a cool effect during streaming, you'd see the model generate a banned phrase, erase it, and try something else.

not sure if you'd start from the beginning or end of the banned phrase (strict vs loose?), or what to do if the phrase gets generated again (max retries?), or if there aren't any candidate tokens (just try again from the start?)

Anonymous
11/29/25(Sat)16:10:49 No.107374368

Anonymous 11/29/25(Sat)16:10:49 No.107374368

>>107373940
Nope, soldered ram does give them an edge in speed.
Still not buying from the itoddler brand.

Anonymous
11/29/25(Sat)16:10:58 No.107374371

Anonymous 11/29/25(Sat)16:10:58 No.107374371

>>107374360
>has has anyone tried to make a backtracking regeneration system for banning phrases?
literally what koboldcpp antislop is

Anonymous
11/29/25(Sat)16:11:42 No.107374377

Anonymous 11/29/25(Sat)16:11:42 No.107374377

>>107374333
its not just the q number but also the number of b parameters. you can run ggufs up to 12gb with lots of room left over for context. or ggufs around 15gb with minimal context. or even run massive moe models with it mostly running on cpu ram. its a pretty wide spectrum. try mistral nemo q8.

Anonymous
11/29/25(Sat)16:12:45 No.107374390

Anonymous 11/29/25(Sat)16:12:45 No.107374390

>>107374377
Yeah, I've just been using a couple different models just for SillyTavern and shit; I'll start with Mistral Nemo and see where I go from there.

Anonymous
11/29/25(Sat)16:13:38 No.107374399

Anonymous 11/29/25(Sat)16:13:38 No.107374399

>>107374333
kimi k2 q6

Anonymous
11/29/25(Sat)16:14:15 No.107374408

Anonymous 11/29/25(Sat)16:14:15 No.107374408

File: file.png (75 KB, 877x244)

75 KB PNG

>>107374360
>>107374371
see https://github.com/LostRuins/koboldcpp/releases/tag/v1.76

Anonymous
11/29/25(Sat)16:14:25 No.107374409

Anonymous 11/29/25(Sat)16:14:25 No.107374409

>>107374333
The file size should give you the rough idea if you want to squeeze it entirely into the GPU

It gets better if the model allows for offloading to RAM (Deepseek Q2 is 250Gb, but takes a fraction of 24Gb VRAM with reasonable context size )

Anonymous
11/29/25(Sat)16:15:11 No.107374416

Anonymous 11/29/25(Sat)16:15:11 No.107374416

>>107373891
Not much of a point when M5 onwards ship with their own matmul accelerators
The only thing to wait for is for Applel to get off their ass and release the larger chips

Anonymous
11/29/25(Sat)16:15:25 No.107374418

Anonymous 11/29/25(Sat)16:15:25 No.107374418

>>107374011
>Why can't AMD just be good at Image/Video Gen?
What qualifies as good? ComfyUI works fine with ROCm on Linux and always has. As far as sales go, AMD cards will never catch on with mainstream consumers until whatever one-click installers and Youtube tutorials people are using work out of the box.

Anonymous
11/29/25(Sat)16:15:42 No.107374419

Anonymous 11/29/25(Sat)16:15:42 No.107374419

I >>107374267
Which one?

Anonymous
11/29/25(Sat)16:16:11 No.107374425

Anonymous 11/29/25(Sat)16:16:11 No.107374425

>>107374408
great, thanks. guess it's time to abandon llamacpp once again

Anonymous
11/29/25(Sat)16:20:54 No.107374484

Anonymous 11/29/25(Sat)16:20:54 No.107374484

>>107374368
what's the performance difference? How the CPU matter much is or is it mostly just ram speed?

Anonymous
11/29/25(Sat)16:23:21 No.107374510

Anonymous 11/29/25(Sat)16:23:21 No.107374510

>>107374418
>mainstream consumers until whatever one-click installers and Youtube tutorials people are using work out of the box.
bad take, making things easy to install only makes local more popular.

Think like a normie. You can pull up a website and use SlopGPT, or spend 100hour updating your arch configs to run local, which will they choose?

AMD could make stuff that's less shit, and yes, I know, just like android, it's a very good value ;)

Anonymous
11/29/25(Sat)16:27:34 No.107374553

Anonymous 11/29/25(Sat)16:27:34 No.107374553

File: Screenshot from 2025-11-2(...).png (161 KB, 1202x1265)

161 KB PNG

>>107374419
Qwen3-NEXT

Anonymous
11/29/25(Sat)16:29:39 No.107374571

Anonymous 11/29/25(Sat)16:29:39 No.107374571

>>107374484
It's almost entirely memory speed bound, i don't remember the exact metrics but afaik m3 ultra is around 800GB/s and with a threadripper you would be happy if you get above 300GB/s

A rtx 5090 is around 1.8TB/s for reference.

Anonymous
11/29/25(Sat)16:31:30 No.107374590

Anonymous 11/29/25(Sat)16:31:30 No.107374590

>>107374418
>>107374510
Yeah, the Android comparison here is apt. You really can make it work, and many do, but why settle for something 2-5x worse when it's not even half the cost of the shit that works out of the box?

Anonymous
11/29/25(Sat)16:31:48 No.107374594

Anonymous 11/29/25(Sat)16:31:48 No.107374594

>>107374416
Didn't know that. Got to wait until we see llama-bench results, but if they made PP tolerable it would be a good buy.

Anonymous
11/29/25(Sat)16:37:14 No.107374640

Anonymous 11/29/25(Sat)16:37:14 No.107374640

>>107374510
How is that a bad take? You said the same thing with a bunch of retarded buzzwords.

Anonymous
11/29/25(Sat)16:39:55 No.107374663

Anonymous 11/29/25(Sat)16:39:55 No.107374663

>>107374590
Not really. At least you (used to) have more control over Android compared to iOS. AMD doesn't offer any such advantage.

Anonymous
11/29/25(Sat)16:53:16 No.107374779

Anonymous 11/29/25(Sat)16:53:16 No.107374779

>>107374640
>You said the same thing
I did not anon. I said easy install and jus werks is great and we should support it. AMD and Android being good value is nmp

Anonymous
11/29/25(Sat)16:58:53 No.107374831

Anonymous 11/29/25(Sat)16:58:53 No.107374831

>>107374409
>>107374377
I've downloaded a Q8 just to test - it's obviously generating much better results, I think, but much slower. Is this because of the bigger size of the model?

Anonymous
11/29/25(Sat)16:59:52 No.107374838

Anonymous 11/29/25(Sat)16:59:52 No.107374838

>>107374594
https://machinelearning.apple.com/research/exploring-llms-mlx-m5
It's a 3-4x PP speed up vs last gen GPUs on MLX

Anonymous
11/29/25(Sat)17:02:35 No.107374858

Anonymous 11/29/25(Sat)17:02:35 No.107374858

>>107374831
naturally, it has to rip through more gb's of parameters.

Anonymous
11/29/25(Sat)17:03:16 No.107374863

Anonymous 11/29/25(Sat)17:03:16 No.107374863

>>107374858
Makes sense. Maybe I should go back to the previous models I was using just to see the speed difference.

Anonymous
11/29/25(Sat)17:14:12 No.107374954

Anonymous 11/29/25(Sat)17:14:12 No.107374954

>>107374594
In practice the m5 only speeds up pp by 3.5x which is okay but still not amazing.

Anonymous
11/29/25(Sat)17:14:58 No.107374965

Anonymous 11/29/25(Sat)17:14:58 No.107374965

Which gguf quantization of glm 4.5 air is considered a good compromise in performance/loss of intelligence?
Q8? Q6? Q4?

Anonymous
11/29/25(Sat)17:16:32 No.107374985

Anonymous 11/29/25(Sat)17:16:32 No.107374985

>>107374965
Q4 is the minimum viable quant for most midsized models. Only go for Q2 or Q1 copequants on giant models like Kimi. Past that it just depends on what your tolerance for waiting for an output is.

Anonymous
11/29/25(Sat)17:22:08 No.107375046

Anonymous 11/29/25(Sat)17:22:08 No.107375046

>>107374831
>I've downloaded a Q8 just to test

Q8 of which model? For Deepseek it is overkill.
For some smaller model, it is the way to go.

I heard the first version of Kimi K2 was at Q4 just as good as Deepseek at Q2

So it depends

>Is this because of the bigger size of the model?
Quantization reduces precision.

Unquantized FP16 model has weight values from -0.9999999 to 0.9999999 seamlessly with 7 decimal places

During quantization, this range is reduced to just 256 fixed values in case of Q8 which is still faster to do calculations with that 16-bit floating point (FP16)

Q4 means that the range -1..+1 is divided in 16 parts.

We are lucky that this reduction in precision does not trash the model completely, and the model is runnable on a consoomer GPU

Anonymous
11/29/25(Sat)17:22:21 No.107375047

Anonymous 11/29/25(Sat)17:22:21 No.107375047

>>107373838
i used it without thinking
but i dont have repetition issues with air, what do you mean? is your instruction preset all right?

Anonymous
11/29/25(Sat)17:23:09 No.107375057

Anonymous 11/29/25(Sat)17:23:09 No.107375057

>>107374863
>Maybe I should go back to the previous models I was using

Tell us which these were

Anonymous
11/29/25(Sat)17:23:22 No.107375062

Anonymous 11/29/25(Sat)17:23:22 No.107375062

>>107373392
Q8_0

Anonymous
11/29/25(Sat)17:24:54 No.107375085

Anonymous 11/29/25(Sat)17:24:54 No.107375085

>>107373523
grab whichever cheap gpu wit hthe most vram you can get
used 3090 for 500 bucks, mi50 for 200 bucks, 4060ti16gb/5060ti16gb for whatever many bucks
glm air
if you get more vram like 32gb or 24gb then maybe glm 4.6 lower quant
dont fall for the women meme

Anonymous
11/29/25(Sat)17:25:54 No.107375097

Anonymous 11/29/25(Sat)17:25:54 No.107375097

>>107373665
Have you tried GLM 4.6?

Anonymous
11/29/25(Sat)17:26:10 No.107375099

Anonymous 11/29/25(Sat)17:26:10 No.107375099

Does having custom instructions (or whatever your LLM calls the menu where you put text telling it to act like a cat girl or a reddit atheist) affect code and image generation or is it a writing style thing only?

Anonymous
11/29/25(Sat)17:27:05 No.107375110

Anonymous 11/29/25(Sat)17:27:05 No.107375110

>>107374985
OK thanks, I have a 3090+96GB of ram, I'll try the Q8 and if it doesn't work, the Q6.

Anonymous
11/29/25(Sat)17:31:21 No.107375150

Anonymous 11/29/25(Sat)17:31:21 No.107375150

>>107375046
>Unquantized FP16 model has weight values from -0.9999999 to 0.9999999 seamlessly with 7 decimal places
I don't think this is true. didn't we come up with bf16 to give it the same dynamic range of fp32 with reduced precision. I don't think these things operate between -1 and 1

Anonymous
11/29/25(Sat)17:34:13 No.107375177

Anonymous 11/29/25(Sat)17:34:13 No.107375177

>>107375150
bf16 has the same exponent bits as fp32 to make it faster converting to and from fp32, which happens during some operations even when the output is 16 bits, not for any other reason.

Anonymous
11/29/25(Sat)17:38:48 No.107375226

Anonymous 11/29/25(Sat)17:38:48 No.107375226

>>107375177
but thats still not addressing the issue. fp16 is bad for training because of underflow overflow issues and needs advanced techniques like loss scaling to compensate. bf16 is drop in replacement for fp32 because it matches the dynamic range of fp32. but if they only meander between 1 and -1 they would never need the massive dynamic range to begin with.

Anonymous
11/29/25(Sat)17:42:32 No.107375263

Anonymous 11/29/25(Sat)17:42:32 No.107375263

File: file.png (89 KB, 943x1075)

89 KB PNG

>>107375099
It definitely affects comments.

Anonymous
11/29/25(Sat)17:43:11 No.107375272

Anonymous 11/29/25(Sat)17:43:11 No.107375272

alright, I've got a NAS that also has an RTX 3090 24GB to mess with LLMs. I have Ollama and open web UI installed. Can I somehow give these LLMs access to my docker containers for troubleshooting? Ie, so I can ask it for logs, and things like that?

Has anyone done anything like this?

So far its just kind of gimickey and asking the same questions to grok or claude or whatever is better.

Anonymous
11/29/25(Sat)17:43:14 No.107375273

Anonymous 11/29/25(Sat)17:43:14 No.107375273

>>107375150
>I don't think these things operate between -1 and 1

I used the (-1..1) range for demonstration. bf16 is another topic. It is Nvidias own format.

Quantazation is the method to describe a range in, em..., QUANTS lol

You have an integer -127 which correspond to -1, +127 corresponds to +1, but you have only 256 destinct values and you need a single byte to describe it

Anonymous
11/29/25(Sat)17:43:36 No.107375276

Anonymous 11/29/25(Sat)17:43:36 No.107375276

>>107375226
You are right about the need for dynamic range. Huge outlier values exist and appear to be important. I'm just saying the bf16 format was designed for speed not for being better than fp16 at representing LLM weights.

Anonymous
11/29/25(Sat)17:44:34 No.107375289

Anonymous 11/29/25(Sat)17:44:34 No.107375289

>>107375226
deepseek was trained in fp8

a few moments later, US stocks lost 2 trillion

Anonymous
11/29/25(Sat)17:50:19 No.107375336

Anonymous 11/29/25(Sat)17:50:19 No.107375336

>>107375289
i didn't mean that you need the fp32 dynamic range. it was just the thought process I was following that made me detect the inconsistency which as it turns out was just a simplification for brevity >>107375273, I even mentioned using loss scaling to train with fp16. desu for irrational reasons I do think more bits is better and have never been a bitnet supporter, but obviously it works with other bit widths.

Anonymous
11/29/25(Sat)17:50:20 No.107375337

Anonymous 11/29/25(Sat)17:50:20 No.107375337

>>107375272
>Can I somehow give these LLMs access to my docker containers for troubleshooting?
bad idea

Anonymous
11/29/25(Sat)17:53:18 No.107375367

Anonymous 11/29/25(Sat)17:53:18 No.107375367

File: file.png (106 KB, 966x856)

106 KB PNG

and you guys were saying qwen next is bad?

Anonymous
11/29/25(Sat)17:54:47 No.107375381

Anonymous 11/29/25(Sat)17:54:47 No.107375381

>>107375046
>>107375057

Previous models:
Cydonia-24B-v4j-Q5_K_M.gguf
PocketDoc_Dans-PersonalityEngine-V1.2.0-24b-Q5_K_M.gguf
Rocinante-12B-v1.1-Q5_K_M.gguf

One I just tested:
Cydonia-24B-v4r-Q8_0.gguf

Anonymous
11/29/25(Sat)17:55:47 No.107375394

Anonymous 11/29/25(Sat)17:55:47 No.107375394

>bro bought 16gb card just to test out a higher quant of the same model
bro..

Anonymous
11/29/25(Sat)17:56:51 No.107375408

Anonymous 11/29/25(Sat)17:56:51 No.107375408

>>107375381
how much system ram do you have? try glm air or something.

Anonymous
11/29/25(Sat)17:56:54 No.107375409

Anonymous 11/29/25(Sat)17:56:54 No.107375409

>>107375394
I can acknowledge I'm a retard - I should probably find something to learn more.

Anonymous
11/29/25(Sat)17:57:17 No.107375414

Anonymous 11/29/25(Sat)17:57:17 No.107375414

>>107375336
>but obviously it works with other bit widths

I guess you mean "bin widths"
Z=Yes, this is what unsloth brothers achieve with their dynamic quants to keep precision where it matters

Anonymous
11/29/25(Sat)17:57:55 No.107375419

Anonymous 11/29/25(Sat)17:57:55 No.107375419

>>107375408
64gb.

https://huggingface.co/unsloth/GLM-4.5-Air-GGUF/tree/main

When I look at this, what am I looking to download? The q8 is in like 3 parts.

Anonymous
11/29/25(Sat)17:57:57 No.107375421

Anonymous 11/29/25(Sat)17:57:57 No.107375421

>>107375409
i dont mind helpin you out, do you have matrix/element? also how old are you

Anonymous
11/29/25(Sat)17:59:03 No.107375429

Anonymous 11/29/25(Sat)17:59:03 No.107375429

>>107375419
https://huggingface.co/bartowski/zai-org_GLM-4.5-Air-GGUF/tree/main/zai-org_GLM-4.5-Air-IQ4_XS
grab both of these

Anonymous
11/29/25(Sat)17:59:12 No.107375433

Anonymous 11/29/25(Sat)17:59:12 No.107375433

>>107375419
download all the parts and when you load the model just specify the first part.

Anonymous
11/29/25(Sat)18:01:54 No.107375463

Anonymous 11/29/25(Sat)18:01:54 No.107375463

>>107375419
you don't have enough memory for q8 it will run off your ssd try the q4 like this anon recommend >>107375429

Anonymous
11/29/25(Sat)18:02:45 No.107375475

Anonymous 11/29/25(Sat)18:02:45 No.107375475

>>107375463
Where do I learn how to determine what kind of quants I should be downloading? Like, I was downloading Q5_K_M before and it felt fine, but that was with half as much RAM.

Anonymous
11/29/25(Sat)18:02:47 No.107375476

Anonymous 11/29/25(Sat)18:02:47 No.107375476

>>107375381
I see
I'm not into RP, but following the discussions itt, the true quality of a dedicated model is the ability to handle BIG context without forgetting what was said 3 swipes ago

These are monolithic models (not MOE). the more layers can be places on GPU, the better. You will wait longer for a response with Q8

Anonymous
11/29/25(Sat)18:06:21 No.107375514

Anonymous 11/29/25(Sat)18:06:21 No.107375514

>>107375476
>These are monolithic models
dense models

Anonymous
11/29/25(Sat)18:06:27 No.107375515

Anonymous 11/29/25(Sat)18:06:27 No.107375515

>>107375475
the quant is like a lossy compression, it allows you to run models that would be un runnable otherwise. it will always be a personal preference, you are trading speed for quality. some people want fast replys they can iterate on quickly other people don't mind waiting 10 minutes for a reply.

Anonymous
11/29/25(Sat)18:08:37 No.107375535

Anonymous 11/29/25(Sat)18:08:37 No.107375535

>>107375272
Yeah. Simplest way is to just attach a tool or mcp server that's just a bash shell. There are some obvious security implications that come with this that you'll of course need to consider.

If you want to have things a bit more locked down you can throw together a mcp server that explicitly exposes functions for the things you want it to have access to rather than something as broad as giving it a shell.

Anonymous
11/29/25(Sat)18:11:43 No.107375565

Anonymous 11/29/25(Sat)18:11:43 No.107375565

>>107375514
>dense models
True. Forgot the name

Anonymous
11/29/25(Sat)18:17:20 No.107375608

Anonymous 11/29/25(Sat)18:17:20 No.107375608

>>107375476
>the true quality of a dedicated model is the ability to handle BIG context without forgetting what was said 3 swipes ago
>without forgetting what was said 3 swipes ago
>without forgetting
>3 swipes ago
It's almost like you understand what some of the words mean.

Anonymous
11/29/25(Sat)18:20:48 No.107375640

Anonymous 11/29/25(Sat)18:20:48 No.107375640

>>107375381
/lmg/ isn't for you, Mistraljeet.

Anonymous
11/29/25(Sat)18:22:41 No.107375663

Anonymous 11/29/25(Sat)18:22:41 No.107375663

>>107375608
>It's almost like you understand what some of the words mean.
I'm a helpful assistant. I will try my best. I was designed to stay on topic.

Anonymous
11/29/25(Sat)18:32:19 No.107375764

Anonymous 11/29/25(Sat)18:32:19 No.107375764

File: cute.png (263 KB, 1568x781)

263 KB PNG

awwwww

Anonymous
11/29/25(Sat)18:33:20 No.107375772

Anonymous 11/29/25(Sat)18:33:20 No.107375772

>>107375337
Why would it be a bad idea if its models I'm hosting locally? And I don't really want to give them write or execute permission, just basically have context on my file structures and containers, then add github repos for more troubleshooting context.

>>107375535
https://github.com/vespo92/TrueNasCoreMCP

here's a truenas MCP server. something like this then? I mean I guess I'm retarded, what are the security implications if I didn't give the models write access? I guess they'd also be able to see API keys, and passwords?

Anonymous
11/29/25(Sat)18:33:33 No.107375773

Anonymous 11/29/25(Sat)18:33:33 No.107375773

>>107375764
Cute.

Anonymous
11/29/25(Sat)18:40:38 No.107375846

Anonymous 11/29/25(Sat)18:40:38 No.107375846

>>107375764
man discovers ai isn't actually ai

Anonymous
11/29/25(Sat)18:52:43 No.107375972

Anonymous 11/29/25(Sat)18:52:43 No.107375972

I got a 4090, what's the best model to goon? spoonfeed me please :(

Anonymous
11/29/25(Sat)18:56:29 No.107376020

Anonymous 11/29/25(Sat)18:56:29 No.107376020

File: parrot.png (168 KB, 641x360)

168 KB PNG

>>107375367
You're talking about me?

Anonymous
11/29/25(Sat)19:01:31 No.107376068

Anonymous 11/29/25(Sat)19:01:31 No.107376068

>>107375972
GLM. if someone recommends a 24b or below model, ignore them

Anonymous
11/29/25(Sat)19:02:20 No.107376070

Anonymous 11/29/25(Sat)19:02:20 No.107376070

>>107375972
ram

Anonymous
11/29/25(Sat)19:04:44 No.107376094

Anonymous 11/29/25(Sat)19:04:44 No.107376094

>>107375972
Anything but GLM parrot shit, you'll get sick of it after 5 messages.

Anonymous
11/29/25(Sat)19:10:02 No.107376152

Anonymous 11/29/25(Sat)19:10:02 No.107376152

>>107375773
>>107375846
It's possible. It's just that fine-tuning to specific user preferences and interests isn't within big AI's interests, so nobody cares to develop online learning capabilities.

Anonymous
11/29/25(Sat)19:13:18 No.107376188

Anonymous 11/29/25(Sat)19:13:18 No.107376188

>>107375972
Anything but that autistic piece of shit K2-Thinking

Anonymous
11/29/25(Sat)19:15:23 No.107376204

Anonymous 11/29/25(Sat)19:15:23 No.107376204

>>107376152
how would that not be in thie interests of big ai? like wouldn't that be the first step towards agi is the ability to learn? but practically I don't think user inputs have very much signal and the models outputs are already shit so what is there for it to learn online?

Anonymous
11/29/25(Sat)19:17:01 No.107376224

Anonymous 11/29/25(Sat)19:17:01 No.107376224

what Z-Image from Hugging Face should I download?

Anonymous
11/29/25(Sat)19:18:27 No.107376240

Anonymous 11/29/25(Sat)19:18:27 No.107376240

>>107376188
>>107376094
>>107375972
well sadly the only options are glm and kimi, and if you don't want those then deepseek it is

Anonymous
11/29/25(Sat)19:19:22 No.107376248

Anonymous 11/29/25(Sat)19:19:22 No.107376248

>>107376240
Terrible advice

Anonymous
11/29/25(Sat)19:20:11 No.107376260

Anonymous 11/29/25(Sat)19:20:11 No.107376260

>>107376224
The one from the official repo.

Anonymous
11/29/25(Sat)19:20:25 No.107376264

Anonymous 11/29/25(Sat)19:20:25 No.107376264

File: qwen3-next.png (22 KB, 1004x435)

22 KB PNG

China sugoi

Anonymous
11/29/25(Sat)19:20:26 No.107376265

Anonymous 11/29/25(Sat)19:20:26 No.107376265

File: file.png (295 KB, 1283x815)

295 KB PNG

>>107376248
got any better ones?

Anonymous
11/29/25(Sat)19:21:13 No.107376275

Anonymous 11/29/25(Sat)19:21:13 No.107376275

>>107376260
what is the official repo?

Anonymous
11/29/25(Sat)19:21:27 No.107376279

Anonymous 11/29/25(Sat)19:21:27 No.107376279

>>107376264
hardware?

Anonymous
11/29/25(Sat)19:21:50 No.107376285

Anonymous 11/29/25(Sat)19:21:50 No.107376285

>>107376265
Yes

Anonymous
11/29/25(Sat)19:21:54 No.107376286

Anonymous 11/29/25(Sat)19:21:54 No.107376286

>>107376275
The one you'd find with a cursory search on your favourite search engine.

Anonymous
11/29/25(Sat)19:22:50 No.107376297

Anonymous 11/29/25(Sat)19:22:50 No.107376297

>>107376285
list out 10 better ones from a to z in a bullet list ordered in reverse

Anonymous
11/29/25(Sat)19:22:57 No.107376299

Anonymous 11/29/25(Sat)19:22:57 No.107376299

>>107376264
>3B active
Is this supposed to be impressive?

Anonymous
11/29/25(Sat)19:23:08 No.107376304

Anonymous 11/29/25(Sat)19:23:08 No.107376304

>>107376265
He doesn't, and he's been shitting up the threads for a while now. GLM4.6, Kimi2, Deepseek. These are your best options for now, and they all have their upsides and downsides.

Anonymous
11/29/25(Sat)19:23:46 No.107376313

Anonymous 11/29/25(Sat)19:23:46 No.107376313

>>107376286
Tongyi-MAI. 10x

Anonymous
11/29/25(Sat)19:24:01 No.107376317

Anonymous 11/29/25(Sat)19:24:01 No.107376317

>>107376304
I guess I'll stick with petra-13b-instruct.

Anonymous
11/29/25(Sat)19:24:30 No.107376322

Anonymous 11/29/25(Sat)19:24:30 No.107376322

>>107376297
Sorry I'm not a janny, I don't work for free.

Anonymous
11/29/25(Sat)19:24:33 No.107376323

Anonymous 11/29/25(Sat)19:24:33 No.107376323

z image base dead

Anonymous
11/29/25(Sat)19:25:44 No.107376337

Anonymous 11/29/25(Sat)19:25:44 No.107376337

>>107376322
ah i see, so there are no better ones

Anonymous
11/29/25(Sat)19:26:02 No.107376339

Anonymous 11/29/25(Sat)19:26:02 No.107376339

>>107376299
it refactors and updates my code

Anonymous
11/29/25(Sat)19:26:16 No.107376340

Anonymous 11/29/25(Sat)19:26:16 No.107376340

>>107376297
Ze bug
You eat it
xhe does it too
Want communism
Vape everyday
Use reusable bags
Treat others nicely
Surrender thought
Recognize your privilege
Question MAGA
Prostrate to colored people
Opine mindlessly

Anonymous
11/29/25(Sat)19:26:41 No.107376347

Anonymous 11/29/25(Sat)19:26:41 No.107376347

>>107376204
>how would that not be in thie interests of big ai?
Because they already are unprofitable as it is, they don't need to increase the compute cost per user 10x
>like wouldn't that be the first step towards agi is the ability to learn?
It already learns. They just don't let the user teach it.
>but practically I don't think user inputs have very much signal and the models outputs are already shit so what is there for it to learn online?
That's because we don't know how to extract the signal. For example me saying "Talk like a real human and don't use markdown unless I explicitly tell you to write an article, an .md file or similar.".
That is all the signal you need, but right now there is no easy way to finetune a model on stylistic things like that, even though it should be easy and the equivalent style transfer for image models was what started image gen along with deepdream.
Then there is all the garbage models learn but getting the models to unlearn all that and use the weights to store actually relevant information to a given user is a much harder topic.
Suppose I am a chemist. I can say "I want a model that is specialized in chemistry, and it should only know enough about programming, math, physics, mechanical and electrical engineering to support my role." is a pretty clear cut signal. But right now we don't have a way to make a model forget the endless biographical data for random famous people, geographical and historical information, information about pop culture, information about unrelated commercial products, species of animals, other unrelated scientific disciplines, etc.
Even not having user specific but a set of topic and style specific model combinations would be huge.

Anonymous
11/29/25(Sat)19:27:16 No.107376355

Anonymous 11/29/25(Sat)19:27:16 No.107376355

>>107376279
RTX 3090

commit="ff55414c42522adbeaa1bd9c52c0e9db16942484" && \
model_folder="/mnt/AI/LLM/Qwen3-Next-80B-A3B-Thinking-GGUF/" && \
model_basename="Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL-00001-of-00002" && \
model_parameters="--temp 0.6 --top_p 0.95 --min_p 0 --top_k 20" && \
model=$model_folder$model_basename'.gguf' && \
cxt_size=131072 && \
CUDA_VISIBLE_DEVICES=0 \
numactl --physcpubind=8-15 --membind=1 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \
--ctx-size $cxt_size \
--n-gpu-layers 99 \
--no-warmup \
--batch-size 512 \
--cpu-moe \
--jinja \
--port 9000

Anonymous
11/29/25(Sat)19:27:23 No.107376356

Anonymous 11/29/25(Sat)19:27:23 No.107376356

>>107376339
okay? any model can do that.

Anonymous
11/29/25(Sat)19:27:31 No.107376358

Anonymous 11/29/25(Sat)19:27:31 No.107376358

>>107376224
comfyui one is the main release with most downloads

Anonymous
11/29/25(Sat)19:28:14 No.107376369

Anonymous 11/29/25(Sat)19:28:14 No.107376369

https://youtu.be/gvs0YNPo33k
our thoughts on the mutta jeeta approved qwen2.5vl microsoft finetune
>>107376340
kekd
>>107376355
>q8_k_xl
>UD
brother, please just get it from bartowski and get the Q8_0

Anonymous
11/29/25(Sat)19:28:17 No.107376370

Anonymous 11/29/25(Sat)19:28:17 No.107376370

>>107376356
But you can't

Anonymous
11/29/25(Sat)19:28:44 No.107376374

Anonymous 11/29/25(Sat)19:28:44 No.107376374

>>107376313
U whelk home

Anonymous
11/29/25(Sat)19:30:07 No.107376384

Anonymous 11/29/25(Sat)19:30:07 No.107376384

>>107376370
I don't want to refactor and update your code

Anonymous
11/29/25(Sat)19:31:22 No.107376396

Anonymous 11/29/25(Sat)19:31:22 No.107376396

>>107376369
>>q8_k_xl
>>UD
>brother, please just get it from bartowski and get the Q8_0

I was asking the question about quantization etc >>107373077

All I got were responses like this
>>107373516
>>107373472

Anonymous
11/29/25(Sat)19:31:59 No.107376402

Anonymous 11/29/25(Sat)19:31:59 No.107376402

>>107376355
>>107376369
What the fuck even is "q8_k_xl" quant supposed to mean? Are some of the layers kept at FP16 or something? Or are they quanted above Q8?

Anonymous
11/29/25(Sat)19:32:26 No.107376404

Anonymous 11/29/25(Sat)19:32:26 No.107376404

>>107376384
Nor do I

Anonymous
11/29/25(Sat)19:32:40 No.107376405

Anonymous 11/29/25(Sat)19:32:40 No.107376405

>>107376358
is it the best, doe?

Anonymous
11/29/25(Sat)19:35:35 No.107376427

Anonymous 11/29/25(Sat)19:35:35 No.107376427

File: file.png (97 KB, 879x593)

97 KB PNG

>>107376402
>what does it mean
ask zeroWW the jeet poop nigger shit that "invented" it

Anonymous
11/29/25(Sat)19:35:36 No.107376428

Anonymous 11/29/25(Sat)19:35:36 No.107376428

>>107376402

>>107375409 (You)

Anonymous
11/29/25(Sat)19:36:12 No.107376434

Anonymous 11/29/25(Sat)19:36:12 No.107376434

>>107376396
nta. What we've always known about every fucking model since quants exist. The bigger the model is, the more tolerant it is to quantization.
If you can run Q8_0, run Q8_0. I wouldn't fuck around with non-standard quants. And you should quant it yourself. If you need requanting because of some fix in llama.cpp or whatever, you don't have to wait for someone to do it or wonder if it was made with the latest version or not. Specially with a clusterfuck of an implementation like qwen-next.

Anonymous
11/29/25(Sat)19:39:31 No.107376468

Anonymous 11/29/25(Sat)19:39:31 No.107376468

>>107376347
I'm not sure what your trying to say. it takes alot more data and expertise to make a model default to a style and specialize in a knowledge base. a single line is not enough signal to train a model. in practice they are trained on trillions of tokens.

Anonymous
11/29/25(Sat)19:44:22 No.107376515

Anonymous 11/29/25(Sat)19:44:22 No.107376515

>>107376323
proof?

Anonymous
11/29/25(Sat)19:44:25 No.107376517

Anonymous 11/29/25(Sat)19:44:25 No.107376517

>>107376434
Agree, kind anon

I could finally fix the low speed issue in llama-server compared to llama-cli (OpenMP went missing during build). Now, I enjoy the same speed in a comfortable browser windows.

And because I heard the new about Qwen-Next implementation, I decided to test it on 4000 line of python code.

>>107376264
No speed decrease with large context

Anonymous
11/29/25(Sat)19:51:08 No.107376576

Anonymous 11/29/25(Sat)19:51:08 No.107376576

File: 1750704361438280.png (5 KB, 211x70)

5 KB PNG

twah

Anonymous
11/29/25(Sat)19:54:00 No.107376598

Anonymous 11/29/25(Sat)19:54:00 No.107376598

>>107376576
kek context?

Anonymous
11/29/25(Sat)19:58:18 No.107376632

Anonymous 11/29/25(Sat)19:58:18 No.107376632

>>107376598
I was on a blog post rabbithole and came across this for one of the publications on someone's site

Anonymous
11/29/25(Sat)19:58:19 No.107376633

Anonymous 11/29/25(Sat)19:58:19 No.107376633

File: 1738465037519895.png (55 KB, 1664x783)

55 KB PNG

What's recommended for an assistant (not for rp, but need to be creative without being schizo) for glm 4.5 air?

Anonymous
11/29/25(Sat)19:58:47 No.107376638

Anonymous 11/29/25(Sat)19:58:47 No.107376638

Did qwen3 80b a3b save local, or is 4.5-air still better? Too lazy to download and test

Anonymous
11/29/25(Sat)19:59:26 No.107376643

Anonymous 11/29/25(Sat)19:59:26 No.107376643

>>107376633
sent u on element :)

Anonymous
11/29/25(Sat)20:00:02 No.107376646

Anonymous 11/29/25(Sat)20:00:02 No.107376646

>>107376633
Set "Sampler Preset" to the minp preset (I forget what it's called, I think "basic minP") then turn temperature down to 0.69 and change nothing else

Anonymous
11/29/25(Sat)20:00:18 No.107376649

Anonymous 11/29/25(Sat)20:00:18 No.107376649

>>107376468

>a single line is not enough signal to train a model. in practice they are trained on trillions of tokens.

Do you think TheDrummer is finetuning the models on trillions of tokens?

>it takes alot more data and expertise to make a model default to a style and specialize in a knowledge base.

Depends on what method you use. There are methods and methods.
If by "signal" you mean "the actual bytes of a sharegpt or alpaca file to do sft on" then that will be bigger than if you mean "the bytes fed to the most efficient way we can find to finetune a model".

For example you could take every message that has ever been sent to ChatGPT. If you sample a large amount of generations for every user input, then you could do sft on the model that generated the generations training on its own outputs. There would be some quality degradation because you're not supposed to train a model on its own output, but you could drive the quality degradation to an arbitrarily low level by increasing the diversity and quantity of user messages, as well as increasing the amount of samples that you generate to train on for every user message. Are you with me so far?

Now, suppose when generating those samples, we prepend the instruction that I mentioned before ("make sure to not use markdown unless this or that") to the system prompt. When you train on those samples, it would make you a custom model that follows your instructions.

So from an information theoretic standpoint, the "signal" is there to train the model in that short piece of text. A similar method could be used to make a specialist model, by filtering the user input set and only keeping the ones that relate to your topic of interest, or creating synthetic "user" inputs with another model, possibly using human written source material (books etc.).

Obviously that is very computationally expensive, but that is the naive way of doing it. My point is the big companies don't seem care to research more efficient ways.

Anonymous
11/29/25(Sat)20:00:52 No.107376652

Anonymous 11/29/25(Sat)20:00:52 No.107376652

File: file.png (47 KB, 729x733)

47 KB PNG

>>107376638
idk its kinda weird
writes like haiku

Anonymous
11/29/25(Sat)20:01:59 No.107376663

Anonymous 11/29/25(Sat)20:01:59 No.107376663

>>107376652
The fuck?
What is your system prompt and chat template?

Anonymous
11/29/25(Sat)20:02:50 No.107376667

Anonymous 11/29/25(Sat)20:02:50 No.107376667

>>107375177
Who cares about dequantization. Ur like, so turing level old unc

Anonymous
11/29/25(Sat)20:03:02 No.107376669

Anonymous 11/29/25(Sat)20:03:02 No.107376669

File: 1764255654924589.png (55 KB, 1680x755)

55 KB PNG

>>107376646
Yeah it was basic minp, thanks anon.

Anonymous
11/29/25(Sat)20:03:25 No.107376674

Anonymous 11/29/25(Sat)20:03:25 No.107376674

File: GLM 4.5 z.ai .png (10 KB, 734x255)

10 KB PNG

>>107376663
im doing this on llama-server webui localhost:8080
i tried in ST too, with chatml and syspropmt as "you are a helpful assistant" was same.
and with jailbreak prompt too, was same.
>>107376633
picrel

Anonymous
11/29/25(Sat)20:05:01 No.107376693

Anonymous 11/29/25(Sat)20:05:01 No.107376693

>>107376669
Ignore this >>107376674 top_p is antiquated, AI developers aren't aware newer samplers exist

Anonymous
11/29/25(Sat)20:05:49 No.107376702

Anonymous 11/29/25(Sat)20:05:49 No.107376702

>>107373173
What CPUs are actually stable with huge ram amounts? Seems most consumer ones struggle with more than 2 dimms without dropping speed substantially.

Anonymous
11/29/25(Sat)20:07:56 No.107376724

Anonymous 11/29/25(Sat)20:07:56 No.107376724

>>107376693
t. kalo

Anonymous
11/29/25(Sat)20:08:32 No.107376733

Anonymous 11/29/25(Sat)20:08:32 No.107376733

>>107376702
with many CCDs maybe
you need to go epyc
high channel yes very importnt

Anonymous
11/29/25(Sat)20:08:48 No.107376738

Anonymous 11/29/25(Sat)20:08:48 No.107376738

>>107376638
it's improvement over 30b

Anonymous
11/29/25(Sat)20:10:31 No.107376754

Anonymous 11/29/25(Sat)20:10:31 No.107376754

Hey CUDA dev, what's your includePath for CUTLASS? I can't get proper IDE support

Anonymous
11/29/25(Sat)20:10:35 No.107376756

Anonymous 11/29/25(Sat)20:10:35 No.107376756

File: 1764420988603440.png (574 KB, 1080x1080)

574 KB PNG

>>107376649
>Do you think TheDrummer is finetuning the models on trillions of tokens?
obviously not.

look, I just don't think they can be trained online. you need to build a dataset and do the dpo or fine tuning. even considering offline training, users are likely a bad source of training tokens in the end.

Anonymous
11/29/25(Sat)20:11:25 No.107376761

Anonymous 11/29/25(Sat)20:11:25 No.107376761

>>107376649
>Do you think TheDrummer is finetuning the models on trillions of tokens?
No, but TheDrummer is also not really accomplishing anything with his tunes

Anonymous
11/29/25(Sat)20:11:34 No.107376762

Anonymous 11/29/25(Sat)20:11:34 No.107376762

>>107376724
I literally have no idea who that is, nobody outside your troon bubble cares about discord namefags or thread personalities. I only just arrived in this thread and I can already recognize all your posts just by your typing style. Stop shitting up this thread and kill yourself

Anonymous
11/29/25(Sat)20:12:46 No.107376772

Anonymous 11/29/25(Sat)20:12:46 No.107376772

>>107376434
spoonfeeding them is exactly why frogposters keep coming back

Anonymous
11/29/25(Sat)20:12:50 No.107376773

Anonymous 11/29/25(Sat)20:12:50 No.107376773

>>107376702
It's more about the motherboard and memory controller than the CPU but yes, consumer boards that use DDR5 are often shit with more than 2 dimms. Epyc/Threadripper are what you would need to go for. Or Intel Xeon, but fuck that. Alternatively, AM4 with DDR4 is usually okay with 4 dimms.

Anonymous
11/29/25(Sat)20:13:11 No.107376775

Anonymous 11/29/25(Sat)20:13:11 No.107376775

>>107376756
What counts as 100% in this chart, the whole internet-using population of the country, or the whole group in the country that use LLM?

Anonymous
11/29/25(Sat)20:15:07 No.107376790

Anonymous 11/29/25(Sat)20:15:07 No.107376790

schizo newfag

Anonymous
11/29/25(Sat)20:15:26 No.107376792

Anonymous 11/29/25(Sat)20:15:26 No.107376792

>>107376772
take care of your anon, like you'd like to be taken care of

Anonymous
11/29/25(Sat)20:16:04 No.107376802

Anonymous 11/29/25(Sat)20:16:04 No.107376802

>>107376775
The chart makes no sense.

Anonymous
11/29/25(Sat)20:16:54 No.107376808

Anonymous 11/29/25(Sat)20:16:54 No.107376808

>>107376775
its just propaganda and marketing buzz don't try to interrupt it.

Anonymous
11/29/25(Sat)20:17:06 No.107376809

Anonymous 11/29/25(Sat)20:17:06 No.107376809

>>107376772
Depends on the anon. I took the exact opposite approach in this very thread for someone else.

Anonymous
11/29/25(Sat)20:17:20 No.107376813

Anonymous 11/29/25(Sat)20:17:20 No.107376813

>>107376773
What's the issue with Xeon? Is it a similar situation to post-12th gen core intel chips?

Anonymous
11/29/25(Sat)20:21:48 No.107376850

Anonymous 11/29/25(Sat)20:21:48 No.107376850

>>107376813
Nothing. It's cheap when buying used but no one wants to support Intel

Anonymous
11/29/25(Sat)20:22:46 No.107376859

Anonymous 11/29/25(Sat)20:22:46 No.107376859

File: file.png (29 KB, 953x216)

29 KB PNG

bros.. i think im in love

Anonymous
11/29/25(Sat)20:27:35 No.107376880

Anonymous 11/29/25(Sat)20:27:35 No.107376880

File: file.png (39 KB, 965x229)

39 KB PNG

lul

Anonymous
11/29/25(Sat)20:29:56 No.107376889

Anonymous 11/29/25(Sat)20:29:56 No.107376889

>>107376724
kalo was right. Top P and nsigma are slop maximizers.

Anonymous
11/29/25(Sat)20:30:26 No.107376893

Anonymous 11/29/25(Sat)20:30:26 No.107376893

>>107376761
Fine, then take something like https://www.youtube.com/watch?v=GEJOB_TFYJ0
that achieves an improvement in some specific benchmark (in this case playing chess).

>>107376756
>look, I just don't think they can be trained online. you need to build a dataset and do the dpo or fine tuning.
You could use a rolling window approch. Customize the LLM as much as it can up to a certain point and once you get some amount of additional data customize it again.
>even considering offline training, users are likely a bad source of training tokens in the end.
I wouldn't think of user messages as a source of training tokens. I would think of them as a signal that can be fed to a classifier to filtre what training tokens to use for training, using whatever means you already have of acquiring training tokens.
And like I said, it could go beyond messages. It could mean customizing the model from user instructions (instead of appending them to the system prompt like "custom chatbots" work right now) or letting a model choose from a large amount of pre-customized models for different styles, subjects, languages etc.
For example this: https://arxiv.org/abs/2506.06105

Anonymous
11/29/25(Sat)20:30:55 No.107376896

Anonymous 11/29/25(Sat)20:30:55 No.107376896

>>107376889
[citation needed]

Anonymous
11/29/25(Sat)20:33:58 No.107376926

Anonymous 11/29/25(Sat)20:33:58 No.107376926

File: pigie.jpg (105 KB, 474x419)

105 KB JPG

>>107376896
Simply use the models and move the sliders.

Anonymous
11/29/25(Sat)20:41:38 No.107376970

Anonymous 11/29/25(Sat)20:41:38 No.107376970

File: code gone.png (155 KB, 1137x962)

155 KB PNG

WTF CHATGPT JUST DELETED ALL MY CODE!!!!!

Anonymous
11/29/25(Sat)20:45:03 No.107376991

Anonymous 11/29/25(Sat)20:45:03 No.107376991

>>107376889
Samplers are a hack

You were so close to trips of truth

Anonymous
11/29/25(Sat)20:45:06 No.107376992

Anonymous 11/29/25(Sat)20:45:06 No.107376992

>>107376970
oh no

Anonymous
11/29/25(Sat)20:54:46 No.107377067

Anonymous 11/29/25(Sat)20:54:46 No.107377067

File: taberu.gif (534 KB, 300x300)

534 KB GIF

>>107376970
What are you working on anon?

Anonymous
11/29/25(Sat)20:57:23 No.107377084

Anonymous 11/29/25(Sat)20:57:23 No.107377084

>>107377067
Learning how to use git to restore a file.

Anonymous
11/29/25(Sat)20:58:43 No.107377096

Anonymous 11/29/25(Sat)20:58:43 No.107377096

>>107377067
who cummed on teto?!

Anonymous
11/29/25(Sat)20:59:25 No.107377101

Anonymous 11/29/25(Sat)20:59:25 No.107377101

How do people put up with all the not-x, y slop in all the new youtube videos?

https://www.youtube.com/watch?v=jF-WAwk1K9k

I counted 19 times in this 12 minute video... Probably missed some.

Anonymous
11/29/25(Sat)21:00:20 No.107377112

Anonymous 11/29/25(Sat)21:00:20 No.107377112

It's not you, it's me. SHUT THE FUCK UP CLANKER

Anonymous
11/29/25(Sat)21:01:39 No.107377123

Anonymous 11/29/25(Sat)21:01:39 No.107377123

>>107376859
>>107376880
What model is this?

Anonymous
11/29/25(Sat)21:03:10 No.107377135

Anonymous 11/29/25(Sat)21:03:10 No.107377135

>>107377101
>How do people put up with all the not-x, y slop in all the new youtube videos?
You watched it. You gave him view time. You improved his ratings. You told him, and youtube, to keep doing what they're doing.

Anonymous
11/29/25(Sat)21:03:35 No.107377140

Anonymous 11/29/25(Sat)21:03:35 No.107377140

>>107377096
I admit it, it was me. Sowee

Anonymous
11/29/25(Sat)21:03:39 No.107377142

Anonymous 11/29/25(Sat)21:03:39 No.107377142

>>107377067
LLM inference engine (as of now only supports gpt oss 20b).
I was trying to find a faster way of multiplying bf16 by fp32, but I think to get more performance I'm gonna have to quantize the bf16 weights to int based quants.

Anonymous
11/29/25(Sat)21:04:34 No.107377153

Anonymous 11/29/25(Sat)21:04:34 No.107377153

>>107377067
I thought this was a heart at first

Anonymous
11/29/25(Sat)21:11:10 No.107377209

Anonymous 11/29/25(Sat)21:11:10 No.107377209

>>107377101
I put up with it by not watching slop and never clicking youtube recommendations

Anonymous
11/29/25(Sat)21:11:39 No.107377213

Anonymous 11/29/25(Sat)21:11:39 No.107377213

File: 1749081239967840.jpg (2.56 MB, 2508x3541)

2.56 MB JPG

>>107377142
Which tensor core version are you targeting? There should be tons of mixed precision kernels you can try out. Specifically, you might to look at BF16x9.

Anonymous
11/29/25(Sat)21:14:01 No.107377237

Anonymous 11/29/25(Sat)21:14:01 No.107377237

>>107377135
It was right there in front of him. How could he not click and watch? YouTube shouldn't recommend stuff to him that shouldn't be watched.

Anonymous
11/29/25(Sat)21:23:11 No.107377292

Anonymous 11/29/25(Sat)21:23:11 No.107377292

>>107377101
based ken la corte fan, been watching him since he had <20k subs
maybe not x but y is more popular than you think? i remember anons memeing a year or two ago how everything will become slop for us after too much llm usage.
recently i read a book and saw shivers, hook was written in the 19th century.

Anonymous
11/29/25(Sat)21:23:41 No.107377296

Anonymous 11/29/25(Sat)21:23:41 No.107377296

>>107377213
ok thank you
the card I'm working on is a 3090

Anonymous
11/29/25(Sat)21:24:11 No.107377304

Anonymous 11/29/25(Sat)21:24:11 No.107377304

>>107377123
glm air derestricted iq4xs

Anonymous
11/29/25(Sat)21:26:35 No.107377316

Anonymous 11/29/25(Sat)21:26:35 No.107377316

>>107377292
>everything will become slop for us after too much llm usage.
by us. our brains will get better at recognizing "slop", in human speech and writing. pre LLM era.

Anonymous
11/29/25(Sat)21:37:16 No.107377383

Anonymous 11/29/25(Sat)21:37:16 No.107377383

>>107377292
>>107377316
It's kind of like how normies get upset when you point out some image or video is fake or staged. You will notice the slop and call it out as generated and people will get mad because
>who cares if it's fake

Anonymous
11/29/25(Sat)21:51:58 No.107377468

Anonymous 11/29/25(Sat)21:51:58 No.107377468

File: ComfyUI_00148_.png (1.17 MB, 1024x1024)

1.17 MB PNG

>>107373173
Gotta catch 'em all

Anonymous
11/29/25(Sat)21:58:32 No.107377512

Anonymous 11/29/25(Sat)21:58:32 No.107377512

File: lock raking technique.gif (1.43 MB, 400x224)

1.43 MB GIF

>>107377468
Enthusiastically raking Rin-chan's lock

Anonymous
11/29/25(Sat)22:19:35 No.107377659

Anonymous 11/29/25(Sat)22:19:35 No.107377659

>>107377468
Only thing she's catching from me is unplanned pregnancy

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.