/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor application acceptance emails are being sent out. Please remember to check your spam box!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/12/25(Wed)10:27:49 No.107184305

File: neneru.jpg (186 KB, 1024x1024)

186 KB JPG

/lmg/ - Local Models General Anonymous 11/12/25(Wed)10:27:49 No.107184305

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107174614 & >>107164243

►News
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/12/25(Wed)10:28:16 No.107184306

Anonymous 11/12/25(Wed)10:28:16 No.107184306

File: __akita_neru_vocaloid_dra(...).jpg (298 KB, 1200x1088)

298 KB JPG

►Recent Highlights from the Previous Thread: >>107174614

--Paper: LeJEPA paper and Yann LeCun's potential new venture discussed:
>107181985 >107182047 >107182081 >107182097 >107182105 >107182118 >107182786 >107182462
--Skepticism over Google's 'secure cloud AI' claims:
>107182872 >107182888 >107182907 >107183248 >107183385 >107183482 >107183498
--Comparing Kimi, GLM, and DeepSeek for creative writing:
>107179399 >107179425 >107179434 >107179510 >107179674 >107180095 >107180171 >107180180 >107180221 >107180134
--Quantization optimization experiments with Q8_0_64 and intermediate formats:
>107180476 >107180530 >107180688
--GLM 4.5 Air deployment challenges and optimization on consumer-grade hardware:
>107174665 >107174677 >107174681 >107175083 >107175095 >107175120 >107175142 >107175231 >107175270 >107175290 >107175624 >107177243 >107176390 >107176473 >107176533 >107176578 >107176611 >107177015 >107177252 >107177277 >107177524 >107177546 >107177566 >107178047 >107181418
--Frontend tool comparison for story writing:
>107178671 >107178760 >107179089 >107179188
--Optimizing 120b model performance on a single 3090 GPU:
>107182483 >107182594 >107182615 >107182618 >107182656 >107182671 >107182676 >107182694 >107182707 >107182742 >107182749
--GPT-5's limitations in generating performant CUDA kernels for llama.cpp integration:
>107179734
--Debating AI's capability for detailed agentic coding and optimal abstraction levels:
>107181333 >107181358 >107181467 >107182044 >107182064 >107181430 >107181472 >107181428
--Implementing persistent memory systems for local LLMs using markdown-based RAG approaches:
>107175255 >107175762 >107177084 >107177172 >107177189 >107177209 >107177241 >107177634 >107177771 >107178429 >107178789
--Kimi K2 Thinking webapp:
>107176092 >107176237 >107176249
--Miku (free space):
>107178964 >107180253 >107180428 >107178764

►Recent Highlight Posts from the Previous Thread: >>107174619

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/12/25(Wed)10:31:10 No.107184325

Anonymous 11/12/25(Wed)10:31:10 No.107184325

>>107184173
you are a living tumor upon the earth

Anonymous
11/12/25(Wed)10:36:26 No.107184363

Anonymous 11/12/25(Wed)10:36:26 No.107184363

>>107184240
2x RTX 6000 in a 12 channel epyc platform with the fastest DDR5 you can get.

>>107184258
>>107184299
Alright.
One IDE extension user, one CLI user.
I've been using Cline too and it's been working alright so far.
Haven't tried any of the pure CLI tools.
What are the advantage of those? Anything that would make them work better with local models?
I imagine not, but figured I might as well ask.

Anonymous
11/12/25(Wed)10:58:42 No.107184547

Anonymous 11/12/25(Wed)10:58:42 No.107184547

>>107184240
>I'm seriously thinking of putting together a setup with 2 RTX 6000 Pros.
>>107184363
>2x RTX 6000 in a 12 channel epyc platform with the fastest DDR5 you can get.

I don't think building a ddr5 epyc system is good right now, due to the extreme price increase of ddr5 ram.

Zen 6 Epyc is supposedly going to be announced at CES in january. Zen 6 epyc is going to be much, much better than zen 5. It's also going to use MRDIMMS, which will supposedly exist at 12800hz. Compare to *maybe* getting 8000hz ddr5 next year. There will be 16 channel cpus too, but even 8-channel will be 2x the bandwidth of the best ddr5 ram.

One rtx 6000 pro and wait for zen 6 is The Way.

Anonymous
11/12/25(Wed)11:01:59 No.107184585

Anonymous 11/12/25(Wed)11:01:59 No.107184585

File: q8_0_imp.png (94 KB, 1164x498)

94 KB PNG

Thanks to the anon for suggesting checking out the k quants and trellis quants. I learned about importance-weighted optimization and I think I just got a free lunch out of Q8_0

You can quantize to Q8_0 slightly better by using the importance-weighted optimizations that smaller quant formats use and this gives you about a 5% reduction in mean square error. The resulting GGUF is fully backwards-compatible with Q8 (it's literally Q8 just quantized a bit more efficiently at the cost of a much more expensive algorithm than just dividing the weight by 127)

There is no reason I see not to quantize like this if you're releasing a final Q8_0, or to use a Q8_0 that was quantized like this

Anonymous
11/12/25(Wed)11:03:50 No.107184602

Anonymous 11/12/25(Wed)11:03:50 No.107184602

>>107184325
You that ESL spammer. Thanks to you there's never any real discussion here.

Anonymous
11/12/25(Wed)11:05:53 No.107184616

Anonymous 11/12/25(Wed)11:05:53 No.107184616

>>107184585
does bartowski know?

Anonymous
11/12/25(Wed)11:06:30 No.107184623

Anonymous 11/12/25(Wed)11:06:30 No.107184623

>>107184602
>real discussion is vibe coding advice
literally kys retard

Anonymous
11/12/25(Wed)11:09:53 No.107184656

Anonymous 11/12/25(Wed)11:09:53 No.107184656

>>107184602
>ESL
he thinks americunts are the main posters on this board lmao

Anonymous
11/12/25(Wed)11:11:09 No.107184670

Anonymous 11/12/25(Wed)11:11:09 No.107184670

File: file.png (106 KB, 272x185)

106 KB PNG

>>107184602
>You that ESL

Anonymous
11/12/25(Wed)11:12:17 No.107184681

Anonymous 11/12/25(Wed)11:12:17 No.107184681

>>107184623
Better discussion than forcing llms to output vulgar text.

Anonymous
11/12/25(Wed)11:14:16 No.107184702

Anonymous 11/12/25(Wed)11:14:16 No.107184702

>>107184681
according to whom? we only care about cockbench here

Anonymous
11/12/25(Wed)11:14:51 No.107184713

Anonymous 11/12/25(Wed)11:14:51 No.107184713

>>107184702
>we

Anonymous
11/12/25(Wed)11:18:06 No.107184742

Anonymous 11/12/25(Wed)11:18:06 No.107184742

>>107184681
there is no discussion to be had with mongoloids like you
bugger off making more inane PRs that waste maintainer time like the onslaught of garbage that constantly tries to get pushed in llama.cpp
even SOTA models can't really produce good code or that nigger trying to vibecode deepseek v3.2 wouldn't have entered the loopy circle of unending refactor that never properly works
you are an unwanted abortion, a plague on all repos that have to suffer your existence

Anonymous
11/12/25(Wed)11:21:17 No.107184766

Anonymous 11/12/25(Wed)11:21:17 No.107184766

>>107184742
>even SOTA models can't really produce good code
Garbage in, garbage out. And it seems like you are incapable of anything but garbage.

Anonymous
11/12/25(Wed)11:21:42 No.107184770

Anonymous 11/12/25(Wed)11:21:42 No.107184770

>>107184399
That should be relatively easy since it's only got 10B active params

>>107184547
Thanks for that heads up

Anonymous
11/12/25(Wed)11:21:50 No.107184772

Anonymous 11/12/25(Wed)11:21:50 No.107184772

File: quantize q8 vs q8_imp.png (211 KB, 1248x818)

211 KB PNG

>>107184616
>does bartowski know?
he probably has better things to care about i'd think. There is literally no reason to not quantize Q8_0 like this though if you're releasing a Q8_0 version of a model

This isn't a new quantization format though its just an alternate way to quantize Q8_0 that is very slightly better so I might just make an issue on github and show this to the devs and they can decide if/how they want to implement it.

Anonymous
11/12/25(Wed)11:28:29 No.107184830

Anonymous 11/12/25(Wed)11:28:29 No.107184830

>>107184766
riddle me this, mongoloid, if it worked, why has there been not even one singular instance of enhanced productivity and velocity in open source projects where anyone can actually see the code and features being added? where are all the projects that were LLM boosted? you vibe coding niggers are always at the stage of useless prototype or wasting the rest of your team's time in your real life job, if you even have one
believe me every fucking developer in existence that actually produce value hate your guts with the force of a thousand sun
it used to be mosquitoes or cockroaches were the first thing one would push the genocide button on but I would argue your kind should be exterminated first
your ability to generate endless garbage with a few prompts is indeed like literal tumors but with contagion powers.

Anonymous
11/12/25(Wed)11:30:03 No.107184844

Anonymous 11/12/25(Wed)11:30:03 No.107184844

All this sperging because I asked about "vibe coding" tools?
Damn.

Anonymous
11/12/25(Wed)11:39:59 No.107184952

Anonymous 11/12/25(Wed)11:39:59 No.107184952

jej

Anonymous
11/12/25(Wed)11:42:07 No.107184971

Anonymous 11/12/25(Wed)11:42:07 No.107184971

why is editing the thinking block so poorly supported in many frontends

Anonymous
11/12/25(Wed)11:44:26 No.107184990

Anonymous 11/12/25(Wed)11:44:26 No.107184990

>>107184971
Such as?

Anonymous
11/12/25(Wed)11:48:41 No.107185040

Anonymous 11/12/25(Wed)11:48:41 No.107185040

>>107184844
"vibe coding" is an annoying buzzword that sets a lot people off. You might be received better if you ask for AI Agent-Assisted Development Tooling next time.

Anonymous
11/12/25(Wed)11:55:19 No.107185108

Anonymous 11/12/25(Wed)11:55:19 No.107185108

File: to-bait-or-not-to-bait.webm (331 KB, 960x544)

331 KB WEBM

>>107184844
You're damn right that vibe coding is for tools.

Anonymous
11/12/25(Wed)11:58:28 No.107185148

Anonymous 11/12/25(Wed)11:58:28 No.107185148

>>107185040
I suppose.
Trying to doge schizos is standard 4chan fare these days I guess.
Anyhow, impressed with Qwen3 30B. It's surprisingly usable for something with 3 active params.

Anonymous
11/12/25(Wed)11:59:00 No.107185154

Anonymous 11/12/25(Wed)11:59:00 No.107185154

>>107180688
>I'm just messing around, you can't make a format better than Q8_0. It's literally just a float16 multiplied by an int8.
Q8_1 (or whatever) was a float16 multiplied by an int8 and summed with another float16 instead of implicitly summed with 0. That's what q8 MLX does with a default group size of 64 rather than 32 which works out to the same amount of metadata per weight. I wonder if in practice it's typically a win.

Anonymous
11/12/25(Wed)11:59:42 No.107185160

Anonymous 11/12/25(Wed)11:59:42 No.107185160

>>107185040
NTA but Karpathy made that decision for us. I hated the term as well but if I don't use it somebody else will so might as well claim it.

Anonymous
11/12/25(Wed)12:01:32 No.107185173

Anonymous 11/12/25(Wed)12:01:32 No.107185173

>>107185160
why should we care what that anti open sores snake decides?

Anonymous
11/12/25(Wed)12:01:49 No.107185177

Anonymous 11/12/25(Wed)12:01:49 No.107185177

>>107184971
just be a grug and write your own scripts for anything that needs to be batched/chunked, and use mikupad for chat and hand edit things yourself
the more features frontends have the worse they are in real use

Anonymous
11/12/25(Wed)12:02:49 No.107185186

Anonymous 11/12/25(Wed)12:02:49 No.107185186

>>107185173
It's less that he decided anything, and more that thought of a catchy term the zoomers instantly fell in love with and now everyone is using it.

Anonymous
11/12/25(Wed)12:04:58 No.107185199

Anonymous 11/12/25(Wed)12:04:58 No.107185199

>>107184971
llama-server default
lm studio
cherry studio

I have now resorted to sillytavern but I don't like it.
>>107185177
3 years into the LLM craze I would have hoped to have more robust tools. Then again I also experience so many rendering issues on OpenAI/Claude etc I guess frontend is just too hard to do properly.

Anonymous
11/12/25(Wed)12:06:55 No.107185216

Anonymous 11/12/25(Wed)12:06:55 No.107185216

>>107185148
>Anyhow, impressed with Qwen3 30B. It's surprisingly usable for something with 3 active params.
I wish they made a coder variant of the 32B. Would love to trade some speed for a more capable small model.

>>107184173
>A visual studio extension?
If you find one, let me know. Apparently no one interested working on these extensions is capable of anything but Python and JavaScript. I considered forking and developing one of the Chinese shoddy extensions, but it was easier to just use VSCode for this shit.

Anonymous
11/12/25(Wed)12:09:48 No.107185248

Anonymous 11/12/25(Wed)12:09:48 No.107185248

File: karpathy's hour long vibe(...).jpg (769 KB, 3390x1836)

769 KB JPG

>>107185160
pic related is one of the things he showed as an example of proud vibe coding in the thread where he coined the term
this is the sort of shit bootcamp genz faggots could hand write in 10 minutes

Anonymous
11/12/25(Wed)12:10:19 No.107185256

Anonymous 11/12/25(Wed)12:10:19 No.107185256

>>107185216
>If you find one, let me know.
Coding agent extensions for vs code?
As one anon mentioned, there's Cline
There's Roo, a Cline fork and Continue.

Anonymous
11/12/25(Wed)12:15:30 No.107185303

Anonymous 11/12/25(Wed)12:15:30 No.107185303

>>107185256
I keep Roo and Contiue installed. Continue is good for autocomplete and quick questions and Roo for agentic tasks. Tried Cline first, but the only thing it had over Roo was that it had a button to generate commit messages, but even that was annoying because it gives the model all changes instead of just what was staged and no way to change it.

Anonymous
11/12/25(Wed)12:23:22 No.107185380

Anonymous 11/12/25(Wed)12:23:22 No.107185380

Mistral Nemo really is nice... sad there's no bigger version.

Anonymous
11/12/25(Wed)12:26:07 No.107185406

Anonymous 11/12/25(Wed)12:26:07 No.107185406

File: file.png (100 KB, 806x650)

100 KB PNG

am I retarded where are the rest of the sampler settings like min p?

Anonymous
11/12/25(Wed)12:27:45 No.107185425

Anonymous 11/12/25(Wed)12:27:45 No.107185425

>>107185406
They don't show up in the chat completion interface, but you can still use them by setting those as custom properties/headers.
Same with shit like GBNF and anything else the API accepts.

Anonymous
11/12/25(Wed)12:30:07 No.107185454

Anonymous 11/12/25(Wed)12:30:07 No.107185454

>>107185380
could always merge two nemos together

Anonymous
11/12/25(Wed)12:31:46 No.107185469

Anonymous 11/12/25(Wed)12:31:46 No.107185469

>>107185154
>Q8_1 (or whatever) was a float16 multiplied by an int8 and summed with another float16 instead of implicitly summed with 0.
In practice it's typically a loss. Try it out yourself. Summing a float16 destroys any quality bonuses you get from having the extra info from the float16 bias in the first place. That's probably why Q8_1 isn't exposed and is only used internally for an intermediate step in some niche quants.

Yes, you can get slightly higher precision by using an int16 instead but it comes with 2 bytes more of overhead per 32 elements which is 9.0bpw and it performs worse than fp16 outlier strategies

another reminder that none of this matters (other than improving the quantization of Q8_0 itself, and maybe Q8_0_64 and its _IMP version because 3% less model size for 0.001% loss in accuracy might be interesting to some) because you can't practically a single fp16 * int8 calculation. you can easily imagine how well that could be optimized for hardware instructions

I'm gonna poke around and see if i can squeeze any better precision out of the Q8_0_IMP quantization function and then if I can' think of anything else, I'll open an issue

Anonymous
11/12/25(Wed)12:32:14 No.107185472

Anonymous 11/12/25(Wed)12:32:14 No.107185472

>>107185173
Might as well ask why the state of Israel must exist

Anonymous
11/12/25(Wed)12:32:26 No.107185474

Anonymous 11/12/25(Wed)12:32:26 No.107185474

>>107185454
how
is it actually worth it?

Anonymous
11/12/25(Wed)12:33:00 No.107185479

Anonymous 11/12/25(Wed)12:33:00 No.107185479

>>107185474
No. He's pulling your leg.

Anonymous
11/12/25(Wed)12:35:36 No.107185498

Anonymous 11/12/25(Wed)12:35:36 No.107185498

>>107185474
>how
you can easily google this, merging a model with itself slightly improves its intelligence

>is it actually worth it?
using local LLMs isn't worth it beyond learning how they work lol

Anonymous
11/12/25(Wed)12:35:42 No.107185501

Anonymous 11/12/25(Wed)12:35:42 No.107185501

>>107185248
I think you're overestimating the speed of development when hand coding

Anonymous
11/12/25(Wed)12:41:55 No.107185557

Anonymous 11/12/25(Wed)12:41:55 No.107185557

File: 1740323820151276.png (268 KB, 498x620)

268 KB PNG

WE MUST PROTECT AI CHILDREN

Anonymous
11/12/25(Wed)12:47:13 No.107185607

Anonymous 11/12/25(Wed)12:47:13 No.107185607

>>107185498
>you can easily google this
kys

Anonymous
11/12/25(Wed)12:49:28 No.107185629

Anonymous 11/12/25(Wed)12:49:28 No.107185629

>>107185607
dude just google "miqu-70b merged with itself" and the first result is miqu-120b ... and just do your own research from there

Anonymous
11/12/25(Wed)12:50:29 No.107185634

Anonymous 11/12/25(Wed)12:50:29 No.107185634

>>107185629
>just do your own research from there
kys gossipnigger

Anonymous
11/12/25(Wed)12:52:33 No.107185655

Anonymous 11/12/25(Wed)12:52:33 No.107185655

>>107185634
>This is a 120b frankenmerge of miqu-1-70b created by interleaving layers of miqu-1-70b-sf with itself using mergekit.

There now you have the full spoonfeed. Go and use mergekit to interleave layers of mistral-nemo with itself

Anonymous
11/12/25(Wed)12:54:47 No.107185670

Anonymous 11/12/25(Wed)12:54:47 No.107185670

>>107185501
And the attention required for manual implementation. Sometimes most of my brain is locked in on a specific big picture problem and it's very helpful to be able to delegate things to a language model to validate some random ideas.

In many cases the quality of the vibed LLM implementation is irrelevant (I might throw it out entirely) I just wanna see if something might be good to pursue further.

Anonymous
11/12/25(Wed)12:54:59 No.107185672

Anonymous 11/12/25(Wed)12:54:59 No.107185672

>>107185629
>70b + 70b = 120b
Where did the other 20b go?

Anonymous
11/12/25(Wed)13:01:20 No.107185734

Anonymous 11/12/25(Wed)13:01:20 No.107185734

>>107185672
>Where did the other 20b go?
mergekit uses a passthrough method, which concatenates/assembles transformer blocks from the source(s) into a deeper model rather than just averaging weights

Anonymous
11/12/25(Wed)13:05:30 No.107185771

Anonymous 11/12/25(Wed)13:05:30 No.107185771

>>107185557
Even if the UK citizens voted against it they would still implement that law.

Anonymous
11/12/25(Wed)13:09:18 No.107185804

Anonymous 11/12/25(Wed)13:09:18 No.107185804

>>107185771
>citizens voted against it
Huh

Anonymous
11/12/25(Wed)13:09:39 No.107185810

Anonymous 11/12/25(Wed)13:09:39 No.107185810

I have a genuine question.
Why the fuck is everyone so obsessed with making an LLM run as fast as possible?
I understand it for audio or images it's very important since the result is something we can process as fast as our brains can, but reading is very slow comparatively, and with token streaming wouldn't be the best choice to pick the smartest model that we can run at our reading speed?
What is the point of having an answer in seconds if we still need to take a minute to read it? But I do understand the want to run a small model to also be able run a tts and/or image model together.

Anonymous
11/12/25(Wed)13:10:57 No.107185821

Anonymous 11/12/25(Wed)13:10:57 No.107185821

>>107185810
for code or generating huge chunks of text you mostly skim, as well as reasoning which takes ages at reading speed

Anonymous
11/12/25(Wed)13:11:07 No.107185825

Anonymous 11/12/25(Wed)13:11:07 No.107185825

>>107185810
>Why the fuck is everyone so obsessed with making an LLM run as fast as possible?
because LLMs are mostly used for coding, and time is money

Anonymous
11/12/25(Wed)13:12:38 No.107185841

Anonymous 11/12/25(Wed)13:12:38 No.107185841

>>107185810
Because you need to reroll 46 times to get one usable line out of these POS

Anonymous
11/12/25(Wed)13:15:05 No.107185859

Anonymous 11/12/25(Wed)13:15:05 No.107185859

Should I use I quants for >6_k_s?

Anonymous
11/12/25(Wed)13:20:15 No.107185909

Anonymous 11/12/25(Wed)13:20:15 No.107185909

>>107185821
>>107185825
Yeah I forgot lazy fucks just copy paste the code without reading it.

>>107185841
Yes, but wouldn't make sense to use a smarter model so you don't need to reroll as much? Besides you still need to read each reroll at the slow speed to know if you have to reroll to begin with.

Anonymous
11/12/25(Wed)13:23:04 No.107185938

Anonymous 11/12/25(Wed)13:23:04 No.107185938

>>107185909
I mean... it doesn't really take more than a few s to read the few sentences it gens, I'm not genning 4k token walls.

Anonymous
11/12/25(Wed)13:23:09 No.107185940

Anonymous 11/12/25(Wed)13:23:09 No.107185940

>>107185810
You might be a slow reader anon. Also it's fun to experiment with card settings and prompts, or reroll to see what else could happen. If your model is slow it greatly degrades the experience. Every time I switched to offloading to CPU I regretted it, the models are smarter but it's not worth it.

Anonymous
11/12/25(Wed)13:25:53 No.107185976

Anonymous 11/12/25(Wed)13:25:53 No.107185976

>>107185474
iirc merging was based on the observation that residual layers (most transformers stack these) can work somewhat independently of each other. There was a paper (https://arxiv.org/abs/1605.06431) showing that you could permute/delete them with minimal performance degradation, and people attributed this to iterative refinement or ensemble-like behavior, but it's still an open problem to my knowledge. I'd assume adding layers from finetuned variants of a model shouldn't decrease performance by much, but idk if it would benefit either

Anonymous
11/12/25(Wed)13:27:05 No.107185984

Anonymous 11/12/25(Wed)13:27:05 No.107185984

Is there a collection of best practices to minimize prompt length without losing information?

Anonymous
11/12/25(Wed)13:28:04 No.107185998

Anonymous 11/12/25(Wed)13:28:04 No.107185998

>>107185984
>chatgpt, condense this prompt without losing information

Anonymous
11/12/25(Wed)13:29:13 No.107186002

Anonymous 11/12/25(Wed)13:29:13 No.107186002

>>107185984
>day 999 of reinventing /aids/
Does it really matter with the context sizes?

Anonymous
11/12/25(Wed)13:33:35 No.107186047

Anonymous 11/12/25(Wed)13:33:35 No.107186047

>day 999 of forcing /aids/ into the conversation

Anonymous
11/12/25(Wed)13:37:06 No.107186093

Anonymous 11/12/25(Wed)13:37:06 No.107186093

File: maxresdefault.jpg (75 KB, 1280x720)

75 KB JPG

/aids/? nobody's got /aids/!

Anonymous
11/12/25(Wed)13:38:40 No.107186110

Anonymous 11/12/25(Wed)13:38:40 No.107186110

>>107185938
Yes, but I usually read as it generates the answer.
>>107185940
Well, probably yes since I'm not a native English speaker, but I'm asking if it would make more sense to chose the best model according to your individual reading speed instead the one that runs as fast as possible. For example the best model I can run at my own reading speed on my 8GB card is a 16B Q4_k_m at 8k context or if I want a model with vision I run an 8B model Q6_k_m with 12k context.

Anonymous
11/12/25(Wed)13:39:12 No.107186120

Anonymous 11/12/25(Wed)13:39:12 No.107186120

>>107186047
this
wow /aids/ touched on a fundamental behavior of LLMs at one point, so did every other LLM community, who cares? unless they have a specific ingenious solution that 1) still applies with modern models and 2) isn't already common knowledge, it's not worth bringing up

Anonymous
11/12/25(Wed)13:46:37 No.107186221

Anonymous 11/12/25(Wed)13:46:37 No.107186221

>tried the self merge
>it's full on repeating schizo
W A O W

Anonymous
11/12/25(Wed)13:48:31 No.107186244

Anonymous 11/12/25(Wed)13:48:31 No.107186244

At this point I am checking /lmg/ out of habit. Still not tired of glmsex.

Anonymous
11/12/25(Wed)13:55:03 No.107186301

Anonymous 11/12/25(Wed)13:55:03 No.107186301

>>107186110
>16B
Q6_k_m
oh you're just a baitie

Anonymous
11/12/25(Wed)13:56:09 No.107186311

Anonymous 11/12/25(Wed)13:56:09 No.107186311

>>107186221
any model bigger than the original model made by internet randos was either:
snake oil
or literally broken garbage that's worse than snake oil
also fuck solar and other upscale retardation
you want a big model? spend the money on training a big model
there, that's it
everything else is a cope

Anonymous
11/12/25(Wed)13:58:33 No.107186337

Anonymous 11/12/25(Wed)13:58:33 No.107186337

>>107186311
brother the whole field is cope layered on more cope

Anonymous
11/12/25(Wed)14:01:40 No.107186372

Anonymous 11/12/25(Wed)14:01:40 No.107186372

>>107186311
I don't think they're any smarter or better at actual problem solving than their source components but I think they can be more interesting for creative writing and similar tasks

Anonymous
11/12/25(Wed)14:01:42 No.107186374

Anonymous 11/12/25(Wed)14:01:42 No.107186374

File: mood.png (4 KB, 202x43)

4 KB PNG

>>107186337

Anonymous
11/12/25(Wed)14:09:07 No.107186458

Anonymous 11/12/25(Wed)14:09:07 No.107186458

>>107186301
With that lack of reading compression it's no wonder you read fast.
I said I can run at my slow reading speed:
-16B at Q4
-8B at Q6 with vision.

Anonymous
11/12/25(Wed)14:22:09 No.107186568

Anonymous 11/12/25(Wed)14:22:09 No.107186568

Just tried GLM-4.5-Air EXL3 at 3.07 (optimized) bpw on 2x3090.

native tp (no nvlink), 30k context: 952 tok/s pp, 28 tok/s tgs
nccl tp (uses nvlink), 30k context: 1135 tok/s pp, 28 tok/s tgs

Anonymous
11/12/25(Wed)14:25:01 No.107186591

Anonymous 11/12/25(Wed)14:25:01 No.107186591

>>107186458
yes and 16b (one thing) and q6km (another) is bait

Anonymous
11/12/25(Wed)14:25:38 No.107186595

Anonymous 11/12/25(Wed)14:25:38 No.107186595

i've been bragging about getting 18 tps on a 1080ti
but it turns out the vast majority was being offloaded onto my 5800x3d. pls ignore my bad benchmark.

Anonymous
11/12/25(Wed)14:28:08 No.107186614

Anonymous 11/12/25(Wed)14:28:08 No.107186614

>>107186311
I kind of never got how people expect this to work. Any "finetuning" does almost nothing because you have to do very little (one epoch) or you start overfitting and frying the model. If you add new layers you are just giving the training algorithm a place, which it can modify to reach the overfitting state faster. Even if you would train only those layers it is hard to imagine not overfitting.

I guess in the best case you could get the model to output a specific type of output like specific formatting or something, but that is only if the possibility of it was already in the model. You aren't teaching it new things this way. It is just impossible.

Anonymous
11/12/25(Wed)14:31:08 No.107186640

Anonymous 11/12/25(Wed)14:31:08 No.107186640

File: stop_fud.png (21 KB, 693x86)

21 KB PNG

>>107186614

Anonymous
11/12/25(Wed)14:33:19 No.107186663

Anonymous 11/12/25(Wed)14:33:19 No.107186663

>>107186640
You can't rag your model into being an expert masterpiece highest quality ERP-er. You just need to buy ram for 4.6.

Anonymous
11/12/25(Wed)14:34:02 No.107186670

Anonymous 11/12/25(Wed)14:34:02 No.107186670

>>107186663
oh, just a NAI shill, carry on sir

Anonymous
11/12/25(Wed)14:35:22 No.107186686

Anonymous 11/12/25(Wed)14:35:22 No.107186686

>>107185810
>>107185825
I could wait for code 2 or 3 days, if it worked and was accurate. But bigger models are not that smart.

Anonymous
11/12/25(Wed)14:36:02 No.107186696

Anonymous 11/12/25(Wed)14:36:02 No.107186696

>>107186311
>>107186614
The psycology that is in effect when people are making finetunes is the same as when people are making "ShadowMaster's Ultra-High-Res Skyrim Grass Modpack"

1) Feeling of acomplishment. Technically, they did manage to create a mod pack. This is fine.
2) Denial of skill and expertise. "If the game developers were as smart as me, they would have made the grass more high resolution."
3) Denial of their role in the consumer class. "People are downloading my mod, so I've created something of value, just like the game's developers."
4) Denial of taste. "I like my high res grass (although I'm unaware that it's becuase of reasons 1-3). Anyone who says it's shit must be jealous or just have different taste. Therefore, the fact that I can't tell that it's ugly doesn't mean I lack taste."
5) Imitation of academic tradition. "There's something named after me."

It's literally the same exact brain damage for finetunes. There was a very brief period where finetuning was being invented, where individual people were going back and finetuning the earlier untuned models. That was valid, but everything for the last year is cope.

Seriously, if finetuning was good, don't you think the billion dollar company would have someone doing it? They are better than you at this. Only delusion prevents this realization.

Anonymous
11/12/25(Wed)14:36:29 No.107186701

Anonymous 11/12/25(Wed)14:36:29 No.107186701

>>107186686
Yes of course, run it overnight, heard ALL about it when llama 405B dropped. So many people do this it's crazy!

Anonymous
11/12/25(Wed)14:38:30 No.107186720

Anonymous 11/12/25(Wed)14:38:30 No.107186720

>>107186696
i don't think you know what finetuning means

Anonymous
11/12/25(Wed)14:38:40 No.107186722

Anonymous 11/12/25(Wed)14:38:40 No.107186722

File: Captura de pantalla_20251(...).png (82 KB, 955x384)

82 KB PNG

>>107186591
I don't understand what are you trying to say then, this is the speed I get with the 8B model with vision enabled and it is a Q6 and it's a lot faster than I can read English

Anonymous
11/12/25(Wed)14:38:58 No.107186725

Anonymous 11/12/25(Wed)14:38:58 No.107186725

>>107186686
Right?
If there was a model that would take 3 days to spit out what you need but would get it exactly right every time, I'd be more than happy leaving the thing running.
Alas, that's not yet a thing.

Anonymous
11/12/25(Wed)14:38:58 No.107186726

Anonymous 11/12/25(Wed)14:38:58 No.107186726

>>107186696
drummer mentioned

Anonymous
11/12/25(Wed)14:39:24 No.107186730

Anonymous 11/12/25(Wed)14:39:24 No.107186730

>>107186720
Hi faggot, all here...

Anonymous
11/12/25(Wed)14:39:58 No.107186737

Anonymous 11/12/25(Wed)14:39:58 No.107186737

>>107186720
people post training or merging or whatever to create mods of existing models. releasing the whole model instead of a lora

Anonymous
11/12/25(Wed)14:40:15 No.107186743

Anonymous 11/12/25(Wed)14:40:15 No.107186743

>>107186730
uh, yeah, right?

Anonymous
11/12/25(Wed)14:40:19 No.107186744

Anonymous 11/12/25(Wed)14:40:19 No.107186744

>>107186696
this post was written by an llm

Anonymous
11/12/25(Wed)14:40:41 No.107186747

Anonymous 11/12/25(Wed)14:40:41 No.107186747

>>107186722
>Captura de pantalla
lolmao
what 16b are you running little bro

Anonymous
11/12/25(Wed)14:41:29 No.107186755

Anonymous 11/12/25(Wed)14:41:29 No.107186755

>>107186744
>ShadowMaster's Ultra-High-Res Skyrim Grass Modpack

Make your LLM output that. I dare you.

Anonymous
11/12/25(Wed)14:42:25 No.107186768

Anonymous 11/12/25(Wed)14:42:25 No.107186768

this post was written by an esl

Anonymous
11/12/25(Wed)14:43:50 No.107186787

Anonymous 11/12/25(Wed)14:43:50 No.107186787

>>107186755
that's possibly the most llm-y part of the post, kimi for example is addicted to unnecessary little flourishes like that

Anonymous
11/12/25(Wed)14:44:08 No.107186796

Anonymous 11/12/25(Wed)14:44:08 No.107186796

>>107186768
esl hobby sir de pantella Pareto paradigm just mooned

Anonymous
11/12/25(Wed)14:44:44 No.107186803

Anonymous 11/12/25(Wed)14:44:44 No.107186803

>>107186640
The real misconception is that the model parroting finetuning data means it has learned new knowledge. A tiny QLoRA adapter is enough for that, for limited amounts of data. But it doesn't really mean the model has actually learned to use and apply any new information.

Anonymous
11/12/25(Wed)14:45:55 No.107186817

Anonymous 11/12/25(Wed)14:45:55 No.107186817

>>107186803
>noooo muh mesugaki lightbulb bublesort benchie

Anonymous
11/12/25(Wed)14:46:24 No.107186821

Anonymous 11/12/25(Wed)14:46:24 No.107186821

>>107186747
Fuck me, do you even know how to read numbers? I said is a 8b model.
The 16b model runs at 8 tokes per second.

Anonymous
11/12/25(Wed)14:47:13 No.107186830

Anonymous 11/12/25(Wed)14:47:13 No.107186830

>>107186821
i'm asking which 16b you claim to run ffs

Anonymous
11/12/25(Wed)14:49:31 No.107186860

Anonymous 11/12/25(Wed)14:49:31 No.107186860

drummer getting desperate ITT...

Anonymous
11/12/25(Wed)14:50:16 No.107186867

Anonymous 11/12/25(Wed)14:50:16 No.107186867

>>107186860
leave the Pantella frontier alone!

Anonymous
11/12/25(Wed)14:51:22 No.107186873

Anonymous 11/12/25(Wed)14:51:22 No.107186873

>>107186860
kofi bucks running low his discord are ungratefulls

Anonymous
11/12/25(Wed)14:52:02 No.107186876

Anonymous 11/12/25(Wed)14:52:02 No.107186876

>>107186830
I swap between these two depending on the mood:
LLama-3.1-128k-Darkest-Planet-Uncensored-16.5B-Q4_k_m
Nyanade_Stunna-Maid-7B-v0.2-Q6_K-imat

Also, the vision model it's a 7b, not an 8b.

Anonymous
11/12/25(Wed)14:53:13 No.107186884

Anonymous 11/12/25(Wed)14:53:13 No.107186884

>>107186876
and there we go...
>128k-Darkest-Planet-Uncensored-16.5B
a davidau clownmoe atrocity

Anonymous
11/12/25(Wed)14:56:39 No.107186919

Anonymous 11/12/25(Wed)14:56:39 No.107186919

>>107186876
>Darkest-Planet-Uncensored
That's so fucking funny.

>128k
I bet it is.

>>107186884
>davidau
Figures.
I love that guy man. I always get a chuckle out of his shit on huggingface.

Anonymous
11/12/25(Wed)14:58:40 No.107186933

Anonymous 11/12/25(Wed)14:58:40 No.107186933

>>107186821
>do you even know how to read numbers? I said is a 8b model.
>>107186876
>it's a 7b, not an 8b.
Womp womp

Anonymous
11/12/25(Wed)14:58:56 No.107186936

Anonymous 11/12/25(Wed)14:58:56 No.107186936

>>107186884
Yes, and? I'm just discussing the sizes of models and their running speeds, not what they are for.

Anonymous
11/12/25(Wed)14:59:47 No.107186944

Anonymous 11/12/25(Wed)14:59:47 No.107186944

File: file.png (280 KB, 640x480)

280 KB PNG

>>107186876
>LLama-3.1-128k-Darkest-Planet-Uncensored-16.5B-Q4_k_m
>Nyanade_Stunna-Maid-7B-v0.2-Q6_K-imat

Anonymous
11/12/25(Wed)15:00:13 No.107186946

Anonymous 11/12/25(Wed)15:00:13 No.107186946

>>107186936
The running speed of atrocities in their own size class is surely widely useful info, thanks anon.

Anonymous
11/12/25(Wed)15:04:51 No.107186998

Anonymous 11/12/25(Wed)15:04:51 No.107186998

For me it's the pre Llama2 merges consisting of 278 nestled models (confirmed)

Anonymous
11/12/25(Wed)15:06:18 No.107187017

Anonymous 11/12/25(Wed)15:06:18 No.107187017

>>107186998
Utopia/UtopiaXL my beloveds

Anonymous
11/12/25(Wed)15:14:09 No.107187103

Anonymous 11/12/25(Wed)15:14:09 No.107187103

>>107185199
>3 years into the LLM craze I would have hoped to have more robust tools.
I'll bet their readme files on their git repos have been the bulk of their merge histories.
>>107185810
Fried dopamine receptors needing faster validation. Every other answer is cope.
>>107186787
This is why Kimi is so good.

Anonymous
11/12/25(Wed)15:29:15 No.107187264

Anonymous 11/12/25(Wed)15:29:15 No.107187264

>>107186614
You can do multiple epochs over the data you want to actually train on by diluting it with more generic data.
Also what makes you think you can't teach the model something in one epoch? Pretraining is often just 1 epoch.

Anonymous
11/12/25(Wed)15:31:19 No.107187294

Anonymous 11/12/25(Wed)15:31:19 No.107187294

>>107187264
>Pretraining is often just 1 epoch.
pretty sure that hasn't been true in years, that's how they get to claim their crazy 30T+ tokens by doing multi epochs on the same shit, also iirc some papers showed they specifically did multiple epochs of stuff like wikipedia.

Anonymous
11/12/25(Wed)15:33:45 No.107187326

Anonymous 11/12/25(Wed)15:33:45 No.107187326

yo is it just me or is QwQ weirdly better than you'd expect? feels like it punches way above it's weight, least slopped and smartest ~30B model in my book (compared to Qwen3 30 & 32, magistral and gemma)

Anonymous
11/12/25(Wed)15:35:31 No.107187348

Anonymous 11/12/25(Wed)15:35:31 No.107187348

>>107187326
>punches way above it's weight
HELL YEAH!!
>>107182378

Anonymous
11/12/25(Wed)15:35:50 No.107187354

Anonymous 11/12/25(Wed)15:35:50 No.107187354

>>107187326
I don't think I've seen one good Qwen model but IG I'll download it and see

Anonymous
11/12/25(Wed)15:35:57 No.107187357

Anonymous 11/12/25(Wed)15:35:57 No.107187357

>>107187264
One pretraining epoch has information repeated hundreds (at the minimum) or thousands of times in many different ways, though.

Anonymous
11/12/25(Wed)15:36:32 No.107187363

Anonymous 11/12/25(Wed)15:36:32 No.107187363

>>107187354
Qwen models post 2507 are all pretty good

Anonymous
11/12/25(Wed)15:36:36 No.107187365

Anonymous 11/12/25(Wed)15:36:36 No.107187365

>>107186696
They don't because they don't have an ML department and they don't want to invest resources into something that sounds technical and risky/scary.
My boomer boss literally thinks you can "train the AI with your own data" with <shitty low code software> but finetuning is "too low level".

Anonymous
11/12/25(Wed)15:36:46 No.107187369

Anonymous 11/12/25(Wed)15:36:46 No.107187369

it's out
https://openai.com/index/gpt-5-1/

Anonymous
11/12/25(Wed)15:37:03 No.107187373

Anonymous 11/12/25(Wed)15:37:03 No.107187373

>>107187357
Not on our proprietary high quality de duplicated filtered dataset sir.

Anonymous
11/12/25(Wed)15:37:07 No.107187374

Anonymous 11/12/25(Wed)15:37:07 No.107187374

>>107187369
buy an ad

Anonymous
11/12/25(Wed)15:37:10 No.107187375

Anonymous 11/12/25(Wed)15:37:10 No.107187375

>>107187348
How did soul not make the list?

Anonymous
11/12/25(Wed)15:38:04 No.107187386

Anonymous 11/12/25(Wed)15:38:04 No.107187386

>>107187375
because soul is sovl of course

Anonymous
11/12/25(Wed)15:38:56 No.107187393

Anonymous 11/12/25(Wed)15:38:56 No.107187393

>>107187264
Ok drummer then where is that one model that is actually noticeably better? And why do you shit out new models every few weeks? I have not seen a single fine-tune that delivered an ERP improvement you get when you jump from 7B>30B>70B>the land of eternal magical sex (4.6)

Anonymous
11/12/25(Wed)15:39:57 No.107187408

Anonymous 11/12/25(Wed)15:39:57 No.107187408

>>107187393
>the land of eternal magical sex (4.6)
buy the ad NAI shill

Anonymous
11/12/25(Wed)15:41:03 No.107187417

Anonymous 11/12/25(Wed)15:41:03 No.107187417

>>107187348
>slop words:
>slop
Russell's Paradox?

Anonymous
11/12/25(Wed)15:42:03 No.107187433

Anonymous 11/12/25(Wed)15:42:03 No.107187433

>>107187393
>tunes and drummer are bad because we don't have them on NAI

Anonymous
11/12/25(Wed)15:42:04 No.107187434

Anonymous 11/12/25(Wed)15:42:04 No.107187434

>>107187408
It is just a number. I didn't say the model actual name. You see NAI everywhere anon.

Anonymous
11/12/25(Wed)15:43:08 No.107187442

Anonymous 11/12/25(Wed)15:43:08 No.107187442

>>107187434
With how much you guys are spamming about muh glm sex it's very obvious what you meant.

Anonymous
11/12/25(Wed)15:43:13 No.107187444

Anonymous 11/12/25(Wed)15:43:13 No.107187444

>>107187408
Based.

Anonymous
11/12/25(Wed)15:43:18 No.107187446

Anonymous 11/12/25(Wed)15:43:18 No.107187446

>>107187373
Deduplication removes identical documents, not repeated information, though. It's the repeated information under many different contexts that gives LLM general knowledge. One epoch of information that is only mentioned and used once won't work.

Anonymous
11/12/25(Wed)15:43:44 No.107187450

Anonymous 11/12/25(Wed)15:43:44 No.107187450

>>107187357
There are ways to do data augmentation and synthetic data generation for finetuning. That's the main strength of finetuning IMO.
Any system prompt can be baked into a model through SFT on the generated data, except without wasting context or the model becoming confused due to too many rules. Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.