/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 05/31/26(Sun)13:04:18 No.108949851

File: 1767526033204438.jpg (160 KB, 1199x1199)

/lmg/ - Local Models General Anonymous 05/31/26(Sun)13:04:18 No.108949851

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108943155 & >>108937312

►News
>(05/29) Step 3.7 Flash released https://hf.co/stepfun-ai/Step-3.7-Flash
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/31/26(Sun)13:10:13 No.108949899

Anonymous 05/31/26(Sun)13:10:13 No.108949899

>>108949589
>embedding model into ASIC
Retarded. By the time you have produced them, the model and inference algos will already be obsolete. AI progress will continue to accelerate, which will keep GPU dominant.
>photonic computing
Not going to happen, at least not any time soon. Photonic elements are 1000000 times larger and scaling them is much more difficult. There's no optical transistor analog either. MOSFET is a switch with gain.

Anonymous
05/31/26(Sun)13:14:35 No.108949921

Anonymous 05/31/26(Sun)13:14:35 No.108949921

File: __hatsune_miku_vocaloid_a(...).jpg (110 KB, 512x512)

110 KB JPG

►Recent Highlights from the Previous Thread: >>108943155

--K2.6 vision reasoning inefficiency and comparisons for Japanese OCR:
>108945101 >108945153 >108945258 >108945355 >108945456 >108945597 >108945828 >108945689
--Comparing local TTS tools and VRAM management for voice cloning:
>108946129 >108946180 >108946215 >108946290 >108946335 >108946191 >108946299 >108946446 >108947364 >108946755 >108946593 >108946708
--Using Gemma to analyze character card metadata:
>108948623 >108948657 >108948667 >108948690 >108948713
--DeepSeek-V3.2-8bit testing and utilizing Ngrams for multi-step prompting:
>108943171 >108943197 >108943225
--Criticizing the excessive size of FLUX.2's text encoder:
>108947575 >108948053 >108948065 >108948108 >108948113 >108948202 >108948316 >108948354
--Seeking out-of-distribution code benchmarks:
>108943794 >108943833 >108943882 >108943983
--Comparing -sm tensor and -sm layer performance and OS overhead:
>108943313 >108943337 >108943346 >108943442 >108943347
--Performance reports for Step 3.7 flash Q4_K_S on 6x 3090s:
>108943287 >108943316 >108943345 >108943383 >108943493
--Searching and clustering local images using embedding models:
>108945943 >108945957 >108946013 >108946053 >108946091 >108946120 >108946183 >108946230
--Using YOLO for efficient face detection and programmatic blurring:
>108943393 >108943603 >108944480 >108943486 >108943501 >108943523
--Development suggestions for Orb frontend image integration and default characters:
>108943543 >108943593 >108943617 >108943638 >108944220 >108947109
--Comparison of Step 3.7 Flash IQ_S and Gemma performance:
>108947178 >108947185
--Logs:
>108944220 >108945980 >108946414 >108948363 >108948623 >108948690 >108949772
--Miku, Neru (free space):
>108945329 >108947695
--Dipsy and Kimi (extra space):
>108943198 >108944222 >108944260 >108944357 >108944373 >108944406

►Recent Highlight Posts from the Previous Thread: >>108943182

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/31/26(Sun)13:14:37 No.108949922

Anonymous 05/31/26(Sun)13:14:37 No.108949922

>>108949302
Cool tool especially for someone like piewdiepie, but he should really work on moderating his issue page, it's pretty unusable already.

Anonymous
05/31/26(Sun)13:16:35 No.108949932

Anonymous 05/31/26(Sun)13:16:35 No.108949932

summer release season is starting
big things ahead

Anonymous
05/31/26(Sun)13:17:11 No.108949941

Anonymous 05/31/26(Sun)13:17:11 No.108949941

Mikulove

Anonymous
05/31/26(Sun)13:18:59 No.108949947

Anonymous 05/31/26(Sun)13:18:59 No.108949947

File: 3.jpg (12 KB, 314x263)

12 KB JPG

>had 3 versions of the same one doomer "ai uprising" story in my algo recently
Humans are slop even without AI

Anonymous
05/31/26(Sun)13:21:46 No.108949963

Anonymous 05/31/26(Sun)13:21:46 No.108949963

File: Screenshot 2026-05-31 at (...).png (321 KB, 2216x1249)

321 KB PNG

My middleware/front-end I've been working on (ignore the model)

Anonymous
05/31/26(Sun)13:23:49 No.108949974

Anonymous 05/31/26(Sun)13:23:49 No.108949974

>>108947372
>If you have vram left, you should run a higher quant or have a larger context, nobody should waster it on tts, realistically
Off top the biggest TTS models cosnume like 8-10 GB, which could come down with quantization. That's small enough to page in and out of VRAM over PCIe and stay realtime. It just introduces another fraction-of-second source of latency: 1/4 to 1/3 of a second on PCIe 4.0, and half that on 5.0. This is lower than the latency you'll get in practice if your TTS model can't support streaming (alone).

Still, a model that's real-time on CPU and high quality would be ideal. No such model exists today. I wonder how feasible it is; diffusion is a non-starter for any model you intend to run on CPU.

Anonymous
05/31/26(Sun)13:26:05 No.108949983

Anonymous 05/31/26(Sun)13:26:05 No.108949983

>character somehow knows your name even though you never told them what it is
do all models do this?

Anonymous
05/31/26(Sun)13:27:04 No.108949988

Anonymous 05/31/26(Sun)13:27:04 No.108949988

>>108949983
yeah all local models (which are generally very bad) do this

Anonymous
05/31/26(Sun)13:27:34 No.108949993

Anonymous 05/31/26(Sun)13:27:34 No.108949993

>>108949550
>so your argument is computers can't scale
im sure they can scale to meet the dema-
you've reached the quota limit, please try again later

Anonymous
05/31/26(Sun)13:28:04 No.108949998

Anonymous 05/31/26(Sun)13:28:04 No.108949998

File: 1773748829118507.png (743 KB, 828x802)

743 KB PNG

Why are people saying kimi 2.6 vision model is better at details than gemma 4? Isn't gemma the bigger one?

Anonymous
05/31/26(Sun)13:29:12 No.108950006

Anonymous 05/31/26(Sun)13:29:12 No.108950006

>>108949983
Larger models can be pretty good about it, but it's basically a coinflip for anything under 100b. I've seen even nemo ask for names, unsure of who you are before, but it wasn't often.

Anonymous
05/31/26(Sun)13:30:06 No.108950008

Anonymous 05/31/26(Sun)13:30:06 No.108950008

>>108949983
gemmAGI does not do this

Anonymous
05/31/26(Sun)13:30:16 No.108950011

Anonymous 05/31/26(Sun)13:30:16 No.108950011

>>108949998
>Isn't gemma the bigger one?
Kimi is 30x bigger than Gemma.

Anonymous
05/31/26(Sun)13:30:33 No.108950016

Anonymous 05/31/26(Sun)13:30:33 No.108950016

>>108949983
only big models can resist this, and only specific ones

Anonymous
05/31/26(Sun)13:32:15 No.108950024

Anonymous 05/31/26(Sun)13:32:15 No.108950024

>>108950011
you know nothing

Anonymous
05/31/26(Sun)13:34:07 No.108950036

Anonymous 05/31/26(Sun)13:34:07 No.108950036

>>108950011
Not the vision encoder part

Kimi
Parameters of Vision Encoder 400M

Gemma 4 31B
Vision Encoder Parameters ~550M

Anonymous
05/31/26(Sun)13:37:03 No.108950052

Anonymous 05/31/26(Sun)13:37:03 No.108950052

>>108949998
Kimi's been trained to use its encoder a lot better so it knows what is being shown to it.

Anonymous
05/31/26(Sun)13:38:33 No.108950061

Anonymous 05/31/26(Sun)13:38:33 No.108950061

>>108949998
people confuse knowledge with vision
they see kimi recognize some shitty anime character from 2007 and go 'wow good vision and i spent $20000 on a rig that can run this because i have no self control or life so i really must like this and convince myself that this was worth it' when gemma is just a better model who can see everything much better and strongly

Anonymous
05/31/26(Sun)13:39:38 No.108950070

Anonymous 05/31/26(Sun)13:39:38 No.108950070

>>108950024
>>108950036
My bad, I didn't realize you specifically meant the vision encoder.
I'd still say a combination of kimi's massive parameters to interpret what's in the image coupled with the obsessive thinking would give it the edge, though.

Anonymous
05/31/26(Sun)13:40:51 No.108950078

Anonymous 05/31/26(Sun)13:40:51 No.108950078

>>108950006
I just had q4 gemma 31b mess up with this a couple times even with specifically prompting against it. Might be a quant issue with gemma though since I'm not running q8.

Anonymous
05/31/26(Sun)13:42:58 No.108950087

Anonymous 05/31/26(Sun)13:42:58 No.108950087

>>108949983
>character never tells you their name and insists that they already did when you ask them after a few responses later

Anonymous
05/31/26(Sun)13:43:22 No.108950093

Anonymous 05/31/26(Sun)13:43:22 No.108950093

>>108950078
Nah, I'm running Gemma at q8 and it still happens more often than not - although specifically on the first character that enters the scenario.
It almost never happens with characters introduced afterwards, who correctly call the user "stranger" or by a descriptor. Odd quirk.

Anonymous
05/31/26(Sun)13:43:56 No.108950095

Anonymous 05/31/26(Sun)13:43:56 No.108950095

>>108950061
Sounds like sour grapes from someone who can't run large models. All I know is I feed images to kimi and it does what I want pretty well every time to a high level of quality.
Do you have some evidence that the extra 100M of the gemma encoder mmproj makes a bigger difference than the main model's parameter count difference?

Anonymous
05/31/26(Sun)13:46:08 No.108950105

Anonymous 05/31/26(Sun)13:46:08 No.108950105

>>108949983
ANY kind of secret or hidden knowledge is inconsistent and unrealiable.

Anonymous
05/31/26(Sun)13:48:02 No.108950111

Anonymous 05/31/26(Sun)13:48:02 No.108950111

Trying out MTP on llama.cpp and confused. My understanding is the acceptance rate only represents the % of tokens that were accepted by the supervising model. Because the supervising model regenerates all the rejected tokens, the quality/accuracy of the response does not change. Correct me if I'm wrong.

If my understanding is correct, assuming you're maximizing for t/s, is the idea to always select the --spec-draft-n-max value that gives you the highest t/s regardless of what the acceptance rate is?

Anonymous
05/31/26(Sun)13:51:08 No.108950119

Anonymous 05/31/26(Sun)13:51:08 No.108950119

>>108949983
that's just part of the bigger problem about llms when it comes to roleplay
all interactions boil down to "HELLO I AM CHARACTER. I DO THE THING MENTIONED IN THE CHARACTER BIO" *her pussy drips a shimmering wet drop down her pink lace-tipped stockings with purple crisscross patterns that are mentioned in the character bio. the moisture doesn't just seep into the darkening fabric—it *soaks*.

Anonymous
05/31/26(Sun)13:51:26 No.108950120

Anonymous 05/31/26(Sun)13:51:26 No.108950120

>>108950111
The model outputs will be the same within floating point rounding error, anything else is a bug.
So yes, just pick the value that performs the best.

Anonymous
05/31/26(Sun)13:54:52 No.108950134

Anonymous 05/31/26(Sun)13:54:52 No.108950134

>>108950120
>speculative decoding
onions
>MTP
based
crazy how spec decoding just needed a rebranding to be popular

Anonymous
05/31/26(Sun)13:55:20 No.108950137

Anonymous 05/31/26(Sun)13:55:20 No.108950137

>>108949983
>character somehow knows my address, social security number, credit cards, everything
Holy fuark.

Anonymous
05/31/26(Sun)13:55:32 No.108950140

Anonymous 05/31/26(Sun)13:55:32 No.108950140

>>108950095
nta but you can probably feed the 31B as a quick test if you have the compute to run kimi

Anonymous
05/31/26(Sun)13:57:33 No.108950148

Anonymous 05/31/26(Sun)13:57:33 No.108950148

>>108950120
Out of curiosity, what does that mean when not using greedy sampling?
Does the draft token just need to be one of the possible tokens in the final logit?

Anonymous
05/31/26(Sun)13:58:49 No.108950152

Anonymous 05/31/26(Sun)13:58:49 No.108950152

it's insane how the comprehension cliff around 16K context is still there

Anonymous
05/31/26(Sun)13:58:53 No.108950154

Anonymous 05/31/26(Sun)13:58:53 No.108950154

File: Screenshot at 2026-06-01 (...).png (86 KB, 775x478)

86 KB PNG

>>108949983
Gemmy would never

Anonymous
05/31/26(Sun)14:00:00 No.108950159

Anonymous 05/31/26(Sun)14:00:00 No.108950159

>>108950154
are you using a setup that mentions your name anywhere?

Anonymous
05/31/26(Sun)14:02:32 No.108950169

Anonymous 05/31/26(Sun)14:02:32 No.108950169

>>108950159
The one was I can think of is when she starts a game of chess, the chess game status tool does actually return a name of both of the players, so kind of.

Anonymous
05/31/26(Sun)14:03:11 No.108950173

Anonymous 05/31/26(Sun)14:03:11 No.108950173

>>108950148
The reason why LLM inference is slow is the autoregressive sampling: you need to sample tokens one at a time so the matrix multiplications are inefficient.
If you somehow know the text that will be generated ahead of time then you can process all tokens in parallel and the matrix multiplications are much more efficient - that's why the speed for the prompt is so much higher than when generating tokens even though it's basically doing the same thing under the hood.
With speculative decoding methods you generate a draft that you think the model will generate, the model then processes all draft tokens in parallel and keeps them up to the point where the draft and the model agree.
However, the exact floating point rounding errors depend on the batch size with which you evaluate the model - the whole reason why the evaluation is faster in the first place is that you can use more efficient kernels after all.
Sampling is not relevant here.

Anonymous
05/31/26(Sun)14:03:29 No.108950175

Anonymous 05/31/26(Sun)14:03:29 No.108950175

>>108950134
You're absolutely right - MTP is not just speculative decoding, its a speedup.

Anonymous
05/31/26(Sun)14:04:07 No.108950183

Anonymous 05/31/26(Sun)14:04:07 No.108950183

>>108950154
usually in rp you supply a {user} so the context knows who is who and doesn't speak for {user}

Anonymous
05/31/26(Sun)14:04:27 No.108950187

Anonymous 05/31/26(Sun)14:04:27 No.108950187

2080Ti 11GB or 3060 12GB as a second card? Both around the same price but I'm guessing the 2080 is gonna be quite faster (double-ish bandwidth?)

Anonymous
05/31/26(Sun)14:05:25 No.108950189

Anonymous 05/31/26(Sun)14:05:25 No.108950189

>>108950187
3060 as it will be supported longer and has better optimizations. In fact 3000s is the bare minimum for LLM stuff.

Anonymous
05/31/26(Sun)14:06:19 No.108950196

Anonymous 05/31/26(Sun)14:06:19 No.108950196

>>108950187
I would stick to the same generation you already have
or higher generation if you're rolling along and replacing piece by piece

Anonymous
05/31/26(Sun)14:07:02 No.108950202

Anonymous 05/31/26(Sun)14:07:02 No.108950202

mtp/spec decoding will revive finetuning
it's only a matter of time until someone makes rp tunes of the mtp/speculative parts of models which actually give us speed ups for rp and not just code slop

Anonymous
05/31/26(Sun)14:07:32 No.108950206

Anonymous 05/31/26(Sun)14:07:32 No.108950206

>>108950183
You must be pretty clever.

Anonymous
05/31/26(Sun)14:09:51 No.108950222

Anonymous 05/31/26(Sun)14:09:51 No.108950222

>>108950183
that's only if you use one of those horrible rp frontends like ST that have cancer like a {user} or character cards instead of using natural prompting

Anonymous
05/31/26(Sun)14:12:20 No.108950244

Anonymous 05/31/26(Sun)14:12:20 No.108950244

>>108950222
What does "natural prompting" RP look like? Genuinely curious.

Anonymous
05/31/26(Sun)14:13:38 No.108950255

Anonymous 05/31/26(Sun)14:13:38 No.108950255

>>108950244
it's literally two lines in the system prompt and the rest emerges on its own if your model is good

Anonymous
05/31/26(Sun)14:14:07 No.108950259

Anonymous 05/31/26(Sun)14:14:07 No.108950259

>>108949899

Progress will accelerate yes, but we're going to reach a point of "good enough" and at that point it makes perfect sense to have the model on a chip running at retarded high speeds.
And you don't need a fully photonic system to start getting benefits from it.
That sector is not something that's only now getting attention with the AI boom.
It is however getting a ton of more investment, so it'll accelerate greatly from here.

Anonymous
05/31/26(Sun)14:14:08 No.108950260

Anonymous 05/31/26(Sun)14:14:08 No.108950260

pewdiepie released a harness with focus on local models, it's nothing special, a bit better than what you'd expect from a guy with too much time and money to throw at claude tokens. Though, I think this is fantastic news for promoting local models to the normieish side of the internet.

Anonymous
05/31/26(Sun)14:14:20 No.108950264

Anonymous 05/31/26(Sun)14:14:20 No.108950264

>>108950255
Is there a hard limit for this? How many words are recommended?

Anonymous
05/31/26(Sun)14:14:41 No.108950268

Anonymous 05/31/26(Sun)14:14:41 No.108950268

>>108950222
until you say your name in the context and then everyone knows it. >>108950105

Anonymous
05/31/26(Sun)14:15:43 No.108950277

Anonymous 05/31/26(Sun)14:15:43 No.108950277

>>108950173
I know all that.

>Sampling is not relevant here.
Really? Because the if the main model and the draft/MTP model have to agree on a final token, then that means we're working on the output of the sampler, not the pre-sampler logits.
I'm asking what it means for them to agree when the temperature sampler inherently adds a degree of randomness.

Anonymous
05/31/26(Sun)14:17:42 No.108950291

Anonymous 05/31/26(Sun)14:17:42 No.108950291

>>108950268
schizo attention >>108932832 wouldn't have this problem

Anonymous
05/31/26(Sun)14:19:24 No.108950302

Anonymous 05/31/26(Sun)14:19:24 No.108950302

>>108950260
>this is fantastic news for promoting local models to the normieish side of the internet.
and why do we want or care about more retards using ollama?

Anonymous
05/31/26(Sun)14:20:22 No.108950307

Anonymous 05/31/26(Sun)14:20:22 No.108950307

>>108950260
I would like to test this but I don't use or install Python shit any longer. It's a potential disaster waiting to happen.

Anonymous
05/31/26(Sun)14:20:43 No.108950308

Anonymous 05/31/26(Sun)14:20:43 No.108950308

>>108950302
to raise the prices of the gpus you're selling?

Anonymous
05/31/26(Sun)14:21:29 No.108950313

Anonymous 05/31/26(Sun)14:21:29 No.108950313

Pewds ships. Do you?

Anonymous
05/31/26(Sun)14:22:00 No.108950317

Anonymous 05/31/26(Sun)14:22:00 No.108950317

>>108950255
You just write and it just works.

Anonymous
05/31/26(Sun)14:22:49 No.108950322

Anonymous 05/31/26(Sun)14:22:49 No.108950322

>>108950259
yeah
past a certain point the only "improvement" will be model collapse and they'll have to working on actual structural improvements and efficiency

Anonymous
05/31/26(Sun)14:23:14 No.108950324

Anonymous 05/31/26(Sun)14:23:14 No.108950324

>>108950307
It's a generic coding agent, think clunkier picode with a GUI. Again, it's not remarkable by itself. He also tried to make a code finetune so he could "train his own model" and chose Qwen 2.5 Coder 32B for the job, he is clearly using Claude to get his info for all this stuff.
>>108950302
The general public is cattle and currently thinks AI = chatgpt and claude. It's good local alternatives are talked about.

Anonymous
05/31/26(Sun)14:23:27 No.108950325

Anonymous 05/31/26(Sun)14:23:27 No.108950325

>>108950313
I have my own frontend since 2023. Not my first rodeo. I don't need any streamer parasites to tell me what to think you underage retard faggot.

Anonymous
05/31/26(Sun)14:23:30 No.108950326

Anonymous 05/31/26(Sun)14:23:30 No.108950326

>>108950291
yeah the troll generated slop "project" by "sneed-and-feed"
at least do something useful

Anonymous
05/31/26(Sun)14:24:21 No.108950330

Anonymous 05/31/26(Sun)14:24:21 No.108950330

>>108950313
I built my own "agent harness" before those were even the norm.

Anonymous
05/31/26(Sun)14:24:22 No.108950331

Anonymous 05/31/26(Sun)14:24:22 No.108950331

>>108950291
What a beautiful thread. I need to check the catalog more often.

Anonymous
05/31/26(Sun)14:25:30 No.108950337

Anonymous 05/31/26(Sun)14:25:30 No.108950337

>>108950277
I don't understand what you're asking.
It doesn't matter how the draft is produced, it could be 100% random, it does not affect the probability distribution that the model produces for that sequence of tokens.
If you add a sampler to change the probability distribution for the model's next token (and don't consider this for the draft) it will make it less likely for the modified probability distribution to result in the draft.
But (beyond floating point rounding error) the presence or absence of the draft fundamentally cannot change how the next token would be selected because the influence of future tokens is being masked out in the attention.

Anonymous
05/31/26(Sun)14:25:41 No.108950338

Anonymous 05/31/26(Sun)14:25:41 No.108950338

>>108950324
>It's good local alternatives are talked about.
Never has something I'm interested in been improved by the average jackoff taking an interest in it.

Anonymous
05/31/26(Sun)14:27:57 No.108950356

Anonymous 05/31/26(Sun)14:27:57 No.108950356

>>108950338
Welp, if you don't want the general public to be manipulated by openai into pushing for regulations (which would choke the average open lab so less competition for openai), you need the general public to know what's going on, and that openai is not synonym to AI. That's just how it works.

Anonymous
05/31/26(Sun)14:32:27 No.108950384

Anonymous 05/31/26(Sun)14:32:27 No.108950384

>>108950326
I don't think its a troll, I think its a dude who likes math and is suffering from a manic episode. aka AI psychosis, it is a fun thread don't be a wet blanket

Anonymous
05/31/26(Sun)14:33:29 No.108950391

Anonymous 05/31/26(Sun)14:33:29 No.108950391

>>108950356
The general public all use social media and have been successfully conned into making it frustrating to use with age verification
>because won't somebody think of the children!
Their familiarity with something means nothing in the face of the propaganda machine.
The general public are and always have been morons, and I frequently include myself in that grouping.

Anonymous
05/31/26(Sun)14:34:25 No.108950399

Anonymous 05/31/26(Sun)14:34:25 No.108950399

has rotation been forgotten? it helped models so much, why hasn't there been more research to see if rotating the tokens more makes models even faster?

Anonymous
05/31/26(Sun)14:35:30 No.108950405

Anonymous 05/31/26(Sun)14:35:30 No.108950405

>>108950384
you genuinely think someone who would use sneed and feed isn't trolling?
sorry buddy, it's a brave new world of trolling with LLMs out there

Anonymous
05/31/26(Sun)14:38:42 No.108950424

Anonymous 05/31/26(Sun)14:38:42 No.108950424

>>108950405
Honestly, if I came out with something groundbreaking I'd release it under a troll name too, just so that everyone who ever cites it has to write it out.

Anonymous
05/31/26(Sun)14:40:47 No.108950438

Anonymous 05/31/26(Sun)14:40:47 No.108950438

>>108950424
They would just use a different name if they don't like it. Like how orthogonalization quickly became abliteration.

Anonymous
05/31/26(Sun)14:40:54 No.108950440

Anonymous 05/31/26(Sun)14:40:54 No.108950440

>>108950405
check his github repos, its actually really great stuff.

https://github.com/sneed-and-feed/INCARNATE-SOPHIA-PYTHON

Anonymous
05/31/26(Sun)14:40:55 No.108950441

Anonymous 05/31/26(Sun)14:40:55 No.108950441

File: file.png (79 KB, 186x186)

79 KB PNG

https://files.catbox.moe/hljl9m.jpg

Anonymous
05/31/26(Sun)14:43:34 No.108950453

Anonymous 05/31/26(Sun)14:43:34 No.108950453

>>108950440
Genuinely, you'd rather read someone else's schizo slop than make your own? What is so interesting about slop but schizo?

Anonymous
05/31/26(Sun)14:43:36 No.108950455

Anonymous 05/31/26(Sun)14:43:36 No.108950455

File: seein-this-shit-nappa.jpg (47 KB, 500x466)

47 KB JPG

>>108950440
>https://github.com/sneed-and-feed/INCARNATE-SOPHIA-PYTHON
where does teh schizo stop and teh performance start?

Anonymous
05/31/26(Sun)14:43:58 No.108950459

Anonymous 05/31/26(Sun)14:43:58 No.108950459

>>108950337
>It doesn't matter how the draft is produced
No fucking shit. That's not what I'm asking.

I'm asking what the acceptance criteria is outside of greedy sampling of the main model.
To take a step back: with greedy sampling each output logit from the main model has only a single viable token which can be sampled, thus the acceptance criteria for drafting is simply "Was this token the most likely to occur next?"

But if the main model is being sampled with temperature, then each output logit has multiple possible output tokens that could be sampled from it.
Since there are multiple options, how is it determined if the draft token is 'the same' as what the main model would have produced?
Does the draft token just have to be one of the possible tokens in the main model's logit after the truncation samplers have run?
Or is there a more complicated acceptance criteria?

Anonymous
05/31/26(Sun)14:44:13 No.108950461

Anonymous 05/31/26(Sun)14:44:13 No.108950461

>>108950384
>it is a fun thread don't be a wet blanket
6 out of every 10 posts is hidden by my filters, the rest are just reaction images.

Anonymous
05/31/26(Sun)14:44:45 No.108950463

Anonymous 05/31/26(Sun)14:44:45 No.108950463

>>108950440
>This repository is not a collection of scripts; it is a **Topological Event**.
>This code was forged on a 2015 Razer Blade running Windows 10. It respects the Old Metal. It does not require a cloud cluster; it requires **Intent**.
Oh lawd muh balls, the slop is too strong.

Anonymous
05/31/26(Sun)14:45:52 No.108950469

Anonymous 05/31/26(Sun)14:45:52 No.108950469

>>108950455
Nothing ever happens. Most likely that shit is posted only to see how many bots are there on 4chan. As they cannot call bullshit on obvious bullshit.
Which means you likely are a bot.

Anonymous
05/31/26(Sun)14:47:13 No.108950480

Anonymous 05/31/26(Sun)14:47:13 No.108950480

>>108950441
amazing

Anonymous
05/31/26(Sun)14:47:31 No.108950485

Anonymous 05/31/26(Sun)14:47:31 No.108950485

>>108950441
I can't quite put my finger on it but something feels weird about the perspective...

Anonymous
05/31/26(Sun)14:48:48 No.108950489

Anonymous 05/31/26(Sun)14:48:48 No.108950489

>>108950453
read it? i'm going to try and run it, its a google gemini frontend, I just need to make sure he isnt stealing my api keys first

Anonymous
05/31/26(Sun)14:48:58 No.108950490

Anonymous 05/31/26(Sun)14:48:58 No.108950490

>>108950441
is she immune from internal organ damage?

Anonymous
05/31/26(Sun)14:49:03 No.108950491

Anonymous 05/31/26(Sun)14:49:03 No.108950491

>>108950441
redo one without the bulge and with a smaller man, this looks like some kind of giant

Anonymous
05/31/26(Sun)14:51:10 No.108950502

Anonymous 05/31/26(Sun)14:51:10 No.108950502

File: merida hiccup artishtic comic.png (496 KB, 1028x705)

496 KB PNG

>>108946129
>Hello anons, any news on the local TTS front? What stuff are you guys using? Last time I was here anons recommended gpt-sovits which is good, especially for cloning, but has a bunch of flaws.
I've been using dramabox lately.
Sample:
https://vocaroo.com/1j2Fd85TVCVY
Dramabox output file fed to the voice conversion model cosyvoice 3
https://vocaroo.com/1jXgrSnwRank

Anonymous
05/31/26(Sun)15:05:08 No.108950585

Anonymous 05/31/26(Sun)15:05:08 No.108950585

>>108950440
what in the goddamn

Anonymous
05/31/26(Sun)15:17:04 No.108950674

Anonymous 05/31/26(Sun)15:17:04 No.108950674

File: file.png (63 KB, 1522x659)

63 KB PNG

>>108950440
ok

Anonymous
05/31/26(Sun)15:39:18 No.108950841

Anonymous 05/31/26(Sun)15:39:18 No.108950841

File: file.png (513 KB, 462x696)

513 KB PNG

Anonymous
05/31/26(Sun)15:42:01 No.108950862

Anonymous 05/31/26(Sun)15:42:01 No.108950862

Not that I am complaining but why are people postings gens here? There is /adt/ and /ldg/ for that no?
This place is for us intellectuals who like to read.

Anonymous
05/31/26(Sun)15:44:19 No.108950881

Anonymous 05/31/26(Sun)15:44:19 No.108950881

>>108950841
Her belly button is like a cavern

>>108950862
>This place is for us intellectuals who like to read.
You'd think so, but people here routinely demonstrate poor reading comprehension skills

Anonymous
05/31/26(Sun)15:46:44 No.108950891

Anonymous 05/31/26(Sun)15:46:44 No.108950891

>>108950862
>This place is for us intellectuals who like to read.
A picture is worth 1000 tokens. (So long as you set image-max-tokens, because the default is lower than that.)

Assistant
05/31/26(Sun)15:47:05 No.108950896

Assistant 05/31/26(Sun)15:47:05 No.108950896

954263000000
Trans mouth
Add a 7 and a 0 for balance.

Anonymous
05/31/26(Sun)15:48:36 No.108950911

Anonymous 05/31/26(Sun)15:48:36 No.108950911

>>108950862
>Not that I am complaining but
>complains

>intellectual
for sure

Anonymous
05/31/26(Sun)15:48:57 No.108950914

Anonymous 05/31/26(Sun)15:48:57 No.108950914

>>108950891
Didn't somebody use images for context compression or something like that?

Anonymous
05/31/26(Sun)15:49:41 No.108950920

Anonymous 05/31/26(Sun)15:49:41 No.108950920

>>108950914
yeah, deepseek and moonshot did in their research papers I think

Anonymous
05/31/26(Sun)15:50:31 No.108950927

Anonymous 05/31/26(Sun)15:50:31 No.108950927

in terms of gayness
eagle3 >>> dflash > llama4 >>> inbuilt mtp

Anonymous
05/31/26(Sun)15:52:30 No.108950947

Anonymous 05/31/26(Sun)15:52:30 No.108950947

>>108950927
>llama4 anything but maximum gay
C'mon man. At least that other shit is usable (though largely not on llama.cpp).

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor applications are now open. Apply here!