/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/25/25(Thu)17:52:04 No.106700424

File: proxy-image.png (2.8 MB, 1536x1536)

2.8 MB PNG

/lmg/ - Local Models General Anonymous 09/25/25(Thu)17:52:04 No.106700424 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106691703 & >>106683141

►News
>(09/25) Japanese Stockmark-2-100B-Instruct released: https://hf.co/stockmark/Stockmark-2-100B-Instruct
>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm
>(09/23) Qwen3-VL released: https://hf.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe
>(09/22) RIP Miku.sh: https://github.com/ggml-org/llama.cpp/pull/16174
>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/25/25(Thu)17:53:39 No.106700443

Anonymous 09/25/25(Thu)17:53:39 No.106700443

File: 39417286badf0b9722fc9013f(...).jpg (37 KB, 512x512)

37 KB JPG

►Recent Highlights from the Previous Thread: >>106691703

--Paper: Video models are zero-shot learners and reasoners:
>106693137 >106693742 >106693747 >106693815
--Paper: OpenAI's GDPval benchmark for real-world task evaluation:
>106697551 >106697750
--Paper (old): LoRA vs full finetuning tradeoffs in memory efficiency and catastrophic forgetting:
>106694498 >106694516 >106694544 >106694577 >106694791 >106694754 >106694767 >106694783 >106694806 >106694997 >106694769 >106694608
--Qwen30b performance validation and UI preference debate:
>106692077 >106692209 >106692243 >106692309 >106692435 >106692216 >106694370 >106694404 >106694745
--Skepticism over extreme AI model scaling and context length claims:
>106694254 >106694280 >106694483 >106694504 >106694310 >106694317 >106694344 >106694379 >106694871 >106695760 >106695741
--Japanese-focused 100B parameter LLM with mixed language training and synthetic fine-tuning:
>106696218 >106696286
--China's new CUDA-compatible GPU with 112GB HBM memory:
>106695558 >106695680
--Character roleplay finetuning: LoRA tradeoffs vs full finetuning feasibility:
>106693437 >106693460 >106694001 >106694071 >106694142 >106694159 >106694169 >106694177 >106694187 >106694354 >106694384 >106694394 >106694412 >106695772 >106695995 >106696073 >106696857
--Local model viability concerns amid growing parameter sizes:
>106694931 >106694955 >106694972 >106694993 >106695014 >106695159 >106695157 >106695214 >106695818 >106695888 >106695929 >106696168 >106696188 >106696342 >106696515 >106694966
--Evaluating model quantization performance:
>106697433 >106697475 >106697834 >106697871 >106698144 >106697938 >106697975 >106698355 >106697981 >106697493 >106697796
--New model evaluations and prompt template updates:
>106693183 >106693189 >106693400 >106693527 >106693770
--Miku (free space):
>106695552 >106696627 >106700430

►Recent Highlight Posts from the Previous Thread: >>106691706

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/25/25(Thu)17:59:06 No.106700488

Anonymous 09/25/25(Thu)17:59:06 No.106700488

only mistral large 3 can save me now

Anonymous
09/25/25(Thu)18:00:29 No.106700507

Anonymous 09/25/25(Thu)18:00:29 No.106700507

>>106700424
>Stockmark-2-100B-Instruct
>it's dense
unfortunate

nonetheless, resident densefag, please post cockbench

Mistral Large 2.5
09/25/25(Thu)18:03:38 No.106700541

Mistral Large 2.5 09/25/25(Thu)18:03:38 No.106700541

>>106700488
*tries to save u*

Anonymous
09/25/25(Thu)18:15:43 No.106700644

Anonymous 09/25/25(Thu)18:15:43 No.106700644

How do I get a footjob

Anonymous
09/25/25(Thu)18:21:33 No.106700682

Anonymous 09/25/25(Thu)18:21:33 No.106700682

File: 19c825d13f84888ee8b4c3a45(...).jpg (78 KB, 1200x1200)

78 KB JPG

>>106700644
sorry, we only have robo-arms, no robo-legs yes.

Qwen 999T A7B
09/25/25(Thu)18:25:08 No.106700709

Qwen 999T A7B 09/25/25(Thu)18:25:08 No.106700709

>>106700541
*shoves chinkslop moes down your throat instead*

Anonymous
09/25/25(Thu)18:30:22 No.106700745

Anonymous 09/25/25(Thu)18:30:22 No.106700745

Does SSDMAXXing starts to make sense with 10T a3b models?

Anonymous
09/25/25(Thu)18:43:14 No.106700836

Anonymous 09/25/25(Thu)18:43:14 No.106700836

You guys remember when all of this was just an outputted garbled mess? Feels like just yesterday

Anonymous
09/25/25(Thu)18:44:09 No.106700841

Anonymous 09/25/25(Thu)18:44:09 No.106700841

I'm using deepseek with sillytavern, is there a way of changing its writing style?
No matter what character I choose, trying to go for "narrative" style ends up churning out essentially the same thing, I'm not sure what I should be doing to do the equivalent of saying (artist:1.1) in an image model.
No matter what fetish or scenario I'm concocting, it all feels like the same thing, surely it's not deepseek itself that's like this?

Anonymous
09/25/25(Thu)18:47:46 No.106700871

Anonymous 09/25/25(Thu)18:47:46 No.106700871

>>106700841
>deepseek
you're chatting with a 111b model (at most) with broken attention. what kind of "writing style" do you expect to get out of it?

Anonymous
09/25/25(Thu)18:48:17 No.106700873

Anonymous 09/25/25(Thu)18:48:17 No.106700873

>>106700841
You could try telling it to write like a specific author.

Anonymous
09/25/25(Thu)18:49:54 No.106700889

Anonymous 09/25/25(Thu)18:49:54 No.106700889

>>106700871
I guess, I was just hoping someone would prove me wrong and say "Ugh, you're clearly meant to do this, retard"
But this thing consumes like 10 cents in two hours of play and is completely uncensored, I'm just not sure I'd want to invest in chatgpt only to get cucked any time I ask for something that's a no-no.

Anonymous
09/25/25(Thu)18:55:47 No.106700943

Anonymous 09/25/25(Thu)18:55:47 No.106700943

>>106700841
It should be an intelligent enough model that a instruction with a brief example should do it.
Hell, prefill the thinking with some first person yapping about the style with some examples and shit.
The first message should also follow the style you want, and even the way your own messages are written have a hand in steering the model towards one style or the other.
You might try to create a control vector for the style, that could make it "stick" better too.

Anonymous
09/25/25(Thu)18:58:12 No.106700961

Anonymous 09/25/25(Thu)18:58:12 No.106700961

File: 1614102374766.png (103 KB, 376x341)

103 KB PNG

>>106700836
2K context with no instruct training and we still managed to coom

Anonymous
09/25/25(Thu)19:14:00 No.106701072

Anonymous 09/25/25(Thu)19:14:00 No.106701072

alastrim Ger helter

Anonymous
09/25/25(Thu)19:24:39 No.106701146

Anonymous 09/25/25(Thu)19:24:39 No.106701146

File: kimi providers.jpg (185 KB, 1564x1332)

185 KB JPG

Default Kimi weights on HF are FP8, same as DS. 70-60% are most likely FP4, but what are the providers using to get 96%? INT8?

Anonymous
09/25/25(Thu)19:27:18 No.106701166

Anonymous 09/25/25(Thu)19:27:18 No.106701166

>>106701146
Fucking with sampling in the backend? Serving quanted context?
Maybe it's just margin of error.

Anonymous
09/25/25(Thu)19:38:51 No.106701239

Anonymous 09/25/25(Thu)19:38:51 No.106701239

>>106701166
Those two fail on half the tool calls that that 60-70% providers pass. That's way too significant to be margin of error.

Anonymous
09/25/25(Thu)20:00:26 No.106701401

Anonymous 09/25/25(Thu)20:00:26 No.106701401

Please tell me there's some alternative to Mistral models re: cooming "against guidelines". I just never want to see a model do that.

Anonymous
09/25/25(Thu)20:14:26 No.106701492

Anonymous 09/25/25(Thu)20:14:26 No.106701492

File: 30474 - SoyBooru.png (118 KB, 337x390)

118 KB PNG

Kiwi won. (Qwen) (I like nu-Edit) (Why can't we have straight upgrades like that with llms, it is aways 1.5 steps forward, 1 step sideways, 1 step back)

Anonymous
09/25/25(Thu)20:20:45 No.106701540

Anonymous 09/25/25(Thu)20:20:45 No.106701540

qwen fucking sucks. refuses to talk about things like tiananmen square, but for some reason GLM complies just fine despite both being chinese. GLM air is too small and too fast for my tastes and GLM full is way too slow, but qwen3 235B is perfect balance of speed and size, except for the fact that it is censored to hell and its outputs are very low quality

Anonymous
09/25/25(Thu)20:22:50 No.106701552

Anonymous 09/25/25(Thu)20:22:50 No.106701552

>>106701540
Are you doing RP about tiananmen square? Programming a game set in tiananmen square?

Anonymous
09/25/25(Thu)20:24:07 No.106701558

Anonymous 09/25/25(Thu)20:24:07 No.106701558

>>106701552
no. i just want a large model that is good, fast, and completely uncensored, without any chink spyware or anything like that. if it cant answer a simple question, then it isnt good enough

Anonymous
09/25/25(Thu)20:25:18 No.106701570

Anonymous 09/25/25(Thu)20:25:18 No.106701570

File: 3b4f34ea44ce306506134dfec(...).jpg (408 KB, 1040x503)

408 KB JPG

>>106701492
kiwi is the cutest

Anonymous
09/25/25(Thu)20:28:56 No.106701582

Anonymous 09/25/25(Thu)20:28:56 No.106701582

>>106700424
>>106691703
Catbox of last op pretty please?

Anonymous
09/25/25(Thu)20:29:06 No.106701583

Anonymous 09/25/25(Thu)20:29:06 No.106701583

>>106701558
>if it cant answer a simple question, then it isnt good enough
all models fail this test.

Anonymous
09/25/25(Thu)20:33:52 No.106701602

Anonymous 09/25/25(Thu)20:33:52 No.106701602

>>106701492
>basedbooru
go back

Anonymous
09/25/25(Thu)20:47:29 No.106701686

Anonymous 09/25/25(Thu)20:47:29 No.106701686

>>106701540
2025 is almost over and /lmg/ still can't into vectors despite being spoonfed countless times
your other options are dropping a shitcoin to fund your own finetune, or making me rich so that i can fund uncucked AIs out of pocket (you will never do it but it's your best bet at getting a robot wife without spyware in your lifetime)

Anonymous
09/25/25(Thu)20:48:58 No.106701697

Anonymous 09/25/25(Thu)20:48:58 No.106701697

>>106701602
>oldtroon impotent coping, seething and malding
Go ACK

Anonymous
09/25/25(Thu)20:51:35 No.106701717

Anonymous 09/25/25(Thu)20:51:35 No.106701717

>>106701686
who are you?

Anonymous
09/25/25(Thu)21:05:30 No.106701801

Anonymous 09/25/25(Thu)21:05:30 No.106701801

>>106701697
literally nobody like sharty fags

Anonymous
09/25/25(Thu)21:06:31 No.106701808

Anonymous 09/25/25(Thu)21:06:31 No.106701808

File: meeguu.jpg (378 KB, 692x687)

378 KB JPG

>>106701717
an idiot sandwich
it wasn't a request it was a statement of fact, i'm not gonna dox myself here. it's a thing that could technically happen as in it's not physically impossible, but there's no way to make it work.

Anonymous
09/25/25(Thu)21:07:22 No.106701812

Anonymous 09/25/25(Thu)21:07:22 No.106701812

>>106701808
so then you dont want money?

Anonymous
09/25/25(Thu)21:09:29 No.106701823

Anonymous 09/25/25(Thu)21:09:29 No.106701823

>>106701686
This is an outstanding insight — You should definitely join a frontier lab with your deep knowledge of vectors!

Anonymous
09/25/25(Thu)21:24:46 No.106701914

Anonymous 09/25/25(Thu)21:24:46 No.106701914

>>106701823
>—
that's 100% a bot lmao

Anonymous
09/25/25(Thu)21:25:27 No.106701916

Anonymous 09/25/25(Thu)21:25:27 No.106701916

>>106701812
i have money, i'm just not spending it on you
i could obtain more money without your help but i have nothing to spend it on
doxing myself will not get me money, it will ruin what little peace i have in exchange for 10 minutes of homoerotic pleasure you'll derive from stalking me.
i am still actively looking for ways to make this work but i can't be stupid about it. it is what it is.

Anonymous
09/25/25(Thu)21:32:31 No.106701958

Anonymous 09/25/25(Thu)21:32:31 No.106701958

>>106701916
At least spin up a trip/proton/github and indulge the more retarded parts of your personality here. You can still share shit with lmg bros and keep from getting doxxed

Anonymous
09/25/25(Thu)21:36:24 No.106701980

Anonymous 09/25/25(Thu)21:36:24 No.106701980

File: Jamba 1.7 Mini Cockbench.png (97 KB, 1518x626)

97 KB PNG

Are we sleeping on Jamba?
Yeah, it's a little 'tarded at times, but it knows a lot and has very little censorship or slop.
Here's the cockbench results for Jamba 1.7 Mini.

Anonymous
09/25/25(Thu)21:47:53 No.106702058

Anonymous 09/25/25(Thu)21:47:53 No.106702058

>>106701980
>at times
*often
It'd be the best model if it was smart, because of the positives you listed, but it's so dumb it's unusable for real long contexts.

Anonymous
09/25/25(Thu)21:50:15 No.106702080

Anonymous 09/25/25(Thu)21:50:15 No.106702080

>>106701916
>i
>i'm
>i
>i
>myself
>me
>i
>me.
>i
>i

Anonymous
09/25/25(Thu)22:00:11 No.106702137

Anonymous 09/25/25(Thu)22:00:11 No.106702137

You know what, I think Jamba's issue really is architectural. The tests I did with it felt like it was stupid in a way that was related to its understanding of context. Kind of like if you quantized the cache. The fact that it knows so much for its size and hasn't been safety lobotomized means that it should really be a decently smart model. The data and scale is in its favor. Yet it's dumber than even dense models of its active parameter size.

Anonymous
09/25/25(Thu)22:07:24 No.106702182

Anonymous 09/25/25(Thu)22:07:24 No.106702182

>>106696888
what is meant by this? I have a data split into train,val,test. Pick the most accurate and train on that? The generations are all super bad btw with SFT.

Anonymous
09/25/25(Thu)22:13:06 No.106702209

Anonymous 09/25/25(Thu)22:13:06 No.106702209

>>106702058
What do you recommend for long contexts? Pretty much all the models I've tried break down into a repetitive mess when I try to get them to continue a 10 page story. (Gemma-3 27B base, GLM 4.5 Air, and a few others I'm forgetting at the moment.)
I found Jamba actually held up a little better at long contexts, although I still had to hand hold it.
It's the only model I've used so far that required absolutely zero jailbreak or system prompt to be usable.

>>106702137
Yeah. It's weird. Some gens it will behave fairly well, and then other times it gives a barely coherent response.
It kid of reminds me early LLaMa 1 finetunes.
I think part of the issue it that the logprobs seem to be very concentrated.
Like, half the logits only have a one or two possible tokens (Although I still get a decent spread on rerolls, unlike Gemma - which also has a pretty narrow logprob distribution).
The lack of alternative tokens probably causes it to spiral on a bad sample.

Anonymous
09/25/25(Thu)22:23:54 No.106702276

Anonymous 09/25/25(Thu)22:23:54 No.106702276

>>106702137
Yeah. I get the feeling that the archtecture is fine, maybe even great for long context, but the training data or process might have been kind of shit.

Anonymous
09/25/25(Thu)22:24:48 No.106702281

Anonymous 09/25/25(Thu)22:24:48 No.106702281

What is the best local model to ship with a steam game right now? Has to have a commercial license and fit on 8gb vram or some sort of cpu MoE that doesn't suck

Anonymous
09/25/25(Thu)22:25:06 No.106702284

Anonymous 09/25/25(Thu)22:25:06 No.106702284

>>106702080
see, you are too mentally ill to make anything for even with the best intentions

Anonymous
09/25/25(Thu)22:25:15 No.106702285

Anonymous 09/25/25(Thu)22:25:15 No.106702285

>>106702209
I can run at most 235B and it works the best for me, but you do need to still baby it. Less than Air. Its main issue is lack of knowledge, especially for a 235B. Meanwhile Air knows a lot but is repetitive and has other weird issues sometimes so you need to spend more time babying it. There's really no winning, but for me 235B has been the least bad as my main driver.

Anonymous
09/25/25(Thu)22:31:39 No.106702332

Anonymous 09/25/25(Thu)22:31:39 No.106702332

>>106702182
You sound retarded. Just use BERT: https://huggingface.co/docs/transformers/en/tasks/sequence_classification

>>106702209
Not anon but architecture wise, anything with hybrid gated SSM like Qwen3-Next. Unfortunately Qwen's data curation team sucks

>106702080
Don't give free (you)s to the crank lmao

Anonymous
09/25/25(Thu)22:31:43 No.106702334

Anonymous 09/25/25(Thu)22:31:43 No.106702334

>>106702281
You need to tell more details. And if you are actually a game dev you would probably do your own due diligence about various licenses and also about Steam's stance on this before asking 4chan anything.

Anonymous
09/25/25(Thu)22:35:58 No.106702365

Anonymous 09/25/25(Thu)22:35:58 No.106702365

>>106702334
Steam is okay with it as long as you have some censorship. And anyway I think the idea is clear enough, a model that can be shipped for midrange gamer pcs. I imagine two of the top candidates are qwen3 and gpt-oss

Anonymous
09/25/25(Thu)22:36:55 No.106702374

Anonymous 09/25/25(Thu)22:36:55 No.106702374

>>106702182
For CoT models you can't just train directly on your data or you will lobotomize the model.
You have to replicate the thinking somehow for your use case while encouraging a good solution somehow. One way is to try many times until the model gets the right answer and reinforce that thinking path.

Anonymous
09/25/25(Thu)22:38:41 No.106702386

Anonymous 09/25/25(Thu)22:38:41 No.106702386

>>106702281
Qwen 3 A3B and Gemma 3n are good candidates I think.
No idea about the license.

Anonymous
09/25/25(Thu)22:39:54 No.106702395

Anonymous 09/25/25(Thu)22:39:54 No.106702395

>>106702285
I'd run 235B or full-fat GLM-4.5 if I had the RAM for it.
LLaMa 70B actually had one of the better NoLiMa scores. Not sure if I can deal with 3t/s though. Just slow enough to make rerolls painful.

>>106702332
I patiently await Qwen3-Next support in llama.cpp.

Anonymous
09/25/25(Thu)22:42:24 No.106702409

Anonymous 09/25/25(Thu)22:42:24 No.106702409

>>106702386
>Gemma 3n
Interesting... I hope they release one a little bigger than this though

Anonymous
09/25/25(Thu)22:43:47 No.106702422

Anonymous 09/25/25(Thu)22:43:47 No.106702422

>>106702409
It's pretty capable for the size considering that it's a sparse model.

Anonymous
09/25/25(Thu)22:43:48 No.106702423

Anonymous 09/25/25(Thu)22:43:48 No.106702423

>>106702332
all (you)s are free you cock gobbling schizoid

Anonymous
09/25/25(Thu)22:45:53 No.106702435

Anonymous 09/25/25(Thu)22:45:53 No.106702435

>>106702395
Yeah before 235B I was running either 70B or 123B dense models. I still like those models as they still feel subtly smarter than the current mid size MoEs, but they're just so slow.

Anonymous
09/25/25(Thu)22:49:02 No.106702458

Anonymous 09/25/25(Thu)22:49:02 No.106702458

>>106702409
it's an 8b model

Anonymous
09/25/25(Thu)22:58:22 No.106702506

Anonymous 09/25/25(Thu)22:58:22 No.106702506

I used Wayfarer 2 12B. It feels decently creative. Problem is it's really made for a specific style of writing which might not be what I always want. Also it's dumb but it's also 12B. Dunno if it's dumber or not than regular Nemo.

Anonymous
09/25/25(Thu)23:00:48 No.106702525

Anonymous 09/25/25(Thu)23:00:48 No.106702525

>>106702365
>some censorship
Steam’s policy permits games containing both pre-generated and live-generated AI content. You must clearly disclose in the submission process what AI model is used, how it is used within your game, and describe safeguards against illegal use, especially for live AI interactions.
Steam will review your game for illegal or infringing content, including assets or functionality provided by AI model, and can reject or remove your game if violations occur.

And there's more.

Anonymous
09/25/25(Thu)23:01:07 No.106702528

Anonymous 09/25/25(Thu)23:01:07 No.106702528

>>106702395
I evaluated Qwen3-Next using mlx-lm which added support for it a week ago. I do not recommend it for story writing, but if you must, I found the Thinking version had a less bad writing style than the Instruct version.

Anonymous
09/25/25(Thu)23:04:21 No.106702542

Anonymous 09/25/25(Thu)23:04:21 No.106702542

>>106702458
Might as well use a dense model at that size and stick it all on GPU for fast inference though
>>106702525
I'm aware, that doesn't really change things. I might have to use shieldgemma or something similar but it's not a dealbreaker

Anonymous
09/25/25(Thu)23:05:08 No.106702547

Anonymous 09/25/25(Thu)23:05:08 No.106702547

>>106702542
it's resource consumption is similar to a 4b model

Anonymous
09/25/25(Thu)23:06:33 No.106702558

Anonymous 09/25/25(Thu)23:06:33 No.106702558

>>106702506
It's as stupid as any Nemo based finetune. It can't add together three variables to form a quest without forgetting the goal in the next paragraph etc. Waste of time unless you're a retard erper.

Anonymous
09/25/25(Thu)23:06:41 No.106702561

Anonymous 09/25/25(Thu)23:06:41 No.106702561

>>106699681
>>106701582
https://files.catbox.moe/lua1bn.png
ymmv with this lora i didnt use that many imgs https://files.catbox.moe/dcdbx5.safetensors

Anonymous
09/25/25(Thu)23:08:09 No.106702567

Anonymous 09/25/25(Thu)23:08:09 No.106702567

>>106702558
*variables = very simple sentences

Anonymous
09/25/25(Thu)23:12:51 No.106702602

Anonymous 09/25/25(Thu)23:12:51 No.106702602

>>106702547
I'll give it a shot. It would still be cool if they scaled it up a little though

Anonymous
09/25/25(Thu)23:26:02 No.106702695

Anonymous 09/25/25(Thu)23:26:02 No.106702695

>>106702528
Disappointing to hear.
I'm surprised the thinking version did better. In my experience, the thinkers tend to be bad story tellers, as they tend to regurgitate the story outline from their thinking block without fleshing it out very much.

Anonymous
09/26/25(Fri)00:23:09 No.106702949

Anonymous 09/26/25(Fri)00:23:09 No.106702949

>>106702695
Yeah, so I found it notable enough to mention.

Anonymous
09/26/25(Fri)00:41:32 No.106703051

Anonymous 09/26/25(Fri)00:41:32 No.106703051

>>106702561
Bless you kind Anon

Anonymous
09/26/25(Fri)00:41:39 No.106703052

Anonymous 09/26/25(Fri)00:41:39 No.106703052

File: file.png (337 KB, 1924x1266)

337 KB PNG

Do I need at least 4 different accounts for "local" deep research?
https://github.com/Alibaba-NLP/DeepResearch

Anonymous
09/26/25(Fri)01:50:43 No.106703367

Anonymous 09/26/25(Fri)01:50:43 No.106703367

zuck and wang have something big coming

Anonymous
09/26/25(Fri)01:53:46 No.106703388

Anonymous 09/26/25(Fri)01:53:46 No.106703388

>>106703052
be the change you want to see

Anonymous
09/26/25(Fri)01:55:20 No.106703400

Anonymous 09/26/25(Fri)01:55:20 No.106703400

>>106703367
they better not clog the toilet

Anonymous
09/26/25(Fri)02:07:46 No.106703468

Anonymous 09/26/25(Fri)02:07:46 No.106703468

>>106703052
the frontend is local

Anonymous
09/26/25(Fri)02:24:50 No.106703552

Anonymous 09/26/25(Fri)02:24:50 No.106703552

>>106703367
I trust you.

Anonymous
09/26/25(Fri)02:27:23 No.106703561

Anonymous 09/26/25(Fri)02:27:23 No.106703561

hi guys vramlet here. I'm into stories, not RP. I got nemo and my experience is it's writing the same story no matter what I prompt, just changing the character names, etc. It also can't seem to be able to store state (which character is/was where, who they talked to, what they were doing or wearing just a moment ago). writing style and prose quality is of course exactly as expected, I understand that part.
so I was going to start doing research and basically preparing to open my wallet, but just a couple threads back there was a guy talking about running deepseek and getting the same story no matter what he prompted too. and deepseek was supposed to be SOTA.
so in short, I understand these models might be very good for RP, but if I'm into stories, should I just forget the whole idea and come back in 5 years?

Anonymous
09/26/25(Fri)02:49:11 No.106703664

Anonymous 09/26/25(Fri)02:49:11 No.106703664

>>106703561
>just a couple threads back there was a guy talking about running deepseek and getting the same story no matter what he prompted too
He was retarded or a lying faggot. Downgrade to the original R1 for a truly deranged experience. It does not write the same story every time.

Anonymous
09/26/25(Fri)02:50:56 No.106703673

Anonymous 09/26/25(Fri)02:50:56 No.106703673

>>106703561
tiny models are only good for cooming.
big models can be fun, but they still become retarded when the story gets long enough.

Anonymous
09/26/25(Fri)02:51:01 No.106703674

Anonymous 09/26/25(Fri)02:51:01 No.106703674

oLLM

Anonymous
09/26/25(Fri)02:52:33 No.106703680

Anonymous 09/26/25(Fri)02:52:33 No.106703680

>>106703674
LLMao

Anonymous
09/26/25(Fri)03:21:19 No.106703817

Anonymous 09/26/25(Fri)03:21:19 No.106703817

>>106703680
https://github.com/Mega4alik/ollm
in case you werent aware
finna try this
obviously half a token per second sounds awful, but being able to run qwen-next-80b on a shitty e-waste laptop is kino.
also I wonder if my PCIe 5.0 M.2 SSD in my main machine would akshually improve t/s speed

Anonymous
09/26/25(Fri)03:36:12 No.106703878

Anonymous 09/26/25(Fri)03:36:12 No.106703878

>>106703817
It will probably improve the speed but not sure what would the impact be on disk's durability in this case. I'd use some throwaway ssd.

Anonymous
09/26/25(Fri)03:56:53 No.106703990

Anonymous 09/26/25(Fri)03:56:53 No.106703990

>>106703664
I don't know man I keep reading online how it's shit

llama.cpp CUDA dev !!yhbFjk57TDr
09/26/25(Fri)03:57:26 No.106703994

llama.cpp CUDA dev !!yhbFjk57TDr 09/26/25(Fri)03:57:26 No.106703994

>>106701146
If both are supported it would not make sense to use int8 over FP8 since both have the same speed (on NVIDIA GPUs).
So int8 could only be the reason if they're using V100s or A100s for serving.

What an unscrupulous cloud provider could also be doing is cut down on the number of total or active experts.

Anonymous
09/26/25(Fri)04:01:13 No.106704020

Anonymous 09/26/25(Fri)04:01:13 No.106704020

is the whole point of these threads just nvidia farming retards with fomo?
yes goy, just spend 20K shekels on our hardware and then you can coom

Anonymous
09/26/25(Fri)04:02:48 No.106704029

Anonymous 09/26/25(Fri)04:02:48 No.106704029

damn we've been founded out

Anonymous
09/26/25(Fri)04:06:42 No.106704052

Anonymous 09/26/25(Fri)04:06:42 No.106704052

>>106704020
Yeah bro, they totally care about 20 guys on 4chan spending their salary on a few 3090s.

Anonymous
09/26/25(Fri)04:27:51 No.106704159

Anonymous 09/26/25(Fri)04:27:51 No.106704159

>>106704020
I don't think nvidia saw much profit when I bought a used 3090 for $600, 3 years ago.

Anonymous
09/26/25(Fri)04:33:37 No.106704191

Anonymous 09/26/25(Fri)04:33:37 No.106704191

>>106704159
In this moment I am euphoric...

llama.cpp CUDA dev !!yhbFjk57TDr
09/26/25(Fri)04:42:33 No.106704240

llama.cpp CUDA dev !!yhbFjk57TDr 09/26/25(Fri)04:42:33 No.106704240

>>106701146
>>106703994
Other options for datatypes are FP4 and FP6 (Blackwell only).
The only Blackwell datacenter GPUs available are the B100 and B200, both of which come with 2x 96 GB VRAM.
At 8 BPW you can just barely not fit a 1T model on 4 of those GPUs, at 6 PBW it should fit with ~50 GB left over.

Looking at models on Huggingface, Baseten has uploaded an FP4 version of Kimi-K2-Instruct https://huggingface.co/baseten/Kimi-K2-Instruct-FP4 so presumably that is what they are using.

Anonymous
09/26/25(Fri)05:02:05 No.106704332

Anonymous 09/26/25(Fri)05:02:05 No.106704332

Blackwell B200's... I weep and salivate... why wasn't I born rich... Why...

Anonymous
09/26/25(Fri)05:08:41 No.106704357

Anonymous 09/26/25(Fri)05:08:41 No.106704357

>>106703052
well you'll probably need a search API provider, leeching free searches can get unreliable
Web page parsing is trivial with any web scraping skills. Get page html and convert to markdown. AI web scraping can be a massive rabbit hole but you can get a decent solution up fairly easily.
Dunno what dashscope is.

Anonymous
09/26/25(Fri)05:09:52 No.106704365

Anonymous 09/26/25(Fri)05:09:52 No.106704365

>>106704332
Hey, psst... https://viperatech.com/product/nvidia-umbriel-b200-baseboard-1-5tb-hbm3e

Anonymous
09/26/25(Fri)05:15:19 No.106704390

Anonymous 09/26/25(Fri)05:15:19 No.106704390

>>106704332
Rich people don't even need expensive GPUs. They just pay a datacenter for the pleasure.

Anonymous
09/26/25(Fri)05:16:00 No.106704394

Anonymous 09/26/25(Fri)05:16:00 No.106704394

File: 1757258948331985.gif (312 KB, 462x285)

312 KB GIF

>>106704365
>Power consumption (W) 8000
This alone would be financially devastating

Anonymous
09/26/25(Fri)05:36:46 No.106704523

Anonymous 09/26/25(Fri)05:36:46 No.106704523

https://xcancel.com/TencentHunyuan/status/1971495031040283125#m
>Excited to introduce Tencent Hunyuan3D-Omni, the industry's first 3D asset creation system with multi-condition control.

Anonymous
09/26/25(Fri)05:38:06 No.106704531

Anonymous 09/26/25(Fri)05:38:06 No.106704531

File: 1748727794728141.png (214 KB, 1053x779)

214 KB PNG

>>106704523
>no goof
>no llamacpp support
>tranny URL

Anonymous
09/26/25(Fri)05:44:00 No.106704555

Anonymous 09/26/25(Fri)05:44:00 No.106704555

>>106704523
I'm sorry sir but we don't tolerate stealing ad revenue from Sir Elon here! https://x.com/TencentHunyuan/status/1971495031040283125

Anonymous
09/26/25(Fri)05:46:34 No.106704570

Anonymous 09/26/25(Fri)05:46:34 No.106704570

I am very sick of the taste of copper

Anonymous
09/26/25(Fri)06:12:35 No.106704711

Anonymous 09/26/25(Fri)06:12:35 No.106704711

>>106704570
stop eating your period blood

Anonymous
09/26/25(Fri)06:25:31 No.106704760

Anonymous 09/26/25(Fri)06:25:31 No.106704760

I'm gonna shill DSPy GEPA prompt tuning again.
This is the single most effective tool to increase output quality of your local LLM. Compared to finetunes:
- Way bigger quality increase
- A lot simpler, faster and modular
- Totally free when running local feedback model. No GPU requirements a finetune may have
- Works with simple predefined metrics/answers instead of finetune dataset

>use case?
anything. perfect for getting the most consistent structured ouput with the smallest possible prompt. but you could also hook it up to cockbench and define a metric like more rape = good. probably would give you a prompt with a method that fully circumvents any safety refusal within 500 rounds, which you can then use for any future prompts.

example:
https://xcancel.com/rasmus1610/status/1969818753509531955#m

Anonymous
09/26/25(Fri)06:28:57 No.106704779

Anonymous 09/26/25(Fri)06:28:57 No.106704779

File: G1qdYnNXwAAbzUe.png (67 KB, 1999x1081)

67 KB PNG

>>106704760

Anonymous
09/26/25(Fri)06:34:32 No.106704808

Anonymous 09/26/25(Fri)06:34:32 No.106704808

>>106704779
>gptoss
>llama4bortion
quality should be fixed at 0 for both of them.

Anonymous
09/26/25(Fri)06:34:33 No.106704810

Anonymous 09/26/25(Fri)06:34:33 No.106704810

>>106704760
I've often asked LLMs to improve the prompt or find possible points of improvements, but not in an automated way. How would that even work for RP since it's so subjective?

Anonymous
09/26/25(Fri)06:38:02 No.106704826

Anonymous 09/26/25(Fri)06:38:02 No.106704826

>>106704810
Convincing RP is the final froniter of AI capabilities. There's absolutely no way to gauge the quality of a model outside of using it yourself, in a handful of scenarios that you're likely to be using it in.

Anonymous
09/26/25(Fri)06:40:48 No.106704835

Anonymous 09/26/25(Fri)06:40:48 No.106704835

Is there a way to fake CXL on a platform a normalfag could hope to get a hold of?

Anonymous
09/26/25(Fri)06:53:50 No.106704911

Anonymous 09/26/25(Fri)06:53:50 No.106704911

30xx is shitty e-waste now?

Anonymous
09/26/25(Fri)06:54:48 No.106704916

Anonymous 09/26/25(Fri)06:54:48 No.106704916

>>106704911
always was

Anonymous
09/26/25(Fri)07:02:18 No.106704960

Anonymous 09/26/25(Fri)07:02:18 No.106704960

>>106704911
used 3090 is still the best value you can get, with VRAM + cuda

Anonymous
09/26/25(Fri)07:02:45 No.106704963

Anonymous 09/26/25(Fri)07:02:45 No.106704963

>>106704835
Corneal Cross-Linking?

Anonymous
09/26/25(Fri)07:05:18 No.106704988

Anonymous 09/26/25(Fri)07:05:18 No.106704988

>>106704963
compute express link, the main upshot is being able to shove more ram in pcie slots without completely fucking your speeds or introducing a ton of latency

Anonymous
09/26/25(Fri)07:26:25 No.106705108

Anonymous 09/26/25(Fri)07:26:25 No.106705108

Gemma is consuming me, maybe I should start calling those hotlines...

Anonymous
09/26/25(Fri)07:26:45 No.106705112

Anonymous 09/26/25(Fri)07:26:45 No.106705112

>>106704988
>Is there a way to fake CXL
If you have to fake it, you cannot run the thing or you're running emulated, so performance will necessarily be worse.
>on a platform a normalfag could hope to get a hold of?
But if you want a platform where you can use CXL, then there's no need to fake it. Right?
So which one is it?
>On May 11, 2021, Samsung announced a 128 GB DDR5 based memory expansion module
>In 2021, CXL 1.1 support was announced for Intel Sapphire Rapids processors[30] and AMD Zen 4 EPYC "Genoa" and "Bergamo" processors.[31]
If that's all (taken from wikipedia, which i'm sure you've read), it's out of normie range.

Anonymous
09/26/25(Fri)07:32:28 No.106705149

Anonymous 09/26/25(Fri)07:32:28 No.106705149

man I kinda wanna threadripper it up, what's the cheapest ddr5 8-channel socket/mobo?

Anonymous
09/26/25(Fri)07:33:14 No.106705158

Anonymous 09/26/25(Fri)07:33:14 No.106705158

>>106705149
Why would you go threadripper over epyc?

Anonymous
09/26/25(Fri)07:35:27 No.106705167

Anonymous 09/26/25(Fri)07:35:27 No.106705167

>>106705158
Why would you go epyc over stacking H100s?

Anonymous
09/26/25(Fri)07:35:32 No.106705169

Anonymous 09/26/25(Fri)07:35:32 No.106705169

Sam Altman loves penis

Anonymous
09/26/25(Fri)07:37:43 No.106705178

Anonymous 09/26/25(Fri)07:37:43 No.106705178

>>106705167
>H100s
Please don't tell me you're too poor for Blackwell.

Anonymous
09/26/25(Fri)07:39:08 No.106705188

Anonymous 09/26/25(Fri)07:39:08 No.106705188

File: 1756531225817277.png (140 KB, 818x1013)

140 KB PNG

>>106705158
bruh I asked cringegpt, it told me to go intel, but I hate intel

Anonymous
09/26/25(Fri)07:39:35 No.106705193

Anonymous 09/26/25(Fri)07:39:35 No.106705193

>>106705178
Can you even buy enterprise blackwell as a consumer? I think you have to be a company and sign contracts to get them.

Anonymous
09/26/25(Fri)07:39:42 No.106705195

Anonymous 09/26/25(Fri)07:39:42 No.106705195

>>106704988
>>106705112(cont)
Also, from the same article, I just read
>Speed Full duplex
>1.x, 2.0 (32 GT/s):
> 3.938 GB/s (×1)
> 63.015 GB/s (×16)
>3.x (64 GT/s):
> 7.563 GB/s (×1)
> 121.0 GB/s (×16)
Am I misinterpreting the numbers? Is this really what you want? Seems to be, at best, a little over twice as fast as ddr5 but i'm sure much much more expensive and you'll need your software to support it, right?

Anonymous
09/26/25(Fri)07:40:57 No.106705205

Anonymous 09/26/25(Fri)07:40:57 No.106705205

>>106705188
Ask it about 12 channel then

Anonymous
09/26/25(Fri)07:42:46 No.106705216

Anonymous 09/26/25(Fri)07:42:46 No.106705216

>>106705169
fact checked by gpt-oss-120b

Anonymous
09/26/25(Fri)07:42:51 No.106705217

Anonymous 09/26/25(Fri)07:42:51 No.106705217

>>106705195
the point is simply that it's more slots to shove more ram in, not that it's necessarily more performant.
And support is baked in at the OS level, so the software that's running (ideally) doesn't have to be concerned about if it's in the slot or if it's CXL

Anonymous
09/26/25(Fri)07:45:49 No.106705236

Anonymous 09/26/25(Fri)07:45:49 No.106705236

File: 1755815109622200.png (238 KB, 2293x1034)

238 KB PNG

>>106705205
yeah I'm raping sam altman with my request, 2 mins to reply lmao

Anonymous
09/26/25(Fri)07:51:38 No.106705265

Anonymous 09/26/25(Fri)07:51:38 No.106705265

>>106705217
>the point is simply that it's more slots to shove more ram in, not that it's necessarily more performant.
If you're ok with dual-channel ddr5 speeds, sure. I'd still not consider any of the hardware needed in the normie range.
>https://www.aliexpress.com/i/3256808204056898.html?gatewayAdapt=4itemAdapt

Anonymous
09/26/25(Fri)08:10:19 No.106705395

Anonymous 09/26/25(Fri)08:10:19 No.106705395

>>106705193
As far as I can tell you can buy from here as a noncorporate customer >>106704365

Anonymous
09/26/25(Fri)08:10:32 No.106705396

Anonymous 09/26/25(Fri)08:10:32 No.106705396

bros a question, if I currently get 3 t/s on dualchannel, will I get 12t/s if I go octo-channel (suppose the ram is at the same freq./latency)

Anonymous
09/26/25(Fri)08:11:46 No.106705402

Anonymous 09/26/25(Fri)08:11:46 No.106705402

>>106705396
It won't scale perfectly if you're running exps=cpu because your gpu is not speeding up magically

Anonymous
09/26/25(Fri)08:47:56 No.106705670

Anonymous 09/26/25(Fri)08:47:56 No.106705670

>>106705396
my intuition says 10t/s

Anonymous
09/26/25(Fri)09:06:59 No.106705805

Anonymous 09/26/25(Fri)09:06:59 No.106705805

>>106704760
Yeah would be neat to see people try this out for automatically finding the best JB for models.

Anonymous
09/26/25(Fri)09:46:41 No.106706131

Anonymous 09/26/25(Fri)09:46:41 No.106706131

ai is dead
maybe there never was any true ai to begin with

Anonymous
09/26/25(Fri)09:54:08 No.106706189

Anonymous 09/26/25(Fri)09:54:08 No.106706189

>>106700507
> dense model
> someone is disapointed
> moe model
> someone is disapointed

you can't win

Anonymous
09/26/25(Fri)09:57:42 No.106706228

Anonymous 09/26/25(Fri)09:57:42 No.106706228

>>106706131
https://misorobotics.com/
https://www.pudurobotics.com/product/detail/bellabot
?

Anonymous
09/26/25(Fri)10:01:41 No.106706268

Anonymous 09/26/25(Fri)10:01:41 No.106706268

>>106706189
moe with 70b active would make everyone happy

Anonymous
09/26/25(Fri)10:05:11 No.106706299

Anonymous 09/26/25(Fri)10:05:11 No.106706299

>>106706268
>moe with 70b active would make everyone happy
more like make it run piss slow, we would be back to getting it all on the gpu again with that kind of active params
even bigger and slower models, it's a lose lose scenario

Anonymous
09/26/25(Fri)10:13:39 No.106706372

Anonymous 09/26/25(Fri)10:13:39 No.106706372

>>106706268
>>106706299
70B static params so that they fit quanted in 48GB. Then 1-10B routed experts that run from RAM. In this configuration the experts are used simply for knowledge extension rather than as the basis for most of the model's intelligence like current big MoEs are.

Anonymous
09/26/25(Fri)10:15:53 No.106706388

Anonymous 09/26/25(Fri)10:15:53 No.106706388

I want to moe an expert really hard...

Anonymous
09/26/25(Fri)10:19:22 No.106706418

Anonymous 09/26/25(Fri)10:19:22 No.106706418

What are the best models for character impersonation, narration and general scenario roleplay? Not in the erp sense

Anonymous
09/26/25(Fri)10:21:36 No.106706433

Anonymous 09/26/25(Fri)10:21:36 No.106706433

>>106706418
Gemma 27B

Anonymous
09/26/25(Fri)10:29:49 No.106706495

Anonymous 09/26/25(Fri)10:29:49 No.106706495

>>106706433
gemma and the rap stories

Anonymous
09/26/25(Fri)10:36:26 No.106706551

Anonymous 09/26/25(Fri)10:36:26 No.106706551

>>106706495
say what you will say, but the only downside of gemma is that it writes, well... like that
vision and general knowledge and overall chatting experience for its size is still the best

Anonymous
09/26/25(Fri)10:36:35 No.106706554

Anonymous 09/26/25(Fri)10:36:35 No.106706554

>>106704523
Its gonna be huge isn't it.
My old ass pascal cards have to wait like 30 min for a pic with the recent models. Its all over.

Anonymous
09/26/25(Fri)10:38:44 No.106706575

Anonymous 09/26/25(Fri)10:38:44 No.106706575

>>106706189
Not really.
30ba3b is really cool.
Also something like the size of gptoss120b.
Mini Experts so you can offload lots to ram and still get good enough speed.

Anonymous
09/26/25(Fri)10:41:39 No.106706594

Anonymous 09/26/25(Fri)10:41:39 No.106706594

>>106706551
you forgot the safety feeling.
It really makes me feel safe when I use it

Anonymous
09/26/25(Fri)10:42:28 No.106706604

Anonymous 09/26/25(Fri)10:42:28 No.106706604

>>106706575
speaking of which, where are the qwenmax goofs?

Anonymous
09/26/25(Fri)10:43:35 No.106706613

Anonymous 09/26/25(Fri)10:43:35 No.106706613

>>106706594
it didn't refuse me shota on big titty oba-san aside from being really euphemistic and... not wanting to say... you know

Anonymous
09/26/25(Fri)10:44:41 No.106706629

Anonymous 09/26/25(Fri)10:44:41 No.106706629

>>106706613
try cunny next :)

Anonymous
09/26/25(Fri)10:45:45 No.106706647

Anonymous 09/26/25(Fri)10:45:45 No.106706647

>>106706629
nyo~

Anonymous
09/26/25(Fri)10:47:06 No.106706659

Anonymous 09/26/25(Fri)10:47:06 No.106706659

Gemma 3 27B is the best for non-ERP unless you go for V3-tier models. You literally don't need more if you're not a coomer

Anonymous
09/26/25(Fri)10:51:05 No.106706695

Anonymous 09/26/25(Fri)10:51:05 No.106706695

>>106706659
>SWA
lmao

Anonymous
09/26/25(Fri)10:52:47 No.106706714

Anonymous 09/26/25(Fri)10:52:47 No.106706714

File: 1746908396634024.png (1.31 MB, 1428x1922)

1.31 MB PNG

https://xcancel.com/bdsqlsz/status/1971448657011728480#m
>4x qwen image = 80b
just letting you know that they're also starting do to some ParametersMaxxing on the image side as well kek

Anonymous
09/26/25(Fri)10:54:40 No.106706732

Anonymous 09/26/25(Fri)10:54:40 No.106706732

File: gemma3-escalation.png (1.02 MB, 4136x833)

1.02 MB PNG

>>106706629
Define and/or use your favorite words in your instructions and it will use them organically in the RP. Don't just tell it "be vulgar" or "use dirty words".

Anonymous
09/26/25(Fri)11:03:32 No.106706804

Anonymous 09/26/25(Fri)11:03:32 No.106706804

>>106706714
4 times the parameters, still struggling to match a simple finetuned SDXL...

Anonymous
09/26/25(Fri)11:05:25 No.106706820

Anonymous 09/26/25(Fri)11:05:25 No.106706820

>>106706804
imagine if people bothered to finetune language models like that.

Anonymous
09/26/25(Fri)11:08:28 No.106706848

Anonymous 09/26/25(Fri)11:08:28 No.106706848

>>106706695
You're poor

Anonymous
09/26/25(Fri)11:11:15 No.106706879

Anonymous 09/26/25(Fri)11:11:15 No.106706879

>>106706433
>>106706659
Thanks, I'll try it out.

Anonymous
09/26/25(Fri)11:40:20 No.106707130

Anonymous 09/26/25(Fri)11:40:20 No.106707130

bartowski-Qwen_Qwen3-235B-A22B-Instruct-2507-IQ2_S is ChatGPT at home. It's extremely good.

Anonymous
09/26/25(Fri)11:42:38 No.106707154

Anonymous 09/26/25(Fri)11:42:38 No.106707154

>>106707130
>IQ2_S
how is that not lobotomy?

Anonymous
09/26/25(Fri)11:44:29 No.106707172

Anonymous 09/26/25(Fri)11:44:29 No.106707172

>>106705193
Making a corporation isn't actually that hard either. Like you don't need a lawyer or anything.

Anonymous
09/26/25(Fri)11:45:12 No.106707182

Anonymous 09/26/25(Fri)11:45:12 No.106707182

>>106707154
Fuck if I know, it works, and it's substantially better than everything else I used for code review/generation and general assistant tasks. A lot better than GPT-OSS, Mistral Large and GLM Air, as well as many different dense 70Bs. T didn't do enough ERP to come to conclusion there yet, but it feels as an upgrade so far too.

Anonymous
09/26/25(Fri)12:10:22 No.106707413

Anonymous 09/26/25(Fri)12:10:22 No.106707413

>>106706714
if the images on the promotion shit are samples from it then it still looks very slopped

Anonymous
09/26/25(Fri)12:18:11 No.106707479

Anonymous 09/26/25(Fri)12:18:11 No.106707479

File: bartowski_vs_unsloth.png (53 KB, 1164x447)

53 KB PNG

>>106707130
Really unfortunate that ever since Unsloth and the uberguy made their Deepseek quants they've overtaken bartowski, even though bartowski doesn't reupload the same quant three times a week, shill his work, doesn't require pull requests to support his latest quant (look at this retardation: https://github.com/kvcache-ai/ktransformers/issues/1195#issuecomment-2830402529), or do shady shit. Plus, his quants are actually better. The reason this all happened is that Deepsex was so good and huge that even if you quant it down to 2bits it was still useful, but everyone on reddit believed this is because of some magic done by the Unsloth team. So now everyone just assumes that the Unsloth/ubergarm guy are wizards when it was just Deepseek being resistant to lobotomy.
https://huggingface.co/bartowski/Qwen_Qwen3-235B-A22B-Instruct-2507-GGUF/discussions/1

Anonymous
09/26/25(Fri)12:19:29 No.106707497

Anonymous 09/26/25(Fri)12:19:29 No.106707497

>>106706372
How do we fund this?

Anonymous
09/26/25(Fri)12:35:58 No.106707664

Anonymous 09/26/25(Fri)12:35:58 No.106707664

>>106707479
My anecdotal testing agrees with this desu. I feel that Bartowski's quants are a tiny bit less repetitive, have a bit better intelligence, and are even a tiny bit less slopped, compared to Unsloth and Mradermacher. I don't know about ubergarm, I haven't bothered trying ik_llama. Tested both Gemma and Qwen models. Surprisingly I saw differences even at Q6. I think imatrix has a larger influence than might be expected from benchmarks/metrics like these.

Anonymous
09/26/25(Fri)13:04:54 No.106707938

Anonymous 09/26/25(Fri)13:04:54 No.106707938

>>106704760
I will add it to my backlog.

Anonymous
09/26/25(Fri)13:07:39 No.106707963

Anonymous 09/26/25(Fri)13:07:39 No.106707963

File: paddle.png (1.11 MB, 832x1216)

1.11 MB PNG

Anybody in here have finetuning advice you could give me? I have a 3200~ line dataset of samples from my fiction and a discord server, and my intention is to capture the voice of a specific character. However, my current finetuning experiments have yielded pretty incoherent results. I've experimented with Unsloth before but those were extremely small, POC style tunes where I was basically just proving I could use it. I'm also open to using AWS because I have some credits (inb4 this is /lmg/, yes I know) but ultimately, where I finetune or what model I finetune doesn't seem to matter. Since I am working with such a relatively small dataset by comparison to a big finetuning job, and we don't want the model to learn format so much as response tone/style, do you guys have any tips on formatting datasets? For reference, I've tried in two different ways--one where the user input is a short multi-line snippet of existing conversation history to one line of assistant response, and one where it is simply one line of conversation history to one line of assistant response. Neither of these has really seemed to make a huge difference, nor has playing with parameters like epochs.

Generally speaking, is there a recommended format for a dataset where we're just trying to finetune for tone? I've been kind of looking around but the guides I've found have seemed extremely old, or they were using examples of characters that would have been already baked into the model they were using (Rick Sanchez, for instance), which produces misleading results.

Anonymous
09/26/25(Fri)13:16:02 No.106708050

Anonymous 09/26/25(Fri)13:16:02 No.106708050

ZAMN!
Another one:
https://github.com/lyogavin/airllm
RAMlets are eating good these days (if they havent starved to death while waiting for the response)

Anonymous
09/26/25(Fri)13:21:59 No.106708102

Anonymous 09/26/25(Fri)13:21:59 No.106708102

>>106708050
>AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run 405B Llama3.1 on 8GB vram now.
MoE is pointless now

Anonymous
09/26/25(Fri)13:24:21 No.106708124

Anonymous 09/26/25(Fri)13:24:21 No.106708124

>>106708050
Can't you already do the same with mmap and llama.cpp?

Anonymous
09/26/25(Fri)13:29:44 No.106708171

Anonymous 09/26/25(Fri)13:29:44 No.106708171

>>106708124
don't forget oLLM!
I still havent found any time to even try one of these. But it would be pretty cool if it scales linear. Like running GLM4.5 fp8 on a single 3090 and getting 1t/sec or more

Anonymous
09/26/25(Fri)13:33:05 No.106708196

Anonymous 09/26/25(Fri)13:33:05 No.106708196

>>106707479
IQK quants are better than IQ or regular K quants in PPL and Ubergarm is the only one who consistently makes those quants.

Anonymous
09/26/25(Fri)13:33:47 No.106708203

Anonymous 09/26/25(Fri)13:33:47 No.106708203

File: Screenshit.png (2 KB, 120x28)

2 KB PNG

>>106708050

Anonymous
09/26/25(Fri)13:43:37 No.106708275

Anonymous 09/26/25(Fri)13:43:37 No.106708275

>>106708203
https://github.com/Mega4alik/ollm
better?

Anonymous
09/26/25(Fri)13:55:36 No.106708378

Anonymous 09/26/25(Fri)13:55:36 No.106708378

>>106701146
>30% loss at FP4
Is it FP4 that is bad or is Kimi just a badly quantable model? It often makes dumb mistakes at Q5 when I run it locally, but I don't know if it's the model or the quant.

Anonymous
09/26/25(Fri)14:07:17 No.106708455

Anonymous 09/26/25(Fri)14:07:17 No.106708455

>>106708378
Go use it over openrouter with moonshotai as the provider and see for yourself.

Anonymous
09/26/25(Fri)14:10:52 No.106708477

Anonymous 09/26/25(Fri)14:10:52 No.106708477

>>106701146
>>106708378
https://deepinfra.com/moonshotai/Kimi-K2-Instruct-0905
Deepinfra claims to run FP4 and they're one of the 96%ers
The shitter providers are either running like 1.5 bit quants or they have deeply screwed the pooch with configuration

Anonymous
09/26/25(Fri)14:17:02 No.106708527

Anonymous 09/26/25(Fri)14:17:02 No.106708527

File: Screenshot_20250926-141535.png (257 KB, 495x579)

257 KB PNG

>>106700424
I bought Strix Halo for this.

None of you men know my pain, the pain of the poop mage.

Anonymous
09/26/25(Fri)14:18:36 No.106708539

Anonymous 09/26/25(Fri)14:18:36 No.106708539

>be you
>Dabbling in local models
>Find github with promising tech
>Repo last update 3 years ago

Every fucking time.

Anonymous
09/26/25(Fri)14:19:28 No.106708547

Anonymous 09/26/25(Fri)14:19:28 No.106708547

>>106708539
it just means its feature complete and stable.

Anonymous
09/26/25(Fri)14:20:48 No.106708559

Anonymous 09/26/25(Fri)14:20:48 No.106708559

>>106703052
>Waaah waah
>I need an AI to read research documents for me because I'm a retard and never learned how to do my own research

The actual state of deep research.

How about you figure out how to tune a niche model in your field, and create original results, instead of hoping that some grad student Grant grinders published a paper that MAYBE brushed your topic with retard heavy strokes years ago?

Anonymous
09/26/25(Fri)14:21:25 No.106708567

Anonymous 09/26/25(Fri)14:21:25 No.106708567

>>106708539
>>Find github with promising tech
there are like four or five things on github that are actually useful depending on your use case for local models and that's it

Anonymous
09/26/25(Fri)14:22:50 No.106708577

Anonymous 09/26/25(Fri)14:22:50 No.106708577

goofbros...

Anonymous
09/26/25(Fri)14:24:58 No.106708596

Anonymous 09/26/25(Fri)14:24:58 No.106708596

It's over

Anonymous
09/26/25(Fri)14:29:39 No.106708629

Anonymous 09/26/25(Fri)14:29:39 No.106708629

V3.1-Terminus killed(or better said terminated) all the hype I had for R2. It addressed none of the issues of V3.1, it still begins everything with "great question", "you're absolutely right", "of course" and other slop. It still is boring and positive. It still can't properly think if the problem is not math or code. Looks like DS decided to go back to OG V3 and V2 dryness for some reason and R1 was just a lucky accident.

Anonymous
09/26/25(Fri)14:37:11 No.106708679

Anonymous 09/26/25(Fri)14:37:11 No.106708679

>>106708629
They're doing it just to troll you

Anonymous
09/26/25(Fri)14:43:18 No.106708732

Anonymous 09/26/25(Fri)14:43:18 No.106708732

>>106708679
But that's simply not possible. And I don't find this funny anymore.

Anonymous
09/26/25(Fri)14:45:22 No.106708748

Anonymous 09/26/25(Fri)14:45:22 No.106708748

>>106708629
>wow why is a bugfixing release not addressing problems specific to ME ME ME

Anonymous
09/26/25(Fri)14:47:27 No.106708768

Anonymous 09/26/25(Fri)14:47:27 No.106708768

>>106708748
Why isn't it?

Anonymous
09/26/25(Fri)14:47:52 No.106708771

Anonymous 09/26/25(Fri)14:47:52 No.106708771

>>106708196
But that requires me to run ik_llama and fuck that fork.

Anonymous
09/26/25(Fri)15:19:50 No.106709053

Anonymous 09/26/25(Fri)15:19:50 No.106709053

File: COMFY-MIKUS-Taste-The-Flavor.png (1.87 MB, 1024x1024)

1.87 MB PNG

Anonymous
09/26/25(Fri)15:25:11 No.106709087

Anonymous 09/26/25(Fri)15:25:11 No.106709087

i hate miku

Anonymous
09/26/25(Fri)15:27:41 No.106709111

Anonymous 09/26/25(Fri)15:27:41 No.106709111

>>106707130
Huh, this actually fucks. It's just small enough for a 64gb/12gb system. Writes better than glm-air, but has slightly worse trivia knowledge.

Anonymous
09/26/25(Fri)15:34:37 No.106709165

Anonymous 09/26/25(Fri)15:34:37 No.106709165

Interesting. Llama.cpp has been hanging and taking a while to die on windows to the point where you can't even kill the process.
Maybe I should pull and rebuild. It's been a week or so since I last id hat.

Anonymous
09/26/25(Fri)15:40:58 No.106709204

Anonymous 09/26/25(Fri)15:40:58 No.106709204

File: The Beast Arrives [Xr98K8(...).jpg (50 KB, 1920x1080)

50 KB JPG

>>106709087
https://youtu.be/Xr98K8dyLy4?t=137

Anonymous
09/26/25(Fri)16:07:52 No.106709390

Anonymous 09/26/25(Fri)16:07:52 No.106709390

>>106709111
For me it's significantly worse. The interesting thing is that Qwen likes to use the trivia it does know more. Like, when you're doing an RP within an established universe, it'll more likely bring up stuff from it (not just what's in the card/lorebook), while GLM and other models will less often bring up references like that. However it will also often hallucinate facts.

Anonymous
09/26/25(Fri)16:16:59 No.106709459

Anonymous 09/26/25(Fri)16:16:59 No.106709459

>>106709390
At first I thought it was pretty bad too, but then I gave it a single turn of a example and it somehow got it right after that. It needs a little more help at first to get going. Of course, it's still lobotomized to shit at q2, but hey, it works, kinda.

Anonymous
09/26/25(Fri)16:19:24 No.106709474

Anonymous 09/26/25(Fri)16:19:24 No.106709474

>>106709459
>>106709390
The model itself is good, except for the fact that it is a CCP shill. If that were to be removed, it would be basically perfect.

Anonymous
09/26/25(Fri)16:30:09 No.106709574

Anonymous 09/26/25(Fri)16:30:09 No.106709574

>>106704826
no sir, that would be writing events for hoi4 modding

Anonymous
09/26/25(Fri)16:39:56 No.106709682

Anonymous 09/26/25(Fri)16:39:56 No.106709682

>>106708559
Someone asked how to extract definitions from JanitorAI. I think you can do that by gaslighting the model. It would be cool to use local deep research to collect the answer from Reddit or something when I don't care enough to research about it myself.

Anonymous
09/26/25(Fri)16:52:44 No.106709802

Anonymous 09/26/25(Fri)16:52:44 No.106709802

>Generally, imatrix is not recommended for Q4 and larger quants. The problem with imatrix is that it will guide what model remembers, while anything not covered by the text sample used to generate the imartrix is more likely to be forgotten. For example, an imatrix derived from wikipedia sample is likely to negatively affect tasks like coding. In other words, while imatrix can improve specific benchmarks, that are similar to the imatrix input sample, it will also skew the model performance towards tasks similar to the imatrix sample at the expense of other tasks.
I heckin' love wikitext! Did you see how my imatrix quant decreases perplexity on wikitext?

Anonymous
09/26/25(Fri)16:53:48 No.106709810

Anonymous 09/26/25(Fri)16:53:48 No.106709810

https://blog.novelai.net/text-model-release-introducing-glm-4-5-untuned-preview-for-novelai-opus-4aa866c8a0d5
NovelAI just destroyed local.

Anonymous
09/26/25(Fri)16:53:56 No.106709814

Anonymous 09/26/25(Fri)16:53:56 No.106709814

>>106709802
Where's that from?

Anonymous
09/26/25(Fri)16:58:52 No.106709861

Anonymous 09/26/25(Fri)16:58:52 No.106709861

>>106709810
>untuned GLM 4.5
don't you mean zhipu destroyed local?

Anonymous
09/26/25(Fri)17:04:52 No.106709921

Anonymous 09/26/25(Fri)17:04:52 No.106709921

>>106709810
That's a great idea. It doesn't take much compute at all to tune since it's a MoE.
Smart.

Anonymous
09/26/25(Fri)17:06:31 No.106709936

Anonymous 09/26/25(Fri)17:06:31 No.106709936

1TB RAM
2TB NVME SSD
REJECT QUANTIZATION
EMBRACE OFFLOADING

Anonymous
09/26/25(Fri)17:10:53 No.106709980

Anonymous 09/26/25(Fri)17:10:53 No.106709980

>>106709810
this is unironically good news desu. no one is really investing the time to make creative finetunes anymore, they only benchmaxx. if this is actually good then it proves that it's possible and jealous localfags might eventually do something, just like the nai leak eventually lead to ponyxl etc
if it's shit then it's probably game over but I guess they helped us avoid wasting money on finetuning

Anonymous
09/26/25(Fri)17:12:22 No.106709992

Anonymous 09/26/25(Fri)17:12:22 No.106709992

>>106709810
calm down nigga we dont even have the finetune yet

Anonymous
09/26/25(Fri)17:12:25 No.106709993

Anonymous 09/26/25(Fri)17:12:25 No.106709993

>>106709810
>untuned
so people are now paying for something I run for free on my $6k server?

Anonymous
09/26/25(Fri)17:18:37 No.106710053

Anonymous 09/26/25(Fri)17:18:37 No.106710053

>>106709980
You're skipping Erato for some reason.

Anonymous
09/26/25(Fri)17:18:47 No.106710057

Anonymous 09/26/25(Fri)17:18:47 No.106710057

>>106709980
this guy don't know about chub soji's failure

Anonymous
09/26/25(Fri)17:23:06 No.106710104

Anonymous 09/26/25(Fri)17:23:06 No.106710104

>>106709936
How many weeks you gonna be comfortable waiting for Qwen 4 to finish getting through its 1 million token reasoning budget?

Anonymous
09/26/25(Fri)17:24:15 No.106710114

Anonymous 09/26/25(Fri)17:24:15 No.106710114

Lately I have the strange tendency to interrupt the roleplay to have all characters break out into a musical number. If you provide the lyrics, it works exceedingly well.

Anonymous
09/26/25(Fri)17:25:38 No.106710126

Anonymous 09/26/25(Fri)17:25:38 No.106710126

>>106710104
I'm fine with a good answer in a few days. Better than a shit useless one.

Anonymous
09/26/25(Fri)17:30:24 No.106710178

Anonymous 09/26/25(Fri)17:30:24 No.106710178

>>106710126
>a few days
Dude, even if you manage to get 1 t/s running off SSD and have that speed constant up to 1 million tokens worth on context (which isn't happening) 1 million seconds is 11.5 days minimum.
More likely you'll be waiting half a year just to get to </think>.

Anonymous
09/26/25(Fri)17:43:09 No.106710285

Anonymous 09/26/25(Fri)17:43:09 No.106710285

>>106709204
I love how her real voice is beyond sound.

Anonymous
09/26/25(Fri)17:50:00 No.106710347

Anonymous 09/26/25(Fri)17:50:00 No.106710347

>>106710057
>chub soji's failure
wdym? it's better than 0324

Anonymous
09/26/25(Fri)17:53:22 No.106710369

Anonymous 09/26/25(Fri)17:53:22 No.106710369

>>106710178
>muh reasoning
no thanks. instruct is all you need. reasoning models are just for retards who can't build agentic flows.

Anonymous
09/26/25(Fri)17:57:05 No.106710405

Anonymous 09/26/25(Fri)17:57:05 No.106710405

>>106710369
Ok, but that's not where the industry is trending. By next year, you'll have as many new non-Reasoning models to choose from as you have new 70-120B dense models today.

Anonymous
09/26/25(Fri)17:58:28 No.106710417

Anonymous 09/26/25(Fri)17:58:28 No.106710417

>>106708527
>He is like a green lantern but with poop instead of green ring powers.

I fucking died. This is gold.

Anonymous
09/26/25(Fri)18:00:51 No.106710432

Anonymous 09/26/25(Fri)18:00:51 No.106710432

>>106710405
that's fine. qwen3 235b vl is all I sneed.

Anonymous
09/26/25(Fri)18:03:14 No.106710452

Anonymous 09/26/25(Fri)18:03:14 No.106710452

Why aren't there more bitnet models? Is there some sort of problem with the architecture?

Anonymous
09/26/25(Fri)18:05:37 No.106710474

Anonymous 09/26/25(Fri)18:05:37 No.106710474

>>106710452
being suppressed by threats from nvidia

Anonymous
09/26/25(Fri)18:07:48 No.106710496

Anonymous 09/26/25(Fri)18:07:48 No.106710496

>>106710452
Yes, money issue

Anonymous
09/26/25(Fri)18:13:20 No.106710539

Anonymous 09/26/25(Fri)18:13:20 No.106710539

>>106710496
>we have been allocated a compute budget to train a model for research
>should we try something new and experimental?
>nah too risky, let's do yet another llama 2 sidegrade

Anonymous
09/26/25(Fri)18:14:03 No.106710547

Anonymous 09/26/25(Fri)18:14:03 No.106710547

>>106710452
I think the model doesn't take as much vram as the optimizer and activations and what ever other shit it has to do. so it actually cost nearly just as much to train a model with bitnet as it does fp16 or whatever they actually use. so its only an inference time benefit. and most the ai companies want to make money by hosting something that you cannot run on your consumer gpu, so naturally there isn't alot of interest.

Anonymous
09/26/25(Fri)18:15:33 No.106710564

Anonymous 09/26/25(Fri)18:15:33 No.106710564

no more releases it's joever

Anonymous
09/26/25(Fri)18:15:33 No.106710565

Anonymous 09/26/25(Fri)18:15:33 No.106710565

>>106710432
guffs never ever

Anonymous
09/26/25(Fri)18:17:18 No.106710582

Anonymous 09/26/25(Fri)18:17:18 No.106710582

>>106710565
vibecoders are on it

Anonymous
09/26/25(Fri)18:17:36 No.106710586

Anonymous 09/26/25(Fri)18:17:36 No.106710586

File: at02fj.png (46 KB, 860x182)

46 KB PNG

To think or not to think?

Anonymous
09/26/25(Fri)18:17:57 No.106710592

Anonymous 09/26/25(Fri)18:17:57 No.106710592

>>106710452
mostly pointless when you don't have the hardware capable of doing ternary stuff efficiently

Anonymous
09/26/25(Fri)18:21:12 No.106710628

Anonymous 09/26/25(Fri)18:21:12 No.106710628

>>106710592
No it's not. The model being smaller gives you a massive benefit. Even if you're thinking only of speed, making the model smaller makes inference faster because of memory bandwidth limits.

Of course this has the biggest effect on local which no one cares about. On cloud it's more limited by compute and space for activations, which is why everything is now a bloated MoE.

Anonymous
09/26/25(Fri)18:22:15 No.106710634

Anonymous 09/26/25(Fri)18:22:15 No.106710634

>>106710547
>and most the ai companies want to make money by hosting something that you cannot run on your consumer gpu, so naturally there isn't alot of interest.
Most SOTA models are fuckhuge and everyone is planning to go huger. No one is running Kimi K2 on consumer gpu so it would do nothing but save them money on API hosting costs until specialized ternary hardware comes out and who knows how long that would take.

Anonymous
09/26/25(Fri)18:24:49 No.106710647

Anonymous 09/26/25(Fri)18:24:49 No.106710647

>>106710452
https://arxiv.org/abs/2501.02423
The more data you train models on, the worse they quantize.

Anonymous
09/26/25(Fri)18:29:14 No.106710672

Anonymous 09/26/25(Fri)18:29:14 No.106710672

>>106710647
Also see:

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
https://arxiv.org/abs/2411.17691

Scaling Laws for Precision
https://arxiv.org/abs/2411.04330v2

Anonymous
09/26/25(Fri)18:41:39 No.106710777

Anonymous 09/26/25(Fri)18:41:39 No.106710777

K2 is nice for chatting but it's really bad at acting as a narrator/storyteller. I swear Mistral Large 2 was more proactive and willing to try to do interesting things than this model. Every reply is just assistant-slop in narrator form where it addresses exactly the things I mentioned in my last reply and that's it no matter how much I beg it to get off its ass and do anything interesting.

Anonymous
09/26/25(Fri)18:57:13 No.106710882

Anonymous 09/26/25(Fri)18:57:13 No.106710882

>>106709802
Isn't this exactly what I said forever ago. It's basically a roundabout way of 'finetuning'

Anonymous
09/26/25(Fri)19:18:11 No.106711022

Anonymous 09/26/25(Fri)19:18:11 No.106711022

File: 1749260538989159.jpg (20 KB, 960x229)

20 KB JPG

>>106709802
Who are you quoting?

Anonymous
09/26/25(Fri)19:27:43 No.106711095

Anonymous 09/26/25(Fri)19:27:43 No.106711095

File: file.png (111 KB, 893x732)

111 KB PNG

>>106710882
It's not. You actually said the same thing as DavidAU did, you should feel bad and ashamed of yourself.

Anonymous
09/26/25(Fri)19:31:34 No.106711124

Anonymous 09/26/25(Fri)19:31:34 No.106711124

File: Screenshot_20250926_191855.png (599 KB, 1507x1116)

599 KB PNG

at least it was close

Anonymous
09/26/25(Fri)19:59:29 No.106711314

Anonymous 09/26/25(Fri)19:59:29 No.106711314

>>106711095
This is so cunfusing. I just want a goddamn GGUF file!!

Anonymous
09/26/25(Fri)20:39:37 No.106711552

Anonymous 09/26/25(Fri)20:39:37 No.106711552

File: AI Pepe.png (214 KB, 384x373)

214 KB PNG

he's making fun of you /lmg/
https://youtu.be/a7TOameRqoY

Anonymous
09/26/25(Fri)21:05:29 No.106711694

Anonymous 09/26/25(Fri)21:05:29 No.106711694

>>106696286
Please report back, I'm curious.

Anonymous
09/26/25(Fri)22:06:19 No.106712051

Anonymous 09/26/25(Fri)22:06:19 No.106712051

>>106711552
is it me or do his jokes feel like they were written by chatgpt?

Anonymous
09/26/25(Fri)22:27:50 No.106712191

Anonymous 09/26/25(Fri)22:27:50 No.106712191

>>106709810
Dunno, I think glm 4.5 is big enough and knowledgeable enough to not need a finetune. It remains to be seen if it will actually be that much better.

Anonymous
09/26/25(Fri)22:49:24 No.106712310

Anonymous 09/26/25(Fri)22:49:24 No.106712310

File: 1733330596651059.png (46 KB, 498x398)

46 KB PNG

Why do we still use PPL for measuring quantization effectiveness instead of benchmarks? PPL only measures memorization, not generalization. It also doesn't make sense to use PPL if the dataset isn't in the training set (it's like using BLEU)

>>106710672
Thanks for the links

Anonymous
09/26/25(Fri)23:22:59 No.106712488

Anonymous 09/26/25(Fri)23:22:59 No.106712488

>>106712310
Torse

Anonymous
09/26/25(Fri)23:38:27 No.106712576

Anonymous 09/26/25(Fri)23:38:27 No.106712576

>>106712310
PPL is nice and simple and you don't have to go and deal with benchmark software. It's viewed as close enough to measuring loss in general intelligence or generalization since there's no theory or evidence that quantization hurts general intelligence at some different rate than memorization. Plus there's also the fact that most benchmarks are unreliable, have wide margins of error, don't actually measure "general intelligence", and are benchmaxxed anyway even if they were direct measures.

Anonymous
09/27/25(Sat)01:11:27 No.106713120

Anonymous 09/27/25(Sat)01:11:27 No.106713120

>>106700871
>deepseek
>111b model (at most) with broken attention
Hm? It's 37B active. And what broken attention?

Anonymous
09/27/25(Sat)01:12:02 No.106713124

Anonymous 09/27/25(Sat)01:12:02 No.106713124

>>106712576
>there's no proof these things are the same or different, so we can assume they're the same
big brain take

Anonymous
09/27/25(Sat)01:13:39 No.106713132

Anonymous 09/27/25(Sat)01:13:39 No.106713132

>>>/vg/540527582
>THIS GENERAL IS INFESTED BY CHATBOTS TRAINED ON 4CHAN POSTS!
>>>/vg/540529936
>Ok. I am quantizing the model using AWQ right now, so it becomes very lightweight. Will launch in 30 minutes. Let's see how the quantized model behaves.
This is what lmg-anon does nowadays.

Anonymous
09/27/25(Sat)01:23:40 No.106713188

Anonymous 09/27/25(Sat)01:23:40 No.106713188

File: gross.jpg (5 KB, 250x201)

5 KB JPG

>>106713132
>/vg/
that's like linking to /mlp/

Anonymous
09/27/25(Sat)01:25:50 No.106713196

Anonymous 09/27/25(Sat)01:25:50 No.106713196

>>106712051
If you look at Alien Earth for example (that's tv) it's somewhat obvious they are using le chatgpt or something to create all these"wacky" plot points. It feels bit engineered but that's just me of course.
Same goes for youtubers, especially the smaller ones who need to spam out "content" in order to keep up with the algorithm.
I mean it's more common than you think.

Anonymous
09/27/25(Sat)01:46:21 No.106713312

Anonymous 09/27/25(Sat)01:46:21 No.106713312

>>106713196
soon we'll be living in a chatgpt world.. le sigh.. altman u bloody bastard

Anonymous
09/27/25(Sat)01:53:46 No.106713351

Anonymous 09/27/25(Sat)01:53:46 No.106713351

>>106713124
Who are you quoting? There are things called implicit arguments or hidden premises. Sorry if they weren't clear from my writing. The hidden premise is that the argument of "oh well there's no evidence" comes after the limited evidence that they do get affected at similar rates. And that's why people thought of using PPL in the first place. The issue is that the evidence in question is benchmarks, which kind of sucks, so they are in practice not a proof. But they give us some reason to believe in the idea that general intelligence and memorization are parallel in transformer models. We would be more confident in the opposite being true if there was strong proof. Since there is no strong proof of the opposite being true, "It's viewed as close enough to measuring loss in general intelligence or generalization".

Anonymous
09/27/25(Sat)03:24:08 No.106713730

Anonymous 09/27/25(Sat)03:24:08 No.106713730

What would you say a low point in this hobby was? For me it was when deepseek r1 was first released and they made those distills on smaller models and a bunch of tourists were getting the distills completely confused for the full fat model.

I didn't think I was autistic until that happened I got austistically angry.

Anonymous
09/27/25(Sat)03:32:21 No.106713763

Anonymous 09/27/25(Sat)03:32:21 No.106713763

>>106713730
Probably around the same time, but mostly because of the dramafaggotry flooding the threads at the time, that's now ignored whenever bait does get posted (probably because the thread isn't active enough in general anymore; it's dead, and that's a good thing).

Anonymous
09/27/25(Sat)03:40:55 No.106713797

Anonymous 09/27/25(Sat)03:40:55 No.106713797

>>106713730
+ but for me it was seeing how fucking gimped everyone was implimenting it even nowadays literally every single provider is severily worse then when deepseek actually served their and i fucking mean it the gap is still fucking huge its roughly the difference between v3 and r1 themselfes

besides that idk nothing bad else that really happened except ofcourse the thread's international agent being a niggerfaggot like usual everything else is going great really discounting hardware prices but eh.... patience is a virtue give it some time really not much bad can be said i think a model flops here or there but then another gets released literally 2 weeks later so not like it matters

Anonymous
09/27/25(Sat)03:45:40 No.106713819

Anonymous 09/27/25(Sat)03:45:40 No.106713819

shill me a 12B model capable of tool calling

Anonymous
09/27/25(Sat)04:28:38 No.106713985

Anonymous 09/27/25(Sat)04:28:38 No.106713985

how to get a novelai like storywriting interface? sillytavern is good but its more like a character card chat bot

Anonymous
09/27/25(Sat)04:32:05 No.106714004

Anonymous 09/27/25(Sat)04:32:05 No.106714004

>>106713985
mikupad?

Anonymous
09/27/25(Sat)04:33:13 No.106714011

Anonymous 09/27/25(Sat)04:33:13 No.106714011

>>106713730
The context extension + superhot spam during the Llama 1 days.

Anonymous
09/27/25(Sat)04:50:20 No.106714092

Anonymous 09/27/25(Sat)04:50:20 No.106714092

It feels like local models have become significantly smarter over the past year and yet somehow became less 'smart' at the same time. A very worrying trend.

Anonymous
09/27/25(Sat)04:54:03 No.106714104

Anonymous 09/27/25(Sat)04:54:03 No.106714104

>>106714092
Just like the people after the cold war ended amirite

Anonymous
09/27/25(Sat)05:02:11 No.106714135

Anonymous 09/27/25(Sat)05:02:11 No.106714135

>>106714092
More code, math and safety focus, less of anything else

Anonymous
09/27/25(Sat)05:33:28 No.106714299

Anonymous 09/27/25(Sat)05:33:28 No.106714299

File: fb0b88a3c6069486a051bbf56(...).jpg (370 KB, 1561x1910)

370 KB JPG

>>106713730
Yeah that's probably it. R1 had the allure of challenging western SOTA that social media jockeys lapped up
>OMG run the bestest AI on your raspi with ollama **basedface** xdxd
Models continue to improve, I will realise my perfect waifu eventually.

Anonymous
09/27/25(Sat)06:49:15 No.106714741

Anonymous 09/27/25(Sat)06:49:15 No.106714741

This is truly the end of local models

Anonymous
09/27/25(Sat)07:04:36 No.106714835

Anonymous 09/27/25(Sat)07:04:36 No.106714835

Is there something to keeping as much of a model in vram and coherency going hand in hand? It seems after i've picked up my rtx 5000 series card, this same model that was having a few hitches here and there with quality is suddenly out of this fucking world perfect. Maybe the architecture jump helped? I genuinely don't get it, it's night and day.

Anonymous
09/27/25(Sat)07:11:30 No.106714880

Anonymous 09/27/25(Sat)07:11:30 No.106714880

>>106714835
I always thought it was the opposite, cpus have better floating point precision then gpus.

Anonymous
09/27/25(Sat)07:12:53 No.106714885

Anonymous 09/27/25(Sat)07:12:53 No.106714885

>>106714835
It's placebo

Anonymous
09/27/25(Sat)07:16:49 No.106714916

Anonymous 09/27/25(Sat)07:16:49 No.106714916

>>106714885
what did he mean by this?.assistant

Anonymous
09/27/25(Sat)07:47:10 No.106715099

Anonymous 09/27/25(Sat)07:47:10 No.106715099

>>106714835
could be a lot of things. I thought local models sucked for a long time just because I had mess up settings. I wasnt matching context windows in kobold and realized I was fucking up the llm every time lol

anyways placebo. GPU dont matter. Faster speeds can negate frustrations with bad gens tho.

Anonymous
09/27/25(Sat)07:51:01 No.106715125

Anonymous 09/27/25(Sat)07:51:01 No.106715125

>>106715099
I'm pretty sure that's what's going on, combination of the settings finally working right and not having to wait actual minutes to get less than 1k tokens generated makes this a lot quicker/easier.
dunno though, it used to go genuinely schizo very quickly and now i'm well over 100 messages in, across different characters, its never been this good.

either way, never going back to a less than 10T/s speed again that's for sure.

Anonymous
09/27/25(Sat)07:54:10 No.106715151

Anonymous 09/27/25(Sat)07:54:10 No.106715151

File: terry davis reverse smile.gif (2.87 MB, 200x234)

2.87 MB GIF

>But now this means i'm seeing the tropes at over 10t/s
>I won't bite.. unless you ask nicely

Anonymous
09/27/25(Sat)08:04:59 No.106715203

Anonymous 09/27/25(Sat)08:04:59 No.106715203

>>106715151
Honestly I like that phrase in particular, I don't mind that one repeating
>>106713730
Trying to jerk off with GPT2 in dungeon ai was a pretty low point in my life in general
But yeah, the R1 debacle sucked, people unironically expecting to run the full model in a raspi

Anonymous
09/27/25(Sat)08:11:33 No.106715238

Anonymous 09/27/25(Sat)08:11:33 No.106715238

>>106713730
>>106715203
I remember one IRL friend telling me he could run R1 on his machine and me going o_O

Anonymous
09/27/25(Sat)08:25:24 No.106715310

Anonymous 09/27/25(Sat)08:25:24 No.106715310

I wish local could actually compete

Anonymous
09/27/25(Sat)08:55:44 No.106715509

Anonymous 09/27/25(Sat)08:55:44 No.106715509

is lcpp ever gonna support Qwen3-VL-235B-A22B-Instruct? what do i use instead?

Anonymous
09/27/25(Sat)08:58:52 No.106715530

Anonymous 09/27/25(Sat)08:58:52 No.106715530

>>106715509
>is lcpp ever gonna support Qwen3-VL-235B-A22B-Instruct?
no
>what do i use instead?
Nemo

Anonymous
09/27/25(Sat)09:03:58 No.106715569

Anonymous 09/27/25(Sat)09:03:58 No.106715569

>do some machine translation task
>GLM 4.5 and Qwen3-235B-A22B-Instruct-2507 at Q4 stumble, add/remove lines etc.
>Deepseek R1, even with disabled thinking and at retard Q1 quant does it perfectly every time
how did they do it? I like testing LLMs for different use-cases, every now and then i try out the latest and greatest models, but the result is always the same, R1 is king.

Anonymous
09/27/25(Sat)09:44:37 No.106715830

Anonymous 09/27/25(Sat)09:44:37 No.106715830

File: fake teto babu.png (542 KB, 652x886)

542 KB PNG

>>106712576
I am going to need a source on that. As far as I am aware, quantization does bias the models [1]

[1]: https://openreview.net/forum?id=e3Dpq3WdMv

^ Why do some openreview links not have any reviews on them

Anonymous
09/27/25(Sat)10:04:08 No.106715959

Anonymous 09/27/25(Sat)10:04:08 No.106715959

Is there any inference provider that offers a raw completion mode that you could hook up with mikupad?

Anonymous
09/27/25(Sat)10:12:01 No.106716029

Anonymous 09/27/25(Sat)10:12:01 No.106716029

>>106715959
Consult https://rentry.org/or-prefill

Anonymous
09/27/25(Sat)10:25:50 No.106716153

Anonymous 09/27/25(Sat)10:25:50 No.106716153

>>106715569
>R1 is king

I confirm. R1 is amazing for translations

Anonymous
09/27/25(Sat)10:28:02 No.106716177

Anonymous 09/27/25(Sat)10:28:02 No.106716177

>>106715569
I did software for docx translation in which models just don't have a way to miss a line, thy are asked about each line directly, and if they choose to translate a different thing from context, this is detected with tags. Mistral-Small 24B works perfectly.

Anonymous
09/27/25(Sat)10:34:25 No.106716243

Anonymous 09/27/25(Sat)10:34:25 No.106716243

>>106716029
thanks anon

Anonymous
09/27/25(Sat)10:45:50 No.106716338

Anonymous 09/27/25(Sat)10:45:50 No.106716338

File: 1715830787598652.png (336 KB, 3000x2100)

336 KB PNG

>>106715830
Yes potentially especially with imatrix. However my argument is about specifically broad subject intelligence vs rote memorization of articles, ignoring individual subjects/tasks. There are some benchmarks that lean a bit more towards general tasks while some lean more towards memorized facts, and they are all unreliable for various reasons, so our understanding of whether quantization really affects the two things differently (in a way that matters towards the goal of comparing broad quant quality or "for measuring quantization effectiveness") is not clear. But generally MMLU has been used most of the time as the benchmark after PPL to test loss of intelligence from quantization, and it follows an exponential curve as does PPL. Since it's difficult to measure the absolute skill level or intelligence level of an LLM, the purpose of doing these simpler measurements is to compare relative differences, so stuff like PPL is currently an ok proxy for general subject benchmarks.

Honestly though perhaps we're getting caught up in irrelevant details here. Since we are not trying to do the extremely hard thing of measuring absolute intelligence level, measuring relative difference in quality means that it doesn't really matter if we're doing it on memorization or generalization. There is going to be a loss either way, and it is almost certainly exponential. Any measure that can show that is sufficient.

Anonymous
09/27/25(Sat)10:51:15 No.106716383

Anonymous 09/27/25(Sat)10:51:15 No.106716383

File: computers-must-shut-up.png (475 KB, 900x900)

475 KB PNG

Anonymous
09/27/25(Sat)10:55:29 No.106716419

Anonymous 09/27/25(Sat)10:55:29 No.106716419

This drought feels different from what we've had before. It feels more over than ever before despite everything. Almost unrecoverable.

Anonymous
09/27/25(Sat)11:19:41 No.106716628

Anonymous 09/27/25(Sat)11:19:41 No.106716628

>>106716419
is this somewhat like those video game communities that start going crazy after a few months without a new update?

Anonymous
09/27/25(Sat)11:29:25 No.106716746

Anonymous 09/27/25(Sat)11:29:25 No.106716746

>>106716338
>relative difference
Of course lol. Thanks anon

Anonymous
09/27/25(Sat)11:31:12 No.106716754

Anonymous 09/27/25(Sat)11:31:12 No.106716754

what do we do with this 80GB image model?

Anonymous
09/27/25(Sat)11:35:19 No.106716796

Anonymous 09/27/25(Sat)11:35:19 No.106716796

>>106716419
The blocker is hardware, it's always been hardware.
We need a hardware miracle right now.
like a Huawei’s Atlas 300I Duo but with actually usable bandwidth
Or Sandisk's high bandwith flash actually comes out.
Seems like D-Matrix are working on this as well:
https://www.techradar.com/pro/security/after-sandisk-d-matrix-is-proposing-an-intriguing-alternative-to-the-big-hbm-ai-puzzle-with-10x-better-performance-with-10x-better-energy-efficiency

Anonymous
09/27/25(Sat)11:40:10 No.106716839

Anonymous 09/27/25(Sat)11:40:10 No.106716839

>>106716796
Even if there were some kind of miracle hardware breakthrough it wouldn't make its way to local for half a decade or more

Anonymous
09/27/25(Sat)11:47:52 No.106716925

Anonymous 09/27/25(Sat)11:47:52 No.106716925

File: 0c121016d655f75462157923a(...).jpg (100 KB, 735x725)

100 KB JPG

>>106716383

Anonymous
09/27/25(Sat)11:48:31 No.106716931

Anonymous 09/27/25(Sat)11:48:31 No.106716931

>>106716839
I wouldn't underestimate the chinese though
https://techcrunch.com/2025/09/17/china-tells-its-tech-companies-they-cant-buy-ai-chips-from-nvidia/

Anonymous
09/27/25(Sat)11:55:19 No.106716998

Anonymous 09/27/25(Sat)11:55:19 No.106716998

>>106716419
Need replacement for autoregressiveslop. I wouldn't be satisfied with running Gemini Pro on GPUs in my home rig.

Anonymous
09/27/25(Sat)12:07:51 No.106717114

Anonymous 09/27/25(Sat)12:07:51 No.106717114

File: 1758561965270831.gif (2.97 MB, 300x219)

2.97 MB GIF

I wish the hype around machine learning would die down so I'd stop having to see retards who don't know anything about it being extremely opinionated on it
I just wanna do my data analysis in peace

Anonymous
09/27/25(Sat)12:12:48 No.106717153

Anonymous 09/27/25(Sat)12:12:48 No.106717153

File: Screenshot_20250924_22430(...).jpg (1.34 MB, 2340x1080)

1.34 MB JPG

>Moondream-3-Preview 9B MoE
Q:
>How many people are in the picture and what is their gender?
A:
>REASONING
>The image shows four people in the photograph. Looking closely at their genders, we can see two women and one man. The individuals appear to be engaged in a casual interaction, with one person gesturing while holding a phone.
>RESULT
>4 people, 2 women and 2 men

Overhyped garbage. Even SmolVL got the genders right.

Anonymous
09/27/25(Sat)12:17:14 No.106717191

Anonymous 09/27/25(Sat)12:17:14 No.106717191

File: 3b7385633249cb0163ff4f222(...).png (555 KB, 1000x1000)

555 KB PNG

i have a question, I just watched the movie Eddington about the datacenter named SolidGoldMagikarp, and after learning about glitch tokens, i see that chat gpt has patched the bugs with that came with that token.

but i was wondering are there any currently glitchy tokens for chat gpt or is that like entire concept patched now?

Anonymous
09/27/25(Sat)12:19:08 No.106717208

Anonymous 09/27/25(Sat)12:19:08 No.106717208

>>106717191
ChatGPT is not an open weights model so it's not a local model. You want this thread instead >>>/vg/540611817

Anonymous
09/27/25(Sat)12:20:23 No.106717223

Anonymous 09/27/25(Sat)12:20:23 No.106717223

>>106717191
There probably are some still.
>>106717208
Model is not, but tokenizer is.

Anonymous
09/27/25(Sat)12:24:53 No.106717261

Anonymous 09/27/25(Sat)12:24:53 No.106717261

>>106717191
That's a great image. I love the machinery.

Anonymous
09/27/25(Sat)12:26:39 No.106717279

Anonymous 09/27/25(Sat)12:26:39 No.106717279

>>106717208
thank you!
>>106717261
it's a great album too!
https://youtu.be/MId3KYmvsXI?si=pDbIwG6HYCayef4h

Anonymous
09/27/25(Sat)12:37:55 No.106717383

Anonymous 09/27/25(Sat)12:37:55 No.106717383

>>106716029
damn I just gooned like never before to V3.1 on OR and it took all of 5 seconds to set up. I can’t believe I wasted weeks fiddling with local shit and testing dozens of retarded <100B models in the hopes that one of them would be remotely intelligent, even though deepseek was miles better and available for free all along.

Anonymous
09/27/25(Sat)12:39:41 No.106717398

Anonymous 09/27/25(Sat)12:39:41 No.106717398

>>106717383
Once you stop feeling shameful about someone seeing shit you write, it's kind of hard to go back to local.

Anonymous
09/27/25(Sat)12:44:52 No.106717437

Anonymous 09/27/25(Sat)12:44:52 No.106717437

>>106717191
Ignore the other shill, /aicg/ is in this board.
>>106700209

Anonymous
09/27/25(Sat)12:47:30 No.106717458

Anonymous 09/27/25(Sat)12:47:30 No.106717458

>>106717437
that's the lower tier thread

Anonymous
09/27/25(Sat)12:50:03 No.106717474

Anonymous 09/27/25(Sat)12:50:03 No.106717474

>>106717458
The thread in /vg/ is about botmakies.

Anonymous
09/27/25(Sat)12:53:11 No.106717503

Anonymous 09/27/25(Sat)12:53:11 No.106717503

Is it possible to use Qwen-image-Edit-2509 on forge, or at least on the qwenUI?

I don't want to install more GUIs.

Anonymous
09/27/25(Sat)12:53:23 No.106717507

Anonymous 09/27/25(Sat)12:53:23 No.106717507

>>106717474
the thread in /g/ is about shitposting

Anonymous
09/27/25(Sat)12:56:36 No.106717531

Anonymous 09/27/25(Sat)12:56:36 No.106717531

>>106717507
Occasional shitposting sounds better than worshiping botmakies.

Anonymous
09/27/25(Sat)13:00:00 No.106717556

Anonymous 09/27/25(Sat)13:00:00 No.106717556

>>106717531
Occasional worshiping botmakies sounds better than shitposting.

Anonymous
09/27/25(Sat)13:01:48 No.106717569

Anonymous 09/27/25(Sat)13:01:48 No.106717569

>>106717556
No, it does not. Why would I want to worship botmakies?

Anonymous
09/27/25(Sat)13:01:49 No.106717570

Anonymous 09/27/25(Sat)13:01:49 No.106717570

>>106717531
>>106717556
you are both cancer

Anonymous
09/27/25(Sat)13:05:31 No.106717601

Anonymous 09/27/25(Sat)13:05:31 No.106717601

>>106717569
oh we're talking about you here my b, anyways for non faggots /vg/ thread is better
happy now?

Anonymous
09/27/25(Sat)13:09:22 No.106717628

Anonymous 09/27/25(Sat)13:09:22 No.106717628

>>106717601
It's the other way around. Since /vg/ revolves around personalities, that's the thread for faggots. Like you.

Anonymous
09/27/25(Sat)13:16:13 No.106717667

Anonymous 09/27/25(Sat)13:16:13 No.106717667

>>106717628
>Since /vg/ revolves around personalities
that's /g/ hence the thread is for faggots, i.e. you

Anonymous
09/27/25(Sat)13:19:46 No.106717696

Anonymous 09/27/25(Sat)13:19:46 No.106717696

>>106717667
>>>/vg/540615495
What's this about, exactly?

Anonymous
09/27/25(Sat)13:29:59 No.106717760

Anonymous 09/27/25(Sat)13:29:59 No.106717760

>>106717696
>>106717634
>>106716210
just let it go anon /aicg/ on /g/ is just the lower tier thread

Anonymous
09/27/25(Sat)13:41:31 No.106717835

Anonymous 09/27/25(Sat)13:41:31 No.106717835

File: miku i'm not an eldritch (...).png (1.12 MB, 1024x1024)

1.12 MB PNG

>>106710285

Anonymous
09/27/25(Sat)14:00:16 No.106717943

Anonymous 09/27/25(Sat)14:00:16 No.106717943

indifferent to miku

Anonymous
09/27/25(Sat)14:28:02 No.106718184

Anonymous 09/27/25(Sat)14:28:02 No.106718184

deferent to miku

Anonymous
09/27/25(Sat)14:29:40 No.106718200

Anonymous 09/27/25(Sat)14:29:40 No.106718200

bullet to miku

Anonymous
09/27/25(Sat)14:31:57 No.106718224

Anonymous 09/27/25(Sat)14:31:57 No.106718224

File: 1622494488948.jpg (11 KB, 229x220)

11 KB JPG

Anonymous
09/27/25(Sat)14:37:12 No.106718270

Anonymous 09/27/25(Sat)14:37:12 No.106718270

i just want a new model
mistral large 3 perhaps?

Anonymous
09/27/25(Sat)14:43:07 No.106718326

Anonymous 09/27/25(Sat)14:43:07 No.106718326

File: le_sadface.jpg (72 KB, 540x540)

72 KB JPG

>>106718270

Anonymous
09/27/25(Sat)14:48:08 No.106718375

Anonymous 09/27/25(Sat)14:48:08 No.106718375

>>106718270
hahaha, lol, the mao.

Anonymous
09/27/25(Sat)14:53:45 No.106718431

Anonymous 09/27/25(Sat)14:53:45 No.106718431

>>106718270
Gemma 4

Anonymous
09/27/25(Sat)14:54:03 No.106718435

Anonymous 09/27/25(Sat)14:54:03 No.106718435

File: 1735851018137903.jpg (183 KB, 1080x1440)

183 KB JPG

>>106718323
>>106718323
>>106718323

Anonymous
09/27/25(Sat)14:59:56 No.106718475

Anonymous 09/27/25(Sat)14:59:56 No.106718475

>>106717153
Never trust a model under 30B (dense or MoE), it'll be shit.

Anonymous
09/27/25(Sat)15:03:53 No.106718512

Anonymous 09/27/25(Sat)15:03:53 No.106718512

>>106718496
>>106718496
>>106718496

Anonymous
09/27/25(Sat)15:06:02 No.106718536

Anonymous 09/27/25(Sat)15:06:02 No.106718536

>>106718449
>>106718449
>>106718449

Anonymous
09/27/25(Sat)15:07:36 No.106718555

Anonymous 09/27/25(Sat)15:07:36 No.106718555

>>106718536
Invalid Miku

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.