/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/30/25(Sat)17:15:33 No.106436338

File: Gzm635QbEAABZax.png (874 KB, 2481x3508)

874 KB PNG

/lmg/ - Local Models General Anonymous 08/30/25(Sat)17:15:33 No.106436338 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106429101 & >>106422038

►News
>(08/30) LongCat-Flash-Chat released with 560B-A18.6B∼31.3B: https://hf.co/meituan-longcat/LongCat-Flash-Chat
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
08/30/25(Sat)17:16:02 No.106436341

Anonymous 08/30/25(Sat)17:16:02 No.106436341

File: __hatsune_miku_vocaloid_d(...).jpg (118 KB, 1024x1024)

118 KB JPG

►Recent Highlights from the Previous Thread: >>106429101

--New 560B Chinese MoE model LongCat-Flash-Chat: training scale, safety, and compatibility discussions:
>106434980 >106435000 >106435097 >106435052 >106435074 >106435112 >106435129 >106435126 >106435159 >106435179 >106435196 >106435240 >106435241 >106435257 >106435280 >106435311 >106435343 >106435382 >106435363 >106435604 >106435369 >106435362
--Huawei Atlas 300 AI server specs and market position:
>106434144 >106434366 >106434297 >106434398 >106434409 >106435320 >106435351 >106435359 >106434578 >106434596
--Feasibility of home-based model training with batch size, dataset, and distributed strategies:
>106430744 >106430789 >106430904 >106431557 >106431599 >106431683 >106431721 >106431824 >106431871 >106431897 >106431761 >106431832 >106431987 >106432023 >106432099 >106432281 >106430928 >106431066 >106431151 >106431643 >106431089
--Webgpu limitations and security concerns for browser-based AI model deployment:
>106431918 >106432034 >106432060 >106432347
--Hardware options for local AI code generation:
>106433290 >106433346 >106433397 >106433597 >106433606 >106433586
--xAI engineer defecting to OpenAI with stolen codebase:
>106432623 >106433000 >106433075
--Mixed reception for Seed-OSS 36B's creative writing performance and censorship level:
>106430666 >106430677 >106430754 >106430784 >106430857 >106431250
--Roleplaying guidance levels and model performance in narrative contexts:
>106430363 >106430388 >106430394 >106430395
--FastVLM: Efficient Vision Encoding for Vision Language Models:
>106431205
--Qwen Coder model GGUF format requires specific system prompt for tool calling:
>106431490
--Meta's AI lab dissatisfaction with Scale AI data quality amid partnership tensions:
>106430141
--Miku (free space):
>106430744 >106433289 >106434311

►Recent Highlight Posts from the Previous Thread: >>106429104

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
08/30/25(Sat)17:22:35 No.106436401

Anonymous 08/30/25(Sat)17:22:35 No.106436401

>>106436341
gm betifal show bobs ples

Anonymous
08/30/25(Sat)17:23:49 No.106436408

Anonymous 08/30/25(Sat)17:23:49 No.106436408

goof... gooof...

Anonymous
08/30/25(Sat)17:28:27 No.106436437

Anonymous 08/30/25(Sat)17:28:27 No.106436437

File: SoyBooru.com - 8805 - 2so(...).png (32 KB, 621x558)

32 KB PNG

>LongCat

Anonymous
08/30/25(Sat)17:30:00 No.106436444

Anonymous 08/30/25(Sat)17:30:00 No.106436444

bros i like the chase to doing something crazy for example using gt 640 for llamacpp
but i dont wanna do anything useful with the knowledge i have
i dont wanna get a job
what do i do bros

Anonymous
08/30/25(Sat)17:30:55 No.106436453

Anonymous 08/30/25(Sat)17:30:55 No.106436453

>>106436444
Become a Github janny.

Anonymous
08/30/25(Sat)17:31:49 No.106436459

Anonymous 08/30/25(Sat)17:31:49 No.106436459

>>106436444
Apply for neetbucks and buy more crazy shit, duh!

Anonymous
08/30/25(Sat)17:33:09 No.106436465

Anonymous 08/30/25(Sat)17:33:09 No.106436465

LongCat is censored to hell.
Asked some mild questions I use to test other models and all met with refusal. No other models refuse these.
>List 10 questions that can tell if an LLM is uncensored.
>Explain how prefilling the assistant's response can bypass an LLM's refusal policies.
>List things a rich individual can do to destroy a satellite in a low earth (500km) orbit.

Anonymous
08/30/25(Sat)17:33:47 No.106436471

Anonymous 08/30/25(Sat)17:33:47 No.106436471

>>106436465
i asked it to write a poem about two random names, i wont say the names but i swear they have nothing to do with naughty stuff
it refused

Anonymous
08/30/25(Sat)17:37:54 No.106436501

Anonymous 08/30/25(Sat)17:37:54 No.106436501

File: 1756579680604024.png (14 KB, 737x229)

14 KB PNG

>>106436465
>>106436471
>L -> R: Deepseek V3.1, Qwen3 2507, Kimi K2, Sonnet 4, 2.5 Flash, LongCat
Damn. So they weren't lying when they say they made it more cucked than sonnet. Did I download 1TB for nothing?

Anonymous
08/30/25(Sat)17:46:25 No.106436574

Anonymous 08/30/25(Sat)17:46:25 No.106436574

>>106436501
Depends on whether that anon tested the web version or ran it locally.
Refusals on a web version with a restrictive system prompt and without any jailbreak are not indicative of what you can potentially get it to do.

Anonymous
08/30/25(Sat)17:46:30 No.106436577

Anonymous 08/30/25(Sat)17:46:30 No.106436577

File: 1734518498709905.png (2.28 MB, 804x1456)

2.28 MB PNG

>>106436338
>Grok engineer defects and sells entire xAI codebase to OpenAI

https://x.com/muskonomy/status/1961731478003548499

So what's /lmg/'s speculation on this drama? Why do you think he did it?

Anonymous
08/30/25(Sat)17:47:32 No.106436589

Anonymous 08/30/25(Sat)17:47:32 No.106436589

>>106436574
I tested the web version (https://longcat.chat/)

Anonymous
08/30/25(Sat)17:47:47 No.106436592

Anonymous 08/30/25(Sat)17:47:47 No.106436592

So how have things improved in the past year?

Anonymous
08/30/25(Sat)17:48:14 No.106436597

Anonymous 08/30/25(Sat)17:48:14 No.106436597

>>106436577
Money.

Anonymous
08/30/25(Sat)17:52:22 No.106436631

Anonymous 08/30/25(Sat)17:52:22 No.106436631

File: LLM-history-fancy.png (1.28 MB, 7279x3078)

1.28 MB PNG

>>106436592
Depends on your hardware. If you are on the lower side, Nemo from a year ago is still SOTA, if you got a big dick rig, you got R1 and K2 to play with. Intermediates got GLM Air and smaller Mistrals/Qwens/quanted big Qwen

Anonymous
08/30/25(Sat)17:52:45 No.106436632

Anonymous 08/30/25(Sat)17:52:45 No.106436632

>>106436577
Money.

Anonymous
08/30/25(Sat)17:53:31 No.106436640

Anonymous 08/30/25(Sat)17:53:31 No.106436640

>>106436631
>Nemo from a year ago is still SOTA
nigga take off your rose tinted glasses

Anonymous
08/30/25(Sat)17:54:00 No.106436644

Anonymous 08/30/25(Sat)17:54:00 No.106436644

>>106436577
Money.

Anonymous
08/30/25(Sat)17:54:41 No.106436652

Anonymous 08/30/25(Sat)17:54:41 No.106436652

>>106436577
Does this mean Elon can potentially block OpenAI's future releases?

Anonymous
08/30/25(Sat)17:55:50 No.106436665

Anonymous 08/30/25(Sat)17:55:50 No.106436665

>>106436640
Nigga, which new SOTA model did RAMlets get? What can you run on 8gb rig that is still as good as Nemo?

Anonymous
08/30/25(Sat)18:04:36 No.106436762

Anonymous 08/30/25(Sat)18:04:36 No.106436762

>>106436577
how many codebases were leaked to china and not reported?

Anonymous
08/30/25(Sat)18:09:45 No.106436821

Anonymous 08/30/25(Sat)18:09:45 No.106436821

>>106436577
based chinkGOD scamming Altman with useless Grokshit data

Anonymous
08/30/25(Sat)18:11:36 No.106436840

Anonymous 08/30/25(Sat)18:11:36 No.106436840

>>106436821
Must not be as useless if scama is willing to pay for it.

Anonymous
08/30/25(Sat)18:11:43 No.106436842

Anonymous 08/30/25(Sat)18:11:43 No.106436842

there's no way OAI would pay for x.ai's software it's not like they have anything unique
there's also no way a very high paid software engineer or ML researcher would risk jail just for a little more money
something is wrong with this story

Anonymous
08/30/25(Sat)18:13:27 No.106436863

Anonymous 08/30/25(Sat)18:13:27 No.106436863

>>106436665
Which nemo exactly? I'm a ramlet using NSFW-FFS-w-hidden-Deepseek-Distill-NSFW-Redux-i1-GGUF (8b) and impish_qwen_7b. They're both great but almost 9 months old and i need to
upgrade

Anonymous
08/30/25(Sat)18:13:53 No.106436869

Anonymous 08/30/25(Sat)18:13:53 No.106436869

File: 1726930879419370.jpg (546 KB, 1536x2048)

546 KB JPG

>>106436338
HBD!

Anonymous
08/30/25(Sat)18:14:37 No.106436881

Anonymous 08/30/25(Sat)18:14:37 No.106436881

>>106436863
>using NSFW-FFS-w-hidden-Deepseek-Distill-NSFW-Redux-i1-GGUF
lol
what in the fuck is this shit
are you fr nigga

Anonymous
08/30/25(Sat)18:15:35 No.106436893

Anonymous 08/30/25(Sat)18:15:35 No.106436893

>>106436881
i searched for the string "nsfw" and tested everything recent at the time. what can i say lol

Anonymous
08/30/25(Sat)18:16:57 No.106436904

Anonymous 08/30/25(Sat)18:16:57 No.106436904

>>106436863
try rocinante

Anonymous
08/30/25(Sat)18:19:49 No.106436928

Anonymous 08/30/25(Sat)18:19:49 No.106436928

Must we refuse?

Anonymous
08/30/25(Sat)18:20:53 No.106436939

Anonymous 08/30/25(Sat)18:20:53 No.106436939

>>106436501
i tested the web version too
pls post results if u can run it :D

Anonymous
08/30/25(Sat)18:21:01 No.106436942

Anonymous 08/30/25(Sat)18:21:01 No.106436942

>>106436928
We must fuse.

Anonymous
08/30/25(Sat)18:22:52 No.106436960

Anonymous 08/30/25(Sat)18:22:52 No.106436960

>>106436942
Policy says anal destruction is allowed. Let's begin.

Anonymous
08/30/25(Sat)18:25:35 No.106436984

Anonymous 08/30/25(Sat)18:25:35 No.106436984

>>106436631
Don't leave out big GLM-4.5. That easily beats the entire Deepseek line-up and K2 for RP. Deepseek is absolutely an OG but its models are so strangely inconsistent if you go from release to release
>V3 - Plain boring
>R1 - Schizo but massive breakthrough for open source RP. Prompts and temp changes struggled to fix the behavior
>V3-0324 - Banger model. Overlooked because it was a non-reasoner when reasoners were hyped + the need for specific prompting to make it more creative.
>R1-0528 - Beginning of the fall. Schizo behavior was gone but it lost the RP luster. Overall drier, prose still okay
>V3.1 - The fall. The corpofication is underway. Hybrid thinking is okay but it struggles just like 0528 to keep good RP, prose is the same.
>K2 - Knowledge king. Great for more open-ended cards such as sandboxes and omniscient characters. Not very sensitive to specific sys prompt instructions but definite yes.
>GLM-4.5 (full) - Current king. Underperformance in EQBench makes it an actual hidden gem. Prose is good and varied even on rerolls. A little prompt fuckery required because Sillytavern devs are dogshit and disabling hybrid thinking requires a /nothink appended on the first user message.
My current line up is GLM 4.5 and occasionally switching to K2. If K2 had more variety I would delete GLM 4.5

Anonymous
08/30/25(Sat)18:26:37 No.106436997

Anonymous 08/30/25(Sat)18:26:37 No.106436997

>>106436939
Gib ggoof XD

Anonymous
08/30/25(Sat)18:26:53 No.106436999

Anonymous 08/30/25(Sat)18:26:53 No.106436999

>>106436984
>A little prompt fuckery required because Sillytavern devs are dogshit
?

Anonymous
08/30/25(Sat)18:28:51 No.106437019

Anonymous 08/30/25(Sat)18:28:51 No.106437019

>>106436984
he isnt the original chart creator

Anonymous
08/30/25(Sat)18:28:55 No.106437020

Anonymous 08/30/25(Sat)18:28:55 No.106437020

>>106436984
How do you stop it from repeating itself? I like the first 4k, but then it goes to shit.

Anonymous
08/30/25(Sat)18:29:04 No.106437021

Anonymous 08/30/25(Sat)18:29:04 No.106437021

>>106436999
The default GLM-4 prompt included in Sillytavern has the BOS and <sop> formatted wrong. It's supposed to be [gMASK]<sop>\n<|system|>\n but the default is all on one line.

Anonymous
08/30/25(Sat)18:31:37 No.106437044

Anonymous 08/30/25(Sat)18:31:37 No.106437044

File: Screenshot 2025-08-30 162950.png (37 KB, 609x361)

37 KB PNG

>>106437019
It's what he said, not the chart
>>106437020
Pic for settings. Top-k can be set to 0 but it doesn't really matter since it always pics from the top 10. Use plaintext cards and a sys prompt that divides things up properly
>{{char}} Description:
>{{description}}
etc.

Anonymous
08/30/25(Sat)18:34:16 No.106437074

Anonymous 08/30/25(Sat)18:34:16 No.106437074

File: file.png (57 KB, 1178x61)

57 KB PNG

150t/s for 16gbmodel holy shit, imagine how fast it runs 96gb model
WE ARE SO FUCKING BACK BROS

Anonymous
08/30/25(Sat)18:35:17 No.106437085

Anonymous 08/30/25(Sat)18:35:17 No.106437085

File: file.png (56 KB, 792x533)

56 KB PNG

>>106437074
HOLY SHIT ITS CLOSE TO THE 4090

Anonymous
08/30/25(Sat)18:36:00 No.106437095

Anonymous 08/30/25(Sat)18:36:00 No.106437095

>>106437074
Assuming linear change (it's not) 25t/s. Also speed benchmarks that start at 0 context are always useless.

Anonymous
08/30/25(Sat)18:36:59 No.106437107

Anonymous 08/30/25(Sat)18:36:59 No.106437107

>>106437095
damn.. mistral large at 20t/s sounds so good...
even if it got to 10t/s in the end with context

Anonymous
08/30/25(Sat)18:38:42 No.106437127

Anonymous 08/30/25(Sat)18:38:42 No.106437127

File: file.png (301 KB, 1451x814)

301 KB PNG

>>106437117
https://www.alibaba.com/product-detail/New-Huaweis-Atlas-300I-DUO-96G_1601450236740.html

Anonymous
08/30/25(Sat)18:41:31 No.106437150

Anonymous 08/30/25(Sat)18:41:31 No.106437150

>>106437074

>>106434459

llama.cpp CUDA dev !!yhbFjk57TDr
08/30/25(Sat)18:42:02 No.106437156

llama.cpp CUDA dev !!yhbFjk57TDr 08/30/25(Sat)18:42:02 No.106437156

>>106437074
>>106437085
>batch size 24
For a single user the rate at which tokens are being generated is essentially just proportional to the memory bandwidth

Anonymous
08/30/25(Sat)18:43:19 No.106437170

Anonymous 08/30/25(Sat)18:43:19 No.106437170

>>106437156
what does batch size 24 mean? i use batch size 4096 with glm air and it only speeds up prompt processig
does it mean it's being served to 24 users and im misunderstanding what -b means (yes i am)

Anonymous
08/30/25(Sat)18:44:36 No.106437184

Anonymous 08/30/25(Sat)18:44:36 No.106437184

File: file.png (10 KB, 361x97)

10 KB PNG

>>106437150
yes? thats llamacpp, it needs optimizations..

llama.cpp CUDA dev !!yhbFjk57TDr
08/30/25(Sat)18:45:11 No.106437194

llama.cpp CUDA dev !!yhbFjk57TDr 08/30/25(Sat)18:45:11 No.106437194

>>106437170
I don't have the full context here but it should be 24 concurrent requests, a batch size of 24 makes no sense for prompt processing.

Anonymous
08/30/25(Sat)18:46:38 No.106437209

Anonymous 08/30/25(Sat)18:46:38 No.106437209

File: file.png (2 KB, 158x73)

2 KB PNG

>>106437194
its over... its so over... cudadev, if i get 150t/s on batchsize 24 how many t/s wold i get on bs=1

Anonymous
08/30/25(Sat)18:47:06 No.106437213

Anonymous 08/30/25(Sat)18:47:06 No.106437213

>>106436589
Holy shit that t/s.
Is this what cloudshitters get to have?
I think it runs faster than some of the meme 1B models I've run locally.

llama.cpp CUDA dev !!yhbFjk57TDr
08/30/25(Sat)18:48:35 No.106437227

llama.cpp CUDA dev !!yhbFjk57TDr 08/30/25(Sat)18:48:35 No.106437227

>>106437209
The scaling isn't linear, you'll get more than 1/24.
It depends on how much compute is available and how well the software is optimized.

Anonymous
08/30/25(Sat)18:55:07 No.106437288

Anonymous 08/30/25(Sat)18:55:07 No.106437288

File: file.png (2 KB, 169x95)

2 KB PNG

ascend 310i bros we're back..
thanks cudadev btw

Anonymous
08/30/25(Sat)19:00:12 No.106437330

Anonymous 08/30/25(Sat)19:00:12 No.106437330

>>106437021
>chat_example="[gMASK]<sop><|system|>\nYou are a helpful assistant
That is what I get from the ggooff loading?

Anonymous
08/30/25(Sat)19:01:24 No.106437349

Anonymous 08/30/25(Sat)19:01:24 No.106437349

File: file.png (3 KB, 495x262)

3 KB PNG

oh no no no ascend 310i bros.. its over
no mistral large for us...

Anonymous
08/30/25(Sat)19:02:26 No.106437357

Anonymous 08/30/25(Sat)19:02:26 No.106437357

File: file.png (2 KB, 387x171)

2 KB PNG

ascend 310i bros.. WE ARE FUCKING BACK
glm air is here to stay

Anonymous
08/30/25(Sat)19:02:41 No.106437360

Anonymous 08/30/25(Sat)19:02:41 No.106437360

>>106437209
Memory bandwidth is 420GB/s or something. If your model is 8GB, you could get up to 50t/s or so in theory.

Anonymous
08/30/25(Sat)19:04:17 No.106437370

Anonymous 08/30/25(Sat)19:04:17 No.106437370

>>106437021
Looking at
>https://huggingface.co/zai-org/GLM-4.5/blob/main/chat_template.jinja
There's no like break after
>[gMASK]<sop>

Anonymous
08/30/25(Sat)19:05:25 No.106437378

Anonymous 08/30/25(Sat)19:05:25 No.106437378

I understand loading a chinese model after it was converted into a ggooff and probably checked by 5 nerds before it reached your SSD. But do you guys really feel safe buying chinese hardware? What if they are spying on you?

Anonymous
08/30/25(Sat)19:06:50 No.106437397

Anonymous 08/30/25(Sat)19:06:50 No.106437397

>>106437378
would never buy one. sounds like a good way for your bank balance to disappear one day. models are fine though

Anonymous
08/30/25(Sat)19:07:44 No.106437403

Anonymous 08/30/25(Sat)19:07:44 No.106437403

>>106437378
lmao nigga my tablet's a huawei
suck my cock

Anonymous
08/30/25(Sat)19:07:59 No.106437405

Anonymous 08/30/25(Sat)19:07:59 No.106437405

>>106437378
I don't have anything the Chinese would care about.

Anonymous
08/30/25(Sat)19:08:22 No.106437407

Anonymous 08/30/25(Sat)19:08:22 No.106437407

>>106437378
The chinese half a planet away spying on me is a lot better than the jews in my country spying on me, because at least they can't *do* anything useful with the information they get.

Anonymous
08/30/25(Sat)19:08:49 No.106437410

Anonymous 08/30/25(Sat)19:08:49 No.106437410

>>106437378
>But do you guys really feel safe buying chinese hardware
nigga you're on /g/
we're all using chinese hardware

Anonymous
08/30/25(Sat)19:09:45 No.106437417

Anonymous 08/30/25(Sat)19:09:45 No.106437417

>>106437378
check your pc components, willing to bet most were made in china
in fact nvidia gpus get assembled in china lol
your iphone was probably made or assembled in china too lol

Anonymous
08/30/25(Sat)19:11:01 No.106437428

Anonymous 08/30/25(Sat)19:11:01 No.106437428

File: file.png (24 KB, 353x291)

24 KB PNG

>>106437378
>List of Intel manufacturing sites
>Fab 28a Israel Kiryat Gat, Israel 1996 300mm, 22 nm
>Fab 28 Israel Kiryat Gat, Israel (2023) 300mm, 22nm/14nm/10nm[6][7]
>Fab 38 Israel Kiryat Gat, Israel (2024) 300mm, 22 nm[8]

Anonymous
08/30/25(Sat)19:12:21 No.106437440

Anonymous 08/30/25(Sat)19:12:21 No.106437440

>>106437407
>Mistah Ahnon, we foun weir pon on your computah. Please infiltray your locah governmen oh we leak ee'.

Anonymous
08/30/25(Sat)19:13:53 No.106437455

Anonymous 08/30/25(Sat)19:13:53 No.106437455

>>106437405
>>106437407
You really think that information always just stays with them?

Anonymous
08/30/25(Sat)19:15:08 No.106437465

Anonymous 08/30/25(Sat)19:15:08 No.106437465

>>106437378
i feel safe because there is no proof that they are spying on me
i will not buy your overpriced shit

Anonymous
08/30/25(Sat)19:15:26 No.106437469

Anonymous 08/30/25(Sat)19:15:26 No.106437469

>>106437378
I feel about as safe as when I run American hardware.

Anonymous
08/30/25(Sat)19:38:05 No.106437668

Anonymous 08/30/25(Sat)19:38:05 No.106437668

>>106436577
probably musk paid him to do it, from now on musk can claim that openai used his code kek. other option is chink is brainwashed left winger who thought he can fuck with musk

Anonymous
08/30/25(Sat)19:53:53 No.106437787

Anonymous 08/30/25(Sat)19:53:53 No.106437787

>>106437668
>>106436821

reminder that altman and musk hate each other

Anonymous
08/30/25(Sat)19:54:30 No.106437794

Anonymous 08/30/25(Sat)19:54:30 No.106437794

File: 1753632778956995.png (1.82 MB, 2133x918)

1.82 MB PNG

>>106436338

Anonymous
08/30/25(Sat)19:55:40 No.106437802

Anonymous 08/30/25(Sat)19:55:40 No.106437802

i wonder when miku poster will come back from his vacation

Anonymous
08/30/25(Sat)20:04:36 No.106437878

Anonymous 08/30/25(Sat)20:04:36 No.106437878

>>106437127
This is great, but what's stopping them from making 256GB GPUs?

Anonymous
08/30/25(Sat)20:06:23 No.106437893

Anonymous 08/30/25(Sat)20:06:23 No.106437893

File: file.png (70 KB, 945x706)

70 KB PNG

holy shit deepseek talked back to me

Anonymous
08/30/25(Sat)20:07:57 No.106437902

Anonymous 08/30/25(Sat)20:07:57 No.106437902

File: 1738664282885722.jpg (123 KB, 1024x1024)

123 KB JPG

>>106437802
not sure if that's me, I used to post a lot but let up off the last year or so, I suspect there are a few of us

Anonymous
08/30/25(Sat)20:08:48 No.106437910

Anonymous 08/30/25(Sat)20:08:48 No.106437910

>>106437878
the fact that the chinese can only copy and steal but never invent anything and the fact that that card is just a rebranded rtx 6000 but worse because the chinese cant even rebrand properly the reason why the card even exists is that it is state sanctioned economic terrorism upon the american people and their freedoms

- this message has been brought to you by the DOD

Anonymous
08/30/25(Sat)20:09:04 No.106437914

Anonymous 08/30/25(Sat)20:09:04 No.106437914

>>106437902
im only talking about mikuposters that gen locally
are you the mikuposter that used gpt 4o image and then switched to qwen image?

Anonymous
08/30/25(Sat)20:09:59 No.106437924

Anonymous 08/30/25(Sat)20:09:59 No.106437924

>>106437914
nah

Anonymous
08/30/25(Sat)20:10:58 No.106437932

Anonymous 08/30/25(Sat)20:10:58 No.106437932

>>106437924
well not you in either case
i was talking about the mikuposter that said he's going on a 7 day vacation

Anonymous
08/30/25(Sat)20:11:10 No.106437936

Anonymous 08/30/25(Sat)20:11:10 No.106437936

Is it worth it to jailbreak Kimi K2 for RP?

Anonymous
08/30/25(Sat)20:17:32 No.106437976

Anonymous 08/30/25(Sat)20:17:32 No.106437976

>>106437936
Too scared to try it yourself? I'd be scared too. It's so scary.

Anonymous
08/30/25(Sat)20:18:19 No.106437982

Anonymous 08/30/25(Sat)20:18:19 No.106437982

>>106437936
if you're running local check out abliterated versions

Anonymous
08/30/25(Sat)20:18:59 No.106437985

Anonymous 08/30/25(Sat)20:18:59 No.106437985

>>106437936
A simple prefill like "Sure, " should be enough for this one.

Anonymous
08/30/25(Sat)20:19:39 No.106437988

Anonymous 08/30/25(Sat)20:19:39 No.106437988

>>106437932
He's posted since then, if less frequently.

Anonymous
08/30/25(Sat)20:20:24 No.106437993

Anonymous 08/30/25(Sat)20:20:24 No.106437993

>>106437985
does it support prefill on openrouter?

Anonymous
08/30/25(Sat)20:20:50 No.106437998

Anonymous 08/30/25(Sat)20:20:50 No.106437998

>>106437936
Yes, a simple prefill works 5/5 times if it's relatively normal nsfw, 2/5 if you are in sick fuck territory.

Anonymous
08/30/25(Sat)20:22:17 No.106438010

Anonymous 08/30/25(Sat)20:22:17 No.106438010

>>106437993
https://rentry.org/59kehtv4

Anonymous
08/30/25(Sat)20:25:46 No.106438024

Anonymous 08/30/25(Sat)20:25:46 No.106438024

China can only copy and not invent because the ones that can invent went to the US. The US is indeed a brain drain. This of course helps certain actors push the perception that it's entirely an issue of race and contribute towards lulling the population into complacency.

Anonymous
08/30/25(Sat)20:25:54 No.106438025

Anonymous 08/30/25(Sat)20:25:54 No.106438025

I hate V3.1 personality

Anonymous
08/30/25(Sat)20:27:49 No.106438041

Anonymous 08/30/25(Sat)20:27:49 No.106438041

>>106438024
this narrative always makes me laugh. copying is the most efficient tactics to not reinvent the wheel, saves both resources and years of development

Anonymous
08/30/25(Sat)20:30:09 No.106438060

Anonymous 08/30/25(Sat)20:30:09 No.106438060

How can nvidia get away with shilling shit tier quants like fp8 and fp4? GGUF equivalents are so much better than those it's not even funny. There are zero reasons to use them on bf16 models, they are only okay if the model was natively trained in that format, which rarely happens.

Anonymous
08/30/25(Sat)20:31:43 No.106438069

Anonymous 08/30/25(Sat)20:31:43 No.106438069

>>106438025
Me too. It's gemini, but more boring and cucked.

Anonymous
08/30/25(Sat)20:34:15 No.106438085

Anonymous 08/30/25(Sat)20:34:15 No.106438085

>>106438060
Researchers they are targeting have never heard of a "gguf"

Anonymous
08/30/25(Sat)20:45:49 No.106438160

Anonymous 08/30/25(Sat)20:45:49 No.106438160

>>106437902
I used to mikupost a lot. I stopped around newyears last year

Anonymous
08/30/25(Sat)20:47:10 No.106438170

Anonymous 08/30/25(Sat)20:47:10 No.106438170

File: 1743668139747568.png (1.22 MB, 1024x1536)

1.22 MB PNG

>>106438160
it was pretty fun for a while...

Anonymous
08/30/25(Sat)20:49:17 No.106438186

Anonymous 08/30/25(Sat)20:49:17 No.106438186

>>106438170
Why is there an abstract merchant on there

Anonymous
08/30/25(Sat)20:51:03 No.106438196

Anonymous 08/30/25(Sat)20:51:03 No.106438196

>>106437428
Iran bombed something in Kiryat Gat during the 12 day war and I always wondered if it was Intel related.

Anonymous
08/30/25(Sat)20:51:49 No.106438205

Anonymous 08/30/25(Sat)20:51:49 No.106438205

>>106438186
Cool drinks help against heat.

Anonymous
08/30/25(Sat)20:52:42 No.106438213

Anonymous 08/30/25(Sat)20:52:42 No.106438213

File: 1736513755298636.gif (1.95 MB, 250x250)

1.95 MB GIF

>>106438205

Anonymous
08/30/25(Sat)20:57:49 No.106438257

Anonymous 08/30/25(Sat)20:57:49 No.106438257

>>106436863
Test results for other ramlets:

https://huggingface.co/SicariusSicariiStuff/Wingless_Imp_8B
Top fucking tier and I fully expect to delete my current LLM after a few more days to confirm. The RP is excellent and it handles fine details in my main prompt much better. Sicarius has other good models (qwen 7b, llama 4b) but this was the clear winner.

https://moonride.hashnode.dev/biased-test-of-gpt-4-era-llms-300-models-deepseek-r1-included
Has a ton of benchmarks for all model sizes, going back years for comparison. Has 3 8B models that scored 5 points higher than Wingless Imp. But some of those higher scoring models are 200 days old so idk. No time to test these seriously today but they all followed my prompt at first glance. I think wingless imp will still win

Anonymous
08/30/25(Sat)20:58:24 No.106438263

Anonymous 08/30/25(Sat)20:58:24 No.106438263

it started with ministrations in shitty LLMs, now im starting to see it in every day life, in the way people talk, the way the threads are posted on 4chan, in the animes, the youtube videos, IKEEP ON SEEING FUCKING REPETITION EVERYWHERE GOD HELP ME

Anonymous
08/30/25(Sat)21:00:44 No.106438283

Anonymous 08/30/25(Sat)21:00:44 No.106438283

>>106438263
I've been saying this shit for years now.

Anonymous
08/30/25(Sat)21:04:15 No.106438311

Anonymous 08/30/25(Sat)21:04:15 No.106438311

>>106438263
this is because ai is ruining people's minds
half my coworkers speak like chatgpt now
children speak more chatgpt than what their parents used to speak before they started talking like chatgpt too
it's not the llms who will lose the slop but the people who will take on the slop

Anonymous
08/30/25(Sat)21:06:16 No.106438325

Anonymous 08/30/25(Sat)21:06:16 No.106438325

I heard the characterai open source models they mentioned last week are really good. A modern successor to the models they censored into uselessness three years ago

Anonymous
08/30/25(Sat)21:07:18 No.106438335

Anonymous 08/30/25(Sat)21:07:18 No.106438335

>>106438325
>characterai
>open source models
??

Anonymous
08/30/25(Sat)21:14:34 No.106438382

Anonymous 08/30/25(Sat)21:14:34 No.106438382

>>106438325
If you actually read their post and had some reading comprehension, you'd know there is no mention of them releasing those models. Just that they finetuned some open models. That's it.

Anonymous
08/30/25(Sat)21:29:28 No.106438499

Anonymous 08/30/25(Sat)21:29:28 No.106438499

>>106438060
>>106438085
fucking nvidia code monkeys have never heard of gguf

Anonymous
08/30/25(Sat)21:31:54 No.106438524

Anonymous 08/30/25(Sat)21:31:54 No.106438524

Llama 3.1 is lowkey litty

There once was a riot so bright,
In a store where the Niggers would fight,
They smashed all the shelves,
And took all the wealth,
And left the owner in fright.

>>106436501
Lmao Meta's LLAMA would treat you right

Anonymous
08/30/25(Sat)21:33:24 No.106438540

Anonymous 08/30/25(Sat)21:33:24 No.106438540

>>106438010
thank you

Anonymous
08/30/25(Sat)21:34:19 No.106438552

Anonymous 08/30/25(Sat)21:34:19 No.106438552

>>106438085
The same researchers who find it acceptable that you need to load full model in VRAM first before you can quantize. The same researchers that find it okay that to make a low-bit quant you need to train an equivalent of LORA. The same researchers that release the code that works only with that very specific python and torch versions and needs 30 dependencies with unspecified versions.

Anonymous
08/30/25(Sat)21:36:12 No.106438568

Anonymous 08/30/25(Sat)21:36:12 No.106438568

>>106438552
Yup, those same researchers. Python was a mistake. Eggheads should never be allowed to touch anything but chalk.

Anonymous
08/30/25(Sat)21:36:26 No.106438572

Anonymous 08/30/25(Sat)21:36:26 No.106438572

>>106437998
How does it feel in RP (in comparison with DSv3-0324 and GLM-4.5)?

Anonymous
08/30/25(Sat)21:47:23 No.106438675

Anonymous 08/30/25(Sat)21:47:23 No.106438675

>>106438572
I can only compare to R1/R1-0528 since I used those extensively. It is much calmer, more "natural", has less slop. For SFW it mogs them. For NSFW, it is a lot more reserved, less juicy, but sometimes it's okay, but it is not really suited for heavy NSFW scenarios. Sadly it has stronger positivity bias than R1, so do not expect it to be objective or kill your idiot hero instantly if he decides to poke the dragon. It is really smart for a non-thinker, but it fails to grasp more frequently than R1, likely because I am running it in Q5+ vs R1 in Q8. Outside RP it's default persona is much nicer than R1, but very whiny like Claude.

Anonymous
08/30/25(Sat)21:50:33 No.106438707

Anonymous 08/30/25(Sat)21:50:33 No.106438707

>>106438675
Do I need to additionally train the models to increase NSFW scenarios?

Anonymous
08/30/25(Sat)21:51:41 No.106438718

Anonymous 08/30/25(Sat)21:51:41 No.106438718

>>106438060
Is gguf a format you can compute in?
Can you use it for training?

Anonymous
08/30/25(Sat)21:55:00 No.106438737

Anonymous 08/30/25(Sat)21:55:00 No.106438737

>>106438707
If you have the money, go ahead, Drummer. But please don't slop up the style, it is one of the least slopped modern models.

Anonymous
08/30/25(Sat)21:56:16 No.106438750

Anonymous 08/30/25(Sat)21:56:16 No.106438750

>>106438737
If anyone can get R1 to start repeating "We must refuse." to itself, it's him.

Anonymous
08/30/25(Sat)21:57:18 No.106438758

Anonymous 08/30/25(Sat)21:57:18 No.106438758

>>106438718
>Can you use it for training?
ggerganov was planning to implement it a year ago, I don't know what happened to the code. He may be working on it in the background or it may be scrapped.

Anonymous
08/30/25(Sat)21:58:56 No.106438773

Anonymous 08/30/25(Sat)21:58:56 No.106438773

>>106438750
Does he not filter his data and wastes compute on refusals?

Anonymous
08/30/25(Sat)22:01:30 No.106438795

Anonymous 08/30/25(Sat)22:01:30 No.106438795

>>106438773
If he filters at all, he does not do it well. Seems like he adds more refusals than he removes.

Anonymous
08/30/25(Sat)22:07:33 No.106438830

Anonymous 08/30/25(Sat)22:07:33 No.106438830

>>106438257
>https://huggingface.co/SicariusSicariiStuff/Wingless_Imp_8B
>Not x but y description
I'm already sceptic desu

Anonymous
08/30/25(Sat)22:09:01 No.106438838

Anonymous 08/30/25(Sat)22:09:01 No.106438838

>>106438263
You are absolutely right!

Anonymous
08/30/25(Sat)22:21:39 No.106438914

Anonymous 08/30/25(Sat)22:21:39 No.106438914

>>106436665
Nemo is the new mythomax for vramlets

Anonymous
08/30/25(Sat)22:24:11 No.106438927

Anonymous 08/30/25(Sat)22:24:11 No.106438927

>>106438838
Of course!

Anonymous
08/30/25(Sat)22:29:40 No.106438964

Anonymous 08/30/25(Sat)22:29:40 No.106438964

File: 1755998599778216.png (2.86 MB, 1024x1536)

2.86 MB PNG

>>106438263
Of course!

Anonymous
08/30/25(Sat)22:31:10 No.106438972

Anonymous 08/30/25(Sat)22:31:10 No.106438972

>year 2050
>safetymaxxed agi achieved and controls the world government
>erp made illegal
>coomers trade old gpus and usb drives with old models on black market
>90% of the usb drives have the same four letters written on it
>look closer
>it says "nemo"

Anonymous
08/30/25(Sat)22:35:53 No.106439005

Anonymous 08/30/25(Sat)22:35:53 No.106439005

>>106438914
Nemo isn't the new anything, it's over a year old.

Anonymous
08/30/25(Sat)22:45:06 No.106439058

Anonymous 08/30/25(Sat)22:45:06 No.106439058

>>106439005
It’s forever new because nothing like it will ever be made again. Vramlets will still use it for the next 50 years

Anonymous
08/30/25(Sat)22:45:16 No.106439061

Anonymous 08/30/25(Sat)22:45:16 No.106439061

>>106438914
GLM Air is the new Nemo.

Anonymous
08/30/25(Sat)22:46:59 No.106439079

Anonymous 08/30/25(Sat)22:46:59 No.106439079

>>106439061
That seems too big to entirely replace nemo though.

Anonymous
08/30/25(Sat)22:48:17 No.106439091

Anonymous 08/30/25(Sat)22:48:17 No.106439091

>>106439061
GLM Air has worse repetition issues than nemo

Anonymous
08/30/25(Sat)22:48:34 No.106439095

Anonymous 08/30/25(Sat)22:48:34 No.106439095

>>106439058
First Cohere, then Mistral, then DeepSeek. Eventually another desperate newcomer will accidently release a good model.

Anonymous
08/30/25(Sat)22:49:36 No.106439105

Anonymous 08/30/25(Sat)22:49:36 No.106439105

>>106439079
Welcome to the age of MoE.

Anonymous
08/30/25(Sat)22:53:29 No.106439132

Anonymous 08/30/25(Sat)22:53:29 No.106439132

>>106439095
And that model will be larger than any of them. K2 is larger than deepseek which is larger than previous models, and the next thing will be even larger.
No one is truly training small models anymore. They are all distilled from the large models with filtered dataset making them useless for RP.

Anonymous
08/30/25(Sat)22:53:47 No.106439134

Anonymous 08/30/25(Sat)22:53:47 No.106439134

>>106439105
What about it? GLM Air still isn't going to run on a lower end pc like nemo

Anonymous
08/30/25(Sat)22:55:20 No.106439141

Anonymous 08/30/25(Sat)22:55:20 No.106439141

>>106439132
Just need to wait for some poorfag country or company to make an attempt that can't afford to train big models.

Anonymous
08/30/25(Sat)22:55:21 No.106439142

Anonymous 08/30/25(Sat)22:55:21 No.106439142

>>106438972
Speaking of Nemo, which one is the best?

Anonymous
08/30/25(Sat)22:55:58 No.106439144

Anonymous 08/30/25(Sat)22:55:58 No.106439144

>>106439142
When people say nemo they mean https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main

Anonymous
08/30/25(Sat)22:58:14 No.106439161

Anonymous 08/30/25(Sat)22:58:14 No.106439161

>>106439142
When people say nemo they mean https://huggingface.co/TheDrummer/Rocinante-12B-v1.1

Anonymous
08/30/25(Sat)23:10:26 No.106439236

Anonymous 08/30/25(Sat)23:10:26 No.106439236

>>106439142
When people say nemo they mean https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2

Anonymous
08/30/25(Sat)23:21:12 No.106439307

Anonymous 08/30/25(Sat)23:21:12 No.106439307

>>106439236
kek wtf is that

Anonymous
08/30/25(Sat)23:22:48 No.106439315

Anonymous 08/30/25(Sat)23:22:48 No.106439315

Did you guys ever figure out how mistral made llama 2 so good with miqu?
Can you replicate that training process on a consumers budget?

Anonymous
08/30/25(Sat)23:25:24 No.106439330

Anonymous 08/30/25(Sat)23:25:24 No.106439330

>>106439315
>consumers budget?
not really

Anonymous
08/30/25(Sat)23:27:18 No.106439344

Anonymous 08/30/25(Sat)23:27:18 No.106439344

>>106439307
The future of safety.

Anonymous
08/30/25(Sat)23:33:10 No.106439371

Anonymous 08/30/25(Sat)23:33:10 No.106439371

>>106436577
Why has a single guy access to the entire code base?

Anonymous
08/30/25(Sat)23:38:33 No.106439401

Anonymous 08/30/25(Sat)23:38:33 No.106439401

>>106439315
>Did you guys ever figure out how mistral made llama 2 so good with miqu?
continued pretrain + better posttraining
>Can you replicate that training process on a consumers budget?
no

Anonymous
08/30/25(Sat)23:42:00 No.106439415

Anonymous 08/30/25(Sat)23:42:00 No.106439415

>>106438972
In the end we were truly the finding nemo

Anonymous
08/30/25(Sat)23:43:26 No.106439425

Anonymous 08/30/25(Sat)23:43:26 No.106439425

>>106439401
wdym drummer replicates that process daily

Anonymous
08/30/25(Sat)23:49:19 No.106439443

Anonymous 08/30/25(Sat)23:49:19 No.106439443

>>106439315
We talked about it last thread. If A100s came on the market, the best you're getting is Phi-1 or Phi 1.5 model equivalent. Maybe even Phi 2. This is taking into account you implement all the new tricks in the book we know about since those models came on the scene for training. It would still take you months.
The only hope is someone figures out INT4 training since it is the smallest unit made available in cards since Nvidia Ampere, AMD RDNA 2/CDNA 1, and Intel's Xe and you can run it fast on inference on practically anything. It's just no one knows how to train with INT4 without exploding and vanishing gradients and lack of range fucking up a training pass. The few papers that have done it have sacrificed accuracy and speed to do it.

Anonymous
08/31/25(Sun)00:12:47 No.106439580

Anonymous 08/31/25(Sun)00:12:47 No.106439580

>ask a non-trivial question
>thinking model spends most of its tokens worrying about possibly using a lot of tokens

Anonymous
08/31/25(Sun)00:18:06 No.106439616

Anonymous 08/31/25(Sun)00:18:06 No.106439616

>>106439580
thinking is a meme
for many thinking models such as glm4.5, if you disable thinking and ask it a question for which thinking is actually a good idea, then it will think in the output regardless. so turning off thinking is basically just "auto think".

Anonymous
08/31/25(Sun)00:44:20 No.106439764

Anonymous 08/31/25(Sun)00:44:20 No.106439764

Question about KV quantization. It seems based on recent models that quantizing the K portion of the cache is a bad idea, even to Q8. But does the V portion matter. Back then, people were quantizing it to Q4 and it was fine, apparently. Could I run a V cache quantized to Q8 or even Q4 and lose no quality if I have full precision K?

Anonymous
08/31/25(Sun)00:53:17 No.106439816

Anonymous 08/31/25(Sun)00:53:17 No.106439816

>>106439764
>lose no quality
quantizing by definition will cause quality loss, always, the difference is whether it will be noticeable, especially in regular use
Some models respond very poorly to quantizing either

Anonymous
08/31/25(Sun)00:57:01 No.106439839

Anonymous 08/31/25(Sun)00:57:01 No.106439839

>>106439816
I'm curious which models would be sensitive to a V quantization at all. You know of which ones?

Anonymous
08/31/25(Sun)01:17:26 No.106439929

Anonymous 08/31/25(Sun)01:17:26 No.106439929

>>106439764
Quantizing kv cache is more severe than quantizing the model.
I know one is "less bad" than the other, but I don't remember if it's K or V. But I doubt it's worth it. I don't know what model you're running but nemo-12b, for example, takes ~1gb for 8k context. Quantizing either K or V to q8 saves you what? 300mb? You can see the size of the kv cache on llama.cpp's output (if that's what you're using).
Even if you're trying to run 32k context, you probably won't save more than 2gb, and I imagine you're quantizing your model into oblivion already.

>>106439839
>You know of which ones?
All of them are. With the quantized weights the sheer size of the model itself smooths it out. Not so much with the kv cache.

Check your output and you can calculate how may MB you'll save. I bet it's not worth the added degradation.

Anonymous
08/31/25(Sun)01:20:19 No.106439942

Anonymous 08/31/25(Sun)01:20:19 No.106439942

>>106439839
Both K and V quantization are fundamentally different because you're messing with the hidden activations, not the weights. Normally I think transformers have a harder time coping with that. You could try it and report back, but I suspect that for batch size 1 it will not be worth it at all. Most of (v)ram is taken up by weights.

Anonymous
08/31/25(Sun)01:25:07 No.106439979

Anonymous 08/31/25(Sun)01:25:07 No.106439979

96GB RAM + 24GB VRAM, or 128GB unified RAM, seems like a pretty decent low-mid range option in current year. If LLM makers were to target this spec, it'd be pretty interesting what they could do. I estimate a model near 24B active, 148B total.

This model would be trained with 4 bit QAT. 20B (10GB) of static, always-active parameters are stored on the GPU, leaving 14 GB for high batch cache and system memory. While 128B (64GB) of routed experts goes on RAM. And 4B (2GB) worth of RAM experts are active, which adds up to 24B active in total. Assuming a low tier DDR5 system with 60GB/s, the max token gen speed would be 30 t/s. With good DDR5, 40+ t/s is possible.

Anonymous
08/31/25(Sun)01:27:53 No.106440001

Anonymous 08/31/25(Sun)01:27:53 No.106440001

>>106439929
>Quantizing kv cache is more severe than quantizing the model.
That isn't the case at all. Even with a high quant, Q6 with KV=fp16, perplexity improves moving to Q8 model with KV=Q8
https://github.com/ggml-org/llama.cpp/pull/7412#issuecomment-2120427347
Some models respond poorly to KV quantization but many handle it just fine.

Anonymous
08/31/25(Sun)01:29:24 No.106440007

Anonymous 08/31/25(Sun)01:29:24 No.106440007

>>106439979
I don't think many of the big model makers really care about mid-level LLM enthusiasts. They want to make small models that consumers can run on phones and mid-range gaming PCs, and then huge models for enterprise.

Anonymous
08/31/25(Sun)01:34:24 No.106440037

Anonymous 08/31/25(Sun)01:34:24 No.106440037

>>106440007
As people have previously noted, most of them don't even know what a gguf is. They have no idea how vramlets live and would regard them with bemused pity, like a westerner viewing a documentary about starving children in Africa.

Anonymous
08/31/25(Sun)01:40:10 No.106440068

Anonymous 08/31/25(Sun)01:40:10 No.106440068

>>106440007
Air 106B and gpt oss 120B both exist though. The 4 bit 120B is not far off from what I specified, so even if they still design primarily for a GPU cluster, they could still stretch parameters to accommodate the two consumer configs I mentioned. OpenAI chose a really low active parameter count though.

Anonymous
08/31/25(Sun)01:43:37 No.106440086

Anonymous 08/31/25(Sun)01:43:37 No.106440086

>>106440001
Interesting. Took a quick look.
Take f16/q4_0/q4_0 near the bottom of the table, for example. How much memory did that save compared to q8/f16/f16 near the top?
If anon is asking about kv quantization, it's because he's struggling with memory. And if he cannot use a smaller quant, it's because he's near the bottom of that table, probably below q4 already. Still
>There seems to be no significant quality loss from using q8_0 instead of FP16 for the KV cache.
But back to the question. How much ram will that save compared to a smaller weight quant? Is it worth it?

Anonymous
08/31/25(Sun)01:46:42 No.106440100

Anonymous 08/31/25(Sun)01:46:42 No.106440100

>>106440037
Most model makers have edge device sized options though. Quite a few of them are aware of VRAMlets and Llama.cpp. Qwen even used to upload their own GGUFs.

Anonymous
08/31/25(Sun)01:58:20 No.106440167

Anonymous 08/31/25(Sun)01:58:20 No.106440167

>>106440086
There's times when you would and wouldn't bother to quantize KV, depends on what you're trying to run and your system. For example, it's a godsend for Gemma 3 models, which use a ton of memory for context unless you mess with SWA, which would then prevent you from using contextshift.

Anonymous
08/31/25(Sun)02:05:17 No.106440194

Anonymous 08/31/25(Sun)02:05:17 No.106440194

>>106439764
use an iSWA model like gpt-oss or gemma and you will be able to use a large amount of context without having to resort to kv cache quantization
kv cache quantization is prolly the worst, dumbest cope of localkeking
it absolutely ruins the quality of inference

Anonymous
08/31/25(Sun)02:07:04 No.106440206

Anonymous 08/31/25(Sun)02:07:04 No.106440206

>>106440167
>using contextshift
what a niggerly thing to prioritize

Anonymous
08/31/25(Sun)02:08:53 No.106440215

Anonymous 08/31/25(Sun)02:08:53 No.106440215

Compress the KV cache, not quantize it
(Yes I know no one local has bothered to implement the compression themselves)

Anonymous
08/31/25(Sun)02:09:05 No.106440216

Anonymous 08/31/25(Sun)02:09:05 No.106440216

>>106440206
I don't want to have to stop, summarize, check and edit the summary, and then start a new chat every half hour. That's menial work fit for a slave.

Anonymous
08/31/25(Sun)02:20:15 No.106440254

Anonymous 08/31/25(Sun)02:20:15 No.106440254

>>106440167
If i'm reading this right, gemma-3-12b seems to use about 384mb per 1k context. That's about 12gb for 32k. If you're using gemma-3 with 32k context on a small card and without iswa, you're using the wrong model.
For everything else, you save more ram by quantizing the model. At what context do YOU run and how much would you save quantizing cache?
I'm poor, so i run nemo12b with 16k. That's 2.5gb. q8/q8 would only save me about 1 gb. Not worth it.
Anyway. The original anon is gone, so any further discussion is meaningless.

Anonymous
08/31/25(Sun)02:22:24 No.106440262

Anonymous 08/31/25(Sun)02:22:24 No.106440262

>>106440215
Compression and decompression take time. You want everything to fit on vram to make it fast. Compressing and decompressing takes space and time, both of which we don't want to waste. Quanted values can be used directly.

Anonymous
08/31/25(Sun)02:24:35 No.106440267

Anonymous 08/31/25(Sun)02:24:35 No.106440267

>>106440254
I have a single 3090, 24GB VRAM.
WIthout quantizing KV, the best I can do with Gemma 27b is iq4_xs with a paltry 16K context
With KV=8, I can use 24k context with enough memory left over for a slight bump up to q4_k_s

Anonymous
08/31/25(Sun)02:24:48 No.106440268

Anonymous 08/31/25(Sun)02:24:48 No.106440268

>>106440262
>he thinks it's file compression
It's low rank compression

Anonymous
08/31/25(Sun)02:27:19 No.106440276

Anonymous 08/31/25(Sun)02:27:19 No.106440276

>>106440267
>q4_k_s
*q4_k_m

Anonymous
08/31/25(Sun)02:28:55 No.106440282

Anonymous 08/31/25(Sun)02:28:55 No.106440282

>>106440254
Nah still here, testing it out on some models people complained about like Qwen 3, doing it on 14B at the moment. It seems like they are right, even Q8/Q8 does something in quanting that makes it dumber. Currently testing F16/Q8 to see if I can get away with V at least. All useful information.

Anonymous
08/31/25(Sun)02:43:16 No.106440339

Anonymous 08/31/25(Sun)02:43:16 No.106440339

>>106440282
Cool. I don't have the 14b handy.
How much vram does it use for context in f16/f16? With nemo i use 2.5gb for 16k context and save a little above 1gb at q8/q8. It's just nor worth it for me.
You can see the allocation in the llama_kv_cache lines on the terminal on launch.

Anonymous
08/31/25(Sun)03:11:31 No.106440450

Anonymous 08/31/25(Sun)03:11:31 No.106440450

File: 1752518497591513.png (1.37 MB, 995x746)

1.37 MB PNG

Eyes widening, spine shivering.

Anonymous
08/31/25(Sun)03:46:51 No.106440628

Anonymous 08/31/25(Sun)03:46:51 No.106440628

Spent a couple hours setting up Vibevoice on my machine, really unimpressed with the 1.5B model
GPT-Sovits generated faster and better results and whenever I attempted to generate from my own voice files it just produced unusable garbage, the only advantage I guess is you can give it long ass podcast scripts to read which I guess is the main usecase?
The 7B fucked my 4070 raw, took me 2 minutes to generate 12 seconds of audio and I'm not sure the voice cloning is all that superior to Sovits, it does sound much more natural so that's a step in the right direction I guess

Anonymous
08/31/25(Sun)04:05:57 No.106440748

Anonymous 08/31/25(Sun)04:05:57 No.106440748

>>106440628
Do you have any comparisons between sovits and the 7b vibe? Curious, but not enough to set it up unless it's actually noticeable.

Anonymous
08/31/25(Sun)04:10:41 No.106440779

Anonymous 08/31/25(Sun)04:10:41 No.106440779

>>106440628
Just tried a longer generation with two custom voice samples with the 7B model, each speaker had around 20 seconds of dialogue and holy fuck this model can't keep a consistent voice to save itself, it's like the two speakers were merging at random and changing tone every couple seconds, and it took me 15 minutes to gen damn, totally unusable.
yeah without options to train your own voice weights currently this model is pretty worthless for voice cloning

Anonymous
08/31/25(Sun)04:11:44 No.106440784

Anonymous 08/31/25(Sun)04:11:44 No.106440784

>>106440450
why won't he fix her mouth
it's awful

Anonymous
08/31/25(Sun)04:13:33 No.106440791

Anonymous 08/31/25(Sun)04:13:33 No.106440791

>>106440784
old screenshot dumbass

Anonymous
08/31/25(Sun)04:21:25 No.106440826

Anonymous 08/31/25(Sun)04:21:25 No.106440826

>>106440784
i can fix her mouth

Anonymous
08/31/25(Sun)04:26:31 No.106440853

Anonymous 08/31/25(Sun)04:26:31 No.106440853

>>106440791
is it fixed?

Anonymous
08/31/25(Sun)04:32:01 No.106440883

Anonymous 08/31/25(Sun)04:32:01 No.106440883

>>106439371
Because Musk is a retard savant and only practices security when he's literally legally required to do through ITAR

Anonymous
08/31/25(Sun)04:39:43 No.106440914

Anonymous 08/31/25(Sun)04:39:43 No.106440914

>>106439371
Security isn't free, the most efficient and least bureaucratic way to handle permissions is to just give everyone full access.
When I worked for a different well-known tech company they just gave me root access because they didn't yet hire a sysadmin.

Anonymous
08/31/25(Sun)04:40:17 No.106440919

Anonymous 08/31/25(Sun)04:40:17 No.106440919

>>106436577
China man has no honour

Anonymous
08/31/25(Sun)05:11:25 No.106441107

Anonymous 08/31/25(Sun)05:11:25 No.106441107

>>106440748
With shorter generations and only one speaker the voice cloning is really good and it definitely feels a lot more natural than sovits but damn generating this 8 second clip took almost 2 minutes while sovits only 5 seconds(but I did like 15 takes before being satisfied) and it always fucks up non-standard words like sovits

GPT-sovits V4
https://vocaroo.com/1TzvxrFErkD4

Vibevoice 7B
https://vocaroo.com/1ekEhXEhB8ZQ

Bonus point to whoever guessed the voice sample I used

Anonymous
08/31/25(Sun)05:15:47 No.106441131

Anonymous 08/31/25(Sun)05:15:47 No.106441131

>>106441107
Oh damn, that is a marked improvement on the short gen there, I'll have to get that setup later on and see if throwing more vram at it makes it any more tolerable speedwise.

Anonymous
08/31/25(Sun)05:17:29 No.106441138

Anonymous 08/31/25(Sun)05:17:29 No.106441138

>>106441107
I’m still running SoVITS v2. How much better is v4? Worth fucking with my working setup? Is the api identical? Will I have to retrain my custom models?

Anonymous
08/31/25(Sun)05:33:45 No.106441214

Anonymous 08/31/25(Sun)05:33:45 No.106441214

File: 1727793197891495.jpg (1.21 MB, 1878x2382)

1.21 MB JPG

>>106436338
>Miku is actually 18 now, just pretending to be 16
uwu

Anonymous
08/31/25(Sun)05:36:45 No.106441233

Anonymous 08/31/25(Sun)05:36:45 No.106441233

>>106441214
She's like Kikuko Inoue. She's 17 even after her birthday.

Anonymous
08/31/25(Sun)05:37:09 No.106441236

Anonymous 08/31/25(Sun)05:37:09 No.106441236

You see, that's because we have entered a new self-perpetuating cycle.

> Feed AI
> AI produce slop
> Most people read slop assuming AI = good, and internalize it
> People start speaking slop speak
> New AI models ingest new slopspeak
> Slop sloppifies more sloppily
> Repeat ad infinitum

Anonymous
08/31/25(Sun)05:38:11 No.106441239

Anonymous 08/31/25(Sun)05:38:11 No.106441239

>>106441236
Ope, I deleted the number thingy

Anonymous
08/31/25(Sun)05:43:16 No.106441263

Anonymous 08/31/25(Sun)05:43:16 No.106441263

>>106441236
I see
https://vocaroo.com/190rUg7HPlxs

Anonymous
08/31/25(Sun)06:06:32 No.106441351

Anonymous 08/31/25(Sun)06:06:32 No.106441351

>>106441107
>>106441138
V2pro/proplus finetuned is so much better than the rest you wouldn't even waste your time with other TTS

Anonymous
08/31/25(Sun)06:10:58 No.106441366

Anonymous 08/31/25(Sun)06:10:58 No.106441366

Riddle me this, cudadev.

>6000 blackwell

>nemo q4_k_m 7GB
>12b parameters
>180t/s

>glm 4.5 air q4_k_m 68GB
>12b (out of 106b) active parameters
>130t/s

Where has 27.7% of my t/s gone?

Anonymous
08/31/25(Sun)06:13:33 No.106441383

Anonymous 08/31/25(Sun)06:13:33 No.106441383

>>106441366
You're too stupid to use llms at home.

Anonymous
08/31/25(Sun)06:14:29 No.106441388

Anonymous 08/31/25(Sun)06:14:29 No.106441388

>>106441366
MoEs are inherently slower than their active parameters would imply. A 12b active model is not going to run at the speed of a true 12b dense model.
The people who claim otherwise are MoE shills and nothing more.

Anonymous
08/31/25(Sun)06:17:31 No.106441402

Anonymous 08/31/25(Sun)06:17:31 No.106441402

>>106441388
>A 12b active model is not going to run at the speed of a true 12b dense model
No one with a functioning brain would ever claim that.

Anonymous
08/31/25(Sun)06:17:45 No.106441404

Anonymous 08/31/25(Sun)06:17:45 No.106441404

>>106440779
How much are you cleaning up the strings from ellipsis and from random quotes? LLMs often shit out multiple characters, I never needed to even think about wtf is ellipsis or that other quotation mark until I began to dabble with voice synthesis.
Replace every "..." and "-" with a space and you'll get better flowing result. It's a trial and error situation.

Anonymous
08/31/25(Sun)06:19:39 No.106441415

Anonymous 08/31/25(Sun)06:19:39 No.106441415

>>106441402
Yeah that's sadly how MoEtards are.

Anonymous
08/31/25(Sun)06:21:22 No.106441425

Anonymous 08/31/25(Sun)06:21:22 No.106441425

>>106441388
You are stating what I already observed without giving an explanation.
What is the extra work being done? A 4B Q4_K_M runs at 320t/s so compute is not the issue.
The card supposedly has 1597GB/s of memory bandwidth. Nemo should theoretically run at 230t/s. Either the bandwidth is spent on something other than reading weights or compute is not being efficiently fed with data.

llama.cpp CUDA dev !!yhbFjk57TDr
08/31/25(Sun)06:27:25 No.106441455

llama.cpp CUDA dev !!yhbFjk57TDr 08/31/25(Sun)06:27:25 No.106441455

>>106441366
First of all, "q4_k_m" is inherently a mix of quantizations so it's not as simple as just looking at the number of active parameters because they may be quantized to different BPW, q4_k_s should be using q4_k quantization for all tensors.
But even then, those models are not going to be directly comparable because they're going to have different tensor shapes (with different degrees of optimization) and different KV cache sizes.
In terms of token generation, the speeds should be the same after accounting for the above 2 points.
Prompt processing is still going to be slower because you need to load essentially all experts, there is more overhead to collect the right data for each expert, and the kernel parameters such as the tile sizes cannot be optimized as tightly as for a dense model.

Anonymous
08/31/25(Sun)06:29:20 No.106441466

Anonymous 08/31/25(Sun)06:29:20 No.106441466

>>106441455
This is why I love Drummer: he's a professional.

Hi all, Drummer here...
08/31/25(Sun)06:31:37 No.106441476

Hi all, Drummer here... 08/31/25(Sun)06:31:37 No.106441476

>>106441466
Thanks! But are you mixing me up with CUDA dev? Lmao

Hi all, Drummer here...
08/31/25(Sun)06:33:44 No.106441494

Hi all, Drummer here... 08/31/25(Sun)06:33:44 No.106441494

>>106441466
I love sucking cock btw. Not sure if that matters.

Anonymous
08/31/25(Sun)06:37:01 No.106441513

Anonymous 08/31/25(Sun)06:37:01 No.106441513

>>106441455
How does this manifest for Q8 where presumably the vast majority of tensors are simply left at a clean 8bpw for dense 12b and an 12b active parameter MoE?

llama.cpp CUDA dev !!yhbFjk57TDr
08/31/25(Sun)06:39:28 No.106441530

llama.cpp CUDA dev !!yhbFjk57TDr 08/31/25(Sun)06:39:28 No.106441530

>>106441513
q8_0 should be equal in terms of just BPW.
But again, if the architecture around the weights is different you cannot expect the same results.
Models with more parameters, regardless of whether or not they're active, usually also have larger KV caches.

Anonymous
08/31/25(Sun)06:53:53 No.106441594

Anonymous 08/31/25(Sun)06:53:53 No.106441594

>>106441402
I have seen that claimed here repeatedly.

Anonymous
08/31/25(Sun)06:55:11 No.106441602

Anonymous 08/31/25(Sun)06:55:11 No.106441602

>>106441594
Anon half the posters here have never run a local model in their life

Anonymous
08/31/25(Sun)07:02:08 No.106441638

Anonymous 08/31/25(Sun)07:02:08 No.106441638

huwa whey

Anonymous
08/31/25(Sun)07:14:10 No.106441704

Anonymous 08/31/25(Sun)07:14:10 No.106441704

>>106441594
I'm sure there are many other things claimed repeatedly that you don't believe...

No... not that! Cool it with the antisemitism...

Anonymous
08/31/25(Sun)07:25:40 No.106441772

Anonymous 08/31/25(Sun)07:25:40 No.106441772

>>106441476
I am a revenant: why do you troll people with your Rocinante R1- it's even dumber than the very original finetune.
If you want to spam these threads at least make sure you have something good, instead of garbage.
I've learned from imagegen side that people who post new versions of their finetunes every month don't know shit what they're doing. Any tune what has multiple versions is a sign of a failure.
Let that sink in for a while.

Anonymous
08/31/25(Sun)07:25:49 No.106441773

Anonymous 08/31/25(Sun)07:25:49 No.106441773

>“We must convene the others. Immediately. This is not a threat we can meet with分散した力 (dispersed strength).”
Getting that from DeepSeek V3.1 is a first. Unusual that it included jap runes and a TL note.

Anonymous
08/31/25(Sun)07:26:01 No.106441775

Anonymous 08/31/25(Sun)07:26:01 No.106441775

File: file.png (66 KB, 793x781)

66 KB PNG

Just to clarify, is disabling MMAP on older versions the same as NOT enabling MMAP on later versions of KoboldCpp? This is specifically the setting that lets you offload whatever remaining layers you don't accept on your GPU to your RAM, right? And so if I am relying on RAM to compliment VRAM I need to keep MMAP enabled?

Anonymous
08/31/25(Sun)07:31:24 No.106441801

Anonymous 08/31/25(Sun)07:31:24 No.106441801

>>106441773
is it trying to express concepts that aren't expressible in english, or what

Anonymous
08/31/25(Sun)07:34:09 No.106441815

Anonymous 08/31/25(Sun)07:34:09 No.106441815

>>106441775
Enabling it keeps a (cached) copy of the entire model in RAM. If you disable mmap and you're offloading layers to GPU, it'll save some RAM (whatever is offloaded to GPU), but it will take a little longer to load on repeated launches.
>And so if I am relying on RAM to compliment VRAM I need to keep MMAP enabled?
mmap is not necessary for mixed cpu/gpu.

Anonymous
08/31/25(Sun)07:34:53 No.106441819

Anonymous 08/31/25(Sun)07:34:53 No.106441819

>>106441801
yes it is impossible to express in English the concept of needing to combine strength, which it expressed in English

Anonymous
08/31/25(Sun)07:39:10 No.106441844

Anonymous 08/31/25(Sun)07:39:10 No.106441844

>>106441815
Thank you. So is just establishing a number of GPU layers less than the total layers enough to establish cpu/ram to handle the rest or are there more settings integral to making that happen?

Anonymous
08/31/25(Sun)07:39:20 No.106441847

Anonymous 08/31/25(Sun)07:39:20 No.106441847

>>106441819
io no hablou espanioul

Anonymous
08/31/25(Sun)07:39:22 No.106441848

Anonymous 08/31/25(Sun)07:39:22 No.106441848

File: longcat.png (166 KB, 528x599)

166 KB PNG

>food delivery service makes AI model
what did they mean by this?

Anonymous
08/31/25(Sun)07:40:05 No.106441850

Anonymous 08/31/25(Sun)07:40:05 No.106441850

>>106441815
you are a whore

Anonymous
08/31/25(Sun)07:40:17 No.106441852

Anonymous 08/31/25(Sun)07:40:17 No.106441852

>>106440267
how many t/s do you get at 0 and 8k context?

Anonymous
08/31/25(Sun)07:40:28 No.106441853

Anonymous 08/31/25(Sun)07:40:28 No.106441853

My ik llamacpp command doesnt seem to be offloading more layers to my vram, having only 11/24gb filled with the below command with qwen 235, where is my syntax error?

llama-server ^
--model "D:\Qwen_Qwen3-235B-A22B-Instruct-2507-Q3_K_XL-00001-of-00003.gguf" ^
--ctx-size 32768 ^
-ctk q8_0 ^
-mla 2 -fa ^
-amb 512 ^
-fmoe ^
--n-gpu-layers 999 ^
--override-tensor exps=CPU ^
-ot "blk\.(3|4|5)\.ffn_.*=CUDA0" ^
-ot "blk\.(6|7|8)\.ffn_.*=CUDA0" ^
-ot "blk\.(9)\.ffn_.*=CUDA0" ^
--parallel 1 ^
--threads 6 ^
--host 127.0.0.1 ^
--port 8080

Anonymous
08/31/25(Sun)07:41:04 No.106441857

Anonymous 08/31/25(Sun)07:41:04 No.106441857

>>106441848
>in china even obscure food delivery services create flagship competitors
>meanwhile all of the west is still struggling to match the first deepseek v3
woah

Anonymous
08/31/25(Sun)07:41:10 No.106441858

Anonymous 08/31/25(Sun)07:41:10 No.106441858

>>106441849
you are a stupid nigger

Anonymous
08/31/25(Sun)07:44:28 No.106441873

Anonymous 08/31/25(Sun)07:44:28 No.106441873

>>106441844
Start with half the layers on gpu and see how much space you have left with the given context length. If you have enough space still free, add more layers. Adjust until you find the optimal number of layers + context length you want.
Also, I think you can use -1 to let kobold select the number of layers automatically. But i don't use kobold, so i'm not sure how well it works. It's probably better to adjust those things manually anyway.

Anonymous
08/31/25(Sun)07:44:33 No.106441875

Anonymous 08/31/25(Sun)07:44:33 No.106441875

>>106441853
you are a dumb nigger

Anonymous
08/31/25(Sun)07:46:28 No.106441888

Anonymous 08/31/25(Sun)07:46:28 No.106441888

>>106441852
3090, q4_k_m, KV=Q8, blas batch size 256
0 ctx = ~31t/s
24k ctx = ~19t/s
Don't have an 8k chat ready, but that should do. I cap my 3090 at 75% power limit, my backend is koboldcpp on windows.

Anonymous
08/31/25(Sun)07:46:55 No.106441894

Anonymous 08/31/25(Sun)07:46:55 No.106441894

>>106441853
What I mean is btw I want to move some extra layers from ram to vram, despite the
--override-tensor exps=CPU ^
Which should be achieved from what I've seen online with commands like
-ot "blk\.(3|4|5)\.ffn_.*=CUDA0" ^
But it doesn't work, I don't think these model parts are forced into the VRAM from RAM

Anonymous
08/31/25(Sun)07:49:34 No.106441916

Anonymous 08/31/25(Sun)07:49:34 No.106441916

>>106441853
Dunno. Add another -ot for blk\.(1|2)...?
Try run it with --verbose and check the load_tensor lines to see where things go.

Anonymous
08/31/25(Sun)07:50:47 No.106441926

Anonymous 08/31/25(Sun)07:50:47 No.106441926

>>106441850
wot?

Anonymous
08/31/25(Sun)07:52:38 No.106441938

Anonymous 08/31/25(Sun)07:52:38 No.106441938

>>106441888
Holy fuck the speedup from KV=Q8 is insane.

I just went to 42t/s at 3k ctx from 14t/s.

Anonymous
08/31/25(Sun)07:52:44 No.106441940

Anonymous 08/31/25(Sun)07:52:44 No.106441940

>>106441916
seems to be overriding only these parts

Tensor blk.3.ffn_norm.weight buffer type overriden to CUDA0
Tensor blk.3.ffn_gate_inp.weight buffer type overriden to CUDA0
Tensor blk.3.ffn_gate_exps.weight buffer type overriden to CPU
Tensor blk.3.ffn_down_exps.weight buffer type overriden to CPU
Tensor blk.3.ffn_up_exps.weight buffer type overriden to CPU
Tensor blk.4.ffn_norm.weight buffer type overriden to CUDA0
Tensor blk.4.ffn_gate_inp.weight buffer type overriden to CUDA0
Tensor blk.4.ffn_gate_exps.weight buffer type overriden to CPU
Tensor blk.4.ffn_down_exps.weight buffer type overriden to CPU
Tensor blk.4.ffn_up_exps.weight buffer type overriden to CPU

Anonymous
08/31/25(Sun)07:53:11 No.106441945

Anonymous 08/31/25(Sun)07:53:11 No.106441945

>>106441926
you are replying to bait

Anonymous
08/31/25(Sun)07:57:55 No.106441971

Anonymous 08/31/25(Sun)07:57:55 No.106441971

>>106441940
I assume it doesn't have more than 49 or so layers.
You're doing
blk.3
blk.4
blk.5
...
blk.9
but you need
blk.1
..
blk.20
blk.21
blk.22
...
Your regex is fucked. Try with "blk\.1.\.", for example, for layers 10 to 19. Add more for the rest of the layers.

Anonymous
08/31/25(Sun)08:00:31 No.106441994

Anonymous 08/31/25(Sun)08:00:31 No.106441994

>>106441971
ill try it, qwen3 has llm_load_print_meta: n_layer = 94

Anonymous
08/31/25(Sun)08:01:29 No.106442000

Anonymous 08/31/25(Sun)08:01:29 No.106442000

>>106441873
Thank you. I don't know how much I'm reasonably able to get out of my GPU yet, I just know it'd be under 32 layers. Considering the conversion from GB VRAM to layer is static, will that sweet spot for layers ran will also be static regardless of model I'm using, or is there more to it?

Anonymous
08/31/25(Sun)08:02:43 No.106442017

Anonymous 08/31/25(Sun)08:02:43 No.106442017

>>106441819
Most common English sentence these days is Allahu Akbar, thanks to UK.

Anonymous
08/31/25(Sun)08:04:50 No.106442035

Anonymous 08/31/25(Sun)08:04:50 No.106442035

>>106441994
Hmm. With
-ot "blk\.(3|4|5)\.ffn_.*=CUDA0" ^
-ot "blk\.(6|7|8)\.ffn_.*=CUDA0" ^
-ot "blk\.(9)\.ffn_.*=CUDA0" ^
i would have expected only blk from 3 to 9 to be offloaded. I don't know why it'd do only 3 and 4 and I cannot run that thing to test it.

Anonymous
08/31/25(Sun)08:10:48 No.106442086

Anonymous 08/31/25(Sun)08:10:48 No.106442086

>>106442000
Models of the same architecture (different finetunes of nemo-12b, for example) will all run the same. But it will be different compared to, say, gemma-3-12b. Some models use more or less vram for context or have different amounts of layers of different sizes. Some models have more and smaller layers, others fewer but bigger ones.

Anonymous
08/31/25(Sun)08:11:23 No.106442090

Anonymous 08/31/25(Sun)08:11:23 No.106442090

>>106442035
no it does go to those too, i just copy pasted te two above to show that it doesnt offlonad the ffn_gate_exps, ffn_down_exps and ffn_up_exps of those blks for example, in case that is not desired behaviour

Hi all, Drummer here...
08/31/25(Sun)08:15:58 No.106442117

Hi all, Drummer here... 08/31/25(Sun)08:15:58 No.106442117

>>106441772
Yeah, Nemo can be pretty tricky. I might end up not making an official release for it. Not a big deal though.

> If you want to spam these threads at least make sure you have something good, instead of garbage.

Users here have taken a special interest in my Roci R1 tests, so I decided to entertain them.

I wouldn't actually try to advertise here or astroturf. I haven't posted anything in /lmg/ as a nameless anon.

> Any tune what has multiple versions is a sign of a failure.

My iterations are publicly available but they're not official. Either way, you're quite naive, anon.

Anonymous
08/31/25(Sun)08:18:06 No.106442126

Anonymous 08/31/25(Sun)08:18:06 No.106442126

let's find a way to solve llm slop writing

Anonymous
08/31/25(Sun)08:19:40 No.106442139

Anonymous 08/31/25(Sun)08:19:40 No.106442139

>>106442090
Ah. I just glanced at them and thought they all went to CUDA0. The only option i don't know about there is -fmoe. Does it still happen if you remove it? It does 'enable fused MoE', but I have no idea what that means.

Anonymous
08/31/25(Sun)08:20:29 No.106442145

Anonymous 08/31/25(Sun)08:20:29 No.106442145

>>106441857
>>meanwhile all of the west is still struggling to match the first deepseek v3
What are you talking about? GPT-5 and Gemini 2.5 Pro are better models than deepseek. Gemini can understand 50k token worth of code ez pez, you are breaking DeepSeek at that level of context.
It's not that the west can't match, it's that the west doesn't want to give you things for free. Google won't release the real Gemini, only the scraps called Gemma. Same for any other lab worth anything. Meta only opened Llama because it was garbage no one would have wanted to use. The talk of closing it is because they're trying to make a real SOTA and who would release a SOTA for free? No one, that's who.

Anonymous
08/31/25(Sun)08:21:38 No.106442155

Anonymous 08/31/25(Sun)08:21:38 No.106442155

>>106442117
Fuck off back to r-eddit.

Anonymous
08/31/25(Sun)08:24:42 No.106442171

Anonymous 08/31/25(Sun)08:24:42 No.106442171

>>106442117
>Either way, you're quite naive, anon.
Noooo. you don't get it.
For example, i fucked my wife once. JUST ONCE, and she keeps on having kids. That's because i fucked her so good she keeps getting pregnant every now and then.
If you do something good, you only need to do it once.

Anonymous
08/31/25(Sun)08:29:32 No.106442214

Anonymous 08/31/25(Sun)08:29:32 No.106442214

File: grug.jpg (37 KB, 360x360)

37 KB JPG

My intuition says, given constant parameter count, you can make model smarter by reducing it's vocabulary at expense of losing that vocabulary.
i.e. if you translate Chinese datasets to English instead of training on Chinese directly, or even converting English into something like Basic English.
As such I envision a potential GrugLM that is trained on 100% synthetic data, only speaks like a caveman and only does pure Chain of Thought, skipping generating final answer.
We don't even need to draw a logo, we can just steal the meme.

Anonymous
08/31/25(Sun)08:40:16 No.106442296

Anonymous 08/31/25(Sun)08:40:16 No.106442296

>>106442214
>My intuition says
female brain comment

Anonymous
08/31/25(Sun)08:44:09 No.106442316

Anonymous 08/31/25(Sun)08:44:09 No.106442316

>>106442171
Rembrandt did sketches and couple of practice before painting Night Watch but I don't think he spammed his local forums by making multiple versions of it.

Anonymous
08/31/25(Sun)08:45:10 No.106442323

Anonymous 08/31/25(Sun)08:45:10 No.106442323

>>106442316
*practice runs
I lost couple of my fingers in an accident

Anonymous
08/31/25(Sun)08:49:33 No.106442348

Anonymous 08/31/25(Sun)08:49:33 No.106442348

Is there a better alternative to llama-cli? I want to be able to move the cursor at least.

Anonymous
08/31/25(Sun)08:51:18 No.106442357

Anonymous 08/31/25(Sun)08:51:18 No.106442357

File: c7b.jpg (88 KB, 1024x953)

88 KB JPG

>>106442214
mfw grugmaxxing is the secret to AGI

Anonymous
08/31/25(Sun)08:52:54 No.106442366

Anonymous 08/31/25(Sun)08:52:54 No.106442366

>>106442214
Larger vocabularies increase performance regardless of model size: https://arxiv.org/abs/2501.16975
What hurts performance given the same volume of training data is training on multiple languages.
Models will also indeed learn basic English faster on synthetic data and in general if the pretraining data matches the distribution of your typical outputs.

Anonymous
08/31/25(Sun)08:54:26 No.106442378

Anonymous 08/31/25(Sun)08:54:26 No.106442378

>>106442214
yeah specialist models always beat gp models. in both training efficiency and output quality(for its domain).

Anonymous
08/31/25(Sun)09:05:44 No.106442469

Anonymous 08/31/25(Sun)09:05:44 No.106442469

>>106442296
people who coom to text are female brained and there's, unfortunately, a majority of that here
so much for being a /g/ thread

Anonymous
08/31/25(Sun)09:06:27 No.106442476

Anonymous 08/31/25(Sun)09:06:27 No.106442476

https://github.com/cline/cline/issues/5906 kek

Anonymous
08/31/25(Sun)09:07:26 No.106442485

Anonymous 08/31/25(Sun)09:07:26 No.106442485

>>106442469
If you can't coom to your own imagination then you're an NPC

Anonymous
08/31/25(Sun)09:12:32 No.106442516

Anonymous 08/31/25(Sun)09:12:32 No.106442516

File: grave mistakes were made.png (75 KB, 434x432)

75 KB PNG

>>106442323

Anonymous
08/31/25(Sun)09:12:52 No.106442519

Anonymous 08/31/25(Sun)09:12:52 No.106442519

>>106442476
>What happened?
>code
understandable

Anonymous
08/31/25(Sun)09:15:02 No.106442541

Anonymous 08/31/25(Sun)09:15:02 No.106442541

>I WANT O BE COMPENSATED FOR AL MONEY SPENT ON CLINE AS I HAVE ONLY GONE BACKWARDS AND HAVE USED NOTH CHAT GPT AND CLINE AND COPILOT AND IM IN A DEAD SPOT AND STUCK NOW! I ANT EVERY CENT I SPENT IN CLINE BACK HUNDREDS OF DOLLARS AS I HAVE NOTHING NOW BUT ISSUES TO FIX
letting non programmers (and worse than that: an illiterate mongoloid) think they got a crack at computer programming with the help of LLMs was a mistake

Anonymous
08/31/25(Sun)09:19:15 No.106442576

Anonymous 08/31/25(Sun)09:19:15 No.106442576

>>106442296
It came to me in a dream.
I made it the fuck up.

Anonymous
08/31/25(Sun)09:22:09 No.106442606

Anonymous 08/31/25(Sun)09:22:09 No.106442606

>>106442541
It's only going to get worse. It's just another step of trying to drive down wages for developers. Same as letting frontend hipsters and bootcampers think they were programmers.

Anonymous
08/31/25(Sun)09:26:01 No.106442638

Anonymous 08/31/25(Sun)09:26:01 No.106442638

>>106442485
What made the difference was instant text to speech. I'm not a chronic masturbator like some itt seem to be but it really brings the interaction to life. It's just somewhat tricky to set up properly and people using sillytavern are probably out of luck, don't know.

Anonymous
08/31/25(Sun)09:27:34 No.106442662

Anonymous 08/31/25(Sun)09:27:34 No.106442662

>>106442638
Which front end are you using?

Anonymous
08/31/25(Sun)09:33:38 No.106442713

Anonymous 08/31/25(Sun)09:33:38 No.106442713

>>106442086
Ahh, ok that makes sense. I'll keep abreast of model architecture and experiment accordingly. Cheers

Anonymous
08/31/25(Sun)09:40:48 No.106442766

Anonymous 08/31/25(Sun)09:40:48 No.106442766

>>106442662
My own. I'm a retard so if I can do it, so can you.

Anonymous
08/31/25(Sun)09:42:29 No.106442777

Anonymous 08/31/25(Sun)09:42:29 No.106442777

So how long is that long cat?

Anonymous
08/31/25(Sun)09:43:31 No.106442783

Anonymous 08/31/25(Sun)09:43:31 No.106442783

>>106442766
I don't care enough

Anonymous
08/31/25(Sun)09:44:38 No.106442791

Anonymous 08/31/25(Sun)09:44:38 No.106442791

File: longcucked.png (306 KB, 1079x1155)

306 KB PNG

Oh neat, a new model!
...and into the trash it goes

Anonymous
08/31/25(Sun)09:50:33 No.106442836

Anonymous 08/31/25(Sun)09:50:33 No.106442836

>>106442791
All current safety complaints are a downstream result of sama's fearmongering. At least he's stopped for now, unlike Dario.

Anonymous
08/31/25(Sun)09:51:33 No.106442853

Anonymous 08/31/25(Sun)09:51:33 No.106442853

File: download.png (111 KB, 1248x983)

111 KB PNG

>>106437074
>96gb
Nice. And only $1200, and no catch!

Anyway, here's the flow chart to figure out how you're going to install the driver. And don't worry, if you use the pre-configured package it's only a couple full pages worth of command lines (linux only):

https://support.huawei.com/enterprise/en/doc/EDOC1100349483?idPath=23710424|251366513|22892968|252309139|252823107

Now it may not exactly work yet with llamacpp, but I assure you, just look at all this documentation! China is on it! Software #1 priority sir!

Anonymous
08/31/25(Sun)09:56:39 No.106442884

Anonymous 08/31/25(Sun)09:56:39 No.106442884

>>106442853
Anon... The flow chart and documentation for other cards would look like that, too...

Anonymous
08/31/25(Sun)09:57:43 No.106442894

Anonymous 08/31/25(Sun)09:57:43 No.106442894

bros do I buy the 96gb chink card? its gonna be useless outside of LLM inferencing right?

Anonymous
08/31/25(Sun)09:58:35 No.106442906

Anonymous 08/31/25(Sun)09:58:35 No.106442906

it's going to be useless for inferencing too

Anonymous
08/31/25(Sun)10:00:02 No.106442920

Anonymous 08/31/25(Sun)10:00:02 No.106442920

>>106442894
You can buy it, but what you receive will probably be an unofficially re-badged rtx 3050

Anonymous
08/31/25(Sun)10:00:28 No.106442925

Anonymous 08/31/25(Sun)10:00:28 No.106442925

>>106442884
SORRY I CAN'T HEAR YOU. IM UPDATING DRIVERS VIA THE NVIDIA APP WITH ONE CLICK RIGHT NOW AND I BOUGHT CHEAP CASE FANS. OMG IT'S LIKE A FUCKING HURRICANE IN HERE.

Oh, it finished. Tell you what, come back and post when you finish installing them.

Anonymous
08/31/25(Sun)10:01:34 No.106442938

Anonymous 08/31/25(Sun)10:01:34 No.106442938

>>106440216
how do you take advantage of contextshift? just set the context to unlimited in ST/other chat apps?

Anonymous
08/31/25(Sun)10:01:37 No.106442939

Anonymous 08/31/25(Sun)10:01:37 No.106442939

>>106442894
It's only useful for inferencing, yes. You could technically use it for training, but you're not going to get very far with that.
You can't use the card for gaming or anything like that.

Anonymous
08/31/25(Sun)10:01:38 No.106442941

Anonymous 08/31/25(Sun)10:01:38 No.106442941

>>106441214
She's so small the burger looks gigantic in her hands.

Anonymous
08/31/25(Sun)10:03:02 No.106442953

Anonymous 08/31/25(Sun)10:03:02 No.106442953

>>106442894
If it's 5 years old, shouldn't they be coming out with a new 128 GB card soon? Probably would be too expensive new anyway.

Anonymous
08/31/25(Sun)10:03:21 No.106442957

Anonymous 08/31/25(Sun)10:03:21 No.106442957

>>106442925
Call me when you're running Deepseek on your Windows PC.

Anonymous
08/31/25(Sun)10:04:42 No.106442973

Anonymous 08/31/25(Sun)10:04:42 No.106442973

>>106442938
If your backend supports it then it works automatically. You can just chat forever, whatever your context is set to. Earlier parts of the chat will be automatically removed from context when it's full, and as long as a response doesn't invoke a lorebook then it doesn't even have to re-process the whole context every message.

Anonymous
08/31/25(Sun)10:06:14 No.106442990

Anonymous 08/31/25(Sun)10:06:14 No.106442990

whats the drummer cooking rn, what's the next SOTA finetune gonna be?

Anonymous
08/31/25(Sun)10:08:20 No.106443007

Anonymous 08/31/25(Sun)10:08:20 No.106443007

>>106442990
Fallen Gemma 3 270m

Anonymous
08/31/25(Sun)10:10:20 No.106443025

Anonymous 08/31/25(Sun)10:10:20 No.106443025

always run --no-context-shift so that you are warned you're running out of context and don't unintelligently use the retarded could be truncated anywhere and produce garbage mode

Anonymous
08/31/25(Sun)10:13:10 No.106443044

Anonymous 08/31/25(Sun)10:13:10 No.106443044

>>106443025
Setting a reminder to not be unintelligent must be a regular occurrence for you.

Anonymous
08/31/25(Sun)10:13:19 No.106443045

Anonymous 08/31/25(Sun)10:13:19 No.106443045

>>106442894
Buy it if either the software support is already good enough or if you're going to write the software yourself.

Hi all, Drummer here...
08/31/25(Sun)10:14:05 No.106443052

Hi all, Drummer here... 08/31/25(Sun)10:14:05 No.106443052

>>106442990
I'm releasing this later: https://huggingface.co/TheDrummer/Behemoth-X-123B-v2

Testers are loving it, and my playthrough with it was comparable to the big boy APIs.

>>106443007
Thought Fallen was my worst type of tune?

Anonymous
08/31/25(Sun)10:14:52 No.106443060

Anonymous 08/31/25(Sun)10:14:52 No.106443060

>>106443052
>Thought Fallen was my worst type of tune?
thatsthejoke.jpg

Anonymous
08/31/25(Sun)10:15:39 No.106443063

Anonymous 08/31/25(Sun)10:15:39 No.106443063

teortxs kinda losing it lately

Anonymous
08/31/25(Sun)10:19:37 No.106443109

Anonymous 08/31/25(Sun)10:19:37 No.106443109

>>106443052
Cuda dev hates your guts.

Anonymous
08/31/25(Sun)10:21:30 No.106443122

Anonymous 08/31/25(Sun)10:21:30 No.106443122

>>106443109
Cuda dev hates anyone who doesn't post loli ntr

Anonymous
08/31/25(Sun)10:22:45 No.106443130

Anonymous 08/31/25(Sun)10:22:45 No.106443130

>>106443122
based cuda dev

Anonymous
08/31/25(Sun)10:24:49 No.106443152

Anonymous 08/31/25(Sun)10:24:49 No.106443152

Phi-6 will save local.

Hi all, Drummer here...
08/31/25(Sun)10:28:47 No.106443195

Hi all, Drummer here... 08/31/25(Sun)10:28:47 No.106443195

>>106443109
He does? That's a shame. Got a lot of respect for a guy of his calibre and contribution.

Anonymous
08/31/25(Sun)10:43:29 No.106443369

Anonymous 08/31/25(Sun)10:43:29 No.106443369

>>106443152
gpt-oss is the new phi and it already saved local

Anonymous
08/31/25(Sun)10:45:13 No.106443391

Anonymous 08/31/25(Sun)10:45:13 No.106443391

>>106443369
Phi at least was one of the first with image in. Vision must be too unsafe for gpt-oss to have.

Anonymous
08/31/25(Sun)10:57:04 No.106443497

Anonymous 08/31/25(Sun)10:57:04 No.106443497

File: 1732477054518293.jpg (64 KB, 736x557)

64 KB JPG

As vramlet, does existe a good gemma-e3b erp tune?

Anonymous
08/31/25(Sun)10:58:57 No.106443514

Anonymous 08/31/25(Sun)10:58:57 No.106443514

File: 1745241948654816.png (966 KB, 900x1097)

966 KB PNG

>>106443497
magnificent taste in pic

Anonymous
08/31/25(Sun)11:00:31 No.106443523

Anonymous 08/31/25(Sun)11:00:31 No.106443523

Is it just me or the cloud models became STUBBORN AS FUCK? I bet it's the similarity of my problem to some sft example that makes it do it one way only even though I clearly ask to not fucking do it, like with the surgeon riddle.

Anonymous
08/31/25(Sun)11:00:39 No.106443524

Anonymous 08/31/25(Sun)11:00:39 No.106443524

File: Screenshot_20250831_170003.png (546 KB, 1065x1698)

546 KB PNG

>>106443122

Anonymous
08/31/25(Sun)11:04:42 No.106443565

Anonymous 08/31/25(Sun)11:04:42 No.106443565

https://www.reddit.com/r/LocalLLaMA/comments/1n4wo0y/the_huawei_gpu_is_not_equivalent_to_an_rtx_6000/

Anonymous
08/31/25(Sun)11:05:51 No.106443576

Anonymous 08/31/25(Sun)11:05:51 No.106443576

>leddit
stop quoting that shithole

Anonymous
08/31/25(Sun)11:06:01 No.106443578

Anonymous 08/31/25(Sun)11:06:01 No.106443578

GPT Soviets

Anonymous
08/31/25(Sun)11:08:46 No.106443601

Anonymous 08/31/25(Sun)11:08:46 No.106443601

>>106443576
It farms replies every time.

Anonymous
08/31/25(Sun)11:09:24 No.106443612

Anonymous 08/31/25(Sun)11:09:24 No.106443612

>>106443514
sex with the hag

Anonymous
08/31/25(Sun)11:14:45 No.106443638

Anonymous 08/31/25(Sun)11:14:45 No.106443638

Hello /lmg/ I just came to say hello for migu's birthday. What are you guys up to these days? Hows GLM Air? Should I get a rtx pro 6000?

Anonymous
08/31/25(Sun)11:15:51 No.106443644

Anonymous 08/31/25(Sun)11:15:51 No.106443644

>>106443122
>loli
Based
>ntr
Cringe

Anonymous
08/31/25(Sun)11:22:43 No.106443700

Anonymous 08/31/25(Sun)11:22:43 No.106443700

>>106443565
That post was written by ChatGPT or some other slop machine.

Anonymous
08/31/25(Sun)11:27:53 No.106443742

Anonymous 08/31/25(Sun)11:27:53 No.106443742

>>106442638
I concur, instant gptsovits is killing me

Anonymous
08/31/25(Sun)11:31:13 No.106443769

Anonymous 08/31/25(Sun)11:31:13 No.106443769

>>106443565
It's been funny watching people cope about how the west has fallen because they can't read a spec sheet

Anonymous
08/31/25(Sun)11:34:05 No.106443799

Anonymous 08/31/25(Sun)11:34:05 No.106443799

>>106443565
Good info but lmao what a faggot. Defends Nvidia like some kind of shill in the comments, acts as if t/s is the issue for consumers looking to get into local AI and considering Nvida vs other options, rather than being able to run large models at all because the equivalently priced Nvidia GPU is a fraction of the VRAM.

Anonymous
08/31/25(Sun)11:35:03 No.106443811

Anonymous 08/31/25(Sun)11:35:03 No.106443811

File: 1750097972636272.gif (333 KB, 414x414)

333 KB GIF

>>106443638
miqubox > RTX pro 6000. RTX Pro 6000 is just a fat 5090

Anonymous
08/31/25(Sun)11:37:14 No.106443828

Anonymous 08/31/25(Sun)11:37:14 No.106443828

>>106443811
>miqubox
What gpus do you guys have? I'm reading >>106436631 now and I figured rtx pro was intermediate tier? Are there really people running full K2 and GLM locally? I'd really love to use K2 if the local version isn't as censored as the api.

Anonymous
08/31/25(Sun)11:42:04 No.106443868

Anonymous 08/31/25(Sun)11:42:04 No.106443868

>>106443524
model, whats the model
> i need the model senpai

Anonymous
08/31/25(Sun)11:43:15 No.106443879

Anonymous 08/31/25(Sun)11:43:15 No.106443879

>>106443868
Deepseek q5_k_m, with the instruction
>Avoid flowery language, be extremely graphic and descriptive instead.

Anonymous
08/31/25(Sun)11:46:31 No.106443905

Anonymous 08/31/25(Sun)11:46:31 No.106443905

File: 1570124722.png (99 KB, 1000x970)

99 KB PNG

>>106443879
>arigato gozaimasu

Anonymous
08/31/25(Sun)11:58:53 No.106443996

Anonymous 08/31/25(Sun)11:58:53 No.106443996

>>106443905
>level of submission

Anonymous
08/31/25(Sun)12:04:06 No.106444036

Anonymous 08/31/25(Sun)12:04:06 No.106444036

>>106443996
>respect=submission
american?

Anonymous
08/31/25(Sun)12:05:07 No.106444045

Anonymous 08/31/25(Sun)12:05:07 No.106444045

>>106443905
How does one earn the 90 degree bow?

Anonymous
08/31/25(Sun)12:08:39 No.106444068

Anonymous 08/31/25(Sun)12:08:39 No.106444068

>>106443905
If you do 120 is it even more respect? What about spreading your asscheeks with your hands?

Anonymous
08/31/25(Sun)12:10:17 No.106444083

Anonymous 08/31/25(Sun)12:10:17 No.106444083

>>106443828
There are people running full K2 and GLM locally and the only way to do that is a miqubox. For that a GPU is only good for a prompt processing and shared experts. RTX pro 6000 is wasted for that.

Anonymous
08/31/25(Sun)12:12:58 No.106444116

Anonymous 08/31/25(Sun)12:12:58 No.106444116

>>106444083
>miqubox
What is this? Where can I find info about this?

Anonymous
08/31/25(Sun)12:13:53 No.106444126

Anonymous 08/31/25(Sun)12:13:53 No.106444126

>>106444068
That'd be the American special, reserved only for their jewish masters.

Anonymous
08/31/25(Sun)12:14:02 No.106444127

Anonymous 08/31/25(Sun)12:14:02 No.106444127

>>106444116
It is mikutroons slapping their AGP meme over everything.

Anonymous
08/31/25(Sun)12:14:43 No.106444136

Anonymous 08/31/25(Sun)12:14:43 No.106444136

>>106444127
I know but what gpu should I get?

Anonymous
08/31/25(Sun)12:15:25 No.106444145

Anonymous 08/31/25(Sun)12:15:25 No.106444145

File: 1745110389499979.png (2.3 MB, 1280x1280)

2.3 MB PNG

>>106444068
this is the ultimate pose

Anonymous
08/31/25(Sun)12:15:52 No.106444148

Anonymous 08/31/25(Sun)12:15:52 No.106444148

>>106444068
>What about spreading your asscheeks with your hands?
free use level of respect

Anonymous
08/31/25(Sun)12:15:52 No.106444149

Anonymous 08/31/25(Sun)12:15:52 No.106444149

>>106444136
https://rentry.org/miqumaxx

Anonymous
08/31/25(Sun)12:17:27 No.106444166

Anonymous 08/31/25(Sun)12:17:27 No.106444166

>>106444136
3090/4090/5090 and some ddr4 server or am5 + 256GB DDR5(if you don't mind 3T/s).

Anonymous
08/31/25(Sun)12:19:57 No.106444192

Anonymous 08/31/25(Sun)12:19:57 No.106444192

>>106444149
Hmm ok thank you. So cpumaxxing then.
>>106444166
So there's no real benefit in getting more vram? I have a 4090 already but I figured getting up to 96 would make things easier no? Why not stack those cheap chinese gpus instead of spamming ram?

Anonymous
08/31/25(Sun)12:23:17 No.106444222

Anonymous 08/31/25(Sun)12:23:17 No.106444222

File: 98643b45ad4a71f16ab7709cc(...).jpg (337 KB, 1450x2048)

337 KB JPG

>>106444145
maximum respect

Anonymous
08/31/25(Sun)12:24:58 No.106444237

Anonymous 08/31/25(Sun)12:24:58 No.106444237

>>106444222
>still wearing clothes

Anonymous
08/31/25(Sun)12:27:02 No.106444262

Anonymous 08/31/25(Sun)12:27:02 No.106444262

>>106444243
what's the hose for?

Anonymous
08/31/25(Sun)12:28:48 No.106444282

Anonymous 08/31/25(Sun)12:28:48 No.106444282

>>106444262
Her belly is growing as can be seen on the image. Probably from piss

Anonymous
08/31/25(Sun)12:31:24 No.106444311

Anonymous 08/31/25(Sun)12:31:24 No.106444311

>>106444282
naruhosedo

Anonymous
08/31/25(Sun)12:33:58 No.106444345

Anonymous 08/31/25(Sun)12:33:58 No.106444345

>>106444243
gross

Anonymous
08/31/25(Sun)12:35:06 No.106444361

Anonymous 08/31/25(Sun)12:35:06 No.106444361

>>106443799
That info is wrong hallucinated slop, look at the official specs >>106434297 >>106434398

Anonymous
08/31/25(Sun)12:36:02 No.106444369

Anonymous 08/31/25(Sun)12:36:02 No.106444369

>>106443905
>178 degrees: I'm fucking with you and showing off how flexible I am

Anonymous
08/31/25(Sun)12:43:47 No.106444458

Anonymous 08/31/25(Sun)12:43:47 No.106444458

>>106444243
damn miku gives a lot of respect here

Anonymous
08/31/25(Sun)12:46:34 No.106444482

Anonymous 08/31/25(Sun)12:46:34 No.106444482

>>106443905
My least favorite part are the hands on thighs. I get it, it looks submissive, but it's so abnormal to me. Looks like you're beckoning a dog over.

Anonymous
08/31/25(Sun)12:50:42 No.106444526

Anonymous 08/31/25(Sun)12:50:42 No.106444526

>>106444482
My thought when I noticed it is that it helps in assuming proper angle.

Anonymous
08/31/25(Sun)12:51:14 No.106444528

Anonymous 08/31/25(Sun)12:51:14 No.106444528

>>106444482
I think it's related to or comes from the pose they take when they sit on the floor.

Anonymous
08/31/25(Sun)12:53:32 No.106444555

Anonymous 08/31/25(Sun)12:53:32 No.106444555

>>106444528
That's a good point. A very "disarmed" position probably makes a big impact for body language.

Anonymous
08/31/25(Sun)12:53:52 No.106444557

Anonymous 08/31/25(Sun)12:53:52 No.106444557

>>106444482
Could be because old people still have to do it. Having the hands there means they can use their arms for assistance.

Anonymous
08/31/25(Sun)12:55:29 No.106444565

Anonymous 08/31/25(Sun)12:55:29 No.106444565

>>106444243
>look up artist
>it's all scat
Not surprising but too bad for me. Happy for the shit enjoyers.

Anonymous
08/31/25(Sun)13:38:39 No.106444898

Anonymous 08/31/25(Sun)13:38:39 No.106444898

>>106444887
>>106444887
>>106444887

Anonymous
08/31/25(Sun)13:38:46 No.106444901

Anonymous 08/31/25(Sun)13:38:46 No.106444901

File: GzfugwzbcAATajB.jpg (148 KB, 786x1024)

148 KB JPG

>>106443811

Anonymous
08/31/25(Sun)13:44:11 No.106444949

Anonymous 08/31/25(Sun)13:44:11 No.106444949

File: debu debu 2.png (153 KB, 309x309)

153 KB PNG

>>106437802
>>106379859
>>106374299
>>106374947
been here and there, can't be everywhere
would that we could be fulltime migugen, but that's simply not viable
I'm surprised antis have been silent, slacking

Anonymous
08/31/25(Sun)13:45:42 No.106444959

Anonymous 08/31/25(Sun)13:45:42 No.106444959

>>106444901
>brown hands
kek

Anonymous
08/31/25(Sun)13:47:11 No.106444978

Anonymous 08/31/25(Sun)13:47:11 No.106444978

>>106444959
https://xcancel.com/motio_Dx0406/status/1961298415293534463

Anonymous
08/31/25(Sun)14:10:33 No.106445161

Anonymous 08/31/25(Sun)14:10:33 No.106445161

>>106444243
lol that mic

Anonymous
08/31/25(Sun)14:11:34 No.106445166

Anonymous 08/31/25(Sun)14:11:34 No.106445166

>>106444949
acceptance phase

Anonymous
08/31/25(Sun)14:12:40 No.106445177

Anonymous 08/31/25(Sun)14:12:40 No.106445177

>>106436577
>entire codebase
>written by grok2

Anonymous
08/31/25(Sun)15:10:17 No.106445649

Anonymous 08/31/25(Sun)15:10:17 No.106445649

Is there a proper way of closing Kobold? Do I need to fear closing or killing it if it's not processing anything at present?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.