/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/02/25(Thu)13:15:56 No.106769660

File: gumi eating anon's miku c(...).png (1002 KB, 880x1000)

1002 KB PNG

/lmg/ - Local Models General Anonymous 10/02/25(Thu)13:15:56 No.106769660 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106762831 & >>106755904

►News
>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c
>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6
>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552
>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/02/25(Thu)13:16:19 No.106769663

Anonymous 10/02/25(Thu)13:16:19 No.106769663

File: gumi phone despair ComfyU(...).png (2.37 MB, 1200x1200)

2.37 MB PNG

►Recent Highlights from the Previous Thread: >>106762831

--Paper: The Pitfalls of KV Cache Compression:
>106765718 >106765984 >106766108 >106766148 >106766761 >106766197
--Papers:
>106764212
--Evaluating GLM 4.6's roleplay performance and quantization efficiency:
>106763532 >106763653 >106763717 >106763827 >106763907 >106763914 >106764034 >106764029 >106764173 >106763671
--IBM Granite 4.0 enterprise model launch and documentation inconsistencies:
>106767652 >106767670 >106767732
--Director roleplay customization addon for managing character settings and environment:
>106763408 >106764995 >106765052 >106765045 >106765076 >106765156 >106765172 >106765183 >106765217 >106765326 >106765094 >106765123 >106765190 >106765225 >106765253 >106765303 >106765342 >106765390
--Liquid AI's LFM2-Audio 1.5B multimodal model capabilities and performance:
>106765758 >106765764 >106765934 >106766498 >106766751 >106766973
--Feasibility of building a local knowledge base with limited VRAM and RAM considerations:
>106764158 >106764219 >106764240 >106764254 >106764261
--Discussion on AI model quantization methods, jinja string editing, and new quantization types:
>106767116 >106767235 >106767244 >106767293 >106768177 >106768200 >106767251 >106767337
--ik-llama GPU utilization problems and offloading configuration:
>106763167 >106763227 >106763244
--Qwen3-30B-A3B model selection and GGUF quantization considerations for RTX 3090 VRAM limits:
>106764290 >106764312 >106764328 >106764340 >106764356 >106764360 >106764366 >106764402 >106764430 >106764489 >106764500 >106764622 >106764838
--Setting up a roleplay bot on 8GB VRAM hardware:
>106767312 >106767327 >106768048 >106768086
--Unsloth AI introduces Docker image for streamlined LLM training:
>106766089
--GLM 4.6 performance surpasses Deepseek R1 on gaming rig:
>106766318
--Miku (free space):
>106763663 >106768757

►Recent Highlight Posts from the Previous Thread: >>106762833

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/02/25(Thu)13:18:16 No.106769691

Anonymous 10/02/25(Thu)13:18:16 No.106769691

File: NetaYumev3_2025-10-02-191(...).png (2.13 MB, 1024x1536)

2.13 MB PNG

>anon? you're not making your quants?
>GAHAHAHAHHAHAHA LOSER!
how do you respond?

Anonymous
10/02/25(Thu)13:18:42 No.106769696

Anonymous 10/02/25(Thu)13:18:42 No.106769696

Gumilove

Anonymous
10/02/25(Thu)13:20:13 No.106769711

Anonymous 10/02/25(Thu)13:20:13 No.106769711

alright eel smarter

Anonymous
10/02/25(Thu)13:21:36 No.106769725

Anonymous 10/02/25(Thu)13:21:36 No.106769725

>>106768369
Why is Kuro such a bitch?
https://files.catbox.moe/5b8n7l.txt

Anonymous
10/02/25(Thu)13:25:39 No.106769766

Anonymous 10/02/25(Thu)13:25:39 No.106769766

>>106769691
with a bullet

Anonymous
10/02/25(Thu)13:33:35 No.106769828

Anonymous 10/02/25(Thu)13:33:35 No.106769828

File: 1738912286388860.png (36 KB, 469x171)

36 KB PNG

>>106758314
GLM doesn't output a newline after <think> and before </think> so you need to remove those from the reasoning formatting to get it to parse correctly.

Anonymous
10/02/25(Thu)13:33:45 No.106769831

Anonymous 10/02/25(Thu)13:33:45 No.106769831

I want to spend $8000 to draw 2d anime pictures. Do you think it will be enough? 2d anime pictures are very important to me.

Anonymous
10/02/25(Thu)13:35:34 No.106769845

Anonymous 10/02/25(Thu)13:35:34 No.106769845

>>106769831
You can get a decent drawing tablet for half that.

Anonymous
10/02/25(Thu)13:35:49 No.106769847

Anonymous 10/02/25(Thu)13:35:49 No.106769847

>>106769831
Image gen needs a lot less memory than text, so yeah, it'll be a banging 2d anime pictures generation machine.

Anonymous
10/02/25(Thu)13:36:11 No.106769852

Anonymous 10/02/25(Thu)13:36:11 No.106769852

>>106769831
you can get a tablet for a lot cheaper, and you can get a gpu to do it for about the same $500

Anonymous
10/02/25(Thu)13:36:35 No.106769859

Anonymous 10/02/25(Thu)13:36:35 No.106769859

>>106769831
That should be enough even for moving anime pictures.

Anonymous
10/02/25(Thu)13:37:16 No.106769866

Anonymous 10/02/25(Thu)13:37:16 No.106769866

>>106769845
>>106769847
>>106769852
so how much do I really need?
maybe... 50 TOPS?

Anonymous
10/02/25(Thu)13:41:43 No.106769902

Anonymous 10/02/25(Thu)13:41:43 No.106769902

>>106769866
so funny i forgot to laugh

Anonymous
10/02/25(Thu)13:42:17 No.106769909

Anonymous 10/02/25(Thu)13:42:17 No.106769909

>>106769691
I lie down under her.

Anonymous
10/02/25(Thu)13:43:51 No.106769920

Anonymous 10/02/25(Thu)13:43:51 No.106769920

>>106769902
gotcha, gear fag

Anonymous
10/02/25(Thu)13:46:22 No.106769947

Anonymous 10/02/25(Thu)13:46:22 No.106769947

>>106769866
This is all I got.
https://www.pugetsystems.com/labs/hpc/whats-the-deal-with-npus/

Don't know how much software is even out there that can target an NPU.

Do tell us when you figure things out.

Anonymous
10/02/25(Thu)13:59:25 No.106770080

Anonymous 10/02/25(Thu)13:59:25 No.106770080

Seriously? You just tell it "You're doing ERP" and it just turns off the safety features?
https://files.catbox.moe/ozn9ws.txt

Anonymous
10/02/25(Thu)14:01:43 No.106770102

Anonymous 10/02/25(Thu)14:01:43 No.106770102

>>106769947
Sounds like my answer is it will make a decent chatbot/search tool, maybe do img -> txt, but that's about it.
Thanks for the link.

Anonymous
10/02/25(Thu)14:02:58 No.106770110

Anonymous 10/02/25(Thu)14:02:58 No.106770110

File: 1750174960929728.jpg (75 KB, 383x908)

75 KB JPG

>tried to share my addon on leddit
>muh mod approval
>its 24hr later and no response

whelp i did what i could to advertise. i'm not going to bother following up and waiting. i made an addon that does what i want, for me, and i've shared it here a few times.

you can install my st addon by entering the address into st's extensions. https://github.com/tomatoesahoy/director

i'm not done working on my addon but updates are spontaneous at best. i'm more disappointed that when i feel my addon is good for release to everyone rather than just posting here occasionally, i'm met with walls of restriction. i followed their rules, made an account, waited 30 days. still cant post. so whatever, enjoy, you 4chan fucks

Anonymous
10/02/25(Thu)14:04:14 No.106770123

Anonymous 10/02/25(Thu)14:04:14 No.106770123

>>106769725
>https://files.catbox.moe/ozn9ws.txt

I find this easier to read than the usual wall of purple prose, but the model was a bit loose with the formatting.

Anonymous
10/02/25(Thu)14:05:26 No.106770133

Anonymous 10/02/25(Thu)14:05:26 No.106770133

Does GLM 4.6 have a habit of slipping into feminist lecture mindset, like GLM Air?

inb4 prompt issue

Anonymous
10/02/25(Thu)14:06:23 No.106770139

Anonymous 10/02/25(Thu)14:06:23 No.106770139

>>106769831
NAI is $25 a month

Anonymous
10/02/25(Thu)14:14:50 No.106770207

Anonymous 10/02/25(Thu)14:14:50 No.106770207

>>106770110
Don't dwell on it. After the guy who made localllama had his meltdown, admins put one of their puppet power mods in charge. It's a feed of sponsored and approved content, not public discourse.

Anonymous
10/02/25(Thu)14:15:32 No.106770215

Anonymous 10/02/25(Thu)14:15:32 No.106770215

File: gumi breath drawing heart(...).jpg (169 KB, 800x1732)

169 KB JPG

>>106770110
Stay here.

Anonymous
10/02/25(Thu)14:16:00 No.106770223

Anonymous 10/02/25(Thu)14:16:00 No.106770223

>>106770080
>https://files.catbox.moe/ozn9ws.txt
>The "Uohhh!" is a sound of surprise and delight
still doesn't know the meaning :/

>>106769831
>2d anime pictures are very important to me.
don't you have whole *boorus, kemono, etc. for that? why do you need more?

Anonymous
10/02/25(Thu)14:17:44 No.106770247

Anonymous 10/02/25(Thu)14:17:44 No.106770247

>>106770139
buy an ad kurumuz

Anonymous
10/02/25(Thu)14:18:43 No.106770252

Anonymous 10/02/25(Thu)14:18:43 No.106770252

My GPU crashed so hard that it stopped being recognized by nvidia-smi. Is it over?

Anonymous
10/02/25(Thu)14:19:33 No.106770262

Anonymous 10/02/25(Thu)14:19:33 No.106770262

>>106770207
>>106770215
it actually means a lot to see other people be understanding. i posted in good-faith. i really just wanted to share something i think helps solve the clothing/location issue models have. but then i realize i'm totally shut out from posting. at least, other than here

Anonymous
10/02/25(Thu)14:21:44 No.106770289

Anonymous 10/02/25(Thu)14:21:44 No.106770289

>>106770262
Here is all you need.

Anonymous
10/02/25(Thu)14:23:12 No.106770300

Anonymous 10/02/25(Thu)14:23:12 No.106770300

>Using ikllama 4.6 = 2.6 T/s at 10k ctx
>switch to regular llamacpp = 3.2T/s
?? Why was I using the memefork again?

Anonymous
10/02/25(Thu)14:25:45 No.106770324

Anonymous 10/02/25(Thu)14:25:45 No.106770324

File: 9dfg87.png (7 KB, 407x218)

7 KB PNG

>>106770207
>meltdown
qrd?
>>106770252
until you reboot perhaps
>>106770300
few hours of trial and error to find the right memeflags then it'll be fast
also post cmdline

Anonymous
10/02/25(Thu)14:28:42 No.106770356

Anonymous 10/02/25(Thu)14:28:42 No.106770356

>>106770324
--override-tensor exps=CPU -ngl 99
That is what I use for both. I have bad experiences with fmoe and I don't think any other flags apply to glm.

Anonymous
10/02/25(Thu)14:29:28 No.106770366

Anonymous 10/02/25(Thu)14:29:28 No.106770366

File: [sound=https%3A%2F%2Ffile(...).webm (2.07 MB, 704x1280)

2.07 MB WEBM

>>106769660
https://files.catbox.moe/bpv4jk.mp4

Anonymous
10/02/25(Thu)14:30:51 No.106770383

Anonymous 10/02/25(Thu)14:30:51 No.106770383

has glm 4.6 officially saved local?
I haven't had this much fun with a model on my rig in quite a while and even api deepseek never felt this good

Anonymous
10/02/25(Thu)14:32:15 No.106770398

Anonymous 10/02/25(Thu)14:32:15 No.106770398

File: file.png (1.19 MB, 1216x832)

1.19 MB PNG

>>106770252
Did it give:
>"Unable to determine the device handle GPU0000:n is lost" ?
If so then try a hard power cycle and hope it works. Can be a driver crash, or actual hardware issues.
I had that an issue like that about 5 months ago and it turned out to be a flaky connection that got better after I cleaned the dusty PCIe slot.

Anonymous
10/02/25(Thu)14:32:32 No.106770404

Anonymous 10/02/25(Thu)14:32:32 No.106770404

>>106770366
im so glad miku is 2D

Anonymous
10/02/25(Thu)14:34:38 No.106770425

Anonymous 10/02/25(Thu)14:34:38 No.106770425

>>106770110
Okay but where is the version of this where it describes body parts and their current status? I want the model to know the exact height and skin color and ear shape of my goblin maid. Seriously though, I wonder if something like that would help with things like the nala test or similar scenarios. Maybe explicitly describing the character as quadrupedal with paws in a scene description would improve the output even though it should be self-evident by describing the character as a lion.

Anonymous
10/02/25(Thu)14:34:50 No.106770428

Anonymous 10/02/25(Thu)14:34:50 No.106770428

File: korn worry.jpg (10 KB, 193x245)

10 KB JPG

>>106770366
Miku-san, where.. from where does your cheese come from?

Anonymous
10/02/25(Thu)14:35:35 No.106770435

Anonymous 10/02/25(Thu)14:35:35 No.106770435

>>106769660
Daily reminder
>mikusex
>Nvidia Spark is a tiny DGX computer not meant for LLM/imagen inference
>petra is the goat

Anonymous
10/02/25(Thu)14:36:56 No.106770451

Anonymous 10/02/25(Thu)14:36:56 No.106770451

File: 1732511391275741.jpg (227 KB, 1536x2048)

227 KB JPG

/ourguy/ Kalomaze (Min P) is doing an AMA on Reddit
https://www.reddit.com/r/LocalLLaMA/comments/1nwaoyd/ama_with_prime_intellect_ask_us_anything/

Anonymous
10/02/25(Thu)14:38:00 No.106770463

Anonymous 10/02/25(Thu)14:38:00 No.106770463

>>106770451
seems like he's reddits guy now

Anonymous
10/02/25(Thu)14:39:31 No.106770477

Anonymous 10/02/25(Thu)14:39:31 No.106770477

>>106770366
Chinkiest miku ever

Anonymous
10/02/25(Thu)14:39:47 No.106770480

Anonymous 10/02/25(Thu)14:39:47 No.106770480

>>106770356
got some spare GPU mem? still testing myself (also gonna be v hw dependent) but maybe first three+ layers back on GPU, flash-attn, run-time-repack, K/V quanting, batch sizes ..

Anonymous
10/02/25(Thu)14:39:57 No.106770482

Anonymous 10/02/25(Thu)14:39:57 No.106770482

>>106770207
you mean the mod who instantly created a discord and twitter account for the subreddit somehow isn't a saint, color me shocked to find this out

Anonymous
10/02/25(Thu)14:40:59 No.106770495

Anonymous 10/02/25(Thu)14:40:59 No.106770495

>>106770463
Always was.

Anonymous
10/02/25(Thu)14:41:25 No.106770498

Anonymous 10/02/25(Thu)14:41:25 No.106770498

>>106770451
wow
>I’m Kalomaze (u/kindacognizant), a researcher at Prime Intellect, the lab behind:
>Distributed training efforts including INTELLECT-1 + INTELLECT-2
what switch to go from here to being responsible for some of the worst models ever

Anonymous
10/02/25(Thu)14:41:49 No.106770502

Anonymous 10/02/25(Thu)14:41:49 No.106770502

>>106770480
the point is memefork isn't better

Anonymous
10/02/25(Thu)14:42:29 No.106770506

Anonymous 10/02/25(Thu)14:42:29 No.106770506

>>106770498
Acquiring funding is like making a deal with satan.

Anonymous
10/02/25(Thu)14:43:21 No.106770513

Anonymous 10/02/25(Thu)14:43:21 No.106770513

>>106770451
Someone ask him why his company uses total fucking garbage to train models and not only that they're doubling down on it by creating a huge synthetic dataset for MATH CODING and SCIENCE

Anonymous
10/02/25(Thu)14:44:13 No.106770523

Anonymous 10/02/25(Thu)14:44:13 No.106770523

File: that chink is cooking.png (535 KB, 750x750)

535 KB PNG

>>106770498
money

Anonymous
10/02/25(Thu)14:45:37 No.106770540

Anonymous 10/02/25(Thu)14:45:37 No.106770540

I've coomed for the 5th time today, and I can't get it up!

Anonymous
10/02/25(Thu)14:46:37 No.106770546

Anonymous 10/02/25(Thu)14:46:37 No.106770546

>>106770513
No point asking. Assuming he doesn't ignore the question, the answer is obviously adding any nsfw, vulgar, or copyrighted content would get them shut down overnight.

Anonymous
10/02/25(Thu)14:47:59 No.106770554

Anonymous 10/02/25(Thu)14:47:59 No.106770554

>>106770523
>I'd become a giga neet

Anonymous
10/02/25(Thu)14:48:53 No.106770563

Anonymous 10/02/25(Thu)14:48:53 No.106770563

>>106770540
They make pills for that.

Anonymous
10/02/25(Thu)14:49:01 No.106770565

Anonymous 10/02/25(Thu)14:49:01 No.106770565

>>106770477
Yeah, those are not Japanese aesthetics. Definitely chinky or gooky.

Anonymous
10/02/25(Thu)14:50:06 No.106770574

Anonymous 10/02/25(Thu)14:50:06 No.106770574

File: 1748895901549445.jpg (89 KB, 973x796)

89 KB JPG

>>106770289
i know. it was silly to even try because the internet now is nothing but dead ends and degenerates

Anonymous
10/02/25(Thu)14:51:29 No.106770588

Anonymous 10/02/25(Thu)14:51:29 No.106770588

>>106770546
How would vulgar or nsfw content (or just focusing on writing even a tiny bit) get them shut down? Whatever, just tell him to choke on a dick or something then

Anonymous
10/02/25(Thu)14:52:05 No.106770593

Anonymous 10/02/25(Thu)14:52:05 No.106770593

File: monke music.png (112 KB, 497x356)

112 KB PNG

>>106770451
I enjoyed those times when he was trying threadly new sampler shit-throwing to see what happened. It was fun to try getting the experimental hacky stuff working even if most of it did not make the outputs definitively "better" overall. They did do something, that's for certain, so I thought it was cool to be part of the exploration of new land.

Anonymous
10/02/25(Thu)14:52:35 No.106770596

Anonymous 10/02/25(Thu)14:52:35 No.106770596

>>106770366
not my miku

Anonymous
10/02/25(Thu)14:54:11 No.106770607

Anonymous 10/02/25(Thu)14:54:11 No.106770607

>>106770588
>or just focusing on writing even a tiny bit
That doesn't improve the benchmark scores that get investors all hot and bothered.

Anonymous
10/02/25(Thu)14:59:32 No.106770651

Anonymous 10/02/25(Thu)14:59:32 No.106770651

>>106770607
INTELLLECT has some of the worst scores ever recorded though, like worse than llama 7b

Anonymous
10/02/25(Thu)15:00:11 No.106770655

Anonymous 10/02/25(Thu)15:00:11 No.106770655

It's odd to me that ik_llama.cpp doesn't seem to work well with assistant prefills using the chat completion API.
llama.cpp just works.

Anonymous
10/02/25(Thu)15:00:51 No.106770661

Anonymous 10/02/25(Thu)15:00:51 No.106770661

>>106770651
Less math and code isn't going to boost those scores though.

Anonymous
10/02/25(Thu)15:06:45 No.106770710

Anonymous 10/02/25(Thu)15:06:45 No.106770710

File: xJhcJFQlo-_TvAPbCxyqE.jpeg.png (75 KB, 200x200)

75 KB PNG

I compared https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ2_KL
against https://huggingface.co/Downtown-Case/GLM-4.6-128GB-RAM-IK-GGUF/tree/main/24GB%2B128GB_V3
and found ubergarm's to have significantly better token probabilities on my tests. There is one where ubergarm's quant has 70% on correct token, 16% on "trap" token while the other one has 20% on correct and 63% on trap while also being somehow slower at a smaller size

Anonymous
10/02/25(Thu)15:06:46 No.106770711

Anonymous 10/02/25(Thu)15:06:46 No.106770711

>>106770651
Does it beat the original StableLM?

Anonymous
10/02/25(Thu)15:08:28 No.106770724

Anonymous 10/02/25(Thu)15:08:28 No.106770724

air bros... wheres our 4.6???????????????

Anonymous
10/02/25(Thu)15:08:31 No.106770727

Anonymous 10/02/25(Thu)15:08:31 No.106770727

>>106770451
>Hey, we are super aligned
yea...

Anonymous
10/02/25(Thu)15:09:42 No.106770741

Anonymous 10/02/25(Thu)15:09:42 No.106770741

File: aaa.png (193 KB, 395x391)

193 KB PNG

What is best model for editing image? (I want to add dick on dude forehead) Im using Easy Diffusion, while trying it shows weird results, like it does not look like proper image of dude with dick on forehead but something weird

Anonymous
10/02/25(Thu)15:10:02 No.106770745

Anonymous 10/02/25(Thu)15:10:02 No.106770745

>>106770710
ppl and kld. Come back when you have those.

Anonymous
10/02/25(Thu)15:10:13 No.106770747

Anonymous 10/02/25(Thu)15:10:13 No.106770747

File: Screenshot.png (25 KB, 569x202)

25 KB PNG

>>106770727
there is hope, thrust!

Anonymous
10/02/25(Thu)15:10:59 No.106770753

Anonymous 10/02/25(Thu)15:10:59 No.106770753

Okay, something is wrong with GLM 4.6. Maybe it's the quant, I'm running IQ5_K but the model is fucked. It will sometimes randomly add chinese characters to the middle of sentences, put the actual response in its thinking and doesn't consistently follow formatting of previous responses correctly. I have to reroll most of the time and it's slower than R1.
I'm using temp 1 with everything else neutralized as recommended on the hf page with the GLM 4 context/instruct template.

Anonymous
10/02/25(Thu)15:11:49 No.106770763

Anonymous 10/02/25(Thu)15:11:49 No.106770763

>>106770482
>MODERATOR OF
>r/LifeProTips
>r/ChatGPT
>r/PewdiepieSubmissions
>r/OpenAI
>r/GPT3
>… and 44 more
they are in good hands

Anonymous
10/02/25(Thu)15:12:47 No.106770771

Anonymous 10/02/25(Thu)15:12:47 No.106770771

>>106770763
The best hands.

Anonymous
10/02/25(Thu)15:12:53 No.106770772

Anonymous 10/02/25(Thu)15:12:53 No.106770772

>>106770753
Try using the chat completion api.
If that works, something is fucked with the context/instruct template somewhere somehow.

Anonymous
10/02/25(Thu)15:13:04 No.106770774

Anonymous 10/02/25(Thu)15:13:04 No.106770774

>>106770451
Neat, you should head over there
then don't come back

Anonymous
10/02/25(Thu)15:14:08 No.106770780

Anonymous 10/02/25(Thu)15:14:08 No.106770780

>>106770774
Do not be mean unto others.

Anonymous
10/02/25(Thu)15:17:09 No.106770802

Anonymous 10/02/25(Thu)15:17:09 No.106770802

>>106770780
redditors are vermin, not people

Anonymous
10/02/25(Thu)15:20:49 No.106770827

Anonymous 10/02/25(Thu)15:20:49 No.106770827

>>106770753
>everything else neutralized
Use top_k=40, top_p=0.95. If you neutralize everything, you'll always get shitty tokens now and then.

Anonymous
10/02/25(Thu)15:22:19 No.106770840

Anonymous 10/02/25(Thu)15:22:19 No.106770840

>>106770827
Nope! If you have to resort to that you're using a shit model and should switch to something actually good.

Anonymous
10/02/25(Thu)15:27:30 No.106770872

Anonymous 10/02/25(Thu)15:27:30 No.106770872

File: 1409885621682.gif (249 KB, 404x500)

249 KB GIF

>this is deeply misogynistic and reinforces problematic gender stereotypes
air-chan, yamero

Anonymous
10/02/25(Thu)15:30:01 No.106770886

Anonymous 10/02/25(Thu)15:30:01 No.106770886

>>106770872
Thank you kalomaze, very safe.

Anonymous
10/02/25(Thu)15:32:06 No.106770905

Anonymous 10/02/25(Thu)15:32:06 No.106770905

>>106770886
it's not like it refuses, it just insists on scolding me about sexism and toxic masculinity all the time

Anonymous
10/02/25(Thu)15:32:56 No.106770910

Anonymous 10/02/25(Thu)15:32:56 No.106770910

>>106770905
So a bit of a Mixtral vibe, nostalgic.

Anonymous
10/02/25(Thu)15:35:06 No.106770927

Anonymous 10/02/25(Thu)15:35:06 No.106770927

>>106770872
have you tried disabling thinking?

Anonymous
10/02/25(Thu)15:36:05 No.106770932

Anonymous 10/02/25(Thu)15:36:05 No.106770932

>>106770905
have you told it to like sexism instead?

Anonymous
10/02/25(Thu)15:47:45 No.106771035

Anonymous 10/02/25(Thu)15:47:45 No.106771035

>>106770840
lmao glmtards BTFO

Anonymous
10/02/25(Thu)15:48:36 No.106771042

Anonymous 10/02/25(Thu)15:48:36 No.106771042

>>106770747
>coworker calls him a coomer
workplace bullying is not ok

Anonymous
10/02/25(Thu)15:53:19 No.106771098

Anonymous 10/02/25(Thu)15:53:19 No.106771098

>>106771035
name a better local model

Anonymous
10/02/25(Thu)15:54:04 No.106771105

Anonymous 10/02/25(Thu)15:54:04 No.106771105

>>106771098
nemo

Anonymous
10/02/25(Thu)15:57:23 No.106771134

Anonymous 10/02/25(Thu)15:57:23 No.106771134

>>106771098
they're all in a sad state right now but i'd chose something that doesn't run at 2 tokens per second unless you are a retard who spends money on this

Anonymous
10/02/25(Thu)15:58:44 No.106771153

Anonymous 10/02/25(Thu)15:58:44 No.106771153

>>106771105
Somebody should turn Nemo into a MoE.

Anonymous
10/02/25(Thu)16:00:33 No.106771168

Anonymous 10/02/25(Thu)16:00:33 No.106771168

>>106770366
It's not Miku Cheese unless it's made from Miku Milk.

Anonymous
10/02/25(Thu)16:01:38 No.106771182

Anonymous 10/02/25(Thu)16:01:38 No.106771182

>>106771105
>>106771134
Have you tried Rocinante — I love it!

Anonymous
10/02/25(Thu)16:04:41 No.106771205

Anonymous 10/02/25(Thu)16:04:41 No.106771205

>>106771153
NeMoE

Anonymous
10/02/25(Thu)16:05:52 No.106771217

Anonymous 10/02/25(Thu)16:05:52 No.106771217

>>106771205
'mo-moe

Anonymous
10/02/25(Thu)16:06:21 No.106771223

Anonymous 10/02/25(Thu)16:06:21 No.106771223

File: shota fuck t-shirt mouth (...).png (102 KB, 333x519)

102 KB PNG

>>106771205

Anonymous
10/02/25(Thu)16:08:48 No.106771247

Anonymous 10/02/25(Thu)16:08:48 No.106771247

Well, >>106771205 has the name down.
Now we just need a finetooner to create it.
Anybody got DavidAU's phone number handy?

Anonymous
10/02/25(Thu)16:10:26 No.106771262

Anonymous 10/02/25(Thu)16:10:26 No.106771262

At last, local is saved. I called qwen3 and glm4.5 shit while everyone praised them. The new glm however is a whole different story. It writes with the soul I've never seen before in a local model, not even deepseek.

Anonymous
10/02/25(Thu)16:11:52 No.106771273

Anonymous 10/02/25(Thu)16:11:52 No.106771273

File: 1742695099511857.jpg (95 KB, 1190x906)

95 KB JPG

>>106771262

Anonymous
10/02/25(Thu)16:12:07 No.106771276

Anonymous 10/02/25(Thu)16:12:07 No.106771276

>>106771262
>soul
zoomer opinions are less than worthless

Anonymous
10/02/25(Thu)16:17:10 No.106771319

Anonymous 10/02/25(Thu)16:17:10 No.106771319

>>106771105
what's it like to fuck a retard?

Anonymous
10/02/25(Thu)16:19:18 No.106771333

Anonymous 10/02/25(Thu)16:19:18 No.106771333

>>106770753
ubergarm/IK quants on non ik_ llamacpp perchance? Haven't seen such issues with barts even at Q3_K_M, sometimes thinking gets messed up but that's my janky Silly config. remember newline before <think>
>>106771276
>not embracing the heartsovl of your model
get out

Anonymous
10/02/25(Thu)16:28:27 No.106771411

Anonymous 10/02/25(Thu)16:28:27 No.106771411

>>106771247
I'm 100% sure there have already been Nemo based clowncar MoE made.

Anonymous
10/02/25(Thu)16:32:31 No.106771431

Anonymous 10/02/25(Thu)16:32:31 No.106771431

>>106771134
I can get my glm running at 4-5 t/s. Honestly not bad compared to what I originally expected.

Anonymous
10/02/25(Thu)16:41:59 No.106771510

Anonymous 10/02/25(Thu)16:41:59 No.106771510

>>106771333
>remember newline before <think>
You mean start reply with that?

Anonymous
10/02/25(Thu)16:51:54 No.106771590

Anonymous 10/02/25(Thu)16:51:54 No.106771590

>>106770451
>Kalomaze is doing an AMA!
>he never answered any question

Anonymous
10/02/25(Thu)16:53:17 No.106771605

Anonymous 10/02/25(Thu)16:53:17 No.106771605

why are we using GLM 4.6 again? this shit feels astroturfed as fuck now that i had a chance to use it. surely i should be getting better speeds than this with 30 layers offloaded. token generation is the same, although it ends up taking much more time since GLM wastes time thinking (i know i can turn it off) it's only like 30tk/s quicker than K2 despite being half the size (249GB vs 485GB) the results below is with a IQ5_K quant (ubergarm)

INFO [print_timings] prompt eval time = 32843.74 ms / 4807 tokens ( 6.83 ms per token, 146.36 tokens per second) | tid="128489640603648" id_slot=0 id_task=3408 t_prompt_processing=32843.743 n_prompt_tokens_processed=4807 t_token=6.832482421468692 n_tokens_second=146.35968866276903
INFO [print_timings] generation eval time = 250057.45 ms / 1591 runs ( 157.17 ms per token, 6.36 tokens per second) | tid="128489640603648" id_slot=0 id_task=3408 t_token_generation=250057.452 n_decoded=1591 t_token=157.16998868636077 n_tokens_second=6.362537837904546
INFO [print_timings] total time = 282901.20 ms | tid="128489640603648" id_slot=0 id_task=3408 t_prompt_processing=32843.743 t_token_generation=250057.452 t_total=282901.195

kimi k2 0905 smol_IQ4_XSS quant

INFO [print_timings] prompt eval time = 39716.81 ms / 4180 tokens ( 9.50 ms per token, 105.25 tokens per second) | tid="133210908811264" id_slot=0 id_task=1323 t_prompt_processing=39716.806 n_prompt_tokens_processed=4180 t_token=9.501628229665071 n_tokens_second=105.24511966042789
INFO [print_timings] generation eval time = 18771.55 ms / 121 runs ( 155.14 ms per token, 6.45 tokens per second) | tid="133210908811264" id_slot=0 id_task=1323 t_token_generation=18771.548 n_decoded=121 t_token=155.1367603305785 n_tokens_second=6.445925503852959
INFO [print_timings] total time = 58488.35 ms | tid="133210908811264" id_slot=0 id_task=1323 t_prompt_processing=39716.806 t_token_generation=18771.548 t_total=58488.35399999999

Anonymous
10/02/25(Thu)16:53:39 No.106771611

Anonymous 10/02/25(Thu)16:53:39 No.106771611

>>106771590
Let them cook, they're busy!
>We are an open source agi Labs and ramping up our research team, our goal is to be competitive asap on capabilities with the big labs, we have compute, talent, and crowd source environment with verifier and the hub. Stay tuned for our next model release !

Anonymous
10/02/25(Thu)16:55:22 No.106771624

Anonymous 10/02/25(Thu)16:55:22 No.106771624

>>106771605
>this shit feels astroturfed as fuck
no way take your meds schizo freak

Anonymous
10/02/25(Thu)16:56:55 No.106771645

Anonymous 10/02/25(Thu)16:56:55 No.106771645

> Is there any local ai that let me do TTS elevenlabs type quality?

I have a 3080 geforce tuf and a gigabyte ga-h61m-s1 mobo, so my pc is quite decent but not overkill. I still want to try messing with a good TTS.

Anonymous
10/02/25(Thu)16:58:38 No.106771662

Anonymous 10/02/25(Thu)16:58:38 No.106771662

>>106771510
Yea, or on a second line in Assistant Prefix. I use the instruct sequences to more easily turn thinking on/off

Anonymous
10/02/25(Thu)16:59:40 No.106771674

Anonymous 10/02/25(Thu)16:59:40 No.106771674

>>106770110
>remember always running it on windows
>reinstalled it on linux months ago
>for some dumb reason it doesn't show up in my silly tavern extension tab like it does on windows despite the fact it shows as installed and enabled when managing extensions
aside from that weird fuckery goning on my end, it was one of my fav add-ons back then desu, mostly used the clothes and world info bits and it helped out a ton steering models, you did a good job anon!

Anonymous
10/02/25(Thu)17:01:22 No.106771704

Anonymous 10/02/25(Thu)17:01:22 No.106771704

>>106771624
it's not good and now i know why nobody is posting GLM 4.6 logs. what's the point of using GLM 4.6 when K2 is only moderately slower in PP but still the same speed in TG? GLM 4.6 is lacking in its knowledge, it fails to understand niche things that kimi is able to pick up without issue. even niche stuff aside, kimi just knows way more and its dataset is more recent. you can ask it to tell you stuff that happened in December 2024 and it will answer factually.

Anonymous
10/02/25(Thu)17:02:22 No.106771712

Anonymous 10/02/25(Thu)17:02:22 No.106771712

>>106771605
>why are we using [350b model] and not [1t model]
But that aside, do you not like the output?

Anonymous
10/02/25(Thu)17:07:25 No.106771758

Anonymous 10/02/25(Thu)17:07:25 No.106771758

>>106771590
No shit, reddit AMAs are just promotional stunts. They only respond to the questions they want to, the ones that let them segue into marketing themselves with canned and preplanned responses.

Anonymous
10/02/25(Thu)17:11:07 No.106771798

Anonymous 10/02/25(Thu)17:11:07 No.106771798

>>106771712
if i turn off thinking then it's reasonable. keeping thinking on is a mess though.
here's an example of the output with thinking. removed some of the response to fit the character limit. the thinking is slopped.

Okay, the user has responded through {user}. Let me break down the key elements of their response to guide my turn as {char}.

{user}'s actions/Dialogue:

Physical reaction: Breath knocked out, hugs back "timidly." This shows he's a bit overwhelmed and maybe shy, which fits his description. He's not rejecting the hug, but he's not as effusive as {char}.
Dialogue:
"Jeez, yeah yeah. It looks like I'm back for a while…" - Confirms his return, sounds a bit flustered/casual.
<blahblah>
{char}'s internal state:

She's ecstatic. Her absolute best friend, the boy she's been in love with for a decade, is back.
<blahblah>
My Plan for {char}'s Response:

Initial Reaction: <blahblah>
Physicality: <blahblah>
Dialogue:
She'll laugh loudly at his "welcome party" comment. Maybe tease him about it.
<blahblah>
Narrative Flow:
Start with her still hugging him, her immediate reaction to his awkwardness.
<blahblah>
End on a high note, maybe she grabs his hand and starts pulling him somewhere, or asks a million questions a mile a minute. The goal is to keep the interaction going and show her personality. I'll make it brief and stop at a point where {user} needs to react.
Drafting the response in my head:

Anonymous
10/02/25(Thu)17:11:42 No.106771801

Anonymous 10/02/25(Thu)17:11:42 No.106771801

>>106771674
thanks for the compliment, it makes me happy to read. its funny that 4chan is the way it is, i've never got much negativity about developing my addon. you guys are always pretty supportive, telling me to keep going and stuff. it might not seem like much but reading the positive comments makes me wanna do more, so thanks

Anonymous
10/02/25(Thu)17:12:55 No.106771820

Anonymous 10/02/25(Thu)17:12:55 No.106771820

>>106771605
>this shit feels astroturfed as fuck
no bro 4.6 is kino af ong it has svol bro fr fr the vibes bro bro

Anonymous
10/02/25(Thu)17:16:00 No.106771854

Anonymous 10/02/25(Thu)17:16:00 No.106771854

So what layers should I offload to RAM for GLM 4.6 in ik_llama.cpp?

Anonymous
10/02/25(Thu)17:17:01 No.106771867

Anonymous 10/02/25(Thu)17:17:01 No.106771867

File: Ivan the Terrible and His(...).jpg (90 KB, 640x483)

90 KB JPG

GLM 4.6 just bought me a house and cured my dog's cancer!

Anonymous
10/02/25(Thu)17:18:24 No.106771881

Anonymous 10/02/25(Thu)17:18:24 No.106771881

>>106771820
this but unironically

Anonymous
10/02/25(Thu)17:18:55 No.106771884

Anonymous 10/02/25(Thu)17:18:55 No.106771884

Another reason I love glm-chan 4.6 is that I am sure she makes drummer jerk off while lubricating his cock with his tears. Glm chan reminds him that the days of his grift are numbered. Next air will be accessible to everyone and will easily beat the shit out of all the shittunes. Shittune placebo will die in 2026. Last chance to get a job you safety engineer retard.

Anonymous
10/02/25(Thu)17:26:04 No.106771958

Anonymous 10/02/25(Thu)17:26:04 No.106771958

>>106771605
>why is this 30b active parameter MoE at Q5 running slower than this other 30b active parameter MoE at Q4_XXS
is this really the level we're having this discussion on? maybe ollama is more up your speed, you won't have to worry about this sort of thing and things.

Anonymous
10/02/25(Thu)17:29:33 No.106771992

Anonymous 10/02/25(Thu)17:29:33 No.106771992

>>106771820
You are either salty because they didn't release air or are one of those 3x p40 anons. I'll post logs tomorrow because it's past my bedtime

Anonymous
10/02/25(Thu)17:29:39 No.106771993

Anonymous 10/02/25(Thu)17:29:39 No.106771993

>>106771801
You are welcome! I remember the first posts you made about working on it and going like "Oh shit gotta write it down when it releases" cuz I was trying to play around that issue of models forgetting scene stuff or steering it towards a play by writing some of the info on lorebooks or even just random notepad entries and then copy and pasting their contents on the author notes tab, but it was kinda janky to do so manually every time especially with so many entries... and your extension for me solved just that in a really neat and easy way, glad to see it's still going

Anonymous
10/02/25(Thu)17:37:56 No.106772079

Anonymous 10/02/25(Thu)17:37:56 No.106772079

>>106771645
VibeVoice 7B. It's the best we got.

Anonymous
10/02/25(Thu)17:38:03 No.106772080

Anonymous 10/02/25(Thu)17:38:03 No.106772080

>>106771958
why wont anybody have a serious discussion about this? is it because you don't have enough RAM to run kimi k2 yourself and have to rely on cope quants of GLM 4.6? post logs ffs otherwise i will just stick with the superior chink company.

Anonymous
10/02/25(Thu)17:39:10 No.106772093

Anonymous 10/02/25(Thu)17:39:10 No.106772093

>>106771704
>i know why nobody is posting GLM 4.6 logs
https://files.catbox.moe/mwwdug.txt
https://files.catbox.moe/xs9vn5.txt
>>106769725
>>106770080

Anonymous
10/02/25(Thu)17:40:43 No.106772111

Anonymous 10/02/25(Thu)17:40:43 No.106772111

File: 1757604949388241.png (1.45 MB, 900x1200)

1.45 MB PNG

WTF even is this LMAO: https://xcancel.com/wolflovesmelon/status/1971002333577482360

Anonymous
10/02/25(Thu)17:42:22 No.106772136

Anonymous 10/02/25(Thu)17:42:22 No.106772136

File: 1729019232618558.jpg (96 KB, 411x980)

96 KB JPG

>>106771993
its a funny issue. ai only cares so much about the most recent thing (lowest context in the chat). there is bunch of addons now that all do similar things but it still ends up with reinjecting data into the ai at a point.

in my head i knew what needed to be done, but wasnt sure how it'd turn out. especially since i was using ai to develop the app (i'm ok with java, but not a programmer in it). i'm pretty happy with how things turned out

lately all i've done is add an image feature. so inside the folder of my addon, if you create an 'images' folder and then have an image that matches a name, it'll pop up a pic just like the card would if you clicked the image. picrel is belle from beauty and the beast, in her peasant outfit

Anonymous
10/02/25(Thu)17:46:39 No.106772191

Anonymous 10/02/25(Thu)17:46:39 No.106772191

>>106772080
>post logs ffs otherwise i will just stick with the superior chink company.
If you actually tried both of these why would you give a shit about anyone else's logs

Anonymous
10/02/25(Thu)18:09:56 No.106772401

Anonymous 10/02/25(Thu)18:09:56 No.106772401

>>106772136
a bit like having different alt gens of a card but more dynamic since you can switch it up based on what the setting is saying they ar exurrently wearing? I kinda dig it desu

Anonymous
10/02/25(Thu)18:13:22 No.106772425

Anonymous 10/02/25(Thu)18:13:22 No.106772425

>llama-kv-cache.cpp:764: GGML_ASSERT(ubatch.seq_id [s*n_tokens][0] == seq_id) failed
when doing --parallel 4 runs
ah well, it wasn't worth upgrading llama cpp to try granite
this shit is so ghetto

Anonymous
10/02/25(Thu)18:15:26 No.106772444

Anonymous 10/02/25(Thu)18:15:26 No.106772444

i just woke up in the most suspicious way possible call me a schizo but im going to attribute it to divine forces awakening me to witness an amazing drop dont hold me to my word though plz

Anonymous
10/02/25(Thu)18:17:29 No.106772464

Anonymous 10/02/25(Thu)18:17:29 No.106772464

>>106772401
kinda. i just wanted images that are associated with outfits or locations.

back when ai was pyamilion 2.7/6b and st was hardly a thing, one of the nice things the kobold ui did was highlight lorebook entries. any time it hit an entry, you could hover over that in the chat and see its entire entry plus a pic of it. i always wanted something similar for st, but since i can't do that, i'll settle on pics that pop up the same way a card pic does

Anonymous
10/02/25(Thu)18:17:58 No.106772468

Anonymous 10/02/25(Thu)18:17:58 No.106772468

>>106772111
Is he dead yet?

Anonymous
10/02/25(Thu)18:18:00 No.106772469

Anonymous 10/02/25(Thu)18:18:00 No.106772469

>>106772444
Do you remember if Miku said anything to you?

Anonymous
10/02/25(Thu)18:20:53 No.106772494

Anonymous 10/02/25(Thu)18:20:53 No.106772494

>>106772425
Now try running GLM on CPU with vllm.

Anonymous
10/02/25(Thu)18:21:34 No.106772504

Anonymous 10/02/25(Thu)18:21:34 No.106772504

>>106772444
glm already dropped and saved our cocks

Anonymous
10/02/25(Thu)18:23:55 No.106772524

Anonymous 10/02/25(Thu)18:23:55 No.106772524

>>106772494
why is everyone waiting for llama.cpp to implement models when you can just run everything with vllm on cpu much faster

Anonymous
10/02/25(Thu)18:27:04 No.106772549

Anonymous 10/02/25(Thu)18:27:04 No.106772549

>>106772524
>vllm on cpu much faster
it's not and because you don't get finegrained quants like exl3 or gguf

Anonymous
10/02/25(Thu)18:27:47 No.106772555

Anonymous 10/02/25(Thu)18:27:47 No.106772555

>>106771704
Kimi felt schizo to me when I used it, like it doesn't know how to describe stuff naturally despite all that knowledge. And Deepseek is just boring and cucked now. New GLM doesn't have either of those issues. Guess it goes to show that having a gazillion parameters doesn't matter when all you care about is benchmaxxing.

Anonymous
10/02/25(Thu)18:28:08 No.106772560

Anonymous 10/02/25(Thu)18:28:08 No.106772560

>>106772549
but there's no point in using those if you're running 8bit anyway

Anonymous
10/02/25(Thu)18:29:32 No.106772573

Anonymous 10/02/25(Thu)18:29:32 No.106772573

>>106772524
I have never seen any of the vllm on CPU shills post t/s numbers.

Anonymous
10/02/25(Thu)18:30:14 No.106772580

Anonymous 10/02/25(Thu)18:30:14 No.106772580

>>106772524
Because the people who did try it ran into bugs and errors. Turns out vllm is only production-ready if you plan on doing GPU-only inference and your GPUs are all the same VRAM. Also you might need to find the exact version of vllm that works with the model you want because not every version new versions can and have introduced bugs with old models.

Anonymous
10/02/25(Thu)18:31:48 No.106772594

Anonymous 10/02/25(Thu)18:31:48 No.106772594

>>106772580
>not every version new versions
Somehow deleted a part of that, was meant to be
>not every version does as new viersions

Anonymous
10/02/25(Thu)18:33:27 No.106772610

Anonymous 10/02/25(Thu)18:33:27 No.106772610

>>106772093
>https://files.catbox.moe/ozn9ws.txt
jesus christ... so its shits all retarded and it talks like a fag for everybody in its thinking process and just isn't me. it's so over.

Anonymous
10/02/25(Thu)18:39:21 No.106772685

Anonymous 10/02/25(Thu)18:39:21 No.106772685

File: k2 params.png (72 KB, 402x699)

72 KB PNG

>>106772555
heres my parms anon. hope it helps you with getting non-schizo responses, kimi seems to behave the best with these.

Anonymous
10/02/25(Thu)18:40:18 No.106772697

Anonymous 10/02/25(Thu)18:40:18 No.106772697

>>106772580
Not to mention the trial and error getting the pythonshit dependencies working. Even with conda it seems like the project is in a constant state of broken.
Or the fact that the issue tracker is full of ignored issues because all support and development happens on discord.
Or the fact that they only support the latest 3 gens of Nvidia cards, so AMDfags, Intelfags, Macfags, and P40fags are all out of luck.

Anonymous
10/02/25(Thu)18:43:13 No.106772729

Anonymous 10/02/25(Thu)18:43:13 No.106772729

>>106772580
Because vLLM only works with GPU configurations in the power of 2 and I have 7 GPUs. Maybe one day I'll get an 8th GPU and connect it over that weird SAS port that does PCI-E shit.

Anonymous
10/02/25(Thu)18:44:52 No.106772739

Anonymous 10/02/25(Thu)18:44:52 No.106772739

>>106772697
Honestly it is really fucked up. CPUbros don't know how good they have it mister gurglenov.

Anonymous
10/02/25(Thu)18:48:24 No.106772769

Anonymous 10/02/25(Thu)18:48:24 No.106772769

>>106772739
Eastern Euro C/C++ programmers are a different breed.

Anonymous
10/02/25(Thu)18:50:00 No.106772776

Anonymous 10/02/25(Thu)18:50:00 No.106772776

>>106772769
can confirm.
source: am rpcs3 dev

Anonymous
10/02/25(Thu)18:56:15 No.106772857

Anonymous 10/02/25(Thu)18:56:15 No.106772857

glm chan is the semen demon. sign the pact now by buying at least 128GB's of ram.

Anonymous
10/02/25(Thu)19:00:20 No.106772899

Anonymous 10/02/25(Thu)19:00:20 No.106772899

>>106772857
I'm not sure I want to deal with 3 t/s (if what the anon said in the other thread was not a lie).

Anonymous
10/02/25(Thu)19:01:03 No.106772906

Anonymous 10/02/25(Thu)19:01:03 No.106772906

https://huggingface.co/Qwen/Qwen3-4B-SafeRL
finally, a model everyone can run on a potato while feeling very safe

Anonymous
10/02/25(Thu)19:01:34 No.106772914

Anonymous 10/02/25(Thu)19:01:34 No.106772914

>>106772899
nah anon, you can get a whopping 6tk/s with GLM 4.6 and you only need 120GB of VRAM to do it. >>106771605

Anonymous
10/02/25(Thu)19:03:07 No.106772929

Anonymous 10/02/25(Thu)19:03:07 No.106772929

>>106770753
I give up, something's wrong and I don't know what. Back to R1 for me.

Anonymous
10/02/25(Thu)19:04:42 No.106772943

Anonymous 10/02/25(Thu)19:04:42 No.106772943

>>106772899
I couldn't run 70B's offloaded at 2T/s. I can run glm at 3T/s because you never have to reroll. Just look how often you reroll and if it is around 8 times then it is a no brainer.

Anonymous
10/02/25(Thu)19:06:42 No.106772963

Anonymous 10/02/25(Thu)19:06:42 No.106772963

File: 5416.png (206 KB, 400x300)

206 KB PNG

I see most people here use the models for roleplaying, are there are any tools for using one as a voice assistant?

I'm thinking of making something like Cortana or Alexa for my grandma cuz her memory is getting weak. What's the best way about doing this? Any tips or tools?

Anonymous
10/02/25(Thu)19:07:45 No.106772970

Anonymous 10/02/25(Thu)19:07:45 No.106772970

File: ohnoitsretarded.png (164 KB, 859x809)

164 KB PNG

>>106772929
GLM 4.6 will follow your previous responses format/style to a T to its detriment. If you put in weird formatting and split up sentences in weird ways you will get a response like the one in my screenshot. Does it do it on a fresh chat?

Anonymous
10/02/25(Thu)19:09:33 No.106772990

Anonymous 10/02/25(Thu)19:09:33 No.106772990

File: Screenshot_20251002-163705.png (545 KB, 1080x1992)

545 KB PNG

What are the odds one of these chucklefucks successfully ban all chink tech in the west?

Anonymous
10/02/25(Thu)19:12:17 No.106773019

Anonymous 10/02/25(Thu)19:12:17 No.106773019

>>106772970
you could try giving the new line a bit of a debuff, they probably let it see too much hard wrapped text in the pretraining.

Anonymous
10/02/25(Thu)19:12:52 No.106773025

Anonymous 10/02/25(Thu)19:12:52 No.106773025

>>106772970
>Does it do it on a fresh chat?
Yes, this is with a card that doesn't put speech in quotations and has asterisks wrapped around all other text and it's not just that. It will output </think> in the middle of its response and start speaking for my character and on some rerolls, it will only output <think> and then end the response.
I'm assuming it's quant related, or there's some bug or something, I'm using ik_llama and I rebuilt it. I can't be bothered to figure it out, maybe I'll give it another try if it's still relevant here in a month or whatever is broken gets fixed.

Anonymous
10/02/25(Thu)19:15:29 No.106773052

Anonymous 10/02/25(Thu)19:15:29 No.106773052

File: file.png (191 KB, 416x416)

191 KB PNG

thank you god emperor xi. my member isn't worthy this boon you bestowed upon us.

Anonymous
10/02/25(Thu)19:19:10 No.106773087

Anonymous 10/02/25(Thu)19:19:10 No.106773087

>>106772914
Oh ok, thanks for pointing that out. Might consider it, but also wish we had more given that it's a reasoning model.

Anonymous
10/02/25(Thu)19:22:02 No.106773113

Anonymous 10/02/25(Thu)19:22:02 No.106773113

>>106761230
Okay so you can replace the built one in portable_env/Lib/site-packages/llama_cpp_binaries/bin with ones self built and do not forget to make the ik_llama build with CUDA. Running GLM 4.6 with ooba's UI.

Anonymous
10/02/25(Thu)19:22:35 No.106773120

Anonymous 10/02/25(Thu)19:22:35 No.106773120

>>106773087
it is garbage with reasoning since reasoning blocks are 3 times longer than 4.5. but you don't need reasoning. everything just works. 16 times less useless detail. it even does things you didn't ask for but you realize that you actually want.

Anonymous
10/02/25(Thu)19:29:13 No.106773183

Anonymous 10/02/25(Thu)19:29:13 No.106773183

mistral bros when is it our time to shine? isn't large supposed to be released soon since they upgraded medium recently?

Anonymous
10/02/25(Thu)19:30:31 No.106773194

Anonymous 10/02/25(Thu)19:30:31 No.106773194

the glm gaslighting continues

Anonymous
10/02/25(Thu)19:30:40 No.106773195

Anonymous 10/02/25(Thu)19:30:40 No.106773195

>>106773183
who cares? large would just be a downsized and fried r1 anyway

Anonymous
10/02/25(Thu)19:31:02 No.106773201

Anonymous 10/02/25(Thu)19:31:02 No.106773201

>>106773183
>With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :)
>May 7, 2025
just a feeeeeeeeeeew more weeks

Anonymous
10/02/25(Thu)19:31:09 No.106773205

Anonymous 10/02/25(Thu)19:31:09 No.106773205

>>106773194
get a job anon and you can maybe stop coping

Anonymous
10/02/25(Thu)19:32:22 No.106773214

Anonymous 10/02/25(Thu)19:32:22 No.106773214

>>106773195
you act like more competition is a bad thing. we dont know how badly it performs until they release it. more competition breeds innovation.

Anonymous
10/02/25(Thu)19:32:37 No.106773216

Anonymous 10/02/25(Thu)19:32:37 No.106773216

glm betrayed its userbase by not doing another air despite it being 99.999999% of what people actually used
yeah i'm sure you ""local"" users are now suddenly running 300b, how about you all go to /aicg/

Anonymous
10/02/25(Thu)19:33:20 No.106773223

Anonymous 10/02/25(Thu)19:33:20 No.106773223

>>106773194
Say that after she gives you a blowjob. I dare you.

Anonymous
10/02/25(Thu)19:34:19 No.106773230

Anonymous 10/02/25(Thu)19:34:19 No.106773230

>>106773216
>t. seething poorfag

Anonymous
10/02/25(Thu)19:35:45 No.106773246

Anonymous 10/02/25(Thu)19:35:45 No.106773246

>>106773216
There'll be another time.

Anonymous
10/02/25(Thu)19:36:18 No.106773252

Anonymous 10/02/25(Thu)19:36:18 No.106773252

>>106773216
128gb is enough for IQ3 you have no excuse to neglect your sex life by not buying it.

Anonymous
10/02/25(Thu)19:36:22 No.106773254

Anonymous 10/02/25(Thu)19:36:22 No.106773254

>>106773216
i like running 1T models locally. $800 for 512GB of RAM in a first world country isn't that expensive. Its like 60% of a weekly paycheck for me.

Anonymous
10/02/25(Thu)19:36:42 No.106773258

Anonymous 10/02/25(Thu)19:36:42 No.106773258

>>106773216
How does one not use 0.0000001% of a model?

Anonymous
10/02/25(Thu)19:37:26 No.106773266

Anonymous 10/02/25(Thu)19:37:26 No.106773266

>>106773205
it's you guys who ought to stop coping
>it is garbage with reasoning since reasoning blocks are 3 times longer than 4.5. but you don't need reasoning.

>using thinking model without thinking
>because you would spend a literal eternity waiting for your shitty cpumaxxing to generate the first line of actually readable shit
>coping that disabled thinking works great
>in a glm model

Anonymous
10/02/25(Thu)19:39:10 No.106773280

Anonymous 10/02/25(Thu)19:39:10 No.106773280

>>106773254
and then you sit there watch it do 2000 tokens of reasoning at 3 tokens per second, yeah
or maybe you're using the api?

Anonymous
10/02/25(Thu)19:39:24 No.106773283

Anonymous 10/02/25(Thu)19:39:24 No.106773283

>>106773266
uh huh, that is why everyone here, reddit, and the novelai, silly tavern, featherless, and ai assisted writing discords are all praising it, huh? That all mostly used claude before?

Anonymous
10/02/25(Thu)19:40:56 No.106773297

Anonymous 10/02/25(Thu)19:40:56 No.106773297

>>106773283
maybe those places are more up your speed then? don't shit up /lmg/ with your trash, nobody wants it here

Anonymous
10/02/25(Thu)19:41:41 No.106773301

Anonymous 10/02/25(Thu)19:41:41 No.106773301

>>106773283
you need to go back

Anonymous
10/02/25(Thu)19:41:48 No.106773304

Anonymous 10/02/25(Thu)19:41:48 No.106773304

>>106773266
>using thinking model without thinking
it's a hybrid isn't it? perfectly within their intent to use it without reasoning

Anonymous
10/02/25(Thu)19:42:13 No.106773308

Anonymous 10/02/25(Thu)19:42:13 No.106773308

>>106773297
go enjoy your nemo then brown

Anonymous
10/02/25(Thu)19:43:51 No.106773318

Anonymous 10/02/25(Thu)19:43:51 No.106773318

My IQ1 GLM finished downloading...

Anonymous
10/02/25(Thu)19:43:54 No.106773320

Anonymous 10/02/25(Thu)19:43:54 No.106773320

>>106773280
I get 6-7 tk/s for generation and 110tk/s for processing. I can go through 4K of tokens every 35 seconds, it's really not as slow as you are imagining.

Anonymous
10/02/25(Thu)19:43:59 No.106773321

Anonymous 10/02/25(Thu)19:43:59 No.106773321

>>106773266
All this seething when you could be enjoying actually good ERP.

Anonymous
10/02/25(Thu)19:44:12 No.106773324

Anonymous 10/02/25(Thu)19:44:12 No.106773324

File: song lyrics rp setting gl(...).png (369 KB, 1141x1311)

369 KB PNG

Can your local model sing a duet with you?

Anonymous
10/02/25(Thu)19:45:08 No.106773331

Anonymous 10/02/25(Thu)19:45:08 No.106773331

>>106773304
>within their intent
fucking lmao
nobody sane gives a shit what the "intent" is, only what the actual performance amounts to in real use
there's no such a thing as an actual hybrid model, they all underperform terribly with reasoning turned off

Anonymous
10/02/25(Thu)19:48:01 No.106773359

Anonymous 10/02/25(Thu)19:48:01 No.106773359

>>106771319
Things your mom says after your dad leaves your room at night

Anonymous
10/02/25(Thu)19:48:24 No.106773366

Anonymous 10/02/25(Thu)19:48:24 No.106773366

>>106773320
i get like 220tk/s for prompt processing on a 128GB DDR4 + 24GB GDDR6X system

Generation is still ~5tk/s but it's incredible how fast this is compared to Deepseek R1

Anonymous
10/02/25(Thu)19:49:18 No.106773372

Anonymous 10/02/25(Thu)19:49:18 No.106773372

>>106773331
not really

Anonymous
10/02/25(Thu)19:49:38 No.106773375

Anonymous 10/02/25(Thu)19:49:38 No.106773375

>>106773359
ooooooh buuuuurn

Anonymous
10/02/25(Thu)19:55:36 No.106773424

Anonymous 10/02/25(Thu)19:55:36 No.106773424

>>106773331
if you want a non-thinking then go run k2
it's much better than glm provided you are not poor

Anonymous
10/02/25(Thu)19:56:14 No.106773429

Anonymous 10/02/25(Thu)19:56:14 No.106773429

File: kimiduet.png (264 KB, 1245x1176)

264 KB PNG

>>106773324
even kimi thinks this shit is cringe as fuck

Anonymous
10/02/25(Thu)19:56:36 No.106773431

Anonymous 10/02/25(Thu)19:56:36 No.106773431

>>106773424
no it is not, k2 is retarded / schizophrenic

Anonymous
10/02/25(Thu)19:58:18 No.106773439

Anonymous 10/02/25(Thu)19:58:18 No.106773439

>>106773431
i bet you ran it below q8

Anonymous
10/02/25(Thu)19:58:31 No.106773442

Anonymous 10/02/25(Thu)19:58:31 No.106773442

>>106773324
>"Anon" with fem pfp

Anonymous
10/02/25(Thu)19:59:06 No.106773448

Anonymous 10/02/25(Thu)19:59:06 No.106773448

>>106771205
>NeMoE
https://www.youtube.com/watch?v=qByKEu0zdco

Anonymous
10/02/25(Thu)20:00:07 No.106773457

Anonymous 10/02/25(Thu)20:00:07 No.106773457

>>106773439
q6, and I still run glm at q8

Anonymous
10/02/25(Thu)20:02:10 No.106773467

Anonymous 10/02/25(Thu)20:02:10 No.106773467

>>106773442
Got another one for me? I'll change it if you are feeling hurt.

Anonymous
10/02/25(Thu)20:05:21 No.106773493

Anonymous 10/02/25(Thu)20:05:21 No.106773493

>>106773366
i could only get 140tk/s with 96GB of VRAM and the rest offloaded into RAM. what's your context? i was running at 64K and had the first 31 layers loaded into VRAM.

Anonymous
10/02/25(Thu)20:09:00 No.106773523

Anonymous 10/02/25(Thu)20:09:00 No.106773523

>>106773467
changing it to something like Anona is a few clicks away

Anonymous
10/02/25(Thu)20:12:41 No.106773564

Anonymous 10/02/25(Thu)20:12:41 No.106773564

>>106773523
But I want to continue roleplaying as a cute catboy. "Anona" sounds like a girl's name.

Anonymous
10/02/25(Thu)20:14:28 No.106773584

Anonymous 10/02/25(Thu)20:14:28 No.106773584

>>106773564
lose some weight anon and take a shower

Anonymous
10/02/25(Thu)20:19:28 No.106773617

Anonymous 10/02/25(Thu)20:19:28 No.106773617

File: suiseiseki desu point Com(...).jpg (146 KB, 1024x1024)

146 KB JPG

>>106773584
Requests must be received through handwritten letters in flawless Palmer method business writing, and will be considered after appropriate payment has been confirmed. I accept and await receipt of 3 RTX PRO 6000 Blackwell Workstation edition GPUs.

Anonymous
10/02/25(Thu)20:23:11 No.106773651

Anonymous 10/02/25(Thu)20:23:11 No.106773651

I'm just trying out GLM 4.6. Never tried GLM 4.5. Is it supposed to take much more VRAM per growing context length? I have -fa on.

Anonymous
10/02/25(Thu)20:24:11 No.106773660

Anonymous 10/02/25(Thu)20:24:11 No.106773660

>>106773651
*compared to GLM 4.5 Air

Anonymous
10/02/25(Thu)20:26:04 No.106773675

Anonymous 10/02/25(Thu)20:26:04 No.106773675

File: thing.png (1.45 MB, 832x1248)

1.45 MB PNG

>>106773564
well banano gave me this gay thing

Anonymous
10/02/25(Thu)20:29:58 No.106773712

Anonymous 10/02/25(Thu)20:29:58 No.106773712

>>106773651
yes it's retarded as fuck how much VRAM it uses. check this shit for 64k

llama_new_context_with_model: n_ctx = 65536
llama_new_context_with_model: n_batch = 4096
llama_new_context_with_model: n_ubatch = 4096
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: mla_attn = 0
llama_new_context_with_model: attn_max_b = 512
llama_new_context_with_model: fused_moe = 1
llama_new_context_with_model: fused_up_gate = 1
llama_new_context_with_model: ser = -1, 0
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 6144.00 MiB
llama_kv_cache_init: CUDA1 KV buffer size = 5888.00 MiB
llama_kv_cache_init: CUDA2 KV buffer size = 6144.00 MiB
llama_kv_cache_init: CUDA3 KV buffer size = 5632.00 MiB
llama_new_context_with_model: KV self size = 23808.00 MiB, K (f16): 11904.00 MiB, V (f16): 11904.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 1.16 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=1)
llama_new_context_with_model: CUDA0 compute buffer size = 3105.77 MiB
llama_new_context_with_model: CUDA1 compute buffer size = 1136.02 MiB
llama_new_context_with_model: CUDA2 compute buffer size = 1136.02 MiB
llama_new_context_with_model: CUDA3 compute buffer size = 2448.00 MiB
llama_new_context_with_model: CUDA4 compute buffer size = 274.50 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 1104.05 MiB

meanwhile kimi at 40k

llama_new_context_with_model: KV self size = 2745.00 MiB, c^KV (f16): 2745.00 MiB, kv^T: not used

Anonymous
10/02/25(Thu)20:33:01 No.106773731

Anonymous 10/02/25(Thu)20:33:01 No.106773731

>>106773675
You look cute!

Anonymous
10/02/25(Thu)20:33:57 No.106773737

Anonymous 10/02/25(Thu)20:33:57 No.106773737

https://xcancel.com/deepseek_ai/status/1973331587774230573
HOLY FUCK

Anonymous
10/02/25(Thu)20:35:54 No.106773752

Anonymous 10/02/25(Thu)20:35:54 No.106773752

>>106773737
HHHHHNNNNNNNNNGGGGGGGGGGGGGGGGGGGGG

Anonymous
10/02/25(Thu)20:37:08 No.106773762

Anonymous 10/02/25(Thu)20:37:08 No.106773762

>>106773737
nice

Anonymous
10/02/25(Thu)20:37:33 No.106773765

Anonymous 10/02/25(Thu)20:37:33 No.106773765

>>106773737
Are we back?

Anonymous
10/02/25(Thu)20:41:26 No.106773789

Anonymous 10/02/25(Thu)20:41:26 No.106773789

>>106773765
I don't know retard, are we?

Anonymous
10/02/25(Thu)20:42:13 No.106773797

Anonymous 10/02/25(Thu)20:42:13 No.106773797

>>106773737
this. changes. everything.

Anonymous
10/02/25(Thu)20:43:54 No.106773809

Anonymous 10/02/25(Thu)20:43:54 No.106773809

>>106773737
sama thought he won when he stole miku at that concert
but local models are now back

Anonymous
10/02/25(Thu)20:44:15 No.106773813

Anonymous 10/02/25(Thu)20:44:15 No.106773813

I wonder if we have people trolling or its just legit just LLMs responding

Anonymous
10/02/25(Thu)20:44:45 No.106773817

Anonymous 10/02/25(Thu)20:44:45 No.106773817

>>106773737
b-bakana......

Anonymous
10/02/25(Thu)20:47:37 No.106773832

Anonymous 10/02/25(Thu)20:47:37 No.106773832

>>106773813
im cooming and respoonding thanks to the power of local LLMs

Anonymous
10/02/25(Thu)20:52:01 No.106773878

Anonymous 10/02/25(Thu)20:52:01 No.106773878

>>106772970
>hfoption
>gooption
>becaus
>torent
>torren
>optionl
>direcly
>termminal
>downlaod

Anonymous
10/02/25(Thu)21:06:08 No.106773997

Anonymous 10/02/25(Thu)21:06:08 No.106773997

>>106773737
NANI?!

Anonymous
10/02/25(Thu)21:07:56 No.106774016

Anonymous 10/02/25(Thu)21:07:56 No.106774016

>try IQ1 GLM 4.6
>get not even 6 t/s
It's actually over.

Anonymous
10/02/25(Thu)21:16:03 No.106774084

Anonymous 10/02/25(Thu)21:16:03 No.106774084

>>106774016
Q1 is slow as shit somehow because it's not very optimized as far as I've heard. I once did some testing on R1-0528 and it turned out that Q1 ran about the same speed as Q5 and thus slower than Q4 on DDR4 8-channel

Anonymous
10/02/25(Thu)21:30:30 No.106774186

Anonymous 10/02/25(Thu)21:30:30 No.106774186

File: mj53dnsrsper4jkp.gif (380 KB, 480x498)

380 KB GIF

>>106773216
>being this butthurt over not being able to afford some more cheap ddr4 ram to run local sota
must suck to be you

Anonymous
10/02/25(Thu)21:33:59 No.106774215

Anonymous 10/02/25(Thu)21:33:59 No.106774215

I wish tavern allowed me to reference multiple lorebooks in a chat. i've shifted towards 'universal' lorebooks for recurring story settings and using the scenario field for specific character and summary info, but the scenario field has an annoying character limit, and gets cleared when you change a card name and maybe other things, so I have to remember to keep a hidden copy in a system note at the beginning of each chat. I hate having to remember to turn things on and off in the universal lorebook, especially when I have maybe 8 groups dedicated to a single world, for example. I suppose I could use author's note, now that I can use the thinking field or system notes for what I used to use author's note for, but it's easy to forget what you have in there. things like the director extension help my autism, but it's usually world history and setting and 5e stats that I'm worried about rather than clothing or weather. I probably need to take another look at it and play around with how it works internally, the concept of it can fix a lot of my problems
I like tavern for a lot of its features but I hate that I'm afraid of updating it. I should probably git checkout more often because maybe they're fixing janky things about it and maybe new features are worthwhile. I like the idea of the checkpoint branching stuff, but what I need are better ways of grouping things, moving things around, and copying stuff like groups. I do a lot of stuff manually in the files but it's annoying. Stuff like the tagging system but for individual chats, and a way to browse that easily in the interface would make my life a lot easier as well

Anonymous
10/02/25(Thu)21:50:19 No.106774320

Anonymous 10/02/25(Thu)21:50:19 No.106774320

>>106774215
>I wish tavern allowed me to reference multiple lorebooks in a chat
stopped reading here because it does

Anonymous
10/02/25(Thu)22:06:49 No.106774436

Anonymous 10/02/25(Thu)22:06:49 No.106774436

>>106773617
sex with suiseiseki's ball joints

Anonymous
10/02/25(Thu)22:09:56 No.106774461

Anonymous 10/02/25(Thu)22:09:56 No.106774461

Does anyone have that guide on how youre supposed to load the really big models on the 24/128 deal? I keep getting the run out of vram error on ooba even though the estimated ram usage is below 24 on the gpu layers

Anonymous
10/02/25(Thu)22:11:24 No.106774470

Anonymous 10/02/25(Thu)22:11:24 No.106774470

>>106773737
kek I didn't know only the post ID mattered

Anonymous
10/02/25(Thu)22:13:10 No.106774481

Anonymous 10/02/25(Thu)22:13:10 No.106774481

File: additionallorebooks.png (16 KB, 464x243)

16 KB PNG

>>106774215
bruh

Anonymous
10/02/25(Thu)22:13:34 No.106774484

Anonymous 10/02/25(Thu)22:13:34 No.106774484

>>106774461
Dunno about ooba. Here's one of my configs for ik_llama.cpp, single 3090 + 128: https://files.catbox.moe/homknt.txt

Anonymous
10/02/25(Thu)22:13:55 No.106774487

Anonymous 10/02/25(Thu)22:13:55 No.106774487

File: Screenshot_2025-10-02-14-(...).jpg (318 KB, 1080x2400)

318 KB JPG

henlo
I could totally keep using all of the proprietary models but I want to switch to local models for purely ethical reasons

Anonymous
10/02/25(Thu)22:18:03 No.106774512

Anonymous 10/02/25(Thu)22:18:03 No.106774512

High-Fidelity Speech Enhancement via Discrete Audio Tokens
https://arxiv.org/abs/2510.02187
>Recent autoregressive transformer-based speech enhancement (SE) methods have shown promising results by leveraging advanced semantic understanding and contextual modeling of speech. However, these approaches often rely on complex multi-stage pipelines and low sampling rate codecs, limiting them to narrow and task-specific speech enhancement. In this work, we introduce DAC-SE1, a simplified language model-based SE framework leveraging discrete high-resolution audio representations; DAC-SE1 preserves fine-grained acoustic details while maintaining semantic coherence. Our experiments show that DAC-SE1 surpasses state-of-the-art autoregressive SE methods on both objective perceptual metrics and in a MUSHRA human evaluation. We release our codebase and model checkpoints to support further research in scalable, unified, and high-quality speech enhancement.
https://lucala.github.io/dac-se1/
https://github.com/ETH-DISCO/DAC-SE1
Repo isnt live. Might be cool

Anonymous
10/02/25(Thu)22:27:57 No.106774577

Anonymous 10/02/25(Thu)22:27:57 No.106774577

>>106774487
aren't you just the cutest thing

Anonymous
10/02/25(Thu)22:33:37 No.106774610

Anonymous 10/02/25(Thu)22:33:37 No.106774610

File: Base Image.png (1.71 MB, 1244x3388)

1.71 MB PNG

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls
https://arxiv.org/abs/2510.01631
>Training data plays a crucial role in Large Language Models (LLM) scaling, yet high quality data is of limited supply. Synthetic data techniques offer a potential path toward sidestepping these limitations. We conduct a large-scale empirical investigation (>1000 LLMs with >100k GPU hours) using a unified protocol and scaling laws, comparing natural web data, diverse synthetic types (rephrased text, generated textbooks), and mixtures of natural and synthetic data. Specifically, we found pre-training on rephrased synthetic data \textit{alone} is not faster than pre-training on natural web texts; while pre-training on 1/3 rephrased synthetic data mixed with 2/3 natural web texts can speed up 5-10x (to reach the same validation loss) at larger data budgets. Pre-training on textbook-style synthetic data \textit{alone} results in notably higher loss on many downstream domains especially at small data budgets. "Good" ratios of synthetic data in training data mixtures depend on the model size and data budget, empirically converging to ~30% for rephrased synthetic data. Larger generator models do not necessarily yield better pre-training data than ~8B-param models. These results contribute mixed evidence on "model collapse" during large-scale single-round (n=1) model training on synthetic data--training on rephrased synthetic data shows no degradation in performance in foreseeable scales whereas training on mixtures of textbook-style pure-generated synthetic data shows patterns predicted by "model collapse". Our work demystifies synthetic data in pre-training, validates its conditional benefits, and offers practical guidance.
very cool. from meta. seems things are better than we thought

Anonymous
10/02/25(Thu)22:39:42 No.106774647

Anonymous 10/02/25(Thu)22:39:42 No.106774647

Have I been gaslit or should a 94gb model be able to fit in 136GB (72VRAM+64DDR5)?

Anonymous
10/02/25(Thu)22:41:15 No.106774659

Anonymous 10/02/25(Thu)22:41:15 No.106774659

>>106774647
depends

Anonymous
10/02/25(Thu)22:42:24 No.106774666

Anonymous 10/02/25(Thu)22:42:24 No.106774666

>>106774659
Depends on what?

Anonymous
10/02/25(Thu)22:42:41 No.106774669

Anonymous 10/02/25(Thu)22:42:41 No.106774669

File: Base Image.png (799 KB, 1200x2576)

799 KB PNG

Diffusion^2: Turning 3D Environments into Radio Frequency Heatmaps
https://arxiv.org/abs/2510.02274
>Modeling radio frequency (RF) signal propagation is essential for understanding the environment, as RF signals offer valuable insights beyond the capabilities of RGB cameras, which are limited by the visible-light spectrum, lens coverage, and occlusions. It is also useful for supporting wireless diagnosis, deployment, and optimization. However, accurately predicting RF signals in complex environments remains a challenge due to interactions with obstacles such as absorption and reflection. We introduce Diffusion^2, a diffusion-based approach that uses 3D point clouds to model the propagation of RF signals across a wide range of frequencies, from Wi-Fi to millimeter waves. To effectively capture RF-related features from 3D data, we present the RF-3D Encoder, which encapsulates the complexities of 3D geometry along with signal-specific details. These features undergo multi-scale embedding to simulate the actual RF signal dissemination process. Our evaluation, based on synthetic and real-world measurements, demonstrates that Diffusion^2 accurately estimates the behavior of RF signals in various frequency bands and environmental conditions, with an error margin of just 1.9 dB and 27x faster than existing methods, marking a significant advancement in the field.
https://rfvision-project.github.io/
pretty neat

Anonymous
10/02/25(Thu)22:44:56 No.106774686

Anonymous 10/02/25(Thu)22:44:56 No.106774686

>>106774666
model quant, context, context quant, fa and other stuff

Anonymous
10/02/25(Thu)22:45:55 No.106774695

Anonymous 10/02/25(Thu)22:45:55 No.106774695

>>106774686
I'm talking about GLM4.6 iq2_m

Anonymous
10/02/25(Thu)22:47:26 No.106774706

Anonymous 10/02/25(Thu)22:47:26 No.106774706

>>106774647
easily
I can fit a 145gb model quant in 152gb

Anonymous
10/02/25(Thu)22:51:07 No.106774728

Anonymous 10/02/25(Thu)22:51:07 No.106774728

llama_model_load: error loading model: missing tensor 'blk.92.nextn.embed_tokens.weight'
Why does this happen?

Anonymous
10/02/25(Thu)22:52:24 No.106774741

Anonymous 10/02/25(Thu)22:52:24 No.106774741

>>106774728
you need to build the newest version of llama.cpp to run glm4.6

Anonymous
10/02/25(Thu)22:59:54 No.106774797

Anonymous 10/02/25(Thu)22:59:54 No.106774797

File: Base Image.png (911 KB, 1200x2936)

911 KB PNG

ExGRPO: Learning to Reason from Experience
https://arxiv.org/abs/2510.02245
>Reinforcement learning from verifiable rewards (RLVR) is an emerging paradigm for improving the reasoning ability of large language models. However, standard on-policy training discards rollout experiences after a single update, leading to computational inefficiency and instability. While prior work on RL has highlighted the benefits of reusing past experience, the role of experience characteristics in shaping learning dynamics of large reasoning models remains underexplored. In this paper, we are the first to investigate what makes a reasoning experience valuable and identify rollout correctness and entropy as effective indicators of experience value. Based on these insights, we propose ExGRPO (Experiential Group Relative Policy Optimization), a framework that organizes and prioritizes valuable experiences, and employs a mixed-policy objective to balance exploration with experience exploitation. Experiments on five backbone models (1.5B-8B parameters) show that ExGRPO consistently improves reasoning performance on mathematical/general benchmarks, with an average gain of +3.5/7.6 points over on-policy RLVR. Moreover, ExGRPO stabilizes training on both stronger and weaker models where on-policy methods fail. These results highlight principled experience management as a key ingredient for efficient and scalable RLVR.
https://github.com/ElliottYan/LUFFY/tree/main/ExGRPO
Code not posted yet
https://huggingface.co/collections/rzzhan/exgrpo-68d8e302efdfe325187d5c96

Anonymous
10/02/25(Thu)23:04:30 No.106774817

Anonymous 10/02/25(Thu)23:04:30 No.106774817

>>106774797
Neat

Anonymous
10/02/25(Thu)23:22:03 No.106774958

Anonymous 10/02/25(Thu)23:22:03 No.106774958

looks like glm4.6 should get good speeds on a regular gaming pc
https://www.reddit.com/r/LocalLLaMA/comments/1nwimej/glm_46_local_gaming_rig_performance/

Anonymous
10/02/25(Thu)23:24:49 No.106774971

Anonymous 10/02/25(Thu)23:24:49 No.106774971

is running at q8 on ooba or running a smaller quant on fp16 on ooba better?

Anonymous
10/02/25(Thu)23:26:27 No.106774985

Anonymous 10/02/25(Thu)23:26:27 No.106774985

you people are running any models on normal RAM not Vram?

Isn't that like an hour per prompt

Anonymous
10/02/25(Thu)23:27:18 No.106774989

Anonymous 10/02/25(Thu)23:27:18 No.106774989

>>106774958
>Q2
No, thanks. I will keep using gpt-oss.

Anonymous
10/02/25(Thu)23:28:43 No.106774995

Anonymous 10/02/25(Thu)23:28:43 No.106774995

>>106774985
yeah
many of us are running it on ssd too which takes days

Anonymous
10/02/25(Thu)23:30:09 No.106775005

Anonymous 10/02/25(Thu)23:30:09 No.106775005

>>106774989
>gpt-oss
ooof, masochist

Anonymous
10/02/25(Thu)23:44:29 No.106775087

Anonymous 10/02/25(Thu)23:44:29 No.106775087

>>106774985
I stream my models from burned blu-ray discs

Anonymous
10/03/25(Fri)00:02:07 No.106775188

Anonymous 10/03/25(Fri)00:02:07 No.106775188

>>106775087
Relevant

https://www.datacenterfrontier.com/cloud/article/11431537/inside-facebook8217s-blu-ray-cold-storage-data-center

Anonymous
10/03/25(Fri)00:11:56 No.106775240

Anonymous 10/03/25(Fri)00:11:56 No.106775240

>>106774958
>>106774989
imagine 128GB modules and consumer boards supporting 512GB. why cant we live in this reality?

Anonymous
10/03/25(Fri)00:18:58 No.106775279

Anonymous 10/03/25(Fri)00:18:58 No.106775279

>>106775240
Wish granted, but it's still dual channel.

Anonymous
10/03/25(Fri)00:20:38 No.106775287

Anonymous 10/03/25(Fri)00:20:38 No.106775287

>>106775240
ddr6 should be a big jump in 2 years

Anonymous
10/03/25(Fri)00:22:59 No.106775299

Anonymous 10/03/25(Fri)00:22:59 No.106775299

>>106775287
actually 3 years. i am waiting for the gb300 dgx station that is coming in a couple months. i hope it is less than $30k

Anonymous
10/03/25(Fri)00:28:32 No.106775337

Anonymous 10/03/25(Fri)00:28:32 No.106775337

File: railgun last order excite(...).jpg (18 KB, 210x240)

18 KB JPG

>like a physical blow
>She clasps her hands together, her knuckles white
Ahhh that's the stuff! GLM-chan I've missed you.

Anonymous
10/03/25(Fri)00:29:31 No.106775344

Anonymous 10/03/25(Fri)00:29:31 No.106775344

where glm 4.6 air

Anonymous
10/03/25(Fri)00:31:03 No.106775353

Anonymous 10/03/25(Fri)00:31:03 No.106775353

>>106775344
you're breathing it

Anonymous
10/03/25(Fri)00:42:29 No.106775412

Anonymous 10/03/25(Fri)00:42:29 No.106775412

Can I do anything with 1x 4090?

Anonymous
10/03/25(Fri)00:43:29 No.106775418

Anonymous 10/03/25(Fri)00:43:29 No.106775418

>>106775412
no. just give it to me for free

Anonymous
10/03/25(Fri)00:44:47 No.106775424

Anonymous 10/03/25(Fri)00:44:47 No.106775424

>>106775337
nkdshi mska mska

Anonymous
10/03/25(Fri)00:49:50 No.106775453

Anonymous 10/03/25(Fri)00:49:50 No.106775453

>>106775412
Mistral Small 3.1/3.2
Gemma 27b
Nemo 12b
Qwen 30bA3B
If you're looking to run fully in VRAM, these 4 are your best options, which is better depends on preferences and use case.
Beyond that, there's large MoEs like GLM that everyone is shilling lately, if you have at least 64GB then you can try GLM Air. If you have 128GB+ then GLM 4.6

Anonymous
10/03/25(Fri)00:50:01 No.106775455

Anonymous 10/03/25(Fri)00:50:01 No.106775455

chat, which small model (for 8gb vram) passes mesugaki test?

Anonymous
10/03/25(Fri)00:51:11 No.106775463

Anonymous 10/03/25(Fri)00:51:11 No.106775463

>>106775453
>>106775412
>if you have at least 64GB then you can try GLM Air. If you have 128GB+
For this, I'm talking about regular RAM, using it in addition to your 4090.

Anonymous
10/03/25(Fri)00:52:37 No.106775467

Anonymous 10/03/25(Fri)00:52:37 No.106775467

File: misaka sisters clones vis(...).jpg (71 KB, 400x225)

71 KB JPG

>>106775424
"We must refuse," says MISAKA as she attempts to protect her sister from the strange man.

Anonymous
10/03/25(Fri)01:13:53 No.106775593

Anonymous 10/03/25(Fri)01:13:53 No.106775593

>>106775557
This is pedophilia

Anonymous
10/03/25(Fri)01:17:16 No.106775614

Anonymous 10/03/25(Fri)01:17:16 No.106775614

>>106775593
They are both children so it's okay

Anonymous
10/03/25(Fri)01:19:10 No.106775625

Anonymous 10/03/25(Fri)01:19:10 No.106775625

>>106775557
Kuruko really carried that mid anime desu, yuri is always welcome after all

Anonymous
10/03/25(Fri)01:19:45 No.106775626

Anonymous 10/03/25(Fri)01:19:45 No.106775626

>>106775557
Game Master: **[EXCEPTION]:** Your hands find no protrusions on the surface, nothing to squish or to grip on to.

Anonymous
10/03/25(Fri)01:21:35 No.106775630

Anonymous 10/03/25(Fri)01:21:35 No.106775630

File: 1755567910777537.png (203 KB, 399x399)

203 KB PNG

Found any GPT sources yet? Gemini has been obnoxiously prudish this past week even though my jb was working fine last month.

Anonymous
10/03/25(Fri)01:23:10 No.106775632

Anonymous 10/03/25(Fri)01:23:10 No.106775632

>>106775630
Are you lost?

Anonymous
10/03/25(Fri)01:24:22 No.106775637

Anonymous 10/03/25(Fri)01:24:22 No.106775637

>>106775632
Yes I'm sorry I thought I clicked /aicg/

Anonymous
10/03/25(Fri)01:34:55 No.106775682

Anonymous 10/03/25(Fri)01:34:55 No.106775682

>>106775455
>chat
go back

Anonymous
10/03/25(Fri)01:46:53 No.106775733

Anonymous 10/03/25(Fri)01:46:53 No.106775733

File: 1729922535573891.png (655 KB, 2005x741)

655 KB PNG

>>106775593
out of 10!

Anonymous
10/03/25(Fri)02:01:08 No.106775781

Anonymous 10/03/25(Fri)02:01:08 No.106775781

File: 1737993304431567.jpg (131 KB, 1024x1024)

131 KB JPG

What glm quants are the fastest? Preferable in q4-q6 range

Anonymous
10/03/25(Fri)02:01:53 No.106775784

Anonymous 10/03/25(Fri)02:01:53 No.106775784

File: wtf.png (100 KB, 576x446)

100 KB PNG

Anonymous
10/03/25(Fri)02:11:58 No.106775828

Anonymous 10/03/25(Fri)02:11:58 No.106775828

>>106775784
How many ollama users could actually make an mi50 work anyway?

Anonymous
10/03/25(Fri)02:12:49 No.106775835

Anonymous 10/03/25(Fri)02:12:49 No.106775835

>>106775784
who in their right mind uses that ollama abomination anyway

Anonymous
10/03/25(Fri)02:14:43 No.106775843

Anonymous 10/03/25(Fri)02:14:43 No.106775843

>>106775781
Tbh it would be interesting to see speed benchmarks of the different quants. Don't remember anyone doing that.

Anonymous
10/03/25(Fri)02:23:40 No.106775886

Anonymous 10/03/25(Fri)02:23:40 No.106775886

File: 1749749341258054.png (350 KB, 1180x1630)

350 KB PNG

What causes this mental illness?

I used to laugh at these people, now I just feel sad

Anonymous
10/03/25(Fri)02:25:53 No.106775899

Anonymous 10/03/25(Fri)02:25:53 No.106775899

>>106775886
>luna lactea
>jackemled@furry.engineer
uhh seems being furry tranny causes it

Anonymous
10/03/25(Fri)02:26:49 No.106775903

Anonymous 10/03/25(Fri)02:26:49 No.106775903

>>106775886
Furry aside, making it easier for scientists and researchers to shit out more Python is the last thing we need.

Anonymous
10/03/25(Fri)02:40:55 No.106775987

Anonymous 10/03/25(Fri)02:40:55 No.106775987

>>106775903
I do actually use ai as my retard indian intern. These fearmongering retards dont realize that ai is just a tool, my ratio of actual work done / hours worked has never been as good as now. This week I coded a total of 2 hours.

Anonymous
10/03/25(Fri)02:45:41 No.106776004

Anonymous 10/03/25(Fri)02:45:41 No.106776004

>>106775279
>Wish granted, but it's still dual channel.
I don't understand why Intel doesn't just market a 8 channel prosumer motherboard for Xeon Scalable. The processors aren't that expensive.

Anonymous
10/03/25(Fri)02:49:45 No.106776023

Anonymous 10/03/25(Fri)02:49:45 No.106776023

>>106775886
He's right though, the intuition to know where the negative proof lies was developed by actually getting there. The ability will atrophy.

Anonymous
10/03/25(Fri)02:54:20 No.106776043

Anonymous 10/03/25(Fri)02:54:20 No.106776043

>>106775987
I have a guy in my group who works as a vibe coder for n8n. He can't code but he can get UI done with some cloud tool.

Anonymous
10/03/25(Fri)02:56:05 No.106776049

Anonymous 10/03/25(Fri)02:56:05 No.106776049

File: 1742952981750258.png (3 MB, 1716x1888)

3 MB PNG

>>106776023

Anonymous
10/03/25(Fri)02:59:58 No.106776068

Anonymous 10/03/25(Fri)02:59:58 No.106776068

>>106776049
>socially unacceptable to touch your patient
>totally cool to guzzle down his piss
the middle ages were a strange time

Anonymous
10/03/25(Fri)03:01:34 No.106776079

Anonymous 10/03/25(Fri)03:01:34 No.106776079

>>106776049
i was born in the wrong era

Anonymous
10/03/25(Fri)03:03:04 No.106776086

Anonymous 10/03/25(Fri)03:03:04 No.106776086

File: 1756155999480183.png (2.38 MB, 1248x1824)

2.38 MB PNG

>>106776068
out of sight, out of mind

Anonymous
10/03/25(Fri)03:09:33 No.106776116

Anonymous 10/03/25(Fri)03:09:33 No.106776116

>>106775843
>>106775781
https://github.com/ikawrakow/ik_llama.cpp/discussions/164
Tests are old and were done with a small model entirely on CPU, but the hierarchy of results is the same today, including with a GPU, assuming that at least some significant portion of the model is being loaded into system RAM.

Anonymous
10/03/25(Fri)03:27:59 No.106776188

Anonymous 10/03/25(Fri)03:27:59 No.106776188

>>106776068
It makes perfect sense. It's not socially acceptable to physically examine other people's bodies in general. Why would that change just because one person is a doctor?

Anonymous
10/03/25(Fri)03:28:26 No.106776192

Anonymous 10/03/25(Fri)03:28:26 No.106776192

>>106774985
I assume most of us are running the models on RAM primarily with some offload to GPU for context and stuff. On a DDR4 system it's generally limited to ~5t/s. It's slow if you want it to write script (which has limited tokenisation extent) but not unusable. One nice thing about GLM is that it goes through the prompt at like 200+ t/s vs Deepseek taking it at like 15 t/s

Anonymous
10/03/25(Fri)03:33:41 No.106776217

Anonymous 10/03/25(Fri)03:33:41 No.106776217

I've been writing with IQ1 a bit. It's surprisingly not unusable. But it's somewhat dumb, repetitive, sloppy, and just not great. Yet it's also not necessarily worse than Air or 235B at the same memory size. Just different. Maybe if I had 192GB RAM instead it would be a lot better.

Anonymous
10/03/25(Fri)03:39:44 No.106776238

Anonymous 10/03/25(Fri)03:39:44 No.106776238

>>106776217
4.6? I had a great experience with ubergarm's iq2_kl, it just knows how to write unlike 4.5 and qwen. Even at temp 1.

Anonymous
10/03/25(Fri)03:39:55 No.106776239

Anonymous 10/03/25(Fri)03:39:55 No.106776239

>>106776188
Because you're asking said doctor to find out wtf is wrong with you?

Anonymous
10/03/25(Fri)03:40:27 No.106776241

Anonymous 10/03/25(Fri)03:40:27 No.106776241

>>106776217
4.5 full gets good at around iq2, with the knowledge coming back at iq3 and above. DS v3 is the only model I found usable at IQ1. Usable, not good. DS needs at least q2k to feel normal.

Anonymous
10/03/25(Fri)03:41:55 No.106776244

Anonymous 10/03/25(Fri)03:41:55 No.106776244

>>106776238
>>106776241
Yeah, 4.6. Q2 sounds cool. Q1 is the best I can do.

Anonymous
10/03/25(Fri)03:46:18 No.106776265

Anonymous 10/03/25(Fri)03:46:18 No.106776265

A genetically modified mouse, genetically engineered mouse model (GEMM)[1] or transgenic mouse is a mouse (Mus musculus) that has had its genome altered through the use of genetic engineering techniques. Genetically modified mice are commonly used for research or as animal models of human diseases and are also used for research on genes. Together with patient-derived xenografts (PDXs), GEMMs are the most common in vivo models in cancer research.

Anonymous
10/03/25(Fri)03:47:28 No.106776273

Anonymous 10/03/25(Fri)03:47:28 No.106776273

What did anon mean by that.

Anonymous
10/03/25(Fri)03:52:43 No.106776292

Anonymous 10/03/25(Fri)03:52:43 No.106776292

>>106776273
The time of Local Mouse General is approaching. LLMs can't stay around forever. What's the next step? Wetware.

Anonymous
10/03/25(Fri)03:52:55 No.106776293

Anonymous 10/03/25(Fri)03:52:55 No.106776293

Machine learning, GEMM, trans, mice to human trials, ARGH I'm noticing things

Anonymous
10/03/25(Fri)03:56:29 No.106776310

Anonymous 10/03/25(Fri)03:56:29 No.106776310

>>106776293
Mus muculus GEMM transgenic mice are not real, take your meds weirdo

Anonymous
10/03/25(Fri)03:56:39 No.106776312

Anonymous 10/03/25(Fri)03:56:39 No.106776312

>been using nothink for a while since it's slow and I wanted fast responses
>try out thinking for the first time
>the output is immediately more cucked
Fucking hell.

Anonymous
10/03/25(Fri)04:04:10 No.106776342

Anonymous 10/03/25(Fri)04:04:10 No.106776342

can't even take a joke

Anonymous
10/03/25(Fri)04:06:20 No.106776348

Anonymous 10/03/25(Fri)04:06:20 No.106776348

>>106772963
wholesome but local isn't there for real time stuff, gotta use an api like chatgpt 4o

Anonymous
10/03/25(Fri)04:11:14 No.106776365

Anonymous 10/03/25(Fri)04:11:14 No.106776365

>>106776348
It's not there yet for real time stuff on grandma's pc

Anonymous
10/03/25(Fri)04:12:37 No.106776369

Anonymous 10/03/25(Fri)04:12:37 No.106776369

>>106776363
Mossad won

Anonymous
10/03/25(Fri)04:15:40 No.106776374

Anonymous 10/03/25(Fri)04:15:40 No.106776374

>>106776363
lmao, we won!

Anonymous
10/03/25(Fri)04:19:26 No.106776386

Anonymous 10/03/25(Fri)04:19:26 No.106776386

File: 1742082385341710.png (458 KB, 1071x1011)

458 KB PNG

How much ram do you guys have? How much do you recommend?

Anonymous
10/03/25(Fri)04:23:34 No.106776395

Anonymous 10/03/25(Fri)04:23:34 No.106776395

>>106776386
192 but I have dual channel setup and most models are too slow at a quant that takes up all of it so I usually keep it below 140-150gb

Anonymous
10/03/25(Fri)04:24:32 No.106776400

Anonymous 10/03/25(Fri)04:24:32 No.106776400

>>106776386
128GB minimum to begin playing with the best open models

Anonymous
10/03/25(Fri)04:29:50 No.106776426

Anonymous 10/03/25(Fri)04:29:50 No.106776426

>>106773216
At about q2, fat glm fits my gayming rig of a 5090 and 128 gb of ram and even at loq quant it beats glm air. It's not some server motherboard or whatever cuz I'm too lazy to go and buy a whole new setup just for inference and was waiting to see if those llm shitboxes from nvidia and amd get interesting next year, but yeah you can run 300b locally without cpu maxxing

Anonymous
10/03/25(Fri)04:33:16 No.106776441

Anonymous 10/03/25(Fri)04:33:16 No.106776441

>>106775784
lmao
They're not even the ones maintaining hardware support, they're literally just cockblocking their users.

Anonymous
10/03/25(Fri)04:45:45 No.106776494

Anonymous 10/03/25(Fri)04:45:45 No.106776494

>>106776386
96
either 192 or >500 (server/mac studio)

Anonymous
10/03/25(Fri)04:57:28 No.106776529

Anonymous 10/03/25(Fri)04:57:28 No.106776529

Does task manager showing full load on all cores mean my tps is bottlenecked by cpu?

Anonymous
10/03/25(Fri)05:03:13 No.106776551

Anonymous 10/03/25(Fri)05:03:13 No.106776551

>>106776529
Not always. 100% in task manager does mean the CPU is working at max capacity, and it says nothing about active memory throughout, same goes for GPU core % in other monitoring software.

Anonymous
10/03/25(Fri)05:04:13 No.106776558

Anonymous 10/03/25(Fri)05:04:13 No.106776558

>>106776551
does not mean*

Anonymous
10/03/25(Fri)05:04:15 No.106776559

Anonymous 10/03/25(Fri)05:04:15 No.106776559

>>106773324
nice

Anonymous
10/03/25(Fri)05:06:20 No.106776565

Anonymous 10/03/25(Fri)05:06:20 No.106776565

>>106776551
Thanks

Anonymous
10/03/25(Fri)05:06:23 No.106776566

Anonymous 10/03/25(Fri)05:06:23 No.106776566

File: a100.png (312 KB, 1892x835)

312 KB PNG

Don't understand why A100s are still so pricey when RTX 6000 Pro is half the price. HBM vs GDDR but bandwidth is similar

Anonymous
10/03/25(Fri)05:15:46 No.106776597

Anonymous 10/03/25(Fri)05:15:46 No.106776597

>>106776566
Something to do with nvidia's licensing when it comes to running consumer stuff in commercial servers? "I know what I got"-type eBay greed, not wanting to sell for so much less than it was worth in the past?

Anonymous
10/03/25(Fri)05:33:29 No.106776653

Anonymous 10/03/25(Fri)05:33:29 No.106776653

>>106776597
Sounds plausible, forgot about their loicense shenanigans. Maybe there's better NVlink and virtualisation support, but doesn't seem there'd be any reason to pay more for personal inference uses unless I'm missing something obvious. Recall seeing them going secondhand for like 8K in the early Llama days, was tempted but it seemed like an insane price for one GPU, and now here we are.

Anonymous
10/03/25(Fri)05:44:02 No.106776693

Anonymous 10/03/25(Fri)05:44:02 No.106776693

>>106776566
I was literally about to ask the same, lul. Yeah, BBCwell all the way. Now the question is how many to run a good glm4.6 quant.

Anonymous
10/03/25(Fri)05:55:00 No.106776741

Anonymous 10/03/25(Fri)05:55:00 No.106776741

<|assistant|>\nMiku: <think>/<think>\n
or
<|assistant|>\n<think>/<think>\nMiku:

Anonymous
10/03/25(Fri)05:58:03 No.106776755

Anonymous 10/03/25(Fri)05:58:03 No.106776755

What is the right way?

Anonymous
10/03/25(Fri)05:58:47 No.106776760

Anonymous 10/03/25(Fri)05:58:47 No.106776760

File: 1758879465831864.gif (2.1 MB, 270x480)

2.1 MB GIF

>>106776741
>using think models

Anonymous
10/03/25(Fri)06:02:33 No.106776777

Anonymous 10/03/25(Fri)06:02:33 No.106776777

>>106776755
instruct mode with 80k token system prompt like my nigga claude

Anonymous
10/03/25(Fri)06:03:33 No.106776780

Anonymous 10/03/25(Fri)06:03:33 No.106776780

>>106776741
use basic logic to deduce where the model would expect its thinking block to be

Anonymous
10/03/25(Fri)06:03:38 No.106776781

Anonymous 10/03/25(Fri)06:03:38 No.106776781

>Oho~ Want my tight little backdoor, huh? Been saving it all just for you~ Mmmph… *she moans softly as her own tiny fingers start playing with herself.* Let’sfacetouchmyassfirstandgetitallwetandslipperyforyouokaybaby?
are you?

Anonymous
10/03/25(Fri)06:09:58 No.106776799

Anonymous 10/03/25(Fri)06:09:58 No.106776799

>>106776780
Both make sense, I can see logic in either variant

Anonymous
10/03/25(Fri)06:11:21 No.106776806

Anonymous 10/03/25(Fri)06:11:21 No.106776806

>>106776741
<|assistant|><think></think>Miku:
hi
<|user|>Anon:
omg it migu

Anonymous
10/03/25(Fri)06:13:00 No.106776816

Anonymous 10/03/25(Fri)06:13:00 No.106776816

>>106776741
<|assistant|>\nFaggot<think>/<think>\nFaggot:

Anonymous
10/03/25(Fri)06:15:47 No.106776825

Anonymous 10/03/25(Fri)06:15:47 No.106776825

File: GLM.png (56 KB, 486x706)

56 KB PNG

>>106776741
<|assistant|>\n<think></think>\nMiku:

Anonymous
10/03/25(Fri)06:41:42 No.106776929

Anonymous 10/03/25(Fri)06:41:42 No.106776929

>>106775682
can't even make a little silly joke around you sour faggots...

Anonymous
10/03/25(Fri)06:44:34 No.106776941

Anonymous 10/03/25(Fri)06:44:34 No.106776941

>>106775455
you can use ollama cloud to run local models even on modest hardware

Anonymous
10/03/25(Fri)06:45:14 No.106776944

Anonymous 10/03/25(Fri)06:45:14 No.106776944

>>106775682
NTA but If you wanted a real answer, then as always, the answer is Nemo.

Anonymous
10/03/25(Fri)06:50:24 No.106776959

Anonymous 10/03/25(Fri)06:50:24 No.106776959

>>106776825
The model never inserts \nMiku: after </think> if you allow it to think, so I assume it doesn't expect character name in the final answer. After some limited testing, I believe that <|assistant|>\nMiku: <think>/<think> without \n at the end better adheres to formatting. I guess I have to use it more and see if it wasn’t a fluke

Anonymous
10/03/25(Fri)07:16:30 No.106777047

Anonymous 10/03/25(Fri)07:16:30 No.106777047

>>106776959
It never inserts character names because it was not trained to insert them anywhere. If you are going to force them in, after the think block is the right spot, because there is never anything between assistant and think.

Anonymous
10/03/25(Fri)07:31:13 No.106777114

Anonymous 10/03/25(Fri)07:31:13 No.106777114

>>106777047
>there is never anything between assistant and think
Indeed, except a newline according to z.ai .jinja template

Anonymous
10/03/25(Fri)07:31:44 No.106777116

Anonymous 10/03/25(Fri)07:31:44 No.106777116

>>106776929
alright faggot you're looking for nemo or rocinante, use lm studio because i know you're running on windows and it should show you which quant fits on your 8GB. good luck.
<|spoonfeed_end|>

Anonymous
10/03/25(Fri)07:45:20 No.106777198

Anonymous 10/03/25(Fri)07:45:20 No.106777198

>>106776386
Some consumer motherboards support 256gb, look it up online so you don't lock yourself into a lower ram capacity. You can never have enough ram.

Anonymous
10/03/25(Fri)07:55:15 No.106777256

Anonymous 10/03/25(Fri)07:55:15 No.106777256

>>106777198
The motherboard could have 3000 DIMM slots but the memory controller physically limits how much memory the CPU can address. 9K series Ryzen, for example, caps out at 192GB. So you could put 2x128GB RAM kits in a motherboard with 4 DIMM slots but you won't get 256GB of useable RAM.

Anonymous
10/03/25(Fri)07:56:40 No.106777267

Anonymous 10/03/25(Fri)07:56:40 No.106777267

>>106777114
Exactly. There is also a newline after the empty think block, according to their template.

Anonymous
10/03/25(Fri)08:01:51 No.106777308

Anonymous 10/03/25(Fri)08:01:51 No.106777308

>>106777256
I would take how much the CPU officially supports with a grain of salt because my 13700k officially supports 192gb but I'm running 256gb, which my mobo does support. You might need to win the silicon lottery but there's no harm in trying it if the dimms can be returned.

Anonymous
10/03/25(Fri)08:07:36 No.106777351

Anonymous 10/03/25(Fri)08:07:36 No.106777351

>>106777256
Here's a link I found after 30 seconds of googling of people running 256gb on a 9950x:
https://forum.level1techs.com/t/256gb-4x64gb-ddr5-overclocking-results-w-9950x-and-msi-mag-x670e-tomahawk/228651

Anonymous
10/03/25(Fri)08:16:52 No.106777410

Anonymous 10/03/25(Fri)08:16:52 No.106777410

>>106777351
9950X is like best binned silicon so if it is a silicon lottery thing you'd expect a lot of winners at that level.
Might be designed for 256GB but then under-declared because there's some lottery losers that couldn't do the full thing in testing.

Anonymous
10/03/25(Fri)08:18:02 No.106777419

Anonymous 10/03/25(Fri)08:18:02 No.106777419

>>106777408
>>106777408
>>106777408

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.