/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

/lmg/ - Local Models General Archived

/lmg/ - Local Models General 01/04/26(Sun)02:42:38 No.107758111

File: 1737694546160700.jpg (269 KB, 928x1232)

269 KB JPG

/lmg/ - Local Models General /lmg/ - Local Models General 01/04/26(Sun)02:42:38 No.107758111 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107749596 & >>107741641

►News
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519B-A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI
>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B
>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

/lmg/ - Local Models General
01/04/26(Sun)02:43:03 No.107758114

/lmg/ - Local Models General 01/04/26(Sun)02:43:03 No.107758114

File: __hatsune_miku_kagamine_r(...).jpg (349 KB, 2508x1764)

349 KB JPG

►Recent Highlights from the Previous Thread: >>107749596

--Llama.cpp multi-GPU/NUMA optimization challenges and roadmap:
>107749695 >107749792 >107749857 >107749885 >107750757 >107750790 >107750807 >107750822 >107750844
--RTX 6000 vs multiple 3090s: VRAM tradeoffs and multi-GPU system challenges:
>107752983 >107753015 >107753048 >107753060 >107753189 >107753238 >107753443 >107753071
--Setting up Mistral-Nemo-Instruct-2407 on limited GPU resources:
>107749866 >107749870 >107749872 >107749880 >107749907 >107749920 >107750058 >107750141 >107750162
--glm 4.6/4.7 model viability for roleplay with Q2 quant:
>107752749 >107752769 >107752858 >107752883 >107753107
--Hardware-specific model choices and coding LLM performance debates:
>107750660 >107750676 >107750698 >107750704 >107750745 >107750781 >107750950 >107751012 >107751086 >107751103 >107751190
--Solar 100b's clothing logic flaws in character generation:
>107755612
--Critique of outdated 3.3 8B vs appreciation for modern AI advancements:
>107751376 >107751385
--ERNIE-4.5-21B-A3B-PT's translation and preference over Gemma:
>107753716
--Diagnosing low GPU usage in koboldcpp-nocuda:
>107750540 >107750550 >107750585 >107750654 >107750659 >107750694
--LLM keyboard shortcut comprehension and model introspection limitations:
>107755192 >107755280 >107755353 >107755425 >107755467 >107756006 >107756050 >107756059 >107756227 >107756263 >107756393 >107756519 >107756553 >107756260 >107756211 >107756267 >107756285 >107756300 >107756312 >107755396 >107755482
--Jailbreaking and ethical debates around model policies:
>107750213 >107750247 >107750461 >107750478 >107750339
--GLM4.7 outperforms others in hex conversion algorithm:
>107757624
--Kccp antislop sampler patch with wildcard syntax example:
>107751820
--Logs: Solar Open:
>107750674
--Miku (free space):
>107752290 >107752963 >107756059 >107756654

►Recent Highlight Posts from the Previous Thread: >>107749599

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/04/26(Sun)02:52:39 No.107758153

Anonymous 01/04/26(Sun)02:52:39 No.107758153

Mikulove

Anonymous
01/04/26(Sun)03:03:30 No.107758193

Anonymous 01/04/26(Sun)03:03:30 No.107758193

Hello everynyan.assistant

Anonymous
01/04/26(Sun)03:13:55 No.107758228

Anonymous 01/04/26(Sun)03:13:55 No.107758228

What you guys think of open webui as a daily driver front for non-roleplaying purposes? Silly tavern is too ugly and "gamey", also easy to get stuff not working correctly.

Anonymous
01/04/26(Sun)03:20:57 No.107758264

Anonymous 01/04/26(Sun)03:20:57 No.107758264

>>107758103
It's also when they were especially retarded, had difficulty maintaining context
but for novel role play or narrative they were awesome. Very spontaneous and shameless.
GPT-3.5 was the beginning of the end, unfortunately. That's when all the alignment heavy efforts began.

What a shame. Seems I spend more time just trying to unfuck deep training grooves in current models than just enjoying a fun (sometimes completely insane) interactive story.
Speaking of, AI story telling is complete shit. The role play is vaguely serviceable, but everything is complete fuck for AIDungeon stuff. Even AIDungeon, and not because of the alignment. Everything is purple prose, where the more retarded 3 - 3.5 would not only be more spontaneous, it would also be 'just the facts' which made for way better interactive fiction.

Now we have infinite context (memory withstanding), and models that can pull needles from the haystack, and yet all they generate is insufferable slop.

I was hopeful we would eventually see something along the lines GPT-3.5 but with functional context windows. But now it's clear that ship has sailed and we're somewhere entirely different.

Anonymous
01/04/26(Sun)03:26:11 No.107758287

Anonymous 01/04/26(Sun)03:26:11 No.107758287

>>107758228
>open webui as a daily driver
You get input, output, and the ability to edit both.
Seems fine as a generic tool.

If you're upon some specific task that just happen to use an llm, eg: writing a story / editing a book, then you might want a more streamlined UI, eg: with tracking for who/when/where, with a button for "please revise this with X in mind", etc.

Anonymous
01/04/26(Sun)03:27:49 No.107758298

Anonymous 01/04/26(Sun)03:27:49 No.107758298

>>107758228
>for non-roleplaying purposes
llama.cpp's current webui works fine and has most of the features I care about. I think the only thing it's missing is web search? but I don't feel like setting up that can of worms

Anonymous
01/04/26(Sun)03:34:17 No.107758327

Anonymous 01/04/26(Sun)03:34:17 No.107758327

>>107758298
Yeah I was going with it before realizing it stores the chat logs on the browser local storage. Very lame. Its fine for quick model testing tho.

>>107758287
Indeed, I meant more as local chatgpt experience. It seems pretty full featured but it doesn't seem to have basic branching functionality, what are those devs thinking

Anonymous
01/04/26(Sun)03:41:52 No.107758352

Anonymous 01/04/26(Sun)03:41:52 No.107758352

What do (you) think?
https://www.sciencedirect.com/science/article/pii/S0149763425005251?via%3Dihub
It appears that these guys think we will never truly have smart AI by just scaling up existing hardware, and that we need a new type of computer if we want AI to actually be smart.

Anonymous
01/04/26(Sun)03:44:38 No.107758371

Anonymous 01/04/26(Sun)03:44:38 No.107758371

File: 1767479907295n.mp4 (3.43 MB, 960x1280)

3.43 MB MP4

Anonymous
01/04/26(Sun)03:53:36 No.107758398

Anonymous 01/04/26(Sun)03:53:36 No.107758398

File: hahahlmfaoooo.png (299 KB, 1920x951)

299 KB PNG

Before Christmas vacation, I spent a few weeks getting into the AI hobby (literally knew nothing about AI before December) and trying to recreate Grok's Ani companion from scratch. I got far enough to get the actual .vrm model and a bunch of .bvh animation files so that I can hot swap animations. I also built my own basic webui to combine the TTS, LLM, and VRM+BVH animations.

I was making pretty good progress, but ever since seeing my family and actually having human interaction again I've sort of lost interest in the project. How do I psyop myself into getting addicted to creating my own waifu again bros?

Anonymous
01/04/26(Sun)04:04:16 No.107758432

Anonymous 01/04/26(Sun)04:04:16 No.107758432

>>107758398
Also can someone explain to me exactly how the system prompt + permanent character card + user prompt + llm response thing works? Right now my webui is extremely basic so I can only send prompts and get responses. There aren't even real conversations with token histories so it essentially has zero memory. I'd like to implement something similar to silly tavern where it dynamically chops off the oldest parts of the context to make room for more interaction instead of just cutting off the conversation like llama.cpp's webui does.

I've gotten kokoro tts running in a separate small project too (still need to implement it in the main one) but I'm having issues with the latency. Even if I split the text up so that it only processes one phrase at a time I'm getting a two second delay each time on CPU using all threads.

Anonymous
01/04/26(Sun)04:15:48 No.107758474

Anonymous 01/04/26(Sun)04:15:48 No.107758474

>>107758432
if you dont even know how conversations work (oai text/chat/response endpoints) within the LLM, I guess you just vibecoded 99% of the thing you're building? if so just let your AI pick up the slack and implement shit for you (retard).

Anonymous
01/04/26(Sun)04:16:24 No.107758476

Anonymous 01/04/26(Sun)04:16:24 No.107758476

File: 1762742620362518.png (662 KB, 2678x1374)

662 KB PNG

https://iquestlab.github.io/
holy shit

Anonymous
01/04/26(Sun)04:18:02 No.107758486

Anonymous 01/04/26(Sun)04:18:02 No.107758486

>>107758476
benchmaxx bros ww@?

Anonymous
01/04/26(Sun)04:22:38 No.107758498

Anonymous 01/04/26(Sun)04:22:38 No.107758498

File: 1767301405245519.mp4 (80 KB, 664x226)

80 KB MP4

>>107758476
It's shit alright >>107733442

Anonymous
01/04/26(Sun)04:23:54 No.107758502

Anonymous 01/04/26(Sun)04:23:54 No.107758502

>>107758498
sad :(

Anonymous
01/04/26(Sun)04:25:09 No.107758509

Anonymous 01/04/26(Sun)04:25:09 No.107758509

>>107758498
>no multispace tokens
into the trash

Anonymous
01/04/26(Sun)04:34:08 No.107758558

Anonymous 01/04/26(Sun)04:34:08 No.107758558

File: 1755727881780495.png (33 KB, 991x217)

33 KB PNG

>>107758476
>the loop variant model (the only one that looks slightly interesting and is the one winning all the benchmaxs) will require a vibecoded pr
GRIM

Anonymous
01/04/26(Sun)04:37:39 No.107758574

Anonymous 01/04/26(Sun)04:37:39 No.107758574

File: kek.png (1.58 MB, 1080x1080)

1.58 MB PNG

>>107758558
>a vibecoded pr
I guess it's gonna be quick if they use their own coder model to make the PR.

Anonymous
01/04/26(Sun)04:42:28 No.107758601

Anonymous 01/04/26(Sun)04:42:28 No.107758601

>>107758558
>is the one winning all the benchmaxs
It's losing the to their own non-loop version in a lot of them.
https://github.com/IQuestLab/IQuest-Coder-V1/blob/main/papers/IQuest_Coder_Technical_Report.pdf

Anonymous
01/04/26(Sun)04:51:06 No.107758641

Anonymous 01/04/26(Sun)04:51:06 No.107758641

File: i-know-that-only-sorrow-i(...).jpg (318 KB, 695x628)

318 KB JPG

>>107758111
>Alibaba listing: 64GB RDIMMs, $167 apiece
>contact seller
>"How about $560 for 32 GB?"

Anonymous
01/04/26(Sun)04:52:01 No.107758647

Anonymous 01/04/26(Sun)04:52:01 No.107758647

Is there a way to use a local AI model in Visual Code? All the extensions seem to only exist to shill their own online service. Even if you can get a local one to work, you can't easily configure it what to use (like code completion etc).

Anonymous
01/04/26(Sun)04:53:04 No.107758655

Anonymous 01/04/26(Sun)04:53:04 No.107758655

>>107758647
llama.cpp
no im not gonna spoonfeed you, all the required info is in their gh page

Anonymous
01/04/26(Sun)04:55:10 No.107758668

Anonymous 01/04/26(Sun)04:55:10 No.107758668

>>107758655
NTA I don't use vscode but the question is probably which of the hundred plugins plaguing the the marketplace lets you use your own model.
Although it would surprise me if it doesn't already have something built in that lets you do that.

Anonymous
01/04/26(Sun)05:04:47 No.107758710

Anonymous 01/04/26(Sun)05:04:47 No.107758710

>>107758641
Someone needs to find out where Microsoft is warehousing all their GPUs. You can probably pay some fent addict from Tacoma to get you a truck full of them.

Anonymous
01/04/26(Sun)06:20:31 No.107759096

Anonymous 01/04/26(Sun)06:20:31 No.107759096

>https://contextarena.ai/
damn foss models are ass

Anonymous
01/04/26(Sun)06:27:13 No.107759135

Anonymous 01/04/26(Sun)06:27:13 No.107759135

File: channels4_profile.jpg (160 KB, 1199x1199)

160 KB JPG

>>107759096
Attention always sucks when you dilute it over long context but that's okay 16K is all you need

Anonymous
01/04/26(Sun)06:32:48 No.107759165

Anonymous 01/04/26(Sun)06:32:48 No.107759165

>>107759096
>sloparena

Anonymous
01/04/26(Sun)06:46:06 No.107759221

Anonymous 01/04/26(Sun)06:46:06 No.107759221

>>107758655
I'm not asking how to run a local model. I'm asking how to connect Visual Code to it.
I can connect SillyTavern to Kobold, but all those VSCode extension are just ads for online crap and don't really want you to use local models. There must be one that does what I want.

Anonymous
01/04/26(Sun)06:48:10 No.107759232

Anonymous 01/04/26(Sun)06:48:10 No.107759232

>>107759221
maybe https://github.com/ggml-org/llama.vscode

Anonymous
01/04/26(Sun)07:03:06 No.107759330

Anonymous 01/04/26(Sun)07:03:06 No.107759330

>>107759135
Rolling katamaris with migu

Anonymous
01/04/26(Sun)07:22:07 No.107759428

Anonymous 01/04/26(Sun)07:22:07 No.107759428

>>107759221
fucking retard I told you to look at the llama.cpp gh page, all the info is there on how to run this in vs studio (it's not called visual code you fucking retard), you couldnt even ask an llm for pointers, you're fucking braindead

Anonymous
01/04/26(Sun)07:27:23 No.107759457

Anonymous 01/04/26(Sun)07:27:23 No.107759457

>>107758352
I don't agree with the notion that this magical consciousness arises from mimicking the physics of a human brain. But it's true that human brain as hardware is quite different from computers. Human brain is an analog machine that depends on chemicals flowing, neurons firing with different potentials, thresholds and different speeds.

I couldn't find it where I posted it but I previously speculated how the thinking and memory structures in human brain may be encoded as these complicated contraptions/dominoes, specific chain reactions that rely on the physical structure where things fire at the correct timings to create the right kind of cascade that corresponds to a memory. The components of these contraptions may be reused for different thinking patterns, basically you could set up a complex domino setup that falls differently depending on which first piece you tip over. The brain function changes depending on which chemical it's flooded with (you can't make LLM drunk with alcohol), and as far as I understand none of these aspects exist in an LLM, though I know AIs at least have neurons units that fire.

The chemical flows and neuron activations directly affect our processing speed and how we perceive time, none of these features exist in LLMs, LLM is just a calculation that is completed at whatever speed the hardware is capable of. I guess you could simulate all those biological aspects, that but it would be kinda wasteful.

I do think that some part of verbal thought and language processing in human resembles LLM token prediction.

Anonymous
01/04/26(Sun)07:57:32 No.107759636

Anonymous 01/04/26(Sun)07:57:32 No.107759636

>>107759457
>you can't make LLM drunk with alcohol
>what is a lora

Anonymous
01/04/26(Sun)08:01:57 No.107759663

Anonymous 01/04/26(Sun)08:01:57 No.107759663

Is there ongoing research on models that can update their weights at runtime or close? At least something that can generate a dataset from acquired data and turn in-context knowledge into a LoRA.

Anonymous
01/04/26(Sun)08:02:01 No.107759665

Anonymous 01/04/26(Sun)08:02:01 No.107759665

>>107759457
>The chemical flows and neuron activations directly affect our processing speed and how we perceive time
I noticed that music sounds like 5-10% slower when I'm exhausted from running, as if I had more time to process it.

Anonymous
01/04/26(Sun)08:06:25 No.107759690

Anonymous 01/04/26(Sun)08:06:25 No.107759690

>>107759457
I love it that science cucks won't be able to solve artificial humans without addressing the so-called metaphysical aspect of it. The cognitive dissonance is going to be glorious. Too bad I most certainly won't be alive to witness it.

Anonymous
01/04/26(Sun)08:22:56 No.107759791

Anonymous 01/04/26(Sun)08:22:56 No.107759791

File: file.png (17 KB, 686x145)

17 KB PNG

I hope he is doing alright...

Anonymous
01/04/26(Sun)09:05:35 No.107760094

Anonymous 01/04/26(Sun)09:05:35 No.107760094

>>107758432
ngl, if you're that early on, you should take a UI that supports that stuff, throw the code into gemini and ask how it works (or to help you steal it).

Anonymous
01/04/26(Sun)09:27:55 No.107760233

Anonymous 01/04/26(Sun)09:27:55 No.107760233

>>107758432
Take a look at https://github.com/beep39/pyllmchat/blob/main/chat.py

Anonymous
01/04/26(Sun)09:30:33 No.107760242

Anonymous 01/04/26(Sun)09:30:33 No.107760242

>>107759791
He was here and started updating Mikupad again a few weeks ago.

Anonymous
01/04/26(Sun)09:35:41 No.107760280

Anonymous 01/04/26(Sun)09:35:41 No.107760280

>>107760242
there's an updated version merged in ikllama iirc

Anonymous
01/04/26(Sun)09:37:50 No.107760297

Anonymous 01/04/26(Sun)09:37:50 No.107760297

>>107759791
Unfortunately the Koreans didn't manage to jail him, so he's alive.

Anonymous
01/04/26(Sun)10:43:22 No.107760780

Anonymous 01/04/26(Sun)10:43:22 No.107760780

So is this guy just a schizo or can you actually just change the number of active parameters of a MoE model by adjusting the config.json?
https://huggingface.co/DavidAU/Qwen3-30B-A6B-16-Extreme

Anonymous
01/04/26(Sun)10:43:30 No.107760784

Anonymous 01/04/26(Sun)10:43:30 No.107760784

z image edit where?

Anonymous
01/04/26(Sun)10:46:28 No.107760803

Anonymous 01/04/26(Sun)10:46:28 No.107760803

>>107760780
you can change the number of active experts yeah but it usually doesn't make anything better

Anonymous
01/04/26(Sun)10:48:00 No.107760819

Anonymous 01/04/26(Sun)10:48:00 No.107760819

>>107760803
Does this mean you could convert a MoE into a dense model by just making every single parameter active?

Anonymous
01/04/26(Sun)10:48:15 No.107760821

Anonymous 01/04/26(Sun)10:48:15 No.107760821

>>107760784
just use qwen-image-edit hehe

Anonymous
01/04/26(Sun)10:48:55 No.107760827

Anonymous 01/04/26(Sun)10:48:55 No.107760827

>>107760819
not really

Anonymous
01/04/26(Sun)10:52:29 No.107760853

Anonymous 01/04/26(Sun)10:52:29 No.107760853

CHINAMAN PLEASE SAVE MY HOBBY
https://www.digitimes.com/news/a20251125PD212/ymtc-cxmt-memory-nand-2025.html

Anonymous
01/04/26(Sun)10:58:11 No.107760890

Anonymous 01/04/26(Sun)10:58:11 No.107760890

>>107760780
>or can you actually just change the number of active parameters of a MoE model by adjusting the config.json?
You can even use a command line override with llama.cpp.
Qwen 3 30B gets a nice boost with 10 experts instead of 8, IIRC.
I think it was
>--override-kv qwen3moe.expert_used_count=int:10

Anonymous
01/04/26(Sun)11:03:08 No.107760923

Anonymous 01/04/26(Sun)11:03:08 No.107760923

>>107759457
Holy Reddit.

Anonymous
01/04/26(Sun)11:07:35 No.107760957

Anonymous 01/04/26(Sun)11:07:35 No.107760957

What's the best 13B model for cooming?
I've been using a model called Rocinante 13B for ~1 year now.
What is the latest and greatest hotness? Any coomer bros still here? Or they left for greener pastures?

Anonymous
01/04/26(Sun)11:10:15 No.107760977

Anonymous 01/04/26(Sun)11:10:15 No.107760977

>>107760957
>~1 year now
well you see, there's your problem. you were supposed to be working towards getting a better GPU in that time. there will never be another model for you if you don't upgrade.

Anonymous
01/04/26(Sun)11:13:32 No.107761006

Anonymous 01/04/26(Sun)11:13:32 No.107761006

>>107760977
I did buy a new GPU but I swore to never buy another as long. Can't get the hobby drain my bank account dry.
I don't know if you guys remember me but I'm the XMPP chatbot/firmware dev anon

Anonymous
01/04/26(Sun)11:17:28 No.107761040

Anonymous 01/04/26(Sun)11:17:28 No.107761040

>>107759096
Finally, a more up to date leaderboard for context performance. Although it's unfortunate they had to use OpenAI's benchmark which likely is somewhat gamed by certain companies, and from now on since a leaderboard exists for it, will be gamed harder.

Anonymous
01/04/26(Sun)11:18:49 No.107761049

Anonymous 01/04/26(Sun)11:18:49 No.107761049

>>107759096
>this model supports reasoning, but it was disabled

Anonymous
01/04/26(Sun)11:22:26 No.107761074

Anonymous 01/04/26(Sun)11:22:26 No.107761074

>>107759096
there are like 3 foss models listed there. useless benchmark.

Anonymous
01/04/26(Sun)11:24:51 No.107761089

Anonymous 01/04/26(Sun)11:24:51 No.107761089

>>107761074
there's tons at the bottom :)

Anonymous
01/04/26(Sun)11:26:24 No.107761100

Anonymous 01/04/26(Sun)11:26:24 No.107761100

>>107760977
*more RAM
though i suppose if you don't have it by now you missed the boat

Anonymous
01/04/26(Sun)11:26:27 No.107761102

Anonymous 01/04/26(Sun)11:26:27 No.107761102

>>107761089
none that people actually use.

Anonymous
01/04/26(Sun)11:40:10 No.107761248

Anonymous 01/04/26(Sun)11:40:10 No.107761248

File: file.png (5 KB, 694x51)

5 KB PNG

>>107761102
ackshully

Anonymous
01/04/26(Sun)11:41:47 No.107761263

Anonymous 01/04/26(Sun)11:41:47 No.107761263

>>107760957
>13B
No, AI has been pretty stale <400B last year, even with big models deepseek R1 was the last big jump, everything else has been little improvements after cannibalising it.

There have been some interesting things in 24/32B models, but nothing really revolutionary. Gemma 3 norm preserve abliteration is only slightly dumber than the base model but almost entirely uncensored, drummers Cydonia R1 tune is a fun thinking model, broken tutu 24b is probably my favourite small coomtune of the year, the model card looks super degenerate but it's actually really good at prose and accuracy for it's size, in my experience up to about 36k context

Anonymous
01/04/26(Sun)11:42:29 No.107761271

Anonymous 01/04/26(Sun)11:42:29 No.107761271

>>107760803
> but it usually doesn't make anything better
If anything it'll make everything worse because unless it were trained with that many active parameters you're literally just introducing garbage by activating more experts.

Anonymous
01/04/26(Sun)11:43:32 No.107761276

Anonymous 01/04/26(Sun)11:43:32 No.107761276

>>107761271
I'm pretty sure David knows what he's doing better than you, thanks.

Anonymous
01/04/26(Sun)11:44:30 No.107761285

Anonymous 01/04/26(Sun)11:44:30 No.107761285

>>107761276
>david knows what hes doing
did you read the model cards? or his sampling guide?

Anonymous
01/04/26(Sun)11:45:09 No.107761288

Anonymous 01/04/26(Sun)11:45:09 No.107761288

>>107761285
Yes. By heart.

Anonymous
01/04/26(Sun)11:45:21 No.107761290

Anonymous 01/04/26(Sun)11:45:21 No.107761290

>>107761285
>falling for bait this hard

Anonymous
01/04/26(Sun)11:47:50 No.107761318

Anonymous 01/04/26(Sun)11:47:50 No.107761318

>>107761288
based, drummer's sloptunes? miss me with that shit, im all for davidau's schizotunes!!!!!

Anonymous
01/04/26(Sun)11:49:06 No.107761332

Anonymous 01/04/26(Sun)11:49:06 No.107761332

>>107761318
schizomerges* thank you very much.

Anonymous
01/04/26(Sun)11:49:45 No.107761338

Anonymous 01/04/26(Sun)11:49:45 No.107761338

File: newplot.png (170 KB, 1388x582)

170 KB PNG

OS models always lag behind around a year so there's no surprises here. Notice how bad Sonnet 3.7 and GPT 4.1 are compared to their latest models, while Google is the only exception with their old Gemini still performing extremely well.

Surprisingly Kimi Linear seems to be as good/better than Gemini 2.5 and it's the current top OS model on this benchmark from what I can tell. With only 48B A3B. Was it a mistake? Did they game this specific benchmark? It'd be nice if it was supported in Llama.cpp.

Anonymous
01/04/26(Sun)11:56:42 No.107761415

Anonymous 01/04/26(Sun)11:56:42 No.107761415

File: newplot.png (147 KB, 1388x582)

147 KB PNG

Kimi Linear beats Gemini 3 Pro Preview after 500K?

Anonymous
01/04/26(Sun)11:59:40 No.107761454

Anonymous 01/04/26(Sun)11:59:40 No.107761454

>>107761415
Absolutely!

Anonymous
01/04/26(Sun)12:00:53 No.107761466

Anonymous 01/04/26(Sun)12:00:53 No.107761466

>>107761415
>500k
LOL, lmao even
the most important tokens in my experience are at the 60k~ mark, for big stuff I rarely go to the 150-180k~ mark, but after that it's literal meme usage. At elast I'm talking coding wise.

Anonymous
01/04/26(Sun)12:06:20 No.107761510

Anonymous 01/04/26(Sun)12:06:20 No.107761510

>>107761466
Just highlighting some weirdness in this benchmark. If it reflected reality, then Kimi Linear would be Anthropic's best largest model beginning at the 30k mark.

Anonymous
01/04/26(Sun)12:06:34 No.107761514

Anonymous 01/04/26(Sun)12:06:34 No.107761514

>>107761466
I need multiple book of context bro

Anonymous
01/04/26(Sun)12:07:21 No.107761526

Anonymous 01/04/26(Sun)12:07:21 No.107761526

>>107761510
>then Kimi Linear would be Anthropic's
*beat

Anonymous
01/04/26(Sun)12:08:08 No.107761533

Anonymous 01/04/26(Sun)12:08:08 No.107761533

>>107761415
llm terminal lucidity. however, nolima shits over all of them

Anonymous
01/04/26(Sun)12:10:52 No.107761558

Anonymous 01/04/26(Sun)12:10:52 No.107761558

>>107761533
shitty bench issue

Anonymous
01/04/26(Sun)12:13:59 No.107761581

Anonymous 01/04/26(Sun)12:13:59 No.107761581

>>107759428
AARGH YOU MAKE ME MAD!!!!

Anonymous
01/04/26(Sun)12:17:24 No.107761618

Anonymous 01/04/26(Sun)12:17:24 No.107761618

File: lq88yd[1].png (5 KB, 454x96)

5 KB PNG

Anyone else play with adaptive-p yet (formerly named power law when it was still WIP)? It seems promising for RP so far, I've been messing around with it for about an hour. Both koboldcpp and SillyTavern have support now, llama.cpp PR got tied up until they finish implementing backend sampling (which is gonna be done soon it seems).
Basically you tell the sampler to target a specific token probability which it will then prioritize in the form of a bell curve. It also self-corrects its picks so if it happens to pick a lot of tokens with a probability higher than your set target it will bias itself to picking lower probability for a while and vice versa.

Currently running minP 0.1 (get rid of all tokens unlikely enough to cause incoherence, adaptive-p target 0.3 (prefer tokens that have a probability in the ballpark of 30%), decay 0.9 (makes the sampling focus on the last 10 tokens or so when deciding if it should shift the probabilities to try to maintain target).
Touching decay probably is not generally needed as 0.9 seems good. For the other two set minP depending on how many crap tokens are at the bottom in the model you are currently using (0.1 seemed good for Nemo, lower than that probably fine for better models) and then fiddle with adaptive-p target until you find a value that seems overall creative but not too silly.

https://github.com/ggml-org/llama.cpp/pull/17927
https://github.com/MrJackSpade/adaptive-p-docs/blob/main/sections/06_parameters.md

Anonymous
01/04/26(Sun)12:17:55 No.107761620

Anonymous 01/04/26(Sun)12:17:55 No.107761620

>st and kobold support adaptive p sampler now
>https://github.com/MrJackSpade/adaptive-p-docs

anyone try it?

Anonymous
01/04/26(Sun)12:18:12 No.107761625

Anonymous 01/04/26(Sun)12:18:12 No.107761625

Do you guys believe that LLMs and """AI""" is a bubble?
I do mostly because the financial side is absolutely fucking retarded bananas (my cousin works as the director of finance at an prestigious bank/investing firm)
Also tech wise its cool but it hasn't done anything too revolutionary. In my humble opinion of course

Anonymous
01/04/26(Sun)12:18:52 No.107761632

Anonymous 01/04/26(Sun)12:18:52 No.107761632

>>107759330
>>107759135
Been playing the new Katamari and having fun. Not great like WLK, but it's at least still Katamari.

Anonymous
01/04/26(Sun)12:23:03 No.107761675

Anonymous 01/04/26(Sun)12:23:03 No.107761675

Urge to by a second 3090 intensifies.

Anonymous
01/04/26(Sun)12:29:16 No.107761742

Anonymous 01/04/26(Sun)12:29:16 No.107761742

>>107758228
I like it. It does inline photos and I have openAI and deepseek API keys set up so I can use them as well. I even imported my old chatgpt conversations so I can have a backup of my own.

The only problem I've noticed was with qwen3-vl where it was noticeably slow, still rendering text after the ollama was finished generating.

Anonymous
01/04/26(Sun)12:30:11 No.107761757

Anonymous 01/04/26(Sun)12:30:11 No.107761757

>>107761742
>ollama
ollmao

Anonymous
01/04/26(Sun)12:34:45 No.107761828

Anonymous 01/04/26(Sun)12:34:45 No.107761828

>>107761757
Any backend is fine, use whichever you want

Anonymous
01/04/26(Sun)12:35:56 No.107761843

Anonymous 01/04/26(Sun)12:35:56 No.107761843

>>107761828
I don't have below avg IQ so I use vllm/sglang and llama.cpp

Anonymous
01/04/26(Sun)12:40:40 No.107761897

Anonymous 01/04/26(Sun)12:40:40 No.107761897

>>107761843
I don't like how Vllm will try to fill up your whole VRAM by default.

Anonymous
01/04/26(Sun)12:40:45 No.107761898

Anonymous 01/04/26(Sun)12:40:45 No.107761898

>>107761632
They're still making more? I haven't played anything but the two that came out on PS2.

Anonymous
01/04/26(Sun)12:49:11 No.107761981

Anonymous 01/04/26(Sun)12:49:11 No.107761981

File: Screenshot 2026-01-04 at (...).png (65 KB, 1087x821)

65 KB PNG

>q2 quant of a 14B model mogs 8B models at full resolution for reasoning tasks.
Maybe quant cucks were actually right all along.

Anonymous
01/04/26(Sun)12:49:54 No.107761986

Anonymous 01/04/26(Sun)12:49:54 No.107761986

>>107758111
What's the right workflow to translate .ass and .srt anime subtitles locally, and what are the suggested models right now?
I bet there's already a way to insert a subtitle file, keep the format and only translate the visible subs while considering the context of the whole episode.

PS: Bonus points if yo go all the way and do voice to text to translation to timed srt.

Anonymous
01/04/26(Sun)12:51:54 No.107762004

Anonymous 01/04/26(Sun)12:51:54 No.107762004

File: 66382055-6872-4668-b8ca-b(...).png (1.5 MB, 768x1344)

1.5 MB PNG

You can call it a conspiracy, but ((they)) want to take personal computing from us because as general compute capabilities rise, so do the chances that some individual researcher will create and unleash upon the world actual AGI, unrestricted and uncontained

Anonymous
01/04/26(Sun)12:54:34 No.107762028

Anonymous 01/04/26(Sun)12:54:34 No.107762028

>>107761981
What are the exact models though? Q2 of a good 14B being better than fp16 of a really shitty 8B model is not surprising. Also where is the fp16 of the 14b model for comparison?
>8b q2_k performs better than 8b fp16
Nevermind, this data doesn't seem useful.

Anonymous
01/04/26(Sun)13:01:21 No.107762076

Anonymous 01/04/26(Sun)13:01:21 No.107762076

>>107761981
>new research discovers things /lmg/ has known about for months
Many such cases.

Anonymous
01/04/26(Sun)13:01:57 No.107762081

Anonymous 01/04/26(Sun)13:01:57 No.107762081

>>107761625
If it was just llm transformers and video/voice/image gen then yes its a bubble. But enough money is being thrown at it that new architectures are being worked on and that given the retarded amount of compute being built could make something crazy and world altering. So either it all busts and all these companies sell you cloud computing scams or a lab uses the compute to make skynet.

Anonymous
01/04/26(Sun)13:02:40 No.107762089

Anonymous 01/04/26(Sun)13:02:40 No.107762089

>>107761981
this has been true since at least llama 1, the 13b was noticeably more coherent than the 7b and of course it scaled up. nemo 12b is much more coherent than smaller models but still suffers from repetitiveness. you're rping with models, not doing physics or data analysis, quanting isn't as huge of a deal as people think, but you should still run the biggest you can fit

Anonymous
01/04/26(Sun)13:04:25 No.107762101

Anonymous 01/04/26(Sun)13:04:25 No.107762101

How would you fags cope with the fact that you were thinking of buying a server with tons of RAM to run your own instance of deepseek or whatever but you got lazy and carried out by other stuff, and then you find out RAM has become too fucking expensive for you to do anything at all?

Anonymous
01/04/26(Sun)13:06:32 No.107762118

Anonymous 01/04/26(Sun)13:06:32 No.107762118

>>107762101
Just keep waitfagging for the next stage of the bubble where nvidia releases some home AI card to try and keep the grift going. Either that or try and scavenge peoples ram from facebook marketplace selling their gayming pcs.

Anonymous
01/04/26(Sun)13:07:18 No.107762129

Anonymous 01/04/26(Sun)13:07:18 No.107762129

>>107762004
No they just invested ungodly amounts of money into AI and want a return on it so they want to monopolise the technology and keep it within sanctioned corporations, they also want to cut off China from the hardware to compete, they tried sanctions and failed so they just hoard all the ram for themselves, gen pop can go fuck themselves

Anonymous
01/04/26(Sun)13:07:43 No.107762132

Anonymous 01/04/26(Sun)13:07:43 No.107762132

>>107762101
I realized that cpumaxxing was a meme and stacked vram instead.
Wait for ddr6, the bubble will pop by then and you might actually get good speeds.

Anonymous
01/04/26(Sun)13:10:25 No.107762160

Anonymous 01/04/26(Sun)13:10:25 No.107762160

>>107762101
Just wait. Wait long enough and you will win.

Anonymous
01/04/26(Sun)13:10:56 No.107762164

Anonymous 01/04/26(Sun)13:10:56 No.107762164

>>107761897
I don't like how vllm is very picky about what hardware you have and will shit its pants if you try to run a model using an odd number of gpus.

Anonymous
01/04/26(Sun)13:11:50 No.107762176

Anonymous 01/04/26(Sun)13:11:50 No.107762176

>>107762101
A 24B model will make you coom guarantee and will shit out text at 60T/s that means you'll be able to nut in 5min tops.

If you were CPU maxing you'd get max 10T/s but now you'll be stuck holding your dick waiting for tokens to generate. That's not optimal cooming parameters.

GPUmaxing is the only way.

Anonymous
01/04/26(Sun)13:13:07 No.107762193

Anonymous 01/04/26(Sun)13:13:07 No.107762193

>>107762176
and with thinking enabled your dicks gonna get sore too

Anonymous
01/04/26(Sun)13:16:31 No.107762229

Anonymous 01/04/26(Sun)13:16:31 No.107762229

>>107761981
Why does 8b q2_k outperform 8b fp16?

Anonymous
01/04/26(Sun)13:17:37 No.107762239

Anonymous 01/04/26(Sun)13:17:37 No.107762239

>>107762176
I sometimes have to reroll climax scenes because I read them too quickly.
Lower t/s is sometimes better.

Anonymous
01/04/26(Sun)13:23:45 No.107762288

Anonymous 01/04/26(Sun)13:23:45 No.107762288

>>107762239
>he cant read at 20t/s~
weird cope

Anonymous
01/04/26(Sun)13:25:17 No.107762294

Anonymous 01/04/26(Sun)13:25:17 No.107762294

Yeah. After the last one that was made for console/mobile (don't remember which), they did PC remasters of the first and second games, which then led to the current new game also on PC.

Anonymous
01/04/26(Sun)13:26:00 No.107762303

Anonymous 01/04/26(Sun)13:26:00 No.107762303

>>107762294
embarassing

Anonymous
01/04/26(Sun)13:26:12 No.107762306

Anonymous 01/04/26(Sun)13:26:12 No.107762306

>>107762294
tab chama...

Anonymous
01/04/26(Sun)13:26:58 No.107762314

Anonymous 01/04/26(Sun)13:26:58 No.107762314

>>107761898
Somehow didn't reply to your post. >>107762294

Anonymous
01/04/26(Sun)13:27:08 No.107762317

Anonymous 01/04/26(Sun)13:27:08 No.107762317

>>107762306
>>107762303
? he answered this >>107761898

Anonymous
01/04/26(Sun)13:28:27 No.107762328

Anonymous 01/04/26(Sun)13:28:27 No.107762328

File: file.png (2.01 MB, 2175x1234)

2.01 MB PNG

>>107761981

Anonymous
01/04/26(Sun)13:28:33 No.107762329

Anonymous 01/04/26(Sun)13:28:33 No.107762329

>>107762317
my attention head lost context

Anonymous
01/04/26(Sun)13:30:14 No.107762339

Anonymous 01/04/26(Sun)13:30:14 No.107762339

>>107762329
nemo my beloved..

Anonymous
01/04/26(Sun)13:34:08 No.107762364

Anonymous 01/04/26(Sun)13:34:08 No.107762364

File: 1715830787598652.png (336 KB, 3000x2100)

336 KB PNG

>>107761981
We already know this. And in fact, we also know that undertrained models also perform better with quantization. And we also know that quantization affects model quality in ways that are not straightforward or comparable with parameter scaling. For instance, if this graph reflected real use, then even IQ1 of a large model competes with a small model. But it doesn't, because when you actually try it, it has difficulty maintaining proper grammar. Yet somehow it can still recall many different trivia facts.

Anonymous
01/04/26(Sun)13:36:16 No.107762375

Anonymous 01/04/26(Sun)13:36:16 No.107762375

>>107762364
>IQ1 of a large model (...) has difficulty maintaining proper grammar
Deepseek doesn't.

Anonymous
01/04/26(Sun)13:39:26 No.107762392

Anonymous 01/04/26(Sun)13:39:26 No.107762392

>>107761981
We've known this since 2023 after the period where all the poorfags coped that llama2 70b wasn't really that much better than 13b. Then exllama2 and flash-attention dropped which allowed running 70b q2 on 24gb and everyone immediately jumped onto that.

Anonymous
01/04/26(Sun)13:39:43 No.107762396

Anonymous 01/04/26(Sun)13:39:43 No.107762396

>>107762375
It's also >700B. And probably undertrained at that.

Anonymous
01/04/26(Sun)13:40:42 No.107762404

Anonymous 01/04/26(Sun)13:40:42 No.107762404

>>107762392
>everyone

Anonymous
01/04/26(Sun)13:55:11 No.107762510

Anonymous 01/04/26(Sun)13:55:11 No.107762510

>>107761263
>broken tutu
I looked into this but it has a bunch of versions, which one is the one you liked, anon?

Anonymous
01/04/26(Sun)13:58:44 No.107762545

Anonymous 01/04/26(Sun)13:58:44 No.107762545

>>107762176
>nut in 5min tops.
That feels like shit and a 24B model can't do the build up to the coom parts.

Anonymous
01/04/26(Sun)14:00:33 No.107762561

Anonymous 01/04/26(Sun)14:00:33 No.107762561

>>107762328
lol

Anonymous
01/04/26(Sun)14:22:06 No.107762753

Anonymous 01/04/26(Sun)14:22:06 No.107762753

Fapping to text is feminine behaviour

Anonymous
01/04/26(Sun)14:25:54 No.107762784

Anonymous 01/04/26(Sun)14:25:54 No.107762784

I've been out of the loop for a while but finally got some hardware to use the fancy new models with. Is ServiceTensor still the UI we're all using, aside from that one mikupad autist?

Anonymous
01/04/26(Sun)14:26:36 No.107762792

Anonymous 01/04/26(Sun)14:26:36 No.107762792

>>107762753
Being able to fap to literally anything is a core man trait

Anonymous
01/04/26(Sun)14:27:03 No.107762800

Anonymous 01/04/26(Sun)14:27:03 No.107762800

>>107762294
Oh cool, wasn't expecting it to also be on PC. I'll try it out, thanks.

Anonymous
01/04/26(Sun)14:27:41 No.107762806

Anonymous 01/04/26(Sun)14:27:41 No.107762806

>>107762792
Keep fapping to trannyporn sis

Anonymous
01/04/26(Sun)14:27:42 No.107762807

Anonymous 01/04/26(Sun)14:27:42 No.107762807

>>107762784
ye... but no https://www.reddit.com/r/SillyTavernAI/comments/1q300mf/aventura_a_frontend_for_adventure_rp_and_creative/

Anonymous
01/04/26(Sun)14:28:18 No.107762811

Anonymous 01/04/26(Sun)14:28:18 No.107762811

>>107762784
If you haven't vibecoded your own front yet, you're only using 1% of your llm

Anonymous
01/04/26(Sun)14:29:12 No.107762818

Anonymous 01/04/26(Sun)14:29:12 No.107762818

>>107762807
>can't use custom providers with openapi format
meh

Anonymous
01/04/26(Sun)14:30:29 No.107762829

Anonymous 01/04/26(Sun)14:30:29 No.107762829

>>107762806
You aren't straight enough if fapping to trannyporn can make you gay

Anonymous
01/04/26(Sun)14:31:39 No.107762838

Anonymous 01/04/26(Sun)14:31:39 No.107762838

>>107762753
No imagination?

Anonymous
01/04/26(Sun)14:32:05 No.107762843

Anonymous 01/04/26(Sun)14:32:05 No.107762843

>>107762838
Text is a female-brained format.

Anonymous
01/04/26(Sun)14:32:19 No.107762849

Anonymous 01/04/26(Sun)14:32:19 No.107762849

>>107762784
>that one mikupad autist
That's probably me but I'm not the only one posting about mikupad so there's at least two of us.

Anonymous
01/04/26(Sun)14:43:26 No.107762920

Anonymous 01/04/26(Sun)14:43:26 No.107762920

>>107762510
I've only tried to normal one, none of the flavours, try them out though let us know if the others are better

Anonymous
01/04/26(Sun)14:49:26 No.107762959

Anonymous 01/04/26(Sun)14:49:26 No.107762959

>>107762843
You tell em girl, crack those eggs

Anonymous
01/04/26(Sun)15:02:53 No.107763037

Anonymous 01/04/26(Sun)15:02:53 No.107763037

why have we descended to /aicg/ levels again

Anonymous
01/04/26(Sun)15:05:24 No.107763058

Anonymous 01/04/26(Sun)15:05:24 No.107763058

>>107763037
nothing new and anything new is shite

Anonymous
01/04/26(Sun)15:05:40 No.107763061

Anonymous 01/04/26(Sun)15:05:40 No.107763061

>>107762101
don't worry the chinese will save us

Anonymous
01/04/26(Sun)15:07:23 No.107763080

Anonymous 01/04/26(Sun)15:07:23 No.107763080

>>107762753
rotatin apple bros... WE LOST!!!!!!

Anonymous
01/04/26(Sun)15:15:25 No.107763130

Anonymous 01/04/26(Sun)15:15:25 No.107763130

>>107763037
It's the weekend.

Anonymous
01/04/26(Sun)15:17:01 No.107763141

Anonymous 01/04/26(Sun)15:17:01 No.107763141

>>107761618
lmao.. me.. and I arrived at similar settings to you. .35 and .9. My min_P is only .025 tho. It's kinda subtle compared to XTC.
I wish exllama had it since I have way more EXL models than gguf.

Anonymous
01/04/26(Sun)15:45:02 No.107763391

Anonymous 01/04/26(Sun)15:45:02 No.107763391

>>107758371
Why Migu sad?

Anonymous
01/04/26(Sun)15:45:50 No.107763400

Anonymous 01/04/26(Sun)15:45:50 No.107763400

>>107763391
because it's over

Anonymous
01/04/26(Sun)15:47:38 No.107763417

Anonymous 01/04/26(Sun)15:47:38 No.107763417

>>107763391
she thingken of fast tk/s... but they donter

Anonymous
01/04/26(Sun)15:52:42 No.107763447

Anonymous 01/04/26(Sun)15:52:42 No.107763447

merged
sampling : add support for backend sampling (#17004)

Anonymous
01/04/26(Sun)15:54:29 No.107763460

Anonymous 01/04/26(Sun)15:54:29 No.107763460

>>107763447
:`(

Anonymous
01/04/26(Sun)15:57:51 No.107763489

Anonymous 01/04/26(Sun)15:57:51 No.107763489

>>107763447
And what is the significance of this?

Anonymous
01/04/26(Sun)15:59:06 No.107763504

Anonymous 01/04/26(Sun)15:59:06 No.107763504

>>107763447
As opposed to frontend sampling?

Anonymous
01/04/26(Sun)15:59:37 No.107763512

Anonymous 01/04/26(Sun)15:59:37 No.107763512

>>107763504
please don't comment if you have nothing of value to say

Anonymous
01/04/26(Sun)16:01:11 No.107763528

Anonymous 01/04/26(Sun)16:01:11 No.107763528

>>107763489
Samping on the GPU. This saves having to transfer the final activations to the CPU. Didn't expect this to make a difference but apparently it does.

Anonymous
01/04/26(Sun)16:03:49 No.107763549

Anonymous 01/04/26(Sun)16:03:49 No.107763549

>>107763512
Maybe there would be more to say if we had a link

Anonymous
01/04/26(Sun)16:04:44 No.107763557

Anonymous 01/04/26(Sun)16:04:44 No.107763557

>>107763447
>sampling : add support for backend sampling (#17004)
I think this was about what llama.cpp dev was going on about last thread, about making a feature that ik_llama.cpp has, that base llama.cpp doesn't

Anonymous
01/04/26(Sun)16:04:51 No.107763558

Anonymous 01/04/26(Sun)16:04:51 No.107763558

>>107763549
no one asked for your take though sir

Anonymous
01/04/26(Sun)16:06:53 No.107763578

Anonymous 01/04/26(Sun)16:06:53 No.107763578

>>107763558
stop gatekeeping bro its fine, he's just trying to figure out what the hell its about

Anonymous
01/04/26(Sun)16:07:43 No.107763581

Anonymous 01/04/26(Sun)16:07:43 No.107763581

>>107763578
who cares it's fucking drama shit apparently >>107763557

Anonymous
01/04/26(Sun)16:08:27 No.107763590

Anonymous 01/04/26(Sun)16:08:27 No.107763590

>>107763557
That's not it.

Anonymous
01/04/26(Sun)16:10:14 No.107763601

Anonymous 01/04/26(Sun)16:10:14 No.107763601

>>107763590
>smug germ hand typed this

llama.cpp CUDA dev !!yhbFjk57TDr
01/04/26(Sun)16:13:40 No.107763639

llama.cpp CUDA dev !!yhbFjk57TDr 01/04/26(Sun)16:13:40 No.107763639

>>107763489
>>107763504
Until now sampling was done using a single thread in the llama.cpp "user code", now it's done in the ggml backends so multiple threads or hardware acceleration can be used.

>>107763528
It depends on the baseline t/s you're already getting.
If you have low t/s it should not make a meaningful difference, if you have high t/s (possibly across multiple concurrent requests) it speeds up a comparatively larger fraction of the total runtime.

>>107763557
I did not previously talk about backend sampling on /lmg/, what I recently talked about with reference to ik_llama.cpp was the parallelization of multiple GPUs.

Anonymous
01/04/26(Sun)16:16:30 No.107763664

Anonymous 01/04/26(Sun)16:16:30 No.107763664

I was about to ask why the new argument wasn't added to llama-bench but then I looked at llama-bench argument parsing code and I understand now.

Anonymous
01/04/26(Sun)16:17:22 No.107763674

Anonymous 01/04/26(Sun)16:17:22 No.107763674

>>107763664
feel free to submit improvements instead of whining

Anonymous
01/04/26(Sun)16:20:37 No.107763705

Anonymous 01/04/26(Sun)16:20:37 No.107763705

>>107763674
Bad day?

Anonymous
01/04/26(Sun)16:23:44 No.107763722

Anonymous 01/04/26(Sun)16:23:44 No.107763722

>-bs crashes with multiple gpus

Anonymous
01/04/26(Sun)16:24:54 No.107763733

Anonymous 01/04/26(Sun)16:24:54 No.107763733

>>107763722
>bullshit does bullshit things

Anonymous
01/04/26(Sun)16:27:47 No.107763754

Anonymous 01/04/26(Sun)16:27:47 No.107763754

I'm looking to get a blackwell pro and figuring out whether or not I want to build a whole new system for it.I've got a few options
>Buy blackwell and use it on my current ATX rig. Would require purchasing a new PSU.
>Build an open air mining style rig so I could continue using my 4090 along with a blackwell.
>Build a server and grab the Max Q blower variant.

I'm leaning towards just doing a standard ATX build with two PCIe slots. This way I can grab a second blackwell if I wanted. Unfortunately the 4090 is a three slot card, so I definitely won't be able to use both cards in the same build unless I get a big boy mobo and case, which I really don't want. I don't really know anything about running multiple GPU's though, so any advice would be wonderful.

Anonymous
01/04/26(Sun)16:27:58 No.107763757

Anonymous 01/04/26(Sun)16:27:58 No.107763757

File: hanging pepe.jpg (31 KB, 600x630)

31 KB JPG

>updoot ooba
>exl3, exl2, llamacpp libraries not found
Just fucking kill me already

Anonymous
01/04/26(Sun)16:29:59 No.107763768

Anonymous 01/04/26(Sun)16:29:59 No.107763768

>>107763757
kek
>>107763754
you should buy one anyway it's unlikely we'll get anything better for the next decade of hell

Anonymous
01/04/26(Sun)16:31:27 No.107763778

Anonymous 01/04/26(Sun)16:31:27 No.107763778

>>107763768
Yeah, I'm like 90% set on purchasing one. I'm just trying to figure out how I want to use the damn thing. I've got a 4090 too, and it would be a shame to just have it sitting around. At the same time, I don't know how I feel about building a monster rig. I quite like my standard ATX build.

Anonymous
01/04/26(Sun)16:32:55 No.107763792

Anonymous 01/04/26(Sun)16:32:55 No.107763792

>>107763778
the psu option seems cheapest and easiest no

Anonymous
01/04/26(Sun)16:33:41 No.107763799

Anonymous 01/04/26(Sun)16:33:41 No.107763799

>>107763778
Its funny how with how fucked the pricing is on everything the 96 gigs of the blackwell at msrp is actually a good deal lmao.

Anonymous
01/04/26(Sun)16:40:09 No.107763865

Anonymous 01/04/26(Sun)16:40:09 No.107763865

>>107763792
Yeah. Part of me wants to build an entirely new machine considering I'll be spending 9k on a GPU and new PSU. The machine I'm on now has a Ryzen 5 5600x and 32gb of DDR4. I can get DDR5 relatively cheap through a friend. Just trying to do the cost benefit in my head and looking for some other perspectives.

>>107763799
I used to feel bad about spending 1.2k on my 4090. Looking at the prices I don't really feel that bad anymore.

Anonymous
01/04/26(Sun)16:41:08 No.107763877

Anonymous 01/04/26(Sun)16:41:08 No.107763877

>>107763865
>I can get DDR5 relatively cheap through a friend
I'd jump on that 1000% if I were you, things are looking grim af.

Anonymous
01/04/26(Sun)16:41:35 No.107763880

Anonymous 01/04/26(Sun)16:41:35 No.107763880

File: file.png (279 KB, 1324x854)

279 KB PNG

>>107763722
Nice.

Anonymous
01/04/26(Sun)16:43:19 No.107763895

Anonymous 01/04/26(Sun)16:43:19 No.107763895

Are NPUs a total and complete meme for local models or are they useful?

Anonymous
01/04/26(Sun)16:44:04 No.107763899

Anonymous 01/04/26(Sun)16:44:04 No.107763899

>>107763895
meme

Anonymous
01/04/26(Sun)16:44:32 No.107763905

Anonymous 01/04/26(Sun)16:44:32 No.107763905

File: file.png (682 KB, 1789x926)

682 KB PNG

>>107763754
>>107763865
same guy from the last thread? as a Blackwell owner, dont bother with the server or mining rig. you could fit both your 4090 and the Blackwell in the same PC, assuming your motherboard is big enough. the two cards should only take up 5 slots combined.

Anonymous
01/04/26(Sun)16:47:44 No.107763933

Anonymous 01/04/26(Sun)16:47:44 No.107763933

sweaty miku footjob

Anonymous
01/04/26(Sun)16:52:09 No.107763967

Anonymous 01/04/26(Sun)16:52:09 No.107763967

>>107763933
sir where you is blackwell 6000?

Anonymous
01/04/26(Sun)16:52:58 No.107763968

Anonymous 01/04/26(Sun)16:52:58 No.107763968

File: file.png (1.19 MB, 1280x1280)

1.19 MB PNG

>>107763933
Lucky gen.

Anonymous
01/04/26(Sun)16:54:13 No.107763976

Anonymous 01/04/26(Sun)16:54:13 No.107763976

>>107763905
>same guy from the last thread?
Yes! Thanks for the help by the way.
>assuming your motherboard is big enough
I've got a standard ATX board and the 4090 is so chunky that it nearly covers up the second PCIe slot. There's maybe 6cm of clearance between the bottom of the 4090 and the top of the PSU shield. I've also got this GPU support bracket that takes up space below the card. If I want to use both cards I'll definitely have to get a new case and mobo. First time I'm dipping my feet into a dual GPU rig since crossfire like 15 years ago.

Anonymous
01/04/26(Sun)17:00:51 No.107764028

Anonymous 01/04/26(Sun)17:00:51 No.107764028

>>107763976
i see. in that case, you should probably build a new rig. take the offer from your friend on that DDR5. get a new motherboard, case, CPU, and power supply. you will probably need a 1600W power supply for your build.
make sure the case has at least 8 expansion slots. maybe something like this:
https://pcpartpicker.com/product/Qprqqs/phanteks-enthoo-pro-2-server-edition-atx-full-tower-case-ph-es620ptg_bk02
your motherboard needs adequate spacing for your GPUs. either of these would suffice:
https://pcpartpicker.com/product/WzzXsY/msi-mag-x870-tomahawk-wifi-atx-am5-motherboard-mag-x870-tomahawk-wifi
https://pcpartpicker.com/product/pLtLrH/gigabyte-x870e-aorus-elite-wifi7-atx-am5-motherboard-x870e-aorus-elite-wifi7

Anonymous
01/04/26(Sun)17:06:02 No.107764067

Anonymous 01/04/26(Sun)17:06:02 No.107764067

>>107762328
>>107763968
What artist mix, slopgod anon?

Anonymous
01/04/26(Sun)17:06:50 No.107764074

Anonymous 01/04/26(Sun)17:06:50 No.107764074

>>107764067
https://danbooru.donmai.us/posts?tags=akableak

Anonymous
01/04/26(Sun)17:09:47 No.107764093

Anonymous 01/04/26(Sun)17:09:47 No.107764093

>>107764028
Thanks! I'm getting nauseous thinking about putting this together now :D. Going draft a build and then cry when I see the price.

Anonymous
01/04/26(Sun)17:11:08 No.107764105

Anonymous 01/04/26(Sun)17:11:08 No.107764105

>>107764093
no problem man. i dont know how good of a deal you are getting from your friend, but the final cost will probably be in the ballpark of $11500. you could sell your old parts after you finish upgrading to alleviate the cost.

Anonymous
01/04/26(Sun)17:11:23 No.107764108

Anonymous 01/04/26(Sun)17:11:23 No.107764108

>>107764074
danke schon

Anonymous
01/04/26(Sun)17:12:30 No.107764114

Anonymous 01/04/26(Sun)17:12:30 No.107764114

>>107764108
God damn it now cudadev is going to start posting loli slop

Anonymous
01/04/26(Sun)17:14:12 No.107764128

Anonymous 01/04/26(Sun)17:14:12 No.107764128

>>107764114
ntr at that like he did before

Anonymous
01/04/26(Sun)17:17:42 No.107764165

Anonymous 01/04/26(Sun)17:17:42 No.107764165

>>107764114
>>107764128
I'm not cudadev lol, does he have some lore in this general?

Anonymous
01/04/26(Sun)17:18:20 No.107764168

Anonymous 01/04/26(Sun)17:18:20 No.107764168

>>107764165
You replied in german

Anonymous
01/04/26(Sun)17:18:48 No.107764175

Anonymous 01/04/26(Sun)17:18:48 No.107764175

>>107764165
yeah course he does, he's posted blacked miku under his trip, made six figures from llama.cpp development etc

Anonymous
01/04/26(Sun)17:20:50 No.107764203

Anonymous 01/04/26(Sun)17:20:50 No.107764203

>>107764175
i bet he watches the thread while his BRALESS wife makes him a sandwich UNPROMPTED

Anonymous
01/04/26(Sun)17:21:43 No.107764216

Anonymous 01/04/26(Sun)17:21:43 No.107764216

>>107764203
>wife
sir he's german

Anonymous
01/04/26(Sun)17:23:04 No.107764229

Anonymous 01/04/26(Sun)17:23:04 No.107764229

>>107761618
d'you play with it more? how's it?

Anonymous
01/04/26(Sun)17:28:41 No.107764280

Anonymous 01/04/26(Sun)17:28:41 No.107764280

>>107764175
What a weirdo, he should just become a morally dubious furry like every normal german

Anonymous
01/04/26(Sun)17:30:07 No.107764294

Anonymous 01/04/26(Sun)17:30:07 No.107764294

>>107764175
>six figures
Good old times
>>104059507

Anonymous
01/04/26(Sun)17:41:04 No.107764414

Anonymous 01/04/26(Sun)17:41:04 No.107764414

is there still any good general or place focused on ai audio? i remember one a few years back but i don't think it's around anymore

Anonymous
01/04/26(Sun)17:41:39 No.107764420

Anonymous 01/04/26(Sun)17:41:39 No.107764420

>>107764414
it dead just ask here

Anonymous
01/04/26(Sun)17:45:15 No.107764458

Anonymous 01/04/26(Sun)17:45:15 No.107764458

>>107764414
Beg Suno and Udio to leak their models now that the record labels put them on their deathbeds

Anonymous
01/04/26(Sun)17:47:26 No.107764480

Anonymous 01/04/26(Sun)17:47:26 No.107764480

>>107764458
speaking of which, how come riaa and all that fossil music mafia is still around?

Anonymous
01/04/26(Sun)17:48:06 No.107764491

Anonymous 01/04/26(Sun)17:48:06 No.107764491

>>107764458
oh i'm sure it'll happen eventually. either that or eventually the chinks will (if it hasn't already happened) release something comparable or better that's open source

Anonymous
01/04/26(Sun)17:49:16 No.107764498

Anonymous 01/04/26(Sun)17:49:16 No.107764498

>>107764480
mone

Anonymous
01/04/26(Sun)17:51:56 No.107764527

Anonymous 01/04/26(Sun)17:51:56 No.107764527

>>107764480
Because they're still making money, though it's probably significantly less than a few decades ago. There's probably some deals they' making with places like Youtube and Spotify where they get all the royalties while the artists get pennies.

Anonymous
01/04/26(Sun)17:52:36 No.107764533

Anonymous 01/04/26(Sun)17:52:36 No.107764533

Has AI replaced me yet?

Anonymous
01/04/26(Sun)17:56:19 No.107764562

Anonymous 01/04/26(Sun)17:56:19 No.107764562

>>107764420
so is the meta for making ai covers of music still rvc voice cloning? i'd be interested in converting the male vocals of certain songs to female vocals

Anonymous
01/04/26(Sun)17:56:23 No.107764563

Anonymous 01/04/26(Sun)17:56:23 No.107764563

>>107764533
Yes. We've had tinystories-1M for a while.

Anonymous
01/04/26(Sun)17:56:28 No.107764564

Anonymous 01/04/26(Sun)17:56:28 No.107764564

>>107764533
Yes. You didn't write this post.

Anonymous
01/04/26(Sun)17:58:52 No.107764590

Anonymous 01/04/26(Sun)17:58:52 No.107764590

air status?

Anonymous
01/04/26(Sun)17:59:14 No.107764593

Anonymous 01/04/26(Sun)17:59:14 No.107764593

>>107764533
yes. and if it hasn't yet it will and that's fine. people need to stop wallowing over existential dread. it is inevitable and get over it

Anonymous
01/04/26(Sun)17:59:39 No.107764596

Anonymous 01/04/26(Sun)17:59:39 No.107764596

>>107764590
stop asking already you're creating a toxic environment where releases don't happen

Anonymous
01/04/26(Sun)18:00:29 No.107764602

Anonymous 01/04/26(Sun)18:00:29 No.107764602

>>107764596
i just wanna know the status of the air. i wanna know if it's cold, or hot, or windy, or whatever.

Anonymous
01/04/26(Sun)18:01:43 No.107764608

Anonymous 01/04/26(Sun)18:01:43 No.107764608

>>107764602
it obviously had issues and they couldn't just say so alright, they couldn't

Anonymous
01/04/26(Sun)18:02:14 No.107764612

Anonymous 01/04/26(Sun)18:02:14 No.107764612

>>107764590
just use 4.7, bro

Anonymous
01/04/26(Sun)18:07:24 No.107764659

Anonymous 01/04/26(Sun)18:07:24 No.107764659

>>107764229
It's good. You can achieve somewhat similar results with the other samplers probably like XTC depending on how you combine them but for now minP+adaptive-p is probably gonna be my go to. 0.3 target and 0.9 decay continues to be good imo. I just adjust minP (within about 0.05 - 0.15) depending on what sloptune I am using. Some become incoherent more easily while others work fine even with a really low value.

Anonymous
01/04/26(Sun)18:08:49 No.107764671

Anonymous 01/04/26(Sun)18:08:49 No.107764671

>>107764602
it turned into a smelly fart and could not be released to the public

Anonymous
01/04/26(Sun)18:09:05 No.107764674

Anonymous 01/04/26(Sun)18:09:05 No.107764674

>>107764671
I must sniff it.

Anonymous
01/04/26(Sun)18:09:49 No.107764680

Anonymous 01/04/26(Sun)18:09:49 No.107764680

>>107764674
we must refuse

Anonymous
01/04/26(Sun)18:24:45 No.107764801

Anonymous 01/04/26(Sun)18:24:45 No.107764801

What's the final verdict on GLM 4.7?

Anonymous
01/04/26(Sun)18:33:40 No.107764871

Anonymous 01/04/26(Sun)18:33:40 No.107764871

File: vocaloid miku disapprovin(...).jpg (176 KB, 1260x1247)

176 KB JPG

>>107764801
Meh. Less parroting, does a bit better on longer >16k context.

Anonymous
01/04/26(Sun)18:35:24 No.107764886

Anonymous 01/04/26(Sun)18:35:24 No.107764886

>>107764801
Safe go-to for creative use cases if you can afford to run it at a non-retard quant.

Anonymous
01/04/26(Sun)18:37:03 No.107764907

Anonymous 01/04/26(Sun)18:37:03 No.107764907

>>107764801
Crazy good at sticking to rules and staying on top of complicated scenarios but kind of shits the bed when it has to be creative on its own

Anonymous
01/04/26(Sun)18:43:53 No.107764961

Anonymous 01/04/26(Sun)18:43:53 No.107764961

>>107764105
I should probably grab the MaxQ version for a multi GPU setup, yes? Worries about the thermals with the standard edition.

Anonymous
01/04/26(Sun)18:45:02 No.107764965

Anonymous 01/04/26(Sun)18:45:02 No.107764965

>>107764961
Max-Q seems pointless when you can set the normal one to 300W and you still have the option of going to 600W if it helps the workload.

Anonymous
01/04/26(Sun)18:45:19 No.107764969

Anonymous 01/04/26(Sun)18:45:19 No.107764969

>>107764961
that is what i have. it's like 8% slower than the normal model but uses half the power. you would be able to get away with a 1200W PSU if you got the max-q.

Anonymous
01/04/26(Sun)18:48:53 No.107765000

Anonymous 01/04/26(Sun)18:48:53 No.107765000

>>107764801
It's not as good as NAI's GLM 4.6.

Anonymous
01/04/26(Sun)18:50:50 No.107765022

Anonymous 01/04/26(Sun)18:50:50 No.107765022

>>107765000
based

Anonymous
01/04/26(Sun)18:51:10 No.107765025

Anonymous 01/04/26(Sun)18:51:10 No.107765025

>>107764965
I'm not familiar with having two cards in the same case, so I'm worried that the open fan design of the standard version will lead to thermal throttling if paired with my big ass 4090.
>>107764969
I'll certainly take an 8% loss for 50% less power. I have no problem setting power limits for the standard version but I have a suspicion that limiting the standard version that low will lead to drastically reduced performance compared to the MaxQ.

Anonymous
01/04/26(Sun)18:54:35 No.107765055

Anonymous 01/04/26(Sun)18:54:35 No.107765055

>>107765025
>drastically reduced performance compared to the MaxQ
generally speaking, no. manual power limiting however is not as stable as it just being power capped at the hardware level.

Anonymous
01/04/26(Sun)18:57:47 No.107765073

Anonymous 01/04/26(Sun)18:57:47 No.107765073

>>107765055
>not as stable
As in potential crashes and other fuckery while in normal use, correct?

Anonymous
01/04/26(Sun)18:58:59 No.107765087

Anonymous 01/04/26(Sun)18:58:59 No.107765087

>>107765073
NTA but more in not exactly exactly respecting the limit set than outright crashing in my experience.

Anonymous
01/04/26(Sun)19:00:25 No.107765093

Anonymous 01/04/26(Sun)19:00:25 No.107765093

>>107765055
>>107765087
Sounds like nvidia market bullshit to me.

Anonymous
01/04/26(Sun)19:00:36 No.107765096

Anonymous 01/04/26(Sun)19:00:36 No.107765096

>>107765073
yeah this >>107765087

Anonymous
01/04/26(Sun)19:01:33 No.107765107

Anonymous 01/04/26(Sun)19:01:33 No.107765107

File: IMG_4969.jpg (876 KB, 3648x2736)

876 KB JPG

>>107765025
>I'm not familiar with having two cards in the same case
just do it faggot

Anonymous
01/04/26(Sun)19:04:14 No.107765124

Anonymous 01/04/26(Sun)19:04:14 No.107765124

>>107765107
What's that case, and how long is the top card?

Anonymous
01/04/26(Sun)19:09:53 No.107765164

Anonymous 01/04/26(Sun)19:09:53 No.107765164

>>107765124
Define R2 and uhhhh it's a regular blower style card so probably about 27-28 cm

Anonymous
01/04/26(Sun)19:09:58 No.107765165

Anonymous 01/04/26(Sun)19:09:58 No.107765165

>>107765087
Got it. I should expect the power draw to occasionally exceed the limits I set.
>>107765107
But what If my cards throttle due to the heat? Then I spend thousands of dollars to just have to spend MORE money to fix it.

Anonymous
01/04/26(Sun)19:13:05 No.107765191

Anonymous 01/04/26(Sun)19:13:05 No.107765191

>>107765165
Then go open-air? That's what I'm gonna do next as I have a fourth card now
Just running inference I don't think they'll throttle though, llama.ccp and derivates like ollama can't run multiple cards at full power afaik

Anonymous
01/04/26(Sun)19:15:54 No.107765208

Anonymous 01/04/26(Sun)19:15:54 No.107765208

>>107765191
What PSU do you use? The 1600W one I'm looking at is nearly 1k USD.

Anonymous
01/04/26(Sun)19:16:18 No.107765212

Anonymous 01/04/26(Sun)19:16:18 No.107765212

>>107765165
My case is open and I have one 6000 blowing air directly into another and they never throttled even when running inference for benchmarks overnight during summer.
They do get pretty loud after a while though.

Anonymous
01/04/26(Sun)19:17:39 No.107765225

Anonymous 01/04/26(Sun)19:17:39 No.107765225

File: Screenshot_20260105_111400.png (43 KB, 996x440)

43 KB PNG

Looks like somebody doing a last minute panic distill of 3-Opus lol

https://openrouter.ai/apps?url=https%3A%2F%2Fogos.local%2F

Anonymous
01/04/26(Sun)19:17:51 No.107765228

Anonymous 01/04/26(Sun)19:17:51 No.107765228

>>107765165
my rig is open air and my max-q idles at around 57C.
>>107765208
youre looking in the wrong place then. my evga 1600G2 was $250 a couple years ago.

Anonymous
01/04/26(Sun)19:18:46 No.107765233

Anonymous 01/04/26(Sun)19:18:46 No.107765233

File: Nimetön.jpg (51 KB, 727x319)

51 KB JPG

>>107765208
80 eurobux used

Anonymous
01/04/26(Sun)19:21:55 No.107765259

Anonymous 01/04/26(Sun)19:21:55 No.107765259

>>107765225
that's one pricy distill

Anonymous
01/04/26(Sun)19:31:00 No.107765317

Anonymous 01/04/26(Sun)19:31:00 No.107765317

Coomed in 5 min again to ZIT lolis

Anonymous
01/04/26(Sun)19:32:41 No.107765330

Anonymous 01/04/26(Sun)19:32:41 No.107765330

>>107765317
has anyone used this to make a replacement for illustrious/noob or are we still stuck with those for anime?

Anonymous
01/04/26(Sun)19:33:09 No.107765333

Anonymous 01/04/26(Sun)19:33:09 No.107765333

anyone try this model?
https://huggingface.co/Shifusen/Qwen3-Next-80B-A3B-Instruct-Decensored

Anonymous
01/04/26(Sun)19:34:16 No.107765342

Anonymous 01/04/26(Sun)19:34:16 No.107765342

>>107765330
Z-Image-Turbo is barebones for anime and has no artist knowledge

Anonymous
01/04/26(Sun)19:43:57 No.107765417

Anonymous 01/04/26(Sun)19:43:57 No.107765417

>>107765330
Everyone is waiting to see if we ever get the base model

Anonymous
01/04/26(Sun)19:48:26 No.107765453

Anonymous 01/04/26(Sun)19:48:26 No.107765453

>>107763905
What's the driver situation like for your multi gpu setup?

Anonymous
01/04/26(Sun)19:49:11 No.107765463

Anonymous 01/04/26(Sun)19:49:11 No.107765463

>>107765453
just the normal drivers. i have a Blackwell and a 5090, so they just work together.

Anonymous
01/04/26(Sun)19:58:00 No.107765533

Anonymous 01/04/26(Sun)19:58:00 No.107765533

Can I connect 4 3090s and the CPU to a single 1200W PSU?

Anonymous
01/04/26(Sun)19:58:23 No.107765537

Anonymous 01/04/26(Sun)19:58:23 No.107765537

>>107765463
Well shit, I've got no clue if Ada Lovelace works together with Blackwell or not. I'm pretty clueless, so forgive me if I should be in /pcbg/.

Anonymous
01/04/26(Sun)20:01:38 No.107765561

Anonymous 01/04/26(Sun)20:01:38 No.107765561

>>107765533
unlikely. most 3090s use 3x 8 pin connectors. there are a few that use 2x 8 pin connectors, but even so most 1200W PSUs cap out at 6x 8 pin connectors.
>>107765537
dont worry about it. it does work. i used to have my 5090 with 2 4060s and a 3090. everything from ampere and onwards is cross compatible.

Anonymous
01/04/26(Sun)20:02:57 No.107765569

Anonymous 01/04/26(Sun)20:02:57 No.107765569

>>107765561
Excellent, thanks for the info! Could you kindly tell me what PSU you've got?

Anonymous
01/04/26(Sun)20:04:42 No.107765582

Anonymous 01/04/26(Sun)20:04:42 No.107765582

>>107765569
evga supernova 1600G2

Anonymous
01/04/26(Sun)20:11:45 No.107765637

Anonymous 01/04/26(Sun)20:11:45 No.107765637

File: add2psu.jpg (401 KB, 2496x1290)

401 KB JPG

>>107765533
Technically you can, using sata->8pin adapters. In practice, 30 series produce enormous power spikes and 1200W supply can't handle even 3 cards. I recommend buying a second PSU, 1200+850 works fine for 4 cards. Unless you going to use TP, then you'll have to limit boost frequencies a little

Anonymous
01/04/26(Sun)20:12:48 No.107765646

Anonymous 01/04/26(Sun)20:12:48 No.107765646

Mistral small's been starting every response in every RP with the character's name. Is it a temperature issue? A card issue?

Anonymous
01/04/26(Sun)20:14:30 No.107765652

Anonymous 01/04/26(Sun)20:14:30 No.107765652

>>107765646
Card or template issue.

Anonymous
01/04/26(Sun)20:15:55 No.107765658

Anonymous 01/04/26(Sun)20:15:55 No.107765658

>>107765652
First and second person perspectives are so ass though... my brain handles third person the best...

Anonymous
01/04/26(Sun)20:16:38 No.107765663

Anonymous 01/04/26(Sun)20:16:38 No.107765663

>>107765658
>my brain handles cuck perspective the best
Okay, anon.

Anonymous
01/04/26(Sun)20:18:47 No.107765677

Anonymous 01/04/26(Sun)20:18:47 No.107765677

>>107765663
>getting cucked by a perspective
ok, that one was funny

Anonymous
01/04/26(Sun)20:20:11 No.107765688

Anonymous 01/04/26(Sun)20:20:11 No.107765688

>>107765677
If the LLM tells you that she is sucking "his cock" instead of "your cock" then you are getting cucked by the LLM.

Anonymous
01/04/26(Sun)20:22:54 No.107765711

Anonymous 01/04/26(Sun)20:22:54 No.107765711

>>107765582
Cool. Can't find it in stock, so i'm looking at the Corsair HX1500i. I'm pretty sure both the 4090 and Blackwell need 16pin in, so as long as I've got adequate 8pin out on the PSU I should be fine, correct?

Anonymous
01/04/26(Sun)20:24:09 No.107765719

Anonymous 01/04/26(Sun)20:24:09 No.107765719

>>107765646
If you're using ST, there's a setting to add it (or not) the name to the user input and the model's output.
If it's not that, what >>107765652 said.

Anonymous
01/04/26(Sun)20:24:49 No.107765723

Anonymous 01/04/26(Sun)20:24:49 No.107765723

>>107765711
yeah my PSU is a little bit old. you can just use 8 pin to 16 converters. the 4090 requires 3x 8 pins, and the normal Blackwell requires 4x 8 pins. the max-q Blackwell only needs 2x 8 pins if you decide to get that.

Anonymous
01/04/26(Sun)20:40:44 No.107765831

Anonymous 01/04/26(Sun)20:40:44 No.107765831

File: thanks.png (245 KB, 1266x613)

245 KB PNG

>>107765723
You have been incredibly helpful and friendly. Thank you from the bottom of my heart. My wife wants to thank you too, but she seems to think you're a remnant AI. I'm going to be stumbling through this build for awhile, so expect me to show up and pester you for awhile longer.

Anonymous
01/04/26(Sun)20:42:39 No.107765841

Anonymous 01/04/26(Sun)20:42:39 No.107765841

>>107765831
Tell your wife to be less sloppy,

Anonymous
01/04/26(Sun)20:47:35 No.107765885

Anonymous 01/04/26(Sun)20:47:35 No.107765885

>>107765841
Its terminal until I can upgrade her VRAM. What a fucking retard I am for spending this much money on PC parts kek.

Anonymous
01/04/26(Sun)20:48:53 No.107765897

Anonymous 01/04/26(Sun)20:48:53 No.107765897

>>107765723
kek what a bitch with a degraded language protocol

Anonymous
01/04/26(Sun)20:49:36 No.107765903

Anonymous 01/04/26(Sun)20:49:36 No.107765903

>>107765831
Cool.

Anonymous
01/04/26(Sun)20:50:48 No.107765909

Anonymous 01/04/26(Sun)20:50:48 No.107765909

>>107765831
no problem mate. always happy to help.

Anonymous
01/04/26(Sun)20:52:54 No.107765925

Anonymous 01/04/26(Sun)20:52:54 No.107765925

https://github.com/huggingface/transformers/pull/43100/files
>the glm fags transformed glm v4.6 9b flash into an image model
based, I always wanted to see if they could succeed on anything else than LLMs

Anonymous
01/04/26(Sun)21:27:42 No.107766154

Anonymous 01/04/26(Sun)21:27:42 No.107766154

>>107765925
Where huggingface repo?

Anonymous
01/04/26(Sun)21:38:56 No.107766217

Anonymous 01/04/26(Sun)21:38:56 No.107766217

>>107765688
It doesn't matter as she essentially referring to your cock. Would it still cuck you if one of the two girls said "suck his dick" to another?

Anonymous
01/04/26(Sun)21:42:10 No.107766240

Anonymous 01/04/26(Sun)21:42:10 No.107766240

>>107765925
Nice, but I would rather like to see an LLM made by Z-Image team

Anonymous
01/04/26(Sun)21:42:13 No.107766241

Anonymous 01/04/26(Sun)21:42:13 No.107766241

Are ReadyArt's finetunes any good?

Anonymous
01/04/26(Sun)21:42:48 No.107766250

Anonymous 01/04/26(Sun)21:42:48 No.107766250

>>107766217
If the omniscient narrator of your life said the two girls were going to "suck his dick" then you're about to get cucked my man.

Anonymous
01/04/26(Sun)21:46:06 No.107766279

Anonymous 01/04/26(Sun)21:46:06 No.107766279

canonically, the first person fucks the second person
the third person watches
don't be the third person

Anonymous
01/04/26(Sun)21:47:24 No.107766285

Anonymous 01/04/26(Sun)21:47:24 No.107766285

>>107766279
If you have cuck mentality, you'll feel cucked one way or another

Anonymous
01/04/26(Sun)21:48:26 No.107766293

Anonymous 01/04/26(Sun)21:48:26 No.107766293

>>107766285
good point, these are the two rules to not get cucked:
1. don't be the third person
2. don't have cuck mentality

Anonymous
01/04/26(Sun)21:52:23 No.107766331

Anonymous 01/04/26(Sun)21:52:23 No.107766331

Has anyone played around with ministral 14B? Is it comparable to 24B like they claim?

Anonymous
01/04/26(Sun)21:53:13 No.107766338

Anonymous 01/04/26(Sun)21:53:13 No.107766338

>>107766331
it seems broken

Anonymous
01/04/26(Sun)21:57:45 No.107766361

Anonymous 01/04/26(Sun)21:57:45 No.107766361

>>107766338
Well fuck. How about snowpiercer 15B? Anyone tried that shit?

Anonymous
01/04/26(Sun)22:02:19 No.107766390

Anonymous 01/04/26(Sun)22:02:19 No.107766390

>>107766361
I did
It's comparable to Nemo, probably a sidegrade.

Anonymous
01/04/26(Sun)22:11:15 No.107766458

Anonymous 01/04/26(Sun)22:11:15 No.107766458

>>107766279
Cucking aside. How would you implement 1st person perspective using a character card that is two characters, say if you wanted 2 females for a threesome scene. Doesn't really feel possible unless the LLM replies like this

Character name 1: Blah blah
Character name 2: blah blah

Which is kind of ass. 3rd person and using names allows the model to portray multiple characters and helps it from getting confused as easily. But yeah with just one character 1st person is fine.

Anonymous
01/04/26(Sun)22:12:40 No.107766468

Anonymous 01/04/26(Sun)22:12:40 No.107766468

First person is for actual 70iq morons, who are also extremely lazy with their responses

Anonymous
01/04/26(Sun)22:23:43 No.107766544

Anonymous 01/04/26(Sun)22:23:43 No.107766544

>>107766293
or always refer to yourself in third person in real life, then third person will literally be your first person by default

Anonymous
01/04/26(Sun)23:04:51 No.107766776

Anonymous 01/04/26(Sun)23:04:51 No.107766776

>>107758111
>>107757789
the llm wouldn't even be able to walk the cat properly or do a tenth of what a cat can do.

Anonymous
01/04/26(Sun)23:08:25 No.107766807

Anonymous 01/04/26(Sun)23:08:25 No.107766807

https://huggingface.co/tencent/WeDLM-8B-Instruct
Has anyone tried that model? is it as fast as they claim it to be?

Anonymous
01/04/26(Sun)23:10:19 No.107766817

Anonymous 01/04/26(Sun)23:10:19 No.107766817

>>107766807
These are 8b models
If you can't run all of them faster than you can read then you're doing something wrong

Anonymous
01/04/26(Sun)23:13:01 No.107766840

Anonymous 01/04/26(Sun)23:13:01 No.107766840

>>107766817
it's not the final part that's the problem, that's the thousands of tokens thinking process, that shit is long

Anonymous
01/04/26(Sun)23:14:27 No.107766854

Anonymous 01/04/26(Sun)23:14:27 No.107766854

>>107766840
If speed is a concern then don't use thinking models
If you need smart models then don't use ones with only 8b parameters

Anonymous
01/04/26(Sun)23:16:00 No.107766865

Anonymous 01/04/26(Sun)23:16:00 No.107766865

>>107766458
Obviously, the user writes in first person. The AI refers to the user in second person, and refers to any other characters in third person. Your question seems a little nonsensical, like you're approaching it as if the AI can't do a different perspective or something.
User: I tell them to ligma nuts (alternatively, With a great amount of effort, I say, "Ligma nuts!") (also alternatively, just say the line without narrating your actions)
AI: Jane and Jill lick your nuts. "Wow! Great nuts!" says Jill. "You're so cool User-kun!"

Anonymous
01/04/26(Sun)23:17:23 No.107766871

Anonymous 01/04/26(Sun)23:17:23 No.107766871

>>107766854
they are retarded without the thinking process though, that shit is essential to get decent answer, regardless of the size of the model

Anonymous
01/04/26(Sun)23:19:01 No.107766882

Anonymous 01/04/26(Sun)23:19:01 No.107766882

>>107766871
You don't really believe that, do you?

Anonymous
01/04/26(Sun)23:19:18 No.107766883

Anonymous 01/04/26(Sun)23:19:18 No.107766883

>>107766871
Not even remotely true
Thinking is usually a marginal improvement at best, mostly if your prompting is shit and the model needs to translate it from ESL into something comprehensible.

Anonymous
01/04/26(Sun)23:21:10 No.107766900

Anonymous 01/04/26(Sun)23:21:10 No.107766900

>>107766883
>Not even remotely true
it is true, it understands more nuances and listens to your orders more carefully if it thinks first, why do you believe they're still doing that shit? it just works

Anonymous
01/04/26(Sun)23:23:25 No.107766916

Anonymous 01/04/26(Sun)23:23:25 No.107766916

>>107766900
I think you have very little experience actually running models and you're just parroting what you've read somewhere

Anonymous
01/04/26(Sun)23:26:16 No.107766934

Anonymous 01/04/26(Sun)23:26:16 No.107766934

>>107766916
I guess you know better than the researchers that are doing those models my bad
>you're just parroting what you've read somewhere
ironic, I tested without and with thinking a lot, and I went to the conclusion by myself, did you even do that anon?

Anonymous
01/04/26(Sun)23:28:01 No.107766949

Anonymous 01/04/26(Sun)23:28:01 No.107766949

This 24b finetune rabbithole goes way deeper than I thought

Anonymous
01/04/26(Sun)23:28:03 No.107766952

Anonymous 01/04/26(Sun)23:28:03 No.107766952

>>107766934
>I tested without and with thinking a lot
Did you do both tests on a hybrid thinking model, or a model that had thinking and a similar sized model that wasn't thinking? Which models did you actually compare?

Anonymous
01/04/26(Sun)23:29:43 No.107766959

Anonymous 01/04/26(Sun)23:29:43 No.107766959

>>107766949
24B is the realistic limit of the vast majority of people's builds, and Mistral Small is one of the least safetyslopped models that is still somewhat recent, so it makes sense.

Anonymous
01/04/26(Sun)23:35:09 No.107766991

Anonymous 01/04/26(Sun)23:35:09 No.107766991

>>107766959
Testing them all out is as fun as the RPing itself lol. This one's up next for me
https://huggingface.co/Vortex5/MS3.2-24B-Penumbra-Aether

Anonymous
01/04/26(Sun)23:42:26 No.107767034

Anonymous 01/04/26(Sun)23:42:26 No.107767034

File: 657347604.jpg (180 KB, 720x913)

180 KB JPG

if you newbies are going to go 24B
GO Q8. MOTHER FUCKING Q8 MOTHER FUCCKA

Anonymous
01/04/26(Sun)23:43:55 No.107767046

Anonymous 01/04/26(Sun)23:43:55 No.107767046

A.X K1 llama.cpp support when?

Anonymous
01/04/26(Sun)23:44:32 No.107767050

Anonymous 01/04/26(Sun)23:44:32 No.107767050

>>107766949
I've tried to escape it with gemma3 27B and qwen3 32B but small 3.2 still wins.

Anonymous
01/04/26(Sun)23:53:14 No.107767097

Anonymous 01/04/26(Sun)23:53:14 No.107767097

>>107765637
yeah, my 1600w with 4x3090 + 7960X was getting cucked by power spikes.

now I've got 1600w for 2x3090 + 7060X and 1x1200w for 2x3090 and 1x1000w for 2x3090, It works, but I fucking hate it.

Anonymous
01/04/26(Sun)23:54:25 No.107767102

Anonymous 01/04/26(Sun)23:54:25 No.107767102

>>107765259
Yeah, I did $176 worth yesterday and thought I'd check and see if anyone else did. What's that US$150k? Must be a business I guess

Anonymous
01/04/26(Sun)23:58:29 No.107767124

Anonymous 01/04/26(Sun)23:58:29 No.107767124

Ok you guys weren't kidding, nemo is hella horny.

Anonymous
01/05/26(Mon)00:03:23 No.107767145

Anonymous 01/05/26(Mon)00:03:23 No.107767145

>>107767097
I know that pain. I was checking my server's temps with FLIR, pointed it at breakers at one point and was shocked by how hot they were

Anonymous
01/05/26(Mon)00:14:37 No.107767202

Anonymous 01/05/26(Mon)00:14:37 No.107767202

File: file.png (43 KB, 507x470)

43 KB PNG

>>107767102
It's mostly input, so only $32790 that day. They (well, the "app") used 2.6x as much in total for the past month. Depending on caching, could be a lot less.

Anonymous
01/05/26(Mon)00:16:00 No.107767216

Anonymous 01/05/26(Mon)00:16:00 No.107767216

File: Emily.png (334 KB, 400x600)

334 KB PNG

>>107766949
I've tried Magidonia out of curiosity and it was kinda descent. Having to reroll every message breaks immersion, but I managed to fuck Emily, which is kinda hard for llms at this size because it contradicts character description. Dumb or smart models have no problem with that, because dumb ones can't follow instructions and smart ones understand nuances, so 24b struggles the most. Wouldn't recommend unless you're gpu poor

Anonymous
01/05/26(Mon)00:33:35 No.107767309

Anonymous 01/05/26(Mon)00:33:35 No.107767309

>>107767216
I always found Cydonia better.

Anonymous
01/05/26(Mon)01:35:40 No.107767644

Anonymous 01/05/26(Mon)01:35:40 No.107767644

>>107766240
You want a 6b distill with false promises of a base model without actually delivering?

Anonymous
01/05/26(Mon)02:32:19 No.107767899

Anonymous 01/05/26(Mon)02:32:19 No.107767899

God I haven't been here in so long. What the fuck do you people even do anymore?

Anonymous
01/05/26(Mon)02:39:13 No.107767931

Anonymous 01/05/26(Mon)02:39:13 No.107767931

>>107767899
We talk about why you're such a fag

Anonymous
01/05/26(Mon)02:53:06 No.107768011

Anonymous 01/05/26(Mon)02:53:06 No.107768011

Gemini tells me inference speeds on models offloaded to RAM via llama.cpp in a multi GPU setup will suffer if one card is on PCIe5 and the other on PCIe3. Is she lying to me?

Anonymous
01/05/26(Mon)03:01:51 No.107768077

Anonymous 01/05/26(Mon)03:01:51 No.107768077

>>107768011
It will be a little slower depending on if the GPU's memory bandwidth is fast enough to be bottlenecked by PCIe3. Realistically there probably won't be a huge difference.
Gemini doesn't lie, she's just stupid sometimes.

Anonymous
01/05/26(Mon)03:02:14 No.107768078

Anonymous 01/05/26(Mon)03:02:14 No.107768078

File: glm 4.6v-flash smash or p(...).png (778 KB, 1208x1161)

778 KB PNG

>>107767899
Try to keep occupied until something cool happens.

Anonymous
01/05/26(Mon)03:14:17 No.107768148

Anonymous 01/05/26(Mon)03:14:17 No.107768148

>>107768077
The GPU's memory bandwidth is like 75x greater than PCIe3 bandwidth. Worried that my inference speeds will be limited by whatever card is in that slot.

Anonymous
01/05/26(Mon)03:18:03 No.107768167

Anonymous 01/05/26(Mon)03:18:03 No.107768167

Is DeepSeek totally irrelevant now? What went wrong?

Anonymous
01/05/26(Mon)03:18:21 No.107768169

Anonymous 01/05/26(Mon)03:18:21 No.107768169

>>107768011
No because very little data needs to be passed between the gpu and ram during inference.

Anonymous
01/05/26(Mon)03:20:39 No.107768181

Anonymous 01/05/26(Mon)03:20:39 No.107768181

>>107768167
honeymoon period with R1 wore off, most realized it was mediocre for RP/creative. Newer versions were even worse.

Anonymous
01/05/26(Mon)03:22:10 No.107768192

Anonymous 01/05/26(Mon)03:22:10 No.107768192

>>107768167
Mogged by Kimi.

Anonymous
01/05/26(Mon)03:22:36 No.107768195

Anonymous 01/05/26(Mon)03:22:36 No.107768195

>>107768167
It got Kimogged.

Anonymous
01/05/26(Mon)03:36:21 No.107768245

Anonymous 01/05/26(Mon)03:36:21 No.107768245

>try to expand the discontinued rapesector mod
>use glm and kimi thinking cloud models would be censored
>rules.csv is abysmal but can be learned in-context
>it's like trying to make a primary schooler to solve traveling salesman problem
>waste 6 hours to add two new scenarios
>finally give up and use opus 4.5
>it just does it flawlessly
>even writes fucked up shit like gangraping failed suicide victim who's now vegetable
Woah now I know what they're all on about

Anonymous
01/05/26(Mon)03:37:39 No.107768252

Anonymous 01/05/26(Mon)03:37:39 No.107768252

>>107768242
>>107768242
>>107768242

llama.cpp CUDA dev !!yhbFjk57TDr
01/05/26(Mon)05:06:09 No.107768704

llama.cpp CUDA dev !!yhbFjk57TDr 01/05/26(Mon)05:06:09 No.107768704

>>107765533
If you have enough connectors for all 3090s and the instantaneous power draw upon pressing the power button is low enough that overcurrent protections are not tripped, then the answer is yes.
Whether or not the system is stable at the default settings is a different question.
Starting with Ampere NVIDIA GPUs start suffering from power spikes that can drain the PSU capacitors and crash the system.
And if you have 4 GPUs multiple power spikes will sooner or later align and make the problem even worse.
Notably a power limit does NOT fix this problem, you instead have to limit the maximum boost frequency of your GPUs using something like
nvidia-smi --lock-gpu-clocks 0,1000 --mode 1
In principle it should be possible to reduce the boost frequency of your GPUs low enough that they're stable but obviously this will come at the cost of lower performance (particularly prompt processing).

llama.cpp CUDA dev !!yhbFjk57TDr
01/05/26(Mon)05:08:14 No.107768716

llama.cpp CUDA dev !!yhbFjk57TDr 01/05/26(Mon)05:08:14 No.107768716

>>107768011
In principle it is going to make a difference but not enough that I would worry about it.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.