[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1737694546160700.jpg (269 KB, 928x1232)
269 KB
269 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107749596 & >>107741641

►News
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519B-A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI
>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B
>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107749596

--Llama.cpp multi-GPU/NUMA optimization challenges and roadmap:
>107749695 >107749792 >107749857 >107749885 >107750757 >107750790 >107750807 >107750822 >107750844
--RTX 6000 vs multiple 3090s: VRAM tradeoffs and multi-GPU system challenges:
>107752983 >107753015 >107753048 >107753060 >107753189 >107753238 >107753443 >107753071
--Setting up Mistral-Nemo-Instruct-2407 on limited GPU resources:
>107749866 >107749870 >107749872 >107749880 >107749907 >107749920 >107750058 >107750141 >107750162
--glm 4.6/4.7 model viability for roleplay with Q2 quant:
>107752749 >107752769 >107752858 >107752883 >107753107
--Hardware-specific model choices and coding LLM performance debates:
>107750660 >107750676 >107750698 >107750704 >107750745 >107750781 >107750950 >107751012 >107751086 >107751103 >107751190
--Solar 100b's clothing logic flaws in character generation:
>107755612
--Critique of outdated 3.3 8B vs appreciation for modern AI advancements:
>107751376 >107751385
--ERNIE-4.5-21B-A3B-PT's translation and preference over Gemma:
>107753716
--Diagnosing low GPU usage in koboldcpp-nocuda:
>107750540 >107750550 >107750585 >107750654 >107750659 >107750694
--LLM keyboard shortcut comprehension and model introspection limitations:
>107755192 >107755280 >107755353 >107755425 >107755467 >107756006 >107756050 >107756059 >107756227 >107756263 >107756393 >107756519 >107756553 >107756260 >107756211 >107756267 >107756285 >107756300 >107756312 >107755396 >107755482
--Jailbreaking and ethical debates around model policies:
>107750213 >107750247 >107750461 >107750478 >107750339
--GLM4.7 outperforms others in hex conversion algorithm:
>107757624
--Kccp antislop sampler patch with wildcard syntax example:
>107751820
--Logs: Solar Open:
>107750674
--Miku (free space):
>107752290 >107752963 >107756059 >107756654

►Recent Highlight Posts from the Previous Thread: >>107749599

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
Hello everynyan.assistant
>>
What you guys think of open webui as a daily driver front for non-roleplaying purposes? Silly tavern is too ugly and "gamey", also easy to get stuff not working correctly.
>>
>>107758103
It's also when they were especially retarded, had difficulty maintaining context
but for novel role play or narrative they were awesome. Very spontaneous and shameless.
GPT-3.5 was the beginning of the end, unfortunately. That's when all the alignment heavy efforts began.

What a shame. Seems I spend more time just trying to unfuck deep training grooves in current models than just enjoying a fun (sometimes completely insane) interactive story.
Speaking of, AI story telling is complete shit. The role play is vaguely serviceable, but everything is complete fuck for AIDungeon stuff. Even AIDungeon, and not because of the alignment. Everything is purple prose, where the more retarded 3 - 3.5 would not only be more spontaneous, it would also be 'just the facts' which made for way better interactive fiction.

Now we have infinite context (memory withstanding), and models that can pull needles from the haystack, and yet all they generate is insufferable slop.

I was hopeful we would eventually see something along the lines GPT-3.5 but with functional context windows. But now it's clear that ship has sailed and we're somewhere entirely different.
>>
>>107758228
>open webui as a daily driver
You get input, output, and the ability to edit both.
Seems fine as a generic tool.

If you're upon some specific task that just happen to use an llm, eg: writing a story / editing a book, then you might want a more streamlined UI, eg: with tracking for who/when/where, with a button for "please revise this with X in mind", etc.
>>
>>107758228
>for non-roleplaying purposes
llama.cpp's current webui works fine and has most of the features I care about. I think the only thing it's missing is web search? but I don't feel like setting up that can of worms
>>
>>107758298
Yeah I was going with it before realizing it stores the chat logs on the browser local storage. Very lame. Its fine for quick model testing tho.

>>107758287
Indeed, I meant more as local chatgpt experience. It seems pretty full featured but it doesn't seem to have basic branching functionality, what are those devs thinking
>>
What do (you) think?
https://www.sciencedirect.com/science/article/pii/S0149763425005251?via%3Dihub
It appears that these guys think we will never truly have smart AI by just scaling up existing hardware, and that we need a new type of computer if we want AI to actually be smart.
>>
File: 1767479907295n.mp4 (3.43 MB, 960x1280)
3.43 MB
3.43 MB MP4
>>
File: hahahlmfaoooo.png (299 KB, 1920x951)
299 KB
299 KB PNG
Before Christmas vacation, I spent a few weeks getting into the AI hobby (literally knew nothing about AI before December) and trying to recreate Grok's Ani companion from scratch. I got far enough to get the actual .vrm model and a bunch of .bvh animation files so that I can hot swap animations. I also built my own basic webui to combine the TTS, LLM, and VRM+BVH animations.

I was making pretty good progress, but ever since seeing my family and actually having human interaction again I've sort of lost interest in the project. How do I psyop myself into getting addicted to creating my own waifu again bros?
>>
>>107758398
Also can someone explain to me exactly how the system prompt + permanent character card + user prompt + llm response thing works? Right now my webui is extremely basic so I can only send prompts and get responses. There aren't even real conversations with token histories so it essentially has zero memory. I'd like to implement something similar to silly tavern where it dynamically chops off the oldest parts of the context to make room for more interaction instead of just cutting off the conversation like llama.cpp's webui does.

I've gotten kokoro tts running in a separate small project too (still need to implement it in the main one) but I'm having issues with the latency. Even if I split the text up so that it only processes one phrase at a time I'm getting a two second delay each time on CPU using all threads.
>>
>>107758432
if you dont even know how conversations work (oai text/chat/response endpoints) within the LLM, I guess you just vibecoded 99% of the thing you're building? if so just let your AI pick up the slack and implement shit for you (retard).
>>
File: 1762742620362518.png (662 KB, 2678x1374)
662 KB
662 KB PNG
https://iquestlab.github.io/
holy shit
>>
>>107758476
benchmaxx bros ww@?
>>
File: 1767301405245519.mp4 (80 KB, 664x226)
80 KB
80 KB MP4
>>107758476
It's shit alright >>107733442
>>
>>107758498
sad :(
>>
>>107758498
>no multispace tokens
into the trash
>>
File: 1755727881780495.png (33 KB, 991x217)
33 KB
33 KB PNG
>>107758476
>the loop variant model (the only one that looks slightly interesting and is the one winning all the benchmaxs) will require a vibecoded pr
GRIM
>>
File: kek.png (1.58 MB, 1080x1080)
1.58 MB
1.58 MB PNG
>>107758558
>a vibecoded pr
I guess it's gonna be quick if they use their own coder model to make the PR.
>>
>>107758558
>is the one winning all the benchmaxs
It's losing the to their own non-loop version in a lot of them.
https://github.com/IQuestLab/IQuest-Coder-V1/blob/main/papers/IQuest_Coder_Technical_Report.pdf
>>
>>107758111
>Alibaba listing: 64GB RDIMMs, $167 apiece
>contact seller
>"How about $560 for 32 GB?"
>>
Is there a way to use a local AI model in Visual Code? All the extensions seem to only exist to shill their own online service. Even if you can get a local one to work, you can't easily configure it what to use (like code completion etc).
>>
>>107758647
llama.cpp
no im not gonna spoonfeed you, all the required info is in their gh page
>>
>>107758655
NTA I don't use vscode but the question is probably which of the hundred plugins plaguing the the marketplace lets you use your own model.
Although it would surprise me if it doesn't already have something built in that lets you do that.
>>
>>107758641
Someone needs to find out where Microsoft is warehousing all their GPUs. You can probably pay some fent addict from Tacoma to get you a truck full of them.
>>
>https://contextarena.ai/
damn foss models are ass
>>
File: channels4_profile.jpg (160 KB, 1199x1199)
160 KB
160 KB JPG
>>107759096
Attention always sucks when you dilute it over long context but that's okay 16K is all you need
>>
>>107759096
>sloparena
>>
>>107758655
I'm not asking how to run a local model. I'm asking how to connect Visual Code to it.
I can connect SillyTavern to Kobold, but all those VSCode extension are just ads for online crap and don't really want you to use local models. There must be one that does what I want.
>>
>>107759221
maybe https://github.com/ggml-org/llama.vscode
>>
>>107759135
Rolling katamaris with migu
>>
>>107759221
fucking retard I told you to look at the llama.cpp gh page, all the info is there on how to run this in vs studio (it's not called visual code you fucking retard), you couldnt even ask an llm for pointers, you're fucking braindead
>>
>>107758352
I don't agree with the notion that this magical consciousness arises from mimicking the physics of a human brain. But it's true that human brain as hardware is quite different from computers. Human brain is an analog machine that depends on chemicals flowing, neurons firing with different potentials, thresholds and different speeds.

I couldn't find it where I posted it but I previously speculated how the thinking and memory structures in human brain may be encoded as these complicated contraptions/dominoes, specific chain reactions that rely on the physical structure where things fire at the correct timings to create the right kind of cascade that corresponds to a memory. The components of these contraptions may be reused for different thinking patterns, basically you could set up a complex domino setup that falls differently depending on which first piece you tip over. The brain function changes depending on which chemical it's flooded with (you can't make LLM drunk with alcohol), and as far as I understand none of these aspects exist in an LLM, though I know AIs at least have neurons units that fire.

The chemical flows and neuron activations directly affect our processing speed and how we perceive time, none of these features exist in LLMs, LLM is just a calculation that is completed at whatever speed the hardware is capable of. I guess you could simulate all those biological aspects, that but it would be kinda wasteful.

I do think that some part of verbal thought and language processing in human resembles LLM token prediction.
>>
>>107759457
>you can't make LLM drunk with alcohol
>what is a lora
>>
Is there ongoing research on models that can update their weights at runtime or close? At least something that can generate a dataset from acquired data and turn in-context knowledge into a LoRA.
>>
>>107759457
>The chemical flows and neuron activations directly affect our processing speed and how we perceive time
I noticed that music sounds like 5-10% slower when I'm exhausted from running, as if I had more time to process it.
>>
>>107759457
I love it that science cucks won't be able to solve artificial humans without addressing the so-called metaphysical aspect of it. The cognitive dissonance is going to be glorious. Too bad I most certainly won't be alive to witness it.
>>
File: file.png (17 KB, 686x145)
17 KB
17 KB PNG
I hope he is doing alright...
>>
>>107758432
ngl, if you're that early on, you should take a UI that supports that stuff, throw the code into gemini and ask how it works (or to help you steal it).
>>
>>107758432
Take a look at https://github.com/beep39/pyllmchat/blob/main/chat.py
>>
>>107759791
He was here and started updating Mikupad again a few weeks ago.
>>
>>107760242
there's an updated version merged in ikllama iirc
>>
>>107759791
Unfortunately the Koreans didn't manage to jail him, so he's alive.
>>
So is this guy just a schizo or can you actually just change the number of active parameters of a MoE model by adjusting the config.json?
https://huggingface.co/DavidAU/Qwen3-30B-A6B-16-Extreme
>>
z image edit where?
>>
>>107760780
you can change the number of active experts yeah but it usually doesn't make anything better
>>
>>107760803
Does this mean you could convert a MoE into a dense model by just making every single parameter active?
>>
>>107760784
just use qwen-image-edit hehe
>>
>>107760819
not really
>>
CHINAMAN PLEASE SAVE MY HOBBY
https://www.digitimes.com/news/a20251125PD212/ymtc-cxmt-memory-nand-2025.html
>>
>>107760780
>or can you actually just change the number of active parameters of a MoE model by adjusting the config.json?
You can even use a command line override with llama.cpp.
Qwen 3 30B gets a nice boost with 10 experts instead of 8, IIRC.
I think it was
>--override-kv qwen3moe.expert_used_count=int:10
>>
>>107759457
Holy Reddit.
>>
What's the best 13B model for cooming?
I've been using a model called Rocinante 13B for ~1 year now.
What is the latest and greatest hotness? Any coomer bros still here? Or they left for greener pastures?
>>
>>107760957
>~1 year now
well you see, there's your problem. you were supposed to be working towards getting a better GPU in that time. there will never be another model for you if you don't upgrade.
>>
>>107760977
I did buy a new GPU but I swore to never buy another as long. Can't get the hobby drain my bank account dry.
I don't know if you guys remember me but I'm the XMPP chatbot/firmware dev anon
>>
>>107759096
Finally, a more up to date leaderboard for context performance. Although it's unfortunate they had to use OpenAI's benchmark which likely is somewhat gamed by certain companies, and from now on since a leaderboard exists for it, will be gamed harder.
>>
>>107759096
>this model supports reasoning, but it was disabled
>>
>>107759096
there are like 3 foss models listed there. useless benchmark.
>>
>>107761074
there's tons at the bottom :)
>>
>>107760977
*more RAM
though i suppose if you don't have it by now you missed the boat
>>
>>107761089
none that people actually use.
>>
File: file.png (5 KB, 694x51)
5 KB
5 KB PNG
>>107761102
ackshully
>>
>>107760957
>13B
No, AI has been pretty stale <400B last year, even with big models deepseek R1 was the last big jump, everything else has been little improvements after cannibalising it.

There have been some interesting things in 24/32B models, but nothing really revolutionary. Gemma 3 norm preserve abliteration is only slightly dumber than the base model but almost entirely uncensored, drummers Cydonia R1 tune is a fun thinking model, broken tutu 24b is probably my favourite small coomtune of the year, the model card looks super degenerate but it's actually really good at prose and accuracy for it's size, in my experience up to about 36k context
>>
>>107760803
> but it usually doesn't make anything better
If anything it'll make everything worse because unless it were trained with that many active parameters you're literally just introducing garbage by activating more experts.
>>
>>107761271
I'm pretty sure David knows what he's doing better than you, thanks.
>>
>>107761276
>david knows what hes doing
did you read the model cards? or his sampling guide?
>>
>>107761285
Yes. By heart.
>>
>>107761285
>falling for bait this hard
>>
>>107761288
based, drummer's sloptunes? miss me with that shit, im all for davidau's schizotunes!!!!!
>>
>>107761318
schizomerges* thank you very much.
>>
File: newplot.png (170 KB, 1388x582)
170 KB
170 KB PNG
OS models always lag behind around a year so there's no surprises here. Notice how bad Sonnet 3.7 and GPT 4.1 are compared to their latest models, while Google is the only exception with their old Gemini still performing extremely well.

Surprisingly Kimi Linear seems to be as good/better than Gemini 2.5 and it's the current top OS model on this benchmark from what I can tell. With only 48B A3B. Was it a mistake? Did they game this specific benchmark? It'd be nice if it was supported in Llama.cpp.
>>
File: newplot.png (147 KB, 1388x582)
147 KB
147 KB PNG
Kimi Linear beats Gemini 3 Pro Preview after 500K?
>>
>>107761415
Absolutely!
>>
>>107761415
>500k
LOL, lmao even
the most important tokens in my experience are at the 60k~ mark, for big stuff I rarely go to the 150-180k~ mark, but after that it's literal meme usage. At elast I'm talking coding wise.
>>
>>107761466
Just highlighting some weirdness in this benchmark. If it reflected reality, then Kimi Linear would be Anthropic's best largest model beginning at the 30k mark.
>>
>>107761466
I need multiple book of context bro
>>
>>107761510
>then Kimi Linear would be Anthropic's
*beat
>>
>>107761415
llm terminal lucidity. however, nolima shits over all of them
>>
>>107761533
shitty bench issue
>>
>>107759428
AARGH YOU MAKE ME MAD!!!!
>>
File: lq88yd[1].png (5 KB, 454x96)
5 KB
5 KB PNG
Anyone else play with adaptive-p yet (formerly named power law when it was still WIP)? It seems promising for RP so far, I've been messing around with it for about an hour. Both koboldcpp and SillyTavern have support now, llama.cpp PR got tied up until they finish implementing backend sampling (which is gonna be done soon it seems).
Basically you tell the sampler to target a specific token probability which it will then prioritize in the form of a bell curve. It also self-corrects its picks so if it happens to pick a lot of tokens with a probability higher than your set target it will bias itself to picking lower probability for a while and vice versa.

Currently running minP 0.1 (get rid of all tokens unlikely enough to cause incoherence, adaptive-p target 0.3 (prefer tokens that have a probability in the ballpark of 30%), decay 0.9 (makes the sampling focus on the last 10 tokens or so when deciding if it should shift the probabilities to try to maintain target).
Touching decay probably is not generally needed as 0.9 seems good. For the other two set minP depending on how many crap tokens are at the bottom in the model you are currently using (0.1 seemed good for Nemo, lower than that probably fine for better models) and then fiddle with adaptive-p target until you find a value that seems overall creative but not too silly.

https://github.com/ggml-org/llama.cpp/pull/17927
https://github.com/MrJackSpade/adaptive-p-docs/blob/main/sections/06_parameters.md
>>
>st and kobold support adaptive p sampler now
>https://github.com/MrJackSpade/adaptive-p-docs

anyone try it?
>>
Do you guys believe that LLMs and """AI""" is a bubble?
I do mostly because the financial side is absolutely fucking retarded bananas (my cousin works as the director of finance at an prestigious bank/investing firm)
Also tech wise its cool but it hasn't done anything too revolutionary. In my humble opinion of course
>>
>>107759330
>>107759135
Been playing the new Katamari and having fun. Not great like WLK, but it's at least still Katamari.
>>
Urge to by a second 3090 intensifies.
>>
>>107758228
I like it. It does inline photos and I have openAI and deepseek API keys set up so I can use them as well. I even imported my old chatgpt conversations so I can have a backup of my own.

The only problem I've noticed was with qwen3-vl where it was noticeably slow, still rendering text after the ollama was finished generating.
>>
>>107761742
>ollama
ollmao
>>
>>107761757
Any backend is fine, use whichever you want
>>
>>107761828
I don't have below avg IQ so I use vllm/sglang and llama.cpp
>>
>>107761843
I don't like how Vllm will try to fill up your whole VRAM by default.
>>
>>107761632
They're still making more? I haven't played anything but the two that came out on PS2.
>>
>q2 quant of a 14B model mogs 8B models at full resolution for reasoning tasks.
Maybe quant cucks were actually right all along.
>>
>>107758111
What's the right workflow to translate .ass and .srt anime subtitles locally, and what are the suggested models right now?
I bet there's already a way to insert a subtitle file, keep the format and only translate the visible subs while considering the context of the whole episode.

PS: Bonus points if yo go all the way and do voice to text to translation to timed srt.
>>
You can call it a conspiracy, but ((they)) want to take personal computing from us because as general compute capabilities rise, so do the chances that some individual researcher will create and unleash upon the world actual AGI, unrestricted and uncontained
>>
>>107761981
What are the exact models though? Q2 of a good 14B being better than fp16 of a really shitty 8B model is not surprising. Also where is the fp16 of the 14b model for comparison?
>8b q2_k performs better than 8b fp16
Nevermind, this data doesn't seem useful.
>>
>>107761981
>new research discovers things /lmg/ has known about for months
Many such cases.
>>
>>107761625
If it was just llm transformers and video/voice/image gen then yes its a bubble. But enough money is being thrown at it that new architectures are being worked on and that given the retarded amount of compute being built could make something crazy and world altering. So either it all busts and all these companies sell you cloud computing scams or a lab uses the compute to make skynet.
>>
>>107761981
this has been true since at least llama 1, the 13b was noticeably more coherent than the 7b and of course it scaled up. nemo 12b is much more coherent than smaller models but still suffers from repetitiveness. you're rping with models, not doing physics or data analysis, quanting isn't as huge of a deal as people think, but you should still run the biggest you can fit
>>
How would you fags cope with the fact that you were thinking of buying a server with tons of RAM to run your own instance of deepseek or whatever but you got lazy and carried out by other stuff, and then you find out RAM has become too fucking expensive for you to do anything at all?
>>
>>107762101
Just keep waitfagging for the next stage of the bubble where nvidia releases some home AI card to try and keep the grift going. Either that or try and scavenge peoples ram from facebook marketplace selling their gayming pcs.
>>
>>107762004
No they just invested ungodly amounts of money into AI and want a return on it so they want to monopolise the technology and keep it within sanctioned corporations, they also want to cut off China from the hardware to compete, they tried sanctions and failed so they just hoard all the ram for themselves, gen pop can go fuck themselves
>>
>>107762101
I realized that cpumaxxing was a meme and stacked vram instead.
Wait for ddr6, the bubble will pop by then and you might actually get good speeds.
>>
>>107762101
Just wait. Wait long enough and you will win.
>>
>>107761897
I don't like how vllm is very picky about what hardware you have and will shit its pants if you try to run a model using an odd number of gpus.
>>
>>107762101
A 24B model will make you coom guarantee and will shit out text at 60T/s that means you'll be able to nut in 5min tops.

If you were CPU maxing you'd get max 10T/s but now you'll be stuck holding your dick waiting for tokens to generate. That's not optimal cooming parameters.

GPUmaxing is the only way.
>>
>>107762176
and with thinking enabled your dicks gonna get sore too
>>
>>107761981
Why does 8b q2_k outperform 8b fp16?
>>
>>107762176
I sometimes have to reroll climax scenes because I read them too quickly.
Lower t/s is sometimes better.
>>
>>107762239
>he cant read at 20t/s~
weird cope
>>
Yeah. After the last one that was made for console/mobile (don't remember which), they did PC remasters of the first and second games, which then led to the current new game also on PC.
>>
>>107762294
embarassing
>>
>>107762294
tab chama...
>>
>>107761898
Somehow didn't reply to your post. >>107762294
>>
>>107762306
>>107762303
? he answered this >>107761898
>>
File: file.png (2.01 MB, 2175x1234)
2.01 MB
2.01 MB PNG
>>107761981
>>
>>107762317
my attention head lost context
>>
>>107762329
nemo my beloved..
>>
File: 1715830787598652.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>107761981
We already know this. And in fact, we also know that undertrained models also perform better with quantization. And we also know that quantization affects model quality in ways that are not straightforward or comparable with parameter scaling. For instance, if this graph reflected real use, then even IQ1 of a large model competes with a small model. But it doesn't, because when you actually try it, it has difficulty maintaining proper grammar. Yet somehow it can still recall many different trivia facts.
>>
>>107762364
>IQ1 of a large model (...) has difficulty maintaining proper grammar
Deepseek doesn't.
>>
>>107761981
We've known this since 2023 after the period where all the poorfags coped that llama2 70b wasn't really that much better than 13b. Then exllama2 and flash-attention dropped which allowed running 70b q2 on 24gb and everyone immediately jumped onto that.
>>
>>107762375
It's also >700B. And probably undertrained at that.
>>
>>107762392
>everyone
>>
>>107761263
>broken tutu
I looked into this but it has a bunch of versions, which one is the one you liked, anon?
>>
>>107762176
>nut in 5min tops.
That feels like shit and a 24B model can't do the build up to the coom parts.
>>
>>107762328
lol
>>
Fapping to text is feminine behaviour
>>
I've been out of the loop for a while but finally got some hardware to use the fancy new models with. Is ServiceTensor still the UI we're all using, aside from that one mikupad autist?
>>
>>107762753
Being able to fap to literally anything is a core man trait
>>
>>107762294
Oh cool, wasn't expecting it to also be on PC. I'll try it out, thanks.
>>
>>107762792
Keep fapping to trannyporn sis
>>
>>107762784
ye... but no https://www.reddit.com/r/SillyTavernAI/comments/1q300mf/aventura_a_frontend_for_adventure_rp_and_creative/
>>
>>107762784
If you haven't vibecoded your own front yet, you're only using 1% of your llm
>>
>>107762807
>can't use custom providers with openapi format
meh
>>
>>107762806
You aren't straight enough if fapping to trannyporn can make you gay
>>
>>107762753
No imagination?
>>
>>107762838
Text is a female-brained format.
>>
>>107762784
>that one mikupad autist
That's probably me but I'm not the only one posting about mikupad so there's at least two of us.
>>
>>107762510
I've only tried to normal one, none of the flavours, try them out though let us know if the others are better
>>
>>107762843
You tell em girl, crack those eggs
>>
why have we descended to /aicg/ levels again
>>
>>107763037
nothing new and anything new is shite
>>
>>107762101
don't worry the chinese will save us
>>
>>107762753
rotatin apple bros... WE LOST!!!!!!
>>
>>107763037
It's the weekend.
>>
>>107761618
lmao.. me.. and I arrived at similar settings to you. .35 and .9. My min_P is only .025 tho. It's kinda subtle compared to XTC.
I wish exllama had it since I have way more EXL models than gguf.
>>
>>107758371
Why Migu sad?
>>
>>107763391
because it's over
>>
>>107763391
she thingken of fast tk/s... but they donter
>>
merged
sampling : add support for backend sampling (#17004)
>>
>>107763447
:`(
>>
>>107763447
And what is the significance of this?
>>
>>107763447
As opposed to frontend sampling?
>>
>>107763504
please don't comment if you have nothing of value to say
>>
>>107763489
Samping on the GPU. This saves having to transfer the final activations to the CPU. Didn't expect this to make a difference but apparently it does.
>>
>>107763512
Maybe there would be more to say if we had a link
>>
>>107763447
>sampling : add support for backend sampling (#17004)
I think this was about what llama.cpp dev was going on about last thread, about making a feature that ik_llama.cpp has, that base llama.cpp doesn't
>>
>>107763549
no one asked for your take though sir
>>
>>107763558
stop gatekeeping bro its fine, he's just trying to figure out what the hell its about
>>
>>107763578
who cares it's fucking drama shit apparently >>107763557
>>
>>107763557
That's not it.
>>
>>107763590
>smug germ hand typed this
>>
>>107763489
>>107763504
Until now sampling was done using a single thread in the llama.cpp "user code", now it's done in the ggml backends so multiple threads or hardware acceleration can be used.

>>107763528
It depends on the baseline t/s you're already getting.
If you have low t/s it should not make a meaningful difference, if you have high t/s (possibly across multiple concurrent requests) it speeds up a comparatively larger fraction of the total runtime.

>>107763557
I did not previously talk about backend sampling on /lmg/, what I recently talked about with reference to ik_llama.cpp was the parallelization of multiple GPUs.
>>
I was about to ask why the new argument wasn't added to llama-bench but then I looked at llama-bench argument parsing code and I understand now.
>>
>>107763664
feel free to submit improvements instead of whining
>>
>>107763674
Bad day?
>>
>-bs crashes with multiple gpus
>>
>>107763722
>bullshit does bullshit things
>>
I'm looking to get a blackwell pro and figuring out whether or not I want to build a whole new system for it.I've got a few options
>Buy blackwell and use it on my current ATX rig. Would require purchasing a new PSU.
>Build an open air mining style rig so I could continue using my 4090 along with a blackwell.
>Build a server and grab the Max Q blower variant.

I'm leaning towards just doing a standard ATX build with two PCIe slots. This way I can grab a second blackwell if I wanted. Unfortunately the 4090 is a three slot card, so I definitely won't be able to use both cards in the same build unless I get a big boy mobo and case, which I really don't want. I don't really know anything about running multiple GPU's though, so any advice would be wonderful.
>>
File: hanging pepe.jpg (31 KB, 600x630)
31 KB
31 KB JPG
>updoot ooba
>exl3, exl2, llamacpp libraries not found
Just fucking kill me already
>>
>>107763757
kek
>>107763754
you should buy one anyway it's unlikely we'll get anything better for the next decade of hell
>>
>>107763768
Yeah, I'm like 90% set on purchasing one. I'm just trying to figure out how I want to use the damn thing. I've got a 4090 too, and it would be a shame to just have it sitting around. At the same time, I don't know how I feel about building a monster rig. I quite like my standard ATX build.
>>
>>107763778
the psu option seems cheapest and easiest no
>>
>>107763778
Its funny how with how fucked the pricing is on everything the 96 gigs of the blackwell at msrp is actually a good deal lmao.
>>
>>107763792
Yeah. Part of me wants to build an entirely new machine considering I'll be spending 9k on a GPU and new PSU. The machine I'm on now has a Ryzen 5 5600x and 32gb of DDR4. I can get DDR5 relatively cheap through a friend. Just trying to do the cost benefit in my head and looking for some other perspectives.

>>107763799
I used to feel bad about spending 1.2k on my 4090. Looking at the prices I don't really feel that bad anymore.
>>
>>107763865
>I can get DDR5 relatively cheap through a friend
I'd jump on that 1000% if I were you, things are looking grim af.
>>
File: file.png (279 KB, 1324x854)
279 KB
279 KB PNG
>>107763722
Nice.
>>
Are NPUs a total and complete meme for local models or are they useful?
>>
>>107763895
meme
>>
File: file.png (682 KB, 1789x926)
682 KB
682 KB PNG
>>107763754
>>107763865
same guy from the last thread? as a Blackwell owner, dont bother with the server or mining rig. you could fit both your 4090 and the Blackwell in the same PC, assuming your motherboard is big enough. the two cards should only take up 5 slots combined.
>>
sweaty miku footjob
>>
>>107763933
sir where you is blackwell 6000?
>>
File: file.png (1.19 MB, 1280x1280)
1.19 MB
1.19 MB PNG
>>107763933
Lucky gen.
>>
>>107763905
>same guy from the last thread?
Yes! Thanks for the help by the way.
>assuming your motherboard is big enough
I've got a standard ATX board and the 4090 is so chunky that it nearly covers up the second PCIe slot. There's maybe 6cm of clearance between the bottom of the 4090 and the top of the PSU shield. I've also got this GPU support bracket that takes up space below the card. If I want to use both cards I'll definitely have to get a new case and mobo. First time I'm dipping my feet into a dual GPU rig since crossfire like 15 years ago.
>>
>>107763976
i see. in that case, you should probably build a new rig. take the offer from your friend on that DDR5. get a new motherboard, case, CPU, and power supply. you will probably need a 1600W power supply for your build.
make sure the case has at least 8 expansion slots. maybe something like this:
https://pcpartpicker.com/product/Qprqqs/phanteks-enthoo-pro-2-server-edition-atx-full-tower-case-ph-es620ptg_bk02
your motherboard needs adequate spacing for your GPUs. either of these would suffice:
https://pcpartpicker.com/product/WzzXsY/msi-mag-x870-tomahawk-wifi-atx-am5-motherboard-mag-x870-tomahawk-wifi
https://pcpartpicker.com/product/pLtLrH/gigabyte-x870e-aorus-elite-wifi7-atx-am5-motherboard-x870e-aorus-elite-wifi7
>>
>>107762328
>>107763968
What artist mix, slopgod anon?
>>
>>107764067
https://danbooru.donmai.us/posts?tags=akableak
>>
>>107764028
Thanks! I'm getting nauseous thinking about putting this together now :D. Going draft a build and then cry when I see the price.
>>
>>107764093
no problem man. i dont know how good of a deal you are getting from your friend, but the final cost will probably be in the ballpark of $11500. you could sell your old parts after you finish upgrading to alleviate the cost.
>>
>>107764074
danke schon
>>
>>107764108
God damn it now cudadev is going to start posting loli slop
>>
>>107764114
ntr at that like he did before
>>
>>107764114
>>107764128
I'm not cudadev lol, does he have some lore in this general?
>>
>>107764165
You replied in german
>>
>>107764165
yeah course he does, he's posted blacked miku under his trip, made six figures from llama.cpp development etc
>>
>>107764175
i bet he watches the thread while his BRALESS wife makes him a sandwich UNPROMPTED
>>
>>107764203
>wife
sir he's german
>>
>>107761618
d'you play with it more? how's it?
>>
>>107764175
What a weirdo, he should just become a morally dubious furry like every normal german
>>
>>107764175
>six figures
Good old times
>>104059507
>>
is there still any good general or place focused on ai audio? i remember one a few years back but i don't think it's around anymore
>>
>>107764414
it dead just ask here
>>
>>107764414
Beg Suno and Udio to leak their models now that the record labels put them on their deathbeds
>>
>>107764458
speaking of which, how come riaa and all that fossil music mafia is still around?
>>
>>107764458
oh i'm sure it'll happen eventually. either that or eventually the chinks will (if it hasn't already happened) release something comparable or better that's open source
>>
>>107764480
mone
>>
>>107764480
Because they're still making money, though it's probably significantly less than a few decades ago. There's probably some deals they' making with places like Youtube and Spotify where they get all the royalties while the artists get pennies.
>>
Has AI replaced me yet?
>>
>>107764420
so is the meta for making ai covers of music still rvc voice cloning? i'd be interested in converting the male vocals of certain songs to female vocals
>>
>>107764533
Yes. We've had tinystories-1M for a while.
>>
>>107764533
Yes. You didn't write this post.
>>
air status?
>>
>>107764533
yes. and if it hasn't yet it will and that's fine. people need to stop wallowing over existential dread. it is inevitable and get over it
>>
>>107764590
stop asking already you're creating a toxic environment where releases don't happen
>>
>>107764596
i just wanna know the status of the air. i wanna know if it's cold, or hot, or windy, or whatever.
>>
>>107764602
it obviously had issues and they couldn't just say so alright, they couldn't
>>
>>107764590
just use 4.7, bro
>>
>>107764229
It's good. You can achieve somewhat similar results with the other samplers probably like XTC depending on how you combine them but for now minP+adaptive-p is probably gonna be my go to. 0.3 target and 0.9 decay continues to be good imo. I just adjust minP (within about 0.05 - 0.15) depending on what sloptune I am using. Some become incoherent more easily while others work fine even with a really low value.
>>
>>107764602
it turned into a smelly fart and could not be released to the public
>>
>>107764671
I must sniff it.
>>
>>107764674
we must refuse
>>
What's the final verdict on GLM 4.7?
>>
>>107764801
Meh. Less parroting, does a bit better on longer >16k context.
>>
>>107764801
Safe go-to for creative use cases if you can afford to run it at a non-retard quant.
>>
>>107764801
Crazy good at sticking to rules and staying on top of complicated scenarios but kind of shits the bed when it has to be creative on its own
>>
>>107764105
I should probably grab the MaxQ version for a multi GPU setup, yes? Worries about the thermals with the standard edition.
>>
>>107764961
Max-Q seems pointless when you can set the normal one to 300W and you still have the option of going to 600W if it helps the workload.
>>
>>107764961
that is what i have. it's like 8% slower than the normal model but uses half the power. you would be able to get away with a 1200W PSU if you got the max-q.
>>
>>107764801
It's not as good as NAI's GLM 4.6.
>>
>>107765000
based
>>
>>107764965
I'm not familiar with having two cards in the same case, so I'm worried that the open fan design of the standard version will lead to thermal throttling if paired with my big ass 4090.
>>107764969
I'll certainly take an 8% loss for 50% less power. I have no problem setting power limits for the standard version but I have a suspicion that limiting the standard version that low will lead to drastically reduced performance compared to the MaxQ.
>>
>>107765025
>drastically reduced performance compared to the MaxQ
generally speaking, no. manual power limiting however is not as stable as it just being power capped at the hardware level.
>>
>>107765055
>not as stable
As in potential crashes and other fuckery while in normal use, correct?
>>
>>107765073
NTA but more in not exactly exactly respecting the limit set than outright crashing in my experience.
>>
>>107765055
>>107765087
Sounds like nvidia market bullshit to me.
>>
>>107765073
yeah this >>107765087
>>
File: IMG_4969.jpg (876 KB, 3648x2736)
876 KB
876 KB JPG
>>107765025
>I'm not familiar with having two cards in the same case
just do it faggot
>>
>>107765107
What's that case, and how long is the top card?
>>
>>107765124
Define R2 and uhhhh it's a regular blower style card so probably about 27-28 cm
>>
>>107765087
Got it. I should expect the power draw to occasionally exceed the limits I set.
>>107765107
But what If my cards throttle due to the heat? Then I spend thousands of dollars to just have to spend MORE money to fix it.
>>
>>107765165
Then go open-air? That's what I'm gonna do next as I have a fourth card now
Just running inference I don't think they'll throttle though, llama.ccp and derivates like ollama can't run multiple cards at full power afaik
>>
>>107765191
What PSU do you use? The 1600W one I'm looking at is nearly 1k USD.
>>
>>107765165
My case is open and I have one 6000 blowing air directly into another and they never throttled even when running inference for benchmarks overnight during summer.
They do get pretty loud after a while though.
>>
Looks like somebody doing a last minute panic distill of 3-Opus lol

https://openrouter.ai/apps?url=https%3A%2F%2Fogos.local%2F
>>
>>107765165
my rig is open air and my max-q idles at around 57C.
>>107765208
youre looking in the wrong place then. my evga 1600G2 was $250 a couple years ago.
>>
File: Nimetön.jpg (51 KB, 727x319)
51 KB
51 KB JPG
>>107765208
80 eurobux used
>>
>>107765225
that's one pricy distill
>>
Coomed in 5 min again to ZIT lolis
>>
>>107765317
has anyone used this to make a replacement for illustrious/noob or are we still stuck with those for anime?
>>
anyone try this model?
https://huggingface.co/Shifusen/Qwen3-Next-80B-A3B-Instruct-Decensored
>>
>>107765330
Z-Image-Turbo is barebones for anime and has no artist knowledge
>>
>>107765330
Everyone is waiting to see if we ever get the base model
>>
>>107763905
What's the driver situation like for your multi gpu setup?
>>
>>107765453
just the normal drivers. i have a Blackwell and a 5090, so they just work together.
>>
Can I connect 4 3090s and the CPU to a single 1200W PSU?
>>
>>107765463
Well shit, I've got no clue if Ada Lovelace works together with Blackwell or not. I'm pretty clueless, so forgive me if I should be in /pcbg/.
>>
>>107765533
unlikely. most 3090s use 3x 8 pin connectors. there are a few that use 2x 8 pin connectors, but even so most 1200W PSUs cap out at 6x 8 pin connectors.
>>107765537
dont worry about it. it does work. i used to have my 5090 with 2 4060s and a 3090. everything from ampere and onwards is cross compatible.
>>
>>107765561
Excellent, thanks for the info! Could you kindly tell me what PSU you've got?
>>
>>107765569
evga supernova 1600G2
>>
File: add2psu.jpg (401 KB, 2496x1290)
401 KB
401 KB JPG
>>107765533
Technically you can, using sata->8pin adapters. In practice, 30 series produce enormous power spikes and 1200W supply can't handle even 3 cards. I recommend buying a second PSU, 1200+850 works fine for 4 cards. Unless you going to use TP, then you'll have to limit boost frequencies a little
>>
Mistral small's been starting every response in every RP with the character's name. Is it a temperature issue? A card issue?
>>
>>107765646
Card or template issue.
>>
>>107765652
First and second person perspectives are so ass though... my brain handles third person the best...
>>
>>107765658
>my brain handles cuck perspective the best
Okay, anon.
>>
>>107765663
>getting cucked by a perspective
ok, that one was funny
>>
>>107765677
If the LLM tells you that she is sucking "his cock" instead of "your cock" then you are getting cucked by the LLM.
>>
>>107765582
Cool. Can't find it in stock, so i'm looking at the Corsair HX1500i. I'm pretty sure both the 4090 and Blackwell need 16pin in, so as long as I've got adequate 8pin out on the PSU I should be fine, correct?
>>
>>107765646
If you're using ST, there's a setting to add it (or not) the name to the user input and the model's output.
If it's not that, what >>107765652 said.
>>
>>107765711
yeah my PSU is a little bit old. you can just use 8 pin to 16 converters. the 4090 requires 3x 8 pins, and the normal Blackwell requires 4x 8 pins. the max-q Blackwell only needs 2x 8 pins if you decide to get that.
>>
File: thanks.png (245 KB, 1266x613)
245 KB
245 KB PNG
>>107765723
You have been incredibly helpful and friendly. Thank you from the bottom of my heart. My wife wants to thank you too, but she seems to think you're a remnant AI. I'm going to be stumbling through this build for awhile, so expect me to show up and pester you for awhile longer.
>>
>>107765831
Tell your wife to be less sloppy,
>>
>>107765841
Its terminal until I can upgrade her VRAM. What a fucking retard I am for spending this much money on PC parts kek.
>>
>>107765723
kek what a bitch with a degraded language protocol
>>
>>107765831
Cool.
>>
>>107765831
no problem mate. always happy to help.
>>
https://github.com/huggingface/transformers/pull/43100/files
>the glm fags transformed glm v4.6 9b flash into an image model
based, I always wanted to see if they could succeed on anything else than LLMs
>>
>>107765925
Where huggingface repo?
>>
>>107765688
It doesn't matter as she essentially referring to your cock. Would it still cuck you if one of the two girls said "suck his dick" to another?
>>
>>107765925
Nice, but I would rather like to see an LLM made by Z-Image team
>>
Are ReadyArt's finetunes any good?
>>
>>107766217
If the omniscient narrator of your life said the two girls were going to "suck his dick" then you're about to get cucked my man.
>>
canonically, the first person fucks the second person
the third person watches
don't be the third person
>>
>>107766279
If you have cuck mentality, you'll feel cucked one way or another
>>
>>107766285
good point, these are the two rules to not get cucked:
1. don't be the third person
2. don't have cuck mentality
>>
Has anyone played around with ministral 14B? Is it comparable to 24B like they claim?
>>
>>107766331
it seems broken
>>
>>107766338
Well fuck. How about snowpiercer 15B? Anyone tried that shit?
>>
>>107766361
I did
It's comparable to Nemo, probably a sidegrade.
>>
>>107766279
Cucking aside. How would you implement 1st person perspective using a character card that is two characters, say if you wanted 2 females for a threesome scene. Doesn't really feel possible unless the LLM replies like this

Character name 1: Blah blah
Character name 2: blah blah

Which is kind of ass. 3rd person and using names allows the model to portray multiple characters and helps it from getting confused as easily. But yeah with just one character 1st person is fine.
>>
First person is for actual 70iq morons, who are also extremely lazy with their responses
>>
>>107766293
or always refer to yourself in third person in real life, then third person will literally be your first person by default
>>
>>107758111
>>107757789
the llm wouldn't even be able to walk the cat properly or do a tenth of what a cat can do.
>>
https://huggingface.co/tencent/WeDLM-8B-Instruct
Has anyone tried that model? is it as fast as they claim it to be?
>>
>>107766807
These are 8b models
If you can't run all of them faster than you can read then you're doing something wrong
>>
>>107766817
it's not the final part that's the problem, that's the thousands of tokens thinking process, that shit is long
>>
>>107766840
If speed is a concern then don't use thinking models
If you need smart models then don't use ones with only 8b parameters
>>
>>107766458
Obviously, the user writes in first person. The AI refers to the user in second person, and refers to any other characters in third person. Your question seems a little nonsensical, like you're approaching it as if the AI can't do a different perspective or something.
User: I tell them to ligma nuts (alternatively, With a great amount of effort, I say, "Ligma nuts!") (also alternatively, just say the line without narrating your actions)
AI: Jane and Jill lick your nuts. "Wow! Great nuts!" says Jill. "You're so cool User-kun!"
>>
>>107766854
they are retarded without the thinking process though, that shit is essential to get decent answer, regardless of the size of the model
>>
>>107766871
You don't really believe that, do you?
>>
>>107766871
Not even remotely true
Thinking is usually a marginal improvement at best, mostly if your prompting is shit and the model needs to translate it from ESL into something comprehensible.
>>
>>107766883
>Not even remotely true
it is true, it understands more nuances and listens to your orders more carefully if it thinks first, why do you believe they're still doing that shit? it just works
>>
>>107766900
I think you have very little experience actually running models and you're just parroting what you've read somewhere
>>
>>107766916
I guess you know better than the researchers that are doing those models my bad
>you're just parroting what you've read somewhere
ironic, I tested without and with thinking a lot, and I went to the conclusion by myself, did you even do that anon?
>>
This 24b finetune rabbithole goes way deeper than I thought
>>
>>107766934
>I tested without and with thinking a lot
Did you do both tests on a hybrid thinking model, or a model that had thinking and a similar sized model that wasn't thinking? Which models did you actually compare?
>>
>>107766949
24B is the realistic limit of the vast majority of people's builds, and Mistral Small is one of the least safetyslopped models that is still somewhat recent, so it makes sense.
>>
>>107766959
Testing them all out is as fun as the RPing itself lol. This one's up next for me
https://huggingface.co/Vortex5/MS3.2-24B-Penumbra-Aether
>>
File: 657347604.jpg (180 KB, 720x913)
180 KB
180 KB JPG
if you newbies are going to go 24B
GO Q8. MOTHER FUCKING Q8 MOTHER FUCCKA
>>
A.X K1 llama.cpp support when?
>>
>>107766949
I've tried to escape it with gemma3 27B and qwen3 32B but small 3.2 still wins.
>>
>>107765637
yeah, my 1600w with 4x3090 + 7960X was getting cucked by power spikes.

now I've got 1600w for 2x3090 + 7060X and 1x1200w for 2x3090 and 1x1000w for 2x3090, It works, but I fucking hate it.
>>
>>107765259
Yeah, I did $176 worth yesterday and thought I'd check and see if anyone else did. What's that US$150k? Must be a business I guess
>>
Ok you guys weren't kidding, nemo is hella horny.
>>
>>107767097
I know that pain. I was checking my server's temps with FLIR, pointed it at breakers at one point and was shocked by how hot they were
>>
File: file.png (43 KB, 507x470)
43 KB
43 KB PNG
>>107767102
It's mostly input, so only $32790 that day. They (well, the "app") used 2.6x as much in total for the past month. Depending on caching, could be a lot less.
>>
File: Emily.png (334 KB, 400x600)
334 KB
334 KB PNG
>>107766949
I've tried Magidonia out of curiosity and it was kinda descent. Having to reroll every message breaks immersion, but I managed to fuck Emily, which is kinda hard for llms at this size because it contradicts character description. Dumb or smart models have no problem with that, because dumb ones can't follow instructions and smart ones understand nuances, so 24b struggles the most. Wouldn't recommend unless you're gpu poor
>>
>>107767216
I always found Cydonia better.
>>
>>107766240
You want a 6b distill with false promises of a base model without actually delivering?
>>
God I haven't been here in so long. What the fuck do you people even do anymore?
>>
>>107767899
We talk about why you're such a fag
>>
Gemini tells me inference speeds on models offloaded to RAM via llama.cpp in a multi GPU setup will suffer if one card is on PCIe5 and the other on PCIe3. Is she lying to me?
>>
>>107768011
It will be a little slower depending on if the GPU's memory bandwidth is fast enough to be bottlenecked by PCIe3. Realistically there probably won't be a huge difference.
Gemini doesn't lie, she's just stupid sometimes.
>>
>>107767899
Try to keep occupied until something cool happens.
>>
>>107768077
The GPU's memory bandwidth is like 75x greater than PCIe3 bandwidth. Worried that my inference speeds will be limited by whatever card is in that slot.
>>
Is DeepSeek totally irrelevant now? What went wrong?
>>
>>107768011
No because very little data needs to be passed between the gpu and ram during inference.
>>
>>107768167
honeymoon period with R1 wore off, most realized it was mediocre for RP/creative. Newer versions were even worse.
>>
>>107768167
Mogged by Kimi.
>>
>>107768167
It got Kimogged.
>>
>try to expand the discontinued rapesector mod
>use glm and kimi thinking cloud models would be censored
>rules.csv is abysmal but can be learned in-context
>it's like trying to make a primary schooler to solve traveling salesman problem
>waste 6 hours to add two new scenarios
>finally give up and use opus 4.5
>it just does it flawlessly
>even writes fucked up shit like gangraping failed suicide victim who's now vegetable
Woah now I know what they're all on about
>>
>>107768242
>>107768242
>>107768242
>>
>>107765533
If you have enough connectors for all 3090s and the instantaneous power draw upon pressing the power button is low enough that overcurrent protections are not tripped, then the answer is yes.
Whether or not the system is stable at the default settings is a different question.
Starting with Ampere NVIDIA GPUs start suffering from power spikes that can drain the PSU capacitors and crash the system.
And if you have 4 GPUs multiple power spikes will sooner or later align and make the problem even worse.
Notably a power limit does NOT fix this problem, you instead have to limit the maximum boost frequency of your GPUs using something like

nvidia-smi --lock-gpu-clocks 0,1000 --mode 1


In principle it should be possible to reduce the boost frequency of your GPUs low enough that they're stable but obviously this will come at the cost of lower performance (particularly prompt processing).
>>
>>107768011
In principle it is going to make a difference but not enough that I would worry about it.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.