[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: GPU.jpg (58 KB, 448x570)
58 KB
58 KB JPG
Because I am a communication disorder who only reads articles but does not reply or post, Reddit banned me from posting this question on the LocalLLaMA.
I think I have a better chance of getting help here than in /wsr/ so I posted here.

Current configuration and system software:
CPU: AMD Ryzen Threadripper 7960X 24-Cores
RAM: 256GB
MB: Pro WS TRX50-SAGE WIFI
GPU: (see picture) Because of the thickness of the graphics card, it can only be installed this way.
System & Software: Win 10 pro, Geforce Game Ready 560.94, Python 3.11.9, oobabooga/text-generation-webui: 1.14, koboldcpp_cu12: 1.74, SillyTavern: 1.12.5

Main purpose: Run more than 100B Q8_0 GGUF and 8.0bpw EXL2, 70B and 72B unquantized original models.
>>
File: slow.jpg (1.76 MB, 2500x2332)
1.76 MB
1.76 MB JPG
The current speed is abnormally slow and pathetic, almost the same as the speed I used to force run Q8 GGUF on my old computer using GPU+CPU:(see pictures)

Please help me where I have set something wrong, or teach me how to use another faster backend.
I don't train the model and don't connect other devices, I just use it for local role play and the context length in 16384.
I don't know English at all and can only watch it through the web translation function. I only understand ctrl+C and ctrl+V and then enter the program code. I can't understand the description of just "--tag" at all, and I don't even know which window to type in. I would be very grateful if you could provide a screenshot of the explanation.
Although I heard that Aphrodite can use multiple GPUs at the same time and it will be faster, I don't understand how to install it at all, it's like opening CMD on the desktop after installing WSL and then pasting "pip install -U aphrodite-engine==0.6.0" in the black window and pressing Enter??

Thanks to everyone who had the time and patience to help me.
>>
>banned from reddit
>comes to 4chan expecting replies
>>
>>102285757
Yes, I usually don’t use any platform to communicate with real people.
If you're going to say that people with communication disabilities like me deserve not to get help, then I'll admit it.
Thank you for going out of your way to ridicule me.
>>
>>102285629
the people in /lmg/ would probably find this interesting
>>
>>102285888
Thank you for taking the time to reply.
I'll go to that post and try.
>>
File: file.png (157 KB, 995x954)
157 KB
157 KB PNG
>>102285629
I humbly give you the answer of my 8b llama 3.1 LLM kek
Looks cool anon, but I doubt many people will have experience with such hardware configuration. I only have an RTX 4090 to run Stable Diffusion and basic LLMs. Hope you find what you're looking for.
>>
>>102286018
Thank you very much for taking the time to reply to me.
It seems to say it will run faster on the older Geforce Game Ready 511.23?
Since I am not familiar with computer programs, I am very scared to touch the BISO settings.
I will try to translate and understand the content of this picture.
Thank you very much sincerely.
>>
>>102285817
We all have communication disabilities, you should stay with us.
>>
>>102285817
This entire site is filled to the brim with toxicity, anon.
Learn to ignore the shitposters and focus on the good parts.
Also, lurk more.
>>
File: file.png (16 KB, 649x541)
16 KB
16 KB PNG
>>102286103
Hold up, what's your background?
Also, I did check on your BIOS and seems you will need someone to update it for you.
https://www.asus.com/motherboards-components/motherboards/workstation/pro-ws-trx50-sage-wifi/helpdesk_bios?model2Name=Pro-WS-TRX50-SAGE-WIFI
It is not hard though, but follow your guts, you have to download a zip, execute the BIOSRenamer.exe and will rename your BIOS file to "A5497.CAP", then you copy the file to a USB, reboot your pc, enter the BIOS, check on the BIOS update, choose the USB, look for the file, select and wait.
>>
>>102286137
Thank you for your gentle message.

>>102286145
Thank you for your comfort.

Sorry for my slow reply, I type very slowly and I have to translate first
Thank you for taking the time to respond to me.
>>
>>102286212
>Sorry for
Bro, stop apologizing and thanking. It just makes you seem pathethic.
Have some confidence in yourself; you have every right to have it.
>>
>>102286166
Thanks for taking the time to help me find these.
Logically speaking, my BIOS is already up to date. After all, I bought this computer less than a month ago. I just installed the PSU and GPU from the old computer.
But I will still try to check this website and my BIOS version number.
>>
>>102286236
I don't know any other way of communication.
I use this method at work.
I don't have any friends in my life so I use LLM
I don't know how to offend anyone without using thank yous and apologies.
I don't want to cause you any discomfort.
Please allow me to thank you for your reply...
>>
>>102286299
>I don't know any other way of communication.
You can literally teach yourself to stop saying "sorry" and "thank you" every two sentences.
You are autistic, not retarded. Learn to mask like every other autist out there.
>>
>>102286299
>>I don’t know how to *not* offend anyone without saying thank you and apologizing.

Translation error, I didn’t check it properly.
>>
>>102286299
On 4chan, when you want to thank someone you call them a pedophile instead.
It's a term of endearment.
>>
>>102286326
Watch out, this anon is lying. We do not call each other pedos.
However, putting "you fucking nigger" at the end of your sentence means "I disagree with you, but respect your beliefs".
Calling each other faggot is also a form of endearment.
>>
>>102286323
I try hard to reduce the repetition of these words.
But I still want to pay my respects to you.
>>102286326
>>102286433
Please allow me to decline the use of these words
Translations revealed that one of the words described someone who behaved inappropriately towards a young child, and the other was an insult for dark-skinned people.
I don't mean to offend anyone.
>>
>>102286488
>But I still want to pay my respects to you.
Wanting does not mean you should. This is something you should learn.
>Please allow me to decline the use of these words
No, fuck off. You either adapt to this site or you can get the hell out.
>I don't mean to offend anyone.
Your entire existence offends me. Please change yourself to my liking so you stop offending me.
>>
File: 1725004419938531.png (600 KB, 828x528)
600 KB
600 KB PNG
>>102286578
Anon don't listen to this retard keep up the good work.
Also why are you translating everything? Where are you from? Don't you know English?
>>
>doing inference on 20k+ of gpus on windows
lol
lmao even
>>
>>102285629
Can't you just wait a while for your degenerate porn? Or kys, just a thought
>>
>>102285629
Everyone berating you ITT is a tryhard, doesn't surprise me since /g/ is full of underages.
Anyways, try exllama2 with tensorcores on and autosplit off. Can you show the memory usage of the GPUs while running vanilla transformers and exl2?
>>
>>102287342
Btw, are you offloading the KQV cache?
>>
>>102286326
>It's a term of endearment.
this but unironically
>>
Tech support on /g/? 26 replies? What the fuck is going on?
>>
>>102285629
Hey OP.
It looks like you're diving in pretty quick, have you tried running through LM studio?
LM Studio's performance isn't as good as running ooga/kobold but it's 80% of the way there.
Could you post utilization/VRAM usage graphs during processing?
Cheers.
>>
Also
>>102285742
and
>>102279239

Both generals have people that can answer your questions, you'll get abrasive answers because that's our nature but eventually someone will help.
>>
Lamo
This eggless OP has been driven away.
Just go /lmg/ and you’ll find out.
>>
>>102285629
>thank you for the gentle message kind sir
LMAOOOO KYS FAGGOT
>>
being from reddit should be an automatic rangeban
>>
>>102288606
Getting banned from reddit for being too autistic gets a pass in my book.
>>
>>102288029
Half of the board is techsupport ㄟ( , )ㄏ
>>
>>102285629
finally, a computer that can run the PC port of GTA IV
>>
File: faggot.jpg (30 KB, 367x451)
30 KB
30 KB JPG
>>102285629
>spend 20k in GPUs
>cant even use them
Nvidia users everyone



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.