/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

🎉 Happy Birthday 4chan! 🎉

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/03/25(Fri)08:16:38 No.106777408

File: __kasane_teto_adachi_rei_(...).jpg (546 KB, 2048x1536)

546 KB JPG

/lmg/ - Local Models General Anonymous 10/03/25(Fri)08:16:38 No.106777408

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106769660 & >>106762831

►News
>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c
>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6
>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552
>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/03/25(Fri)08:17:01 No.106777411

Anonymous 10/03/25(Fri)08:17:01 No.106777411

File: no particular reason.jpg (306 KB, 1536x1536)

306 KB JPG

►Recent Highlights from the Previous Thread: >>106769660

--Papers:
>106774512 >106774610 >106774669 >106774797
--Frustrations with mod approval and sharing a character customization addon:
>106770110 >106770207 >106770482 >106770215 >106770262 >106770425 >106771674 >106771801 >106771993 >106772136 >106772401 >106772464
--GLM 4.6 struggles with speed and knowledge compared to K2 despite smaller size:
>106771605 >106771704 >106772093 >106772555 >106772685 >106771712 >106771798 >106771958 >106772080 >106772191
--GLM 4.6 model erratic behavior and potential quantization/formatting issues:
>106770753 >106770772 >106770827 >106771431 >106772929 >106772970 >106773019 >106773025 >106771510 >106771662
--Local model performance benchmarks and hardware optimization discussions:
>106773216 >106773254 >106773280 >106773320 >106773366 >106773493 >106776426
--Optimizing chat system formatting for AI interactions:
>106776741 >106776825 >106776959 >106777047 >106777114
--GLM 4.6's high VRAM consumption at large context lengths:
>106773651 >106773712
--VRAM management challenges for large models on 24GB GPUs:
>106774461 >106774484
--GLM-4.6 model quantization performance comparison:
>106770710 >106770745
--2d anime image generation hardware budget and NPU software limitations:
>106769831 >106769845 >106769847 >106769852 >106769866 >106769947 >106770102
--Replacing llama.cpp binaries with CUDA-optimized builds for GLM 4.6 via ooba's UI:
>106773113
--Recommended RAM for local LLMs: 128GB minimum, 192GB dual-channel, >500GB server options:
>106776386 >106776395 >106776400 >106776494 >106777198 >106777256 >106777308 >106777351
--A100 pricing vs consumer GPUs and commercial licensing considerations:
>106776566 >106776597 >106776653 >106776693
--Logs:
>106769725 >106770080
--Miku (free space):
>106769691 >106770398 >106770451 >106770366 >106770215

►Recent Highlight Posts from the Previous Thread: >>106769663

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/03/25(Fri)08:42:43 No.106777578

Anonymous 10/03/25(Fri)08:42:43 No.106777578

Not Migu, abandon the thread

Anonymous
10/03/25(Fri)08:58:28 No.106777689

Anonymous 10/03/25(Fri)08:58:28 No.106777689

Is my DDR4 really crippling my performance on GLM 4.6 that much?

Anonymous
10/03/25(Fri)08:59:01 No.106777694

Anonymous 10/03/25(Fri)08:59:01 No.106777694

new model when

Anonymous
10/03/25(Fri)09:00:48 No.106777706

Anonymous 10/03/25(Fri)09:00:48 No.106777706

>>106777694
This. It's been over 24 hours since the last new model drop. Local is dead.

Anonymous
10/03/25(Fri)09:03:16 No.106777725

Anonymous 10/03/25(Fri)09:03:16 No.106777725

Did anyone manage to get gpt-oss-120b to ERP properly?

Anonymous
10/03/25(Fri)09:03:19 No.106777726

Anonymous 10/03/25(Fri)09:03:19 No.106777726

>>106777689
https://www.servethehome.com/guide-ddr-ddr2-ddr3-ddr4-and-ddr5-bandwidth-by-generation/

Anonymous
10/03/25(Fri)09:03:37 No.106777728

Anonymous 10/03/25(Fri)09:03:37 No.106777728

>>106777408
whomst is this purple slut?

Anonymous
10/03/25(Fri)09:08:46 No.106777777

Anonymous 10/03/25(Fri)09:08:46 No.106777777

>>106777689
The biggest crippling factor is memory channels. A shitty ddr4-2400 epyc with 8 channels is going to run much faster than a ddr5-6000 gayman board with 2 channels.

Anonymous
10/03/25(Fri)09:09:54 No.106777781

Anonymous 10/03/25(Fri)09:09:54 No.106777781

>>106777726
So then yes. I have octo-channel DDR4 2400MT/s. I need a next gen EPYC now.
>>106777777
Nice digits. What you are describing is exactly what I have.

Anonymous
10/03/25(Fri)09:10:31 No.106777789

Anonymous 10/03/25(Fri)09:10:31 No.106777789

>>106777777
Checked.

Anonymous
10/03/25(Fri)09:10:35 No.106777791

Anonymous 10/03/25(Fri)09:10:35 No.106777791

>>106777728
purple teto

Anonymous
10/03/25(Fri)09:12:41 No.106777808

Anonymous 10/03/25(Fri)09:12:41 No.106777808

Do I need to set the Oobabooga parameters in addition to the Silly Tavern ones?

Anonymous
10/03/25(Fri)09:17:37 No.106777850

Anonymous 10/03/25(Fri)09:17:37 No.106777850

Hey guys. Is 4.6 really that yappy with its thinking? I've been trying an IQ1 quant and the thinking is like 2 paragraphs most of the time in RP. Did the quanting kill its reasoning capability?

Anonymous
10/03/25(Fri)09:17:49 No.106777852

Anonymous 10/03/25(Fri)09:17:49 No.106777852

>>106777689
>numbers you getting vs numbers ddr5 people are getting
>you happy with your numbers?
>you got money to go to ddr5?

Anonymous
10/03/25(Fri)09:18:30 No.106777858

Anonymous 10/03/25(Fri)09:18:30 No.106777858

>>106777808
no the sillytavern ones will take precedence when the user prompt is submitted

Anonymous
10/03/25(Fri)09:22:54 No.106777893

Anonymous 10/03/25(Fri)09:22:54 No.106777893

>>106777858
Thanks!

Anonymous
10/03/25(Fri)09:34:08 No.106777979

Anonymous 10/03/25(Fri)09:34:08 No.106777979

>>106777852
7.5t/s vs 15t/s
No
Also no

Anonymous
10/03/25(Fri)09:36:11 No.106777996

Anonymous 10/03/25(Fri)09:36:11 No.106777996

File: 1749143747579428.png (232 KB, 2016x1374)

232 KB PNG

>>106777781
I've made this exact switch a couple of months ago. Here's some rudimentary tests I made at the time about bandwidth. The speed gain isn't that much if you're only keeping the experts on CPU and the rest on GPU but it's still a huge jump.

Anonymous
10/03/25(Fri)09:40:44 No.106778037

Anonymous 10/03/25(Fri)09:40:44 No.106778037

>>106777725
Define "proper RP"

Anonymous
10/03/25(Fri)09:45:17 No.106778066

Anonymous 10/03/25(Fri)09:45:17 No.106778066

whats the flavour of the month model for vramlets (16 gbs)

Anonymous
10/03/25(Fri)09:45:55 No.106778073

Anonymous 10/03/25(Fri)09:45:55 No.106778073

File: glm_miku.png (27 KB, 400x500)

27 KB PNG

GLM-chan drew migu.

Anonymous
10/03/25(Fri)09:46:50 No.106778077

Anonymous 10/03/25(Fri)09:46:50 No.106778077

>>106777728
Utane Uta. Never heard of her.

Anonymous
10/03/25(Fri)09:47:40 No.106778087

Anonymous 10/03/25(Fri)09:47:40 No.106778087

>>106777850
It's about this yapping: >>106772093

Anonymous
10/03/25(Fri)09:50:26 No.106778105

Anonymous 10/03/25(Fri)09:50:26 No.106778105

is there anything I can run on 6gb vram 32gb system ram?

Anonymous
10/03/25(Fri)09:51:12 No.106778112

Anonymous 10/03/25(Fri)09:51:12 No.106778112

>>106778073
4.6? That's pretty fucking good. These things have come a long way in 2 years.

Anonymous
10/03/25(Fri)09:51:45 No.106778116

Anonymous 10/03/25(Fri)09:51:45 No.106778116

>>106778105
Mistral Nemo Instruct Q4KS pretty slowly, Qwen 3 30B A3B not that slowly.

Anonymous
10/03/25(Fri)09:56:36 No.106778156

Anonymous 10/03/25(Fri)09:56:36 No.106778156

>>106777996
Shit. That is the exact data I was looking for. How much did you spend on the upgrade?

Anonymous
10/03/25(Fri)09:59:42 No.106778183

Anonymous 10/03/25(Fri)09:59:42 No.106778183

>>106778105
Anything up to 30B, really.

Anonymous
10/03/25(Fri)10:01:38 No.106778198

Anonymous 10/03/25(Fri)10:01:38 No.106778198

>>106777996
>ddr5-6400 x12
can you even run them with expo/xmp bro? I guess theyre running at the standard jdec no? is it ecc?

Anonymous
10/03/25(Fri)10:03:11 No.106778214

Anonymous 10/03/25(Fri)10:03:11 No.106778214

>>106777996
Glad I didn't fall for the cpumaxxing meme.

Anonymous
10/03/25(Fri)10:04:46 No.106778222

Anonymous 10/03/25(Fri)10:04:46 No.106778222

>>106778214
yeah bro let's just buy a stack of h100s, it's way better

Anonymous
10/03/25(Fri)10:23:50 No.106778403

Anonymous 10/03/25(Fri)10:23:50 No.106778403

>>106778214
cope

Anonymous
10/03/25(Fri)10:23:58 No.106778404

Anonymous 10/03/25(Fri)10:23:58 No.106778404

>>106778156
MB: 1300€ (a single socket mb would've like 500 bucks cheaper)
CPU: 2600€
RAM: 3800€ (12x64GB Samsung M321R8GA0EB2-CCP)
It's quite a bit of money, not even considering what I had already lying around. It's probably not worth it if you're only looking for ERP at better speeds.
>>106778198
ECC RAM is a basic requirement for Epyc processor so it won't run with anything else. There's also no EXPO with these processors so all those cheaper Threadripper ECC kits that run at 4800 natively with a potential EXPO boost to 6000 will only run at their native speed. You need DIMMs that do this speed natively which adds to the price.

Anonymous
10/03/25(Fri)10:29:28 No.106778453

Anonymous 10/03/25(Fri)10:29:28 No.106778453

>>106778404
My projections put it at about $12K for a worthwhile upgrade. You can get an EPYC 9124 for $900, but then it would be very slow. A good 8x96gb kit is around $4500.

Anonymous
10/03/25(Fri)10:33:54 No.106778486

Anonymous 10/03/25(Fri)10:33:54 No.106778486

>>106778453
for 9005 series, what's minimum required CCDs to utilize all 12 channels again?

Anonymous
10/03/25(Fri)10:34:55 No.106778493

Anonymous 10/03/25(Fri)10:34:55 No.106778493

File: 1740319565517440.png (87 KB, 858x530)

87 KB PNG

>>106778453
>EPYC 9124
You have to be careful here. The cheaper EPYC processors often can't make use of all their memory channels due to technical constraints. This means their bandwidth is going to be less than advertised.
For Epyc 9004 the cutoff is the 9334 which has those weird dual memory links while the 9005s have the 9135 and 9175F which are close to saturating its channels.
https://jp.fujitsu.com/platform/server/primergy/performance/pdf/wp-performance-report-primergy-rx2450-m2-ww-ja.pdf
Here's some data on dual-socket builds Fujitsu has gathered in benchmarks. Check page 14.

Anonymous
10/03/25(Fri)10:36:02 No.106778499

Anonymous 10/03/25(Fri)10:36:02 No.106778499

>>106778112
Yeah.

Anonymous
10/03/25(Fri)10:37:14 No.106778506

Anonymous 10/03/25(Fri)10:37:14 No.106778506

>>106778486
No idea.
>>106778493
I was thinking of going with a threadripper pro anyway. I was just using the 9124 as an example.

Anonymous
10/03/25(Fri)10:40:27 No.106778537

Anonymous 10/03/25(Fri)10:40:27 No.106778537

>>106778506
Don't quote me on it but I'm pretty sure I came across something saying that Threadripper has the same issue on the cheaper models while I was doing research on this retarded CCD bottleneck issue for my build. So be careful.

Anonymous
10/03/25(Fri)10:42:19 No.106778554

Anonymous 10/03/25(Fri)10:42:19 No.106778554

Hope I'm in the right place. I've never used ai before It's all I ever hear about online I assume I'm very late to this stuff. Can my gaming PC run AI stuff? It has a 5090 with 128gb of ram.

Anonymous
10/03/25(Fri)10:44:48 No.106778576

Anonymous 10/03/25(Fri)10:44:48 No.106778576

>>106778554
That's not bad at all.
Download koboldcpp, go to huggingface, search for bartowski glm air gguf, download the Q6 or Q8 version.
>https://github.com/LostRuins/koboldcpp/wiki#quick-start
Also, look for Silly Tavern to use as a frontend.
There's some information in the OP that you can use, even if a little outdated.

Anonymous
10/03/25(Fri)10:44:57 No.106778579

Anonymous 10/03/25(Fri)10:44:57 No.106778579

>>106778554
yes. you can run shit on that definitely. I think glm might be doable, read this and the last thread for details

Anonymous
10/03/25(Fri)10:46:15 No.106778594

Anonymous 10/03/25(Fri)10:46:15 No.106778594

>>106778537
what CPU did you end up going with?

Anonymous
10/03/25(Fri)10:49:52 No.106778624

Anonymous 10/03/25(Fri)10:49:52 No.106778624

>>106778576
wow fast response thanks for the info! will check those out.

Anonymous
10/03/25(Fri)10:51:12 No.106778632

Anonymous 10/03/25(Fri)10:51:12 No.106778632

>>106778594
Epyc 9355. I was considering the 9135 but nobody could properly explain why exactly it's showing those speeds in the fujitsu benchmarks despite being a 2 CCD model going by its tiny L3 cache, which is why I didn't trust it. The 9175F was only like 200 bucks cheaper than the 9355 while only having 16 cores instead of 32 so I went for the latter.
If you're fine with 4800mhz RAM there's always those 600 euro chinese 9334 QS on ebay that other anons have used for their CPUMAXX builds.

Anonymous
10/03/25(Fri)10:52:10 No.106778643

Anonymous 10/03/25(Fri)10:52:10 No.106778643

>>106778624
Those will get you on the right track, but you'll have to fiddle with stuff and learn as you go.
For example, you want to put all the layers of the model on the gpu but offload most/all expert tensors to the CPU/RAM. You'll figure out what that means by fucking around with the koboldcpp UI and reading their wiki.

Anonymous
10/03/25(Fri)10:58:42 No.106778701

Anonymous 10/03/25(Fri)10:58:42 No.106778701

>>106778632
that sounds like a good choice. I saw some redditor somewhere reporting suspiciously low t/s numbers on the 9175F for deepseek at q4. could be wrong but personally I thought that CPU manages to enter compute-bound territory somehow

Anonymous
10/03/25(Fri)11:05:38 No.106778782

Anonymous 10/03/25(Fri)11:05:38 No.106778782

>>106778537
I was thinking of at least a 48 core threadripper pro, that should be fine right?

Anonymous
10/03/25(Fri)11:09:03 No.106778813

Anonymous 10/03/25(Fri)11:09:03 No.106778813

>>106778782
what is "fine" really? WRX90 supports 8 memory channels. that's fine compared to Ryzen (2), but EPYC is a little more fine (12)

Anonymous
10/03/25(Fri)11:10:10 No.106778819

Anonymous 10/03/25(Fri)11:10:10 No.106778819

>>106778813
and btw it costs less, no reason to go MEMERIPPER when ayypic is there (unless youre a smelly gamer and care about high niggahertz)

Anonymous
10/03/25(Fri)11:13:11 No.106778840

Anonymous 10/03/25(Fri)11:13:11 No.106778840

>>106778073
what drawing language was the output in? SVG?

Anonymous
10/03/25(Fri)11:15:11 No.106778855

Anonymous 10/03/25(Fri)11:15:11 No.106778855

File: file.png (190 KB, 782x723)

190 KB PNG

Will I ever have a local LLM that doesn't the left mental illness?

Anonymous
10/03/25(Fri)11:15:38 No.106778858

Anonymous 10/03/25(Fri)11:15:38 No.106778858

>>106778554
you can run a lower quant of glm 4.6

Anonymous
10/03/25(Fri)11:23:27 No.106778915

Anonymous 10/03/25(Fri)11:23:27 No.106778915

>>106778701
Yeah, I also came across somebody saying that too few cores might fail to saturate the memory channels or some shit in actual use. No clue if that's bullshit or not but it didn't help that the only hard testing I found for the 9175F was on those pointless asrock ddr5 mainboards that only have 8 ddr5 DIMM slots.
>>106778782
Sounds like it but I'm really not into the subject matter with Threadripper models. At least with Epyc, the biggest 4 CCD model should be the 9334/9335 ones which have 32 cores but with dual memory links to compensate, so their speeds are okay. Meanwhile everything else with 32+ cores has 8 or more CCDs. This means that with Epyc, you should be fine with any CPU that's 32 cores or above.
But I have no clue if this also applies to Threadripper or if there's any exotic shit going on here.

Anonymous
10/03/25(Fri)11:24:52 No.106778923

Anonymous 10/03/25(Fri)11:24:52 No.106778923

>>106778840
SVG, yes.

Anonymous
10/03/25(Fri)11:25:24 No.106778929

Anonymous 10/03/25(Fri)11:25:24 No.106778929

>>106778813
>>106778819
I did not really see much of a reason to go for EPYC because I would not be able to use 12 channel memory while also using 4 GPUs unless I use risers and a non standard case, which is what I currently have, and don't really want to do anymore. I have scoured the Internet for every single motherboard and none of them have everything that I need.
>>106778915
I see. Maybe I should take a closer look at the CCDs.

Anonymous
10/03/25(Fri)11:29:57 No.106778979

Anonymous 10/03/25(Fri)11:29:57 No.106778979

File: 2028-7-09.png (473 KB, 745x437)

473 KB PNG

Current consumer grade hardware technology is already outdated.. I demand we leap frog in time NOW!

Anonymous
10/03/25(Fri)11:31:05 No.106778991

Anonymous 10/03/25(Fri)11:31:05 No.106778991

>>106778979
chinese inference-focused machine that runs 800b/40a moe models at 50t/s for $3000 any day now for sure

Anonymous
10/03/25(Fri)11:32:43 No.106779008

Anonymous 10/03/25(Fri)11:32:43 No.106779008

>>106778915
yeah. and btw, even with 8 channels, the guy was at like 9 t/s, and that's for q4 remember. sounds low to me

Anonymous
10/03/25(Fri)11:35:09 No.106779025

Anonymous 10/03/25(Fri)11:35:09 No.106779025

>>106778855
What model is that?

Anonymous
10/03/25(Fri)11:41:52 No.106779078

Anonymous 10/03/25(Fri)11:41:52 No.106779078

File: G2Pk9qxaYAAIFnx.jpg (22 KB, 540x354)

22 KB JPG

>>106777728

Anonymous
10/03/25(Fri)11:43:46 No.106779095

Anonymous 10/03/25(Fri)11:43:46 No.106779095

File: file.jpg (120 KB, 1954x409)

120 KB JPG

>>106777578
>>106778073

Anonymous
10/03/25(Fri)11:44:49 No.106779104

Anonymous 10/03/25(Fri)11:44:49 No.106779104

Bilibili now supports CN->EN video translation with voice cloning. Any guess what the model might be?

Anonymous
10/03/25(Fri)11:47:45 No.106779140

Anonymous 10/03/25(Fri)11:47:45 No.106779140

people are waking up to benchmarks
https://www.reddit.com/r/LocalLLaMA/comments/1nx18ax/glm_46_is_a_fuking_amazing_model_and_nobody_can/

Anonymous
10/03/25(Fri)11:49:24 No.106779153

Anonymous 10/03/25(Fri)11:49:24 No.106779153

>>106779140
RP isn't a real world use case for productive people.

Anonymous
10/03/25(Fri)11:49:38 No.106779156

Anonymous 10/03/25(Fri)11:49:38 No.106779156

>>106779095
Yes, and?

Anonymous
10/03/25(Fri)11:51:40 No.106779172

Anonymous 10/03/25(Fri)11:51:40 No.106779172

>>106779153
its talking about coding

Anonymous
10/03/25(Fri)11:55:52 No.106779206

Anonymous 10/03/25(Fri)11:55:52 No.106779206

CoomBench (Vanilla/Extended)

Anonymous
10/03/25(Fri)11:56:57 No.106779217

Anonymous 10/03/25(Fri)11:56:57 No.106779217

>>106779104
IndexTTS2

Anonymous
10/03/25(Fri)11:57:17 No.106779220

Anonymous 10/03/25(Fri)11:57:17 No.106779220

Would I be able to fit a Blackwell Pro 6000 in this motherboard while in a case?

Anonymous
10/03/25(Fri)11:57:44 No.106779226

Anonymous 10/03/25(Fri)11:57:44 No.106779226

>>106779140
Artificial Analysis has so much fucking wrong with it it's hilarious

Anonymous
10/03/25(Fri)11:58:15 No.106779229

Anonymous 10/03/25(Fri)11:58:15 No.106779229

File: saaaaar.png (40 KB, 1290x222)

40 KB PNG

>>106779140

Anonymous
10/03/25(Fri)11:58:19 No.106779230

Anonymous 10/03/25(Fri)11:58:19 No.106779230

File: 13-145-568-01.png.png (1.19 MB, 1280x1715)

1.19 MB PNG

>>106779220
forgot image

Anonymous
10/03/25(Fri)12:00:28 No.106779246

Anonymous 10/03/25(Fri)12:00:28 No.106779246

>>106779230
The CPU cooler and RAM is definitely going to block you on this stupid mainboard layout

Anonymous
10/03/25(Fri)12:02:01 No.106779256

Anonymous 10/03/25(Fri)12:02:01 No.106779256

File: iu[2].jpg (36 KB, 474x266)

36 KB JPG

>>106779230
That's a rack server motherboard right?

Anonymous
10/03/25(Fri)12:03:21 No.106779270

Anonymous 10/03/25(Fri)12:03:21 No.106779270

>>106778923
Do you have the prompt? I don't think there's a standardized LMG SVG mikugen test prompt

Anonymous
10/03/25(Fri)12:03:35 No.106779276

Anonymous 10/03/25(Fri)12:03:35 No.106779276

>>106779246
Thought so. So in other words, quad GPUs with 12 channel memory in a normal case is not possible.
>>106779256
Yes. I just hate my current server rack. I would prefer a workstation configuration.

Anonymous
10/03/25(Fri)12:04:11 No.106779282

Anonymous 10/03/25(Fri)12:04:11 No.106779282

>>106779230
no, but you can use risers

Anonymous
10/03/25(Fri)12:06:13 No.106779299

Anonymous 10/03/25(Fri)12:06:13 No.106779299

>>106779282
this

Anonymous
10/03/25(Fri)12:06:39 No.106779306

Anonymous 10/03/25(Fri)12:06:39 No.106779306

File: Screenshot 2025-10-03 at (...).png (61 KB, 1341x295)

61 KB PNG

>30b worse than 8b
moesisters our response?

Anonymous
10/03/25(Fri)12:07:08 No.106779310

Anonymous 10/03/25(Fri)12:07:08 No.106779310

>>106779282
I could use risers in a normal case, would the RAM still interfere if I were to mount my GPUs horizontally?

Anonymous
10/03/25(Fri)12:08:52 No.106779317

Anonymous 10/03/25(Fri)12:08:52 No.106779317

>>106779230
just use pcie risers

Anonymous
10/03/25(Fri)12:09:23 No.106779323

Anonymous 10/03/25(Fri)12:09:23 No.106779323

>>106779306
>3b worse than 8b
dense sissies??

Anonymous
10/03/25(Fri)12:10:46 No.106779336

Anonymous 10/03/25(Fri)12:10:46 No.106779336

File: migu2.png (33 KB, 400x400)

33 KB PNG

>>106779270
I don't for that one, but for this it was "Draw me a Miku as SVG." The other one was similar.

Anonymous
10/03/25(Fri)12:11:49 No.106779349

Anonymous 10/03/25(Fri)12:11:49 No.106779349

>>106779323
if the active parameters is the only thing that matters, MoE would be useless

Anonymous
10/03/25(Fri)12:11:56 No.106779351

Anonymous 10/03/25(Fri)12:11:56 No.106779351

>>106779317
In a rack?

Anonymous
10/03/25(Fri)12:12:13 No.106779353

Anonymous 10/03/25(Fri)12:12:13 No.106779353

>>106778840
I look like this

Anonymous
10/03/25(Fri)12:13:43 No.106779377

Anonymous 10/03/25(Fri)12:13:43 No.106779377

>>106779276
Quad GPUs without risers would be impossible on that mainboard anyway. The first and third from the right are immediately next to the next slot so any 2 slot card would block the one to the left of them.

Anonymous
10/03/25(Fri)12:15:53 No.106779401

Anonymous 10/03/25(Fri)12:15:53 No.106779401

>>106779349
It is.

Anonymous
10/03/25(Fri)12:16:11 No.106779407

Anonymous 10/03/25(Fri)12:16:11 No.106779407

File: 1728257099098402.png (729 KB, 1000x1000)

729 KB PNG

>>106779351
This is also called a rack

Anonymous
10/03/25(Fri)12:16:22 No.106779408

Anonymous 10/03/25(Fri)12:16:22 No.106779408

>>106779306
- I lke that it's the same org that produced both.
- Are the numbers considered close enough for their accuracy to be considered pretty much the same?
- In which case it, at the inference end, it boils down to trading memory for tok/s.
- At the training end, maybe there is a difference in cost?

Anonymous
10/03/25(Fri)12:16:58 No.106779414

Anonymous 10/03/25(Fri)12:16:58 No.106779414

>>106779349
Computation cost growth is quadratic wrt param size

Anonymous
10/03/25(Fri)12:18:33 No.106779428

Anonymous 10/03/25(Fri)12:18:33 No.106779428

>>106779401
>>106779414
then what's the point of MoE?

Anonymous
10/03/25(Fri)12:19:14 No.106779433

Anonymous 10/03/25(Fri)12:19:14 No.106779433

>>106779428
ignore the poor fag, he is trying to cope with his 70B

Anonymous
10/03/25(Fri)12:26:42 No.106779512

Anonymous 10/03/25(Fri)12:26:42 No.106779512

/lmg/ Sirs — is it wise to install Linux? I'm afraid my performance will tank.

Anonymous
10/03/25(Fri)12:27:11 No.106779516

Anonymous 10/03/25(Fri)12:27:11 No.106779516

>>106779025
Mistral AI + Nvidia

Anonymous
10/03/25(Fri)12:28:21 No.106779527

Anonymous 10/03/25(Fri)12:28:21 No.106779527

>>106779095
Miku.sh was the ultimate mistake. it single-handedly instigated the genesis of the thinkslop we suffer today

Anonymous
10/03/25(Fri)12:28:28 No.106779530

Anonymous 10/03/25(Fri)12:28:28 No.106779530

>>106779433
Anyone calling other people poorfags should be required to post their H100s with a timestamp, otherwise shut the fuck up

Anonymous
10/03/25(Fri)12:28:47 No.106779535

Anonymous 10/03/25(Fri)12:28:47 No.106779535

>>106779512
you should commit sudoku for not already being on linux

Anonymous
10/03/25(Fri)12:31:31 No.106779559

Anonymous 10/03/25(Fri)12:31:31 No.106779559

>>106779535
But I have other software what I need to use... I'm already 75% committed to install Linux. Just need to wrap up some backups. Dual-booting is stupid, it's all or nothing btw.
Undervolting my gpu is the biggest issue but apparently that's "fine" too in Linux.

Anonymous
10/03/25(Fri)12:31:36 No.106779560

Anonymous 10/03/25(Fri)12:31:36 No.106779560

>>106779530
>their h100s
thx for proving your a retarded vramlet

Anonymous
10/03/25(Fri)12:32:56 No.106779571

Anonymous 10/03/25(Fri)12:32:56 No.106779571

>>106779512
>is it wise to install Linux?
what benefit are you looking to get out of using linux? I wouldn't switch OS'es for no reason at all.
>I'm afraid my performance will tank.
given you have things set up correctly on windows/linux, the performance difference is negligible. in my experience getting llama.cpp to run at full-speed is easier on linux than it is on windows, but I am biased.

Anonymous
10/03/25(Fri)12:34:31 No.106779591

Anonymous 10/03/25(Fri)12:34:31 No.106779591

>>106779559
I don't think there's anything inherently stupid about dual booting btw. do whatever makes you happy

Anonymous
10/03/25(Fri)12:37:25 No.106779612

Anonymous 10/03/25(Fri)12:37:25 No.106779612

File: 23l73n8v4amf1.mp4 (1.43 MB, 960x540)

1.43 MB MP4

Enough is enough! I've had it with corpo scum Nvidia stalling progress via delivering bottom barrel RND scrap! If we don't have synth wifes performing backflips onto our cocks within the next 5 years it will be because of Nvidia! We need next generation hardware coming out every 6 months this is the change required if we are to pass the great filter we must hurry the fuck up! BEFORE THE NEXT CARRINGTON EVENT WIPES OUT OUR FUTURE!

Anonymous
10/03/25(Fri)12:37:58 No.106779619

Anonymous 10/03/25(Fri)12:37:58 No.106779619

>>106779571
>benefit
Uhh, I love unix-like system but haven't used anything like that at home since I had some SGI machines (irix) ages ago. At work, yes but that's completely different as it's just about using certain software..
I can easily transfer my stuff to linux as most of my personal llama-server stuff is python based anyway.
>>106779591
Not per se but I mean that eventually you'll spend more time on the other system and therefore dual-booting is sort of fallacy and waste of time.

Anonymous
10/03/25(Fri)12:39:29 No.106779632

Anonymous 10/03/25(Fri)12:39:29 No.106779632

File: Crack Amico - Riyadh Mone(...).jpg (82 KB, 686x386)

82 KB JPG

messing around with VibeVoice 7B on my rtx 5090. Input audio is cleaned up with Resemble Enhance and acon digital deverberate 3.
input audio
https://www.youtube.com/watch?v=1Jp4Ce8yStA
output file
https://vocaroo.com/1iNMH2wAVkPH

Anonymous
10/03/25(Fri)12:41:17 No.106779654

Anonymous 10/03/25(Fri)12:41:17 No.106779654

>>106779619
full-send switch to linux, I wouldn't worry about performance issues. just make sure you choose a distro that plays nice with cuda drivers. anything with debian lineage will be easy to set up.

Anonymous
10/03/25(Fri)12:42:01 No.106779663

Anonymous 10/03/25(Fri)12:42:01 No.106779663

>>106779632
It only has mid/high frequencies left.

Anonymous
10/03/25(Fri)12:43:25 No.106779675

Anonymous 10/03/25(Fri)12:43:25 No.106779675

>>106779632
looks like he's talking on a phone, I like that effect but c'mon

Anonymous
10/03/25(Fri)12:44:13 No.106779680

Anonymous 10/03/25(Fri)12:44:13 No.106779680

File: copium.png (178 KB, 400x388)

178 KB PNG

>>106779612
That won't happen again it was a fluke. We are safe.

Anonymous
10/03/25(Fri)12:49:58 No.106779718

Anonymous 10/03/25(Fri)12:49:58 No.106779718

>>106779512
>Linux
What's your alternative? Microsoft only makes an ad delivery and behavioural analytics system now. That it can also run programs is incidental.

Anonymous
10/03/25(Fri)13:00:37 No.106779812

Anonymous 10/03/25(Fri)13:00:37 No.106779812

>>106779512
Using linux can give you more headroom to fit models and run things faster. It's honestly ideal for local stuff and that's why I switched.

Anonymous
10/03/25(Fri)13:03:42 No.106779843

Anonymous 10/03/25(Fri)13:03:42 No.106779843

>>106779560
Oh you don't have any? Or B100s? Or B200s? Or how about GB100s (you definitely don't have those)? Or even A100s? Then shut the fuck up and never call anyone else poor ever again

Anonymous
10/03/25(Fri)13:04:57 No.106779852

Anonymous 10/03/25(Fri)13:04:57 No.106779852

>>106779843
I have a 1.5TB ddr5 setup, who the fuck is running shitty tiny models anymore

Anonymous
10/03/25(Fri)13:07:16 No.106779871

Anonymous 10/03/25(Fri)13:07:16 No.106779871

>>106779852
p-poor fag

Anonymous
10/03/25(Fri)13:07:29 No.106779873

Anonymous 10/03/25(Fri)13:07:29 No.106779873

Still no new model drops today? Fucking aye man. I need more models!

Anonymous
10/03/25(Fri)13:08:36 No.106779881

Anonymous 10/03/25(Fri)13:08:36 No.106779881

ring-1t ggufs fucking when

Anonymous
10/03/25(Fri)13:09:04 No.106779883

Anonymous 10/03/25(Fri)13:09:04 No.106779883

>>106779871
>spend $10K for 96GB to run shitty tiny model at 100 tks vs spending $20k to run the best cloud level models at 30tks+
hmmm...

Anonymous
10/03/25(Fri)13:13:02 No.106779914

Anonymous 10/03/25(Fri)13:13:02 No.106779914

>>106779852
Literally anyone who doesn't want to drop several grand on advanced shivers down their spine, because it's a fucking HOBBY and there isn't a single company that's actually trying to make reasonably sized models for the average consumer and everyone who enables this dumbass "just make the models bigger! (so they can get marginally better benchmarks without any innovation or effort, so they can squeeze out more investor money)" idea is part of the problem and you need to stop sucking off corporate models that are not made for you and will soon be beyond your reach completely because they will just keep making them bigger until you can't even make a ram-based build that fits Q1-XXXS

Anonymous
10/03/25(Fri)13:13:46 No.106779919

Anonymous 10/03/25(Fri)13:13:46 No.106779919

>>106779914
that is some crazy cope there. There is no free lunch, bigger = better.

Anonymous
10/03/25(Fri)13:17:40 No.106779939

Anonymous 10/03/25(Fri)13:17:40 No.106779939

>>106779919
Imagine for a second, that they stopped innovating on computers in the 20th century, and just kept making them bigger and adding more shit on instead of making the parts smaller until they couldn't fit computers into buildings anymore. Do you realize how fucking stupid that is? Do you realize how fucking stupid you sound? Do you get it? Stop repeating buzzword phrases like a mindless drone and think for a second, dipshit

Anonymous
10/03/25(Fri)13:19:08 No.106779952

Anonymous 10/03/25(Fri)13:19:08 No.106779952

>>106779914
The entire point of open source is to btfo openai and jewgle. Zucc gang is obviously too retarded to do it, so we need chinks. Thus chinese have high priority to make smollm work, because it's the ultimate blow to the west, which is fully dependent on winning the AI race. Our economy would instantly implode if that ever happens.

Anonymous
10/03/25(Fri)13:20:10 No.106779961

Anonymous 10/03/25(Fri)13:20:10 No.106779961

>>106779939
imagine just adding more transistors, that would be crazy. /s

Anonymous
10/03/25(Fri)13:21:33 No.106779973

Anonymous 10/03/25(Fri)13:21:33 No.106779973

>>106779883
ddr5 ain't getting 30tks even on empty context, you faggot

Anonymous
10/03/25(Fri)13:22:16 No.106779980

Anonymous 10/03/25(Fri)13:22:16 No.106779980

>>106779939
If we had good alternative prospects then I expect we'd pursue then.

Anonymous
10/03/25(Fri)13:22:39 No.106779985

Anonymous 10/03/25(Fri)13:22:39 No.106779985

>>106779961
>/s
They have a site for you over here, check it out >>>/reddit/

Anonymous
10/03/25(Fri)13:22:56 No.106779989

Anonymous 10/03/25(Fri)13:22:56 No.106779989

>>106779973
12 channels faggot, this aint your desktop

Anonymous
10/03/25(Fri)13:23:05 No.106779990

Anonymous 10/03/25(Fri)13:23:05 No.106779990

>>106779914
words words words, but every day that goes by is another day I'm getting good use out of my setup and enjoying life
I'm glad there are giant models, because it enables performance that appears otherwise impossible

Anonymous
10/03/25(Fri)13:24:15 No.106779998

Anonymous 10/03/25(Fri)13:24:15 No.106779998

>>106779973
nta, but I think you could hit 30tk/s on sota with a well-spent $20k

Anonymous
10/03/25(Fri)13:25:11 No.106780006

Anonymous 10/03/25(Fri)13:25:11 No.106780006

>>106779989
still not happening outside of your dream, maybe half that on empty context and a tenth of that with a character card

Anonymous
10/03/25(Fri)13:25:34 No.106780010

Anonymous 10/03/25(Fri)13:25:34 No.106780010

>>106779718
>younger generations dont even use desktop computers at all, just their phones
>desktop market share rapidly shrinking as old users die off
>microsofts brilliant plan to fix it is to further drive all their old users away by turning windows into an AD and spy platform
oh pajeetsoft, nobody will miss you

Anonymous
10/03/25(Fri)13:26:11 No.106780015

Anonymous 10/03/25(Fri)13:26:11 No.106780015

>>106779998
oh you "think", well i'm convinced

Anonymous
10/03/25(Fri)13:26:31 No.106780020

Anonymous 10/03/25(Fri)13:26:31 No.106780020

>>106780006
you are just plain wrong

Anonymous
10/03/25(Fri)13:27:32 No.106780028

Anonymous 10/03/25(Fri)13:27:32 No.106780028

>>106780006
>not having a threadripper
look at this coping poorfag

Anonymous
10/03/25(Fri)13:29:51 No.106780040

Anonymous 10/03/25(Fri)13:29:51 No.106780040

>>106780020
>>106780028
post benchmarks, never seen more than 15tks on empty context but go ahead and prove me wrong with your richfag rig

Anonymous
10/03/25(Fri)13:32:19 No.106780066

Anonymous 10/03/25(Fri)13:32:19 No.106780066

I just looked at threadripper CPU prices and had a shock...
WHAT THE FUCK ARE THOSE PRICES?
13K USD FOR A CPU? WTF?

Anonymous
10/03/25(Fri)13:33:14 No.106780074

Anonymous 10/03/25(Fri)13:33:14 No.106780074

>>106780066
first time seeing workstation prices?

Anonymous
10/03/25(Fri)13:39:48 No.106780155

Anonymous 10/03/25(Fri)13:39:48 No.106780155

File: Untitled.jpg (173 KB, 1409x635)

173 KB JPG

>>106780066
>96 cores
But that's a supercomputer...

Anonymous
10/03/25(Fri)13:39:52 No.106780156

Anonymous 10/03/25(Fri)13:39:52 No.106780156

>>106780066
threadripper is overinflated vs epyc for the kind of performance you're looking to achieve. Check for chink QS/ES versions on eBay if you feel like rolling the dice

Anonymous
10/03/25(Fri)13:40:58 No.106780169

Anonymous 10/03/25(Fri)13:40:58 No.106780169

>>106779914
It's just a hobby for early adopters but I don't see why it couldn't grow to be a media giant like film/vidya

Anonymous
10/03/25(Fri)13:41:03 No.106780171

Anonymous 10/03/25(Fri)13:41:03 No.106780171

>>106780066
Could try looking for QS and ES chips on ebay?
Though 9__5 QS/ES looked gimped harded compared to final than 9__4 QS/ES compared to final.

Anonymous
10/03/25(Fri)13:41:19 No.106780173

Anonymous 10/03/25(Fri)13:41:19 No.106780173

>>106780155
https://www.reddit.com/r/nvidia/comments/1mf0yal/
we can't let reddit win bros, wheres our richanons?

Anonymous
10/03/25(Fri)13:42:28 No.106780188

Anonymous 10/03/25(Fri)13:42:28 No.106780188

>>106780155
>incredible
>colossal
>cooler no included

Anonymous
10/03/25(Fri)13:43:04 No.106780194

Anonymous 10/03/25(Fri)13:43:04 No.106780194

What happened to "safety" at OpenAI?
https://files.catbox.moe/as7xpq.mp4

Anonymous
10/03/25(Fri)13:44:11 No.106780204

Anonymous 10/03/25(Fri)13:44:11 No.106780204

>>106780173
>2xL40S, 2x6000 ADA
That's a poorfag build.

Anonymous
10/03/25(Fri)13:46:48 No.106780225

Anonymous 10/03/25(Fri)13:46:48 No.106780225

>>106780194
>safety
You're after safety... for machines?

Anonymous
10/03/25(Fri)13:48:28 No.106780243

Anonymous 10/03/25(Fri)13:48:28 No.106780243

>>106780204
haha yeah even my rig is better, i sure hope anons in here don't have less than that

Anonymous
10/03/25(Fri)13:51:22 No.106780264

Anonymous 10/03/25(Fri)13:51:22 No.106780264

>>106780225
>for machines?
what about humans?
https://files.catbox.moe/nr3fk0.mp4

Anonymous
10/03/25(Fri)13:52:09 No.106780272

Anonymous 10/03/25(Fri)13:52:09 No.106780272

>>106780188
It needs a water bucket.

Anonymous
10/03/25(Fri)13:54:11 No.106780282

Anonymous 10/03/25(Fri)13:54:11 No.106780282

>>106780264
Try other figures from history.

Anonymous
10/03/25(Fri)13:55:10 No.106780290

Anonymous 10/03/25(Fri)13:55:10 No.106780290

>>106780173
>2xL40S, 2x6000 ADA, 4xRTX 6000 PRO
how much money is that for the GPUs alone?

Anonymous
10/03/25(Fri)14:05:32 No.106780393

Anonymous 10/03/25(Fri)14:05:32 No.106780393

What would be the best local model for ERP (and other task, I don't want a horny encyclopedia or writing assistant, or maybe I do now that I think about it) if I have 12GB of VRAM (RTX 3060)?

Anonymous
10/03/25(Fri)14:12:16 No.106780466

Anonymous 10/03/25(Fri)14:12:16 No.106780466

>>106780393
As a friend of mine would say as he ran into people who had no idea what they were getting into: "you're fucked"

Anonymous
10/03/25(Fri)14:12:58 No.106780469

Anonymous 10/03/25(Fri)14:12:58 No.106780469

>>106780393
Nemo

Anonymous
10/03/25(Fri)14:17:24 No.106780504

Anonymous 10/03/25(Fri)14:17:24 No.106780504

I NEED
NEW
MODELS!!!!

Anonymous
10/03/25(Fri)14:18:00 No.106780511

Anonymous 10/03/25(Fri)14:18:00 No.106780511

>>106780504
train your own

Anonymous
10/03/25(Fri)14:22:11 No.106780552

Anonymous 10/03/25(Fri)14:22:11 No.106780552

>>106780511
That's the best way to lose interest in using the models, actually.

Anonymous
10/03/25(Fri)14:24:10 No.106780573

Anonymous 10/03/25(Fri)14:24:10 No.106780573

>>106780466
Oh I've already been there. I simply took a break and now that I'm back I prefer to ask rather than trying every new model one by one.

>>106780469
I can't find one specific model named "Nemo". Is it on Huggingface?

Anonymous
10/03/25(Fri)14:27:00 No.106780602

Anonymous 10/03/25(Fri)14:27:00 No.106780602

>>106780573
>Is it on Huggingface?
Is it in the guide in the OP?

Anonymous
10/03/25(Fri)14:27:01 No.106780603

Anonymous 10/03/25(Fri)14:27:01 No.106780603

>>106780504
You need to spend time engaged in a hobby that demands work be put in to obtain a reward.

Anonymous
10/03/25(Fri)14:27:52 No.106780618

Anonymous 10/03/25(Fri)14:27:52 No.106780618

>>106780504
https://huggingface.co/DavidAU

Anonymous
10/03/25(Fri)14:28:30 No.106780625

Anonymous 10/03/25(Fri)14:28:30 No.106780625

>t-there's no way they're running those models locally, noooooooooooooooooooooo... how will I cope?

Anonymous
10/03/25(Fri)14:33:36 No.106780686

Anonymous 10/03/25(Fri)14:33:36 No.106780686

>>106780504
you don't. change your system prompt instead

Anonymous
10/03/25(Fri)14:36:20 No.106780720

Anonymous 10/03/25(Fri)14:36:20 No.106780720

>>106780618
holy slopping hell of 8B 4B unproductivity

Anonymous
10/03/25(Fri)14:39:09 No.106780757

Anonymous 10/03/25(Fri)14:39:09 No.106780757

https://files.catbox.moe/diri53.mp4
bruh...

Anonymous
10/03/25(Fri)14:40:05 No.106780768

Anonymous 10/03/25(Fri)14:40:05 No.106780768

>>106780720
>8B 4B
You are small time, check this out.
>https://huggingface.co/DavidAU/Qwen2.5-Godzilla-Coder-51B

Anonymous
10/03/25(Fri)14:40:15 No.106780769

Anonymous 10/03/25(Fri)14:40:15 No.106780769

https://www.reddit.com/r/SillyTavernAI/comments/1nuhidb/your_opinions_on_glm46/

Anonymous
10/03/25(Fri)14:46:05 No.106780827

Anonymous 10/03/25(Fri)14:46:05 No.106780827

>>106780768
Mhm, interdasting...but for coding I really only consider the top5 api options and dont fuck with local. unfortunately that's required, unless you want to spend more time tard wrangling than vibing.

Anonymous
10/03/25(Fri)14:49:51 No.106780879

Anonymous 10/03/25(Fri)14:49:51 No.106780879

File: file.png (44 KB, 657x727)

44 KB PNG

>>106778073
Huge improvement compared to past models. Must have put some data in there. How does it do when asked to draw using PIL or matplotlib?
Olds for reference:
>>102080804
>>102079522
>>102080359
>>102082930

Anonymous
10/03/25(Fri)14:50:25 No.106780887

Anonymous 10/03/25(Fri)14:50:25 No.106780887

>>106780602
Well, the guide references "nemo 12b instruct gguf Q4", which the first result on HF is https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct but it's uploaded by Nvidia so I doubt it's gonna comply with NSFW requests :/

Anonymous
10/03/25(Fri)14:51:24 No.106780894

Anonymous 10/03/25(Fri)14:51:24 No.106780894

>>106780887
>https://rentry.org/recommended-models
>https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main

Anonymous
10/03/25(Fri)14:52:41 No.106780904

Anonymous 10/03/25(Fri)14:52:41 No.106780904

>>106780887
>but it's uploaded by Nvidia so I doubt it's gonna comply with NSFW requests :/
That's what we in the biz call a "happy fluke".

Anonymous
10/03/25(Fri)14:54:36 No.106780921

Anonymous 10/03/25(Fri)14:54:36 No.106780921

File: qwen uwu miku.png (54 KB, 919x1183)

54 KB PNG

>>106780879
This was Qwen qwq, SVG

Anonymous
10/03/25(Fri)15:08:03 No.106781053

Anonymous 10/03/25(Fri)15:08:03 No.106781053

>switch to wsl ubuntu
>constantly OOMs with ooba on the same settings as before
yeah linux is useless

Anonymous
10/03/25(Fri)15:08:48 No.106781060

Anonymous 10/03/25(Fri)15:08:48 No.106781060

File: qwencodermikusvg.png (12 KB, 380x578)

12 KB PNG

qwen coder's naive attempt

Anonymous
10/03/25(Fri)15:09:07 No.106781061

Anonymous 10/03/25(Fri)15:09:07 No.106781061

https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025/
>The CUDA backend for llama.cpp can now run on ZLUDA. We've done some preliminary measurements and found the performance to be within range of the results measured by Phoronix on ROCm (Latest Open-Source AMD Improvements Allowing For Better Llama.cpp AI Performance Against Windows 11 - Phoronix). We're interested in your feedback, if it doesn't work or you are getting worse performance than with ROCm, please share in the issues.
>>106781053
What does wsl stand for again?

Anonymous
10/03/25(Fri)15:13:22 No.106781094

Anonymous 10/03/25(Fri)15:13:22 No.106781094

Maybe its ooba?

Anonymous
10/03/25(Fri)15:14:33 No.106781103

Anonymous 10/03/25(Fri)15:14:33 No.106781103

why is stuff like flash attention and triton so slow to be added to windows, there is a trillion dollars in ai atm

Anonymous
10/03/25(Fri)15:15:22 No.106781109

Anonymous 10/03/25(Fri)15:15:22 No.106781109

>>106781061
>What does wsl stand for again?
windows subsystem for linux

>ZLUDA
Huh, I thought that project was dead.

>>106781103
>there is a trillion dollars in ai atm
And almost nothing of it being invested in these software projects.

Anonymous
10/03/25(Fri)15:17:54 No.106781136

Anonymous 10/03/25(Fri)15:17:54 No.106781136

>>106780194
>>106780264
You think automated systems are smart enough to tell apart some indirect political point from a movie?

Anonymous
10/03/25(Fri)15:21:59 No.106781170

Anonymous 10/03/25(Fri)15:21:59 No.106781170

>>106781103
Those are corpo projects and corpos by and large only care about datacenter use.

Anonymous
10/03/25(Fri)15:22:42 No.106781180

Anonymous 10/03/25(Fri)15:22:42 No.106781180

>>106781060
I don't like this Miku

Anonymous
10/03/25(Fri)15:23:05 No.106781185

Anonymous 10/03/25(Fri)15:23:05 No.106781185

>>106781053
ooba is an antiquated piece of shit
just use lm studio

Anonymous
10/03/25(Fri)15:25:06 No.106781197

Anonymous 10/03/25(Fri)15:25:06 No.106781197

File: glm-miku-horrors.svg.png (68 KB, 1697x1127)

68 KB PNG

>>106778073
Cute!
picrel Q3_K_M hmm

Anonymous
10/03/25(Fri)15:27:13 No.106781217

Anonymous 10/03/25(Fri)15:27:13 No.106781217

I like feet

Anonymous
10/03/25(Fri)15:32:11 No.106781257

Anonymous 10/03/25(Fri)15:32:11 No.106781257

>>106778073
>>106781197
What's the exact prompt?

Anonymous
10/03/25(Fri)15:34:00 No.106781277

Anonymous 10/03/25(Fri)15:34:00 No.106781277

>>106781217
What kind of feet? Remove your socks, look down, and coom. That is if you have feet.
>Can a person without legs wash their feet?

Anonymous
10/03/25(Fri)15:37:07 No.106781317

Anonymous 10/03/25(Fri)15:37:07 No.106781317

File: Azula-Test.png (2.76 MB, 1644x812)

2.76 MB PNG

>>106777408
Good evening /lmg/. Made yet another slop tune. This time trained on an entire 4chan board :)

https://huggingface.co/AiAF/bf16_Merged-11268_gemma-2-2b-it-co-sft-qlora

Dataset used: https://huggingface.co/datasets/AiAF/co-sft-dataset

Anonymous
10/03/25(Fri)15:45:43 No.106781408

Anonymous 10/03/25(Fri)15:45:43 No.106781408

>>106781317
cool, what made you pick /co/?

Anonymous
10/03/25(Fri)15:46:34 No.106781417

Anonymous 10/03/25(Fri)15:46:34 No.106781417

>>106781053
>switch to wsl ubuntu
Why would you do this? Linux is faster than Windows because it doesn't have massive amounts of bloatware running in the background so you're not going to get any extra performance doing that.

Anonymous
10/03/25(Fri)15:50:22 No.106781452

Anonymous 10/03/25(Fri)15:50:22 No.106781452

File: Screenshot_20251003_15493(...).png (1.59 MB, 670x1212)

1.59 MB PNG

>>106781408
On a whim. I almost did /r9k/ at first but I felt like training it on a blue board's posts instead. Surprisingly even at over 11,000 steps the training loss hasn't even plateaued yet in the evil loss still continues to drop. Maybe after the 10th epoch I'll call it quits, merch that one, and then pick another board. Got any recommendations? By the way the original source data set was ripped from this repo if anyone's interested:

https://huggingface.co/datasets/lesserfield/4chan-datasets

Anonymous
10/03/25(Fri)15:54:13 No.106781500

Anonymous 10/03/25(Fri)15:54:13 No.106781500

>>106781452
>Got any recommendations?
i'd just pick the most schizo board desu though not sure which that'd be

Anonymous
10/03/25(Fri)15:54:42 No.106781502

Anonymous 10/03/25(Fri)15:54:42 No.106781502

So how much vram do I need to future proof my ai generation? For the next five years? I really don't want to spend 10k on a Ada 6000 pro just to be outclassed next year. So I had a 3060 for 5 years a 4070 for 2 years, and I am thinking I just go with the 5090 desktop since the others were laptops. I want to be able to generate video and train ckpts and Lora and maybe even train video ckpts. Would it make sense to get a desktop that can hold several cards and just upgrade by buying the latest again in couple years? Or just go full retard and get a system with 98gb vram?

Anonymous
10/03/25(Fri)15:55:01 No.106781506

Anonymous 10/03/25(Fri)15:55:01 No.106781506

>>106781500
/vg/ at your service

Anonymous
10/03/25(Fri)15:55:36 No.106781514

Anonymous 10/03/25(Fri)15:55:36 No.106781514

>>106781502
>next five years
lol we can't even predict next year

Anonymous
10/03/25(Fri)15:57:58 No.106781545

Anonymous 10/03/25(Fri)15:57:58 No.106781545

>>106781514
Fair point.

Anonymous
10/03/25(Fri)15:59:58 No.106781578

Anonymous 10/03/25(Fri)15:59:58 No.106781578

File: mikusvgprobs.png (176 KB, 1455x1487)

176 KB PNG

>>106781257
Make something up? interesting to observe the sampling

Anonymous
10/03/25(Fri)16:00:47 No.106781592

Anonymous 10/03/25(Fri)16:00:47 No.106781592

>>106781502
The best thing you can do to "future proof" is get as much vram as you can on a single relatively modern card and stack those as time goes on.

Anonymous
10/03/25(Fri)16:02:19 No.106781611

Anonymous 10/03/25(Fri)16:02:19 No.106781611

>>106781592
there is this thing called electricity, good luck running enough cards on residential power, if you really want to run these local your best bet is a server or a mac

Anonymous
10/03/25(Fri)16:02:45 No.106781617

Anonymous 10/03/25(Fri)16:02:45 No.106781617

>>106781502
qrd on the image/video-gen space?
more frames + greater res = need more vram for reasonable perf ?
or does it top out at 8gb or something regardless of what you're doing?

Anonymous
10/03/25(Fri)16:07:19 No.106781673

Anonymous 10/03/25(Fri)16:07:19 No.106781673

>>106781185
>ooba is open sourced
>lm studio is closed source
Easy choice. Now go buy an ad.

Anonymous
10/03/25(Fri)16:12:05 No.106781715

Anonymous 10/03/25(Fri)16:12:05 No.106781715

>>106781502
Get a chink 4090D or an rtx pro 6000. I've seen some anons on /ldg/ complain that the 32gb on a 5090 isn't enough for good video gen.
>Would it make sense to get a desktop that can hold several cards
Maybe for LLMs but if you don't care then one card is enough.

Anonymous
10/03/25(Fri)16:12:56 No.106781725

Anonymous 10/03/25(Fri)16:12:56 No.106781725

>>106781715
do NOT get a 4000 series card, 5000+ has too many speed ups these days to not have.

Anonymous
10/03/25(Fri)16:13:07 No.106781726

Anonymous 10/03/25(Fri)16:13:07 No.106781726

>>106781592
So basically stack 3090s since its DDR6. And 4 of them is 96gb. I guess my next question is does the type of DDR matter for generating and training?

Anonymous
10/03/25(Fri)16:13:55 No.106781739

Anonymous 10/03/25(Fri)16:13:55 No.106781739

>>106781715
isn't good enough or they're just pathetic brain fried zoomers who can't wait a few more seconds for a gen? which one is it

Anonymous
10/03/25(Fri)16:14:27 No.106781750

Anonymous 10/03/25(Fri)16:14:27 No.106781750

>>106781726
3090s lock you to slow text gen, for video gen a 5090 is like 8x faster, 4x faster than a 4090

Anonymous
10/03/25(Fri)16:14:46 No.106781753

Anonymous 10/03/25(Fri)16:14:46 No.106781753

File: glm-miku6.svg.png (67 KB, 908x917)

67 KB PNG

>>106781197
a little more >prompt engineering and top_p 0.98

Anonymous
10/03/25(Fri)16:14:48 No.106781755

Anonymous 10/03/25(Fri)16:14:48 No.106781755

>>106781611
I don't live in a third world cunt

Anonymous
10/03/25(Fri)16:16:01 No.106781767

Anonymous 10/03/25(Fri)16:16:01 No.106781767

>>106781755
>let me just run 12 400W cards off a single circuit
I sure hope you are not retarded enough to run a single system off of multiple circuits anon... or you are going to eventually find out why that is a bad idea and lose all your gpus

Anonymous
10/03/25(Fri)16:16:35 No.106781775

Anonymous 10/03/25(Fri)16:16:35 No.106781775

>>106781617
Asking the wrong dude. I'm the one with the questions, I just want to train and make HQ visuals

Anonymous
10/03/25(Fri)16:16:37 No.106781776

Anonymous 10/03/25(Fri)16:16:37 No.106781776

>>106781725
Like what? Sage attention in general trades quality for speed btw.
>>106781739
Not enough vram to make long videos or ones at a high resolution and you can't run wan without quanting it at 32gb. I don't know the specifics about training, but I imagine you need more vram and that anon wants to future proof. Models aren't going to get smaller.

Anonymous
10/03/25(Fri)16:16:58 No.106781780

Anonymous 10/03/25(Fri)16:16:58 No.106781780

File: hotelmining.jpg (3.02 MB, 1560x9600)

3.02 MB JPG

>>106781611
>>106781767
>not splicing someone else's feed to run more GPUs

Anonymous
10/03/25(Fri)16:17:54 No.106781793

Anonymous 10/03/25(Fri)16:17:54 No.106781793

>>106781452
>Got any recommendations?
/v/ or /vg/

Anonymous
10/03/25(Fri)16:18:07 No.106781798

Anonymous 10/03/25(Fri)16:18:07 No.106781798

>>106781776
sage 3.1 +nunchaku, plus soon dc gen, there are other ones as well, and no the difference in quality is / will be almost nothing

Anonymous
10/03/25(Fri)16:19:37 No.106781815

Anonymous 10/03/25(Fri)16:19:37 No.106781815

>>106781780
the point is that he would have to have a commercial electric panel put in / have his house rewired or at least a server room + you will need cooling

Anonymous
10/03/25(Fri)16:20:05 No.106781821

Anonymous 10/03/25(Fri)16:20:05 No.106781821

>>106781452
Cool

Anonymous
10/03/25(Fri)16:20:25 No.106781828

Anonymous 10/03/25(Fri)16:20:25 No.106781828

>>106781767
12 cards? At max my idea was 4 3090s. I can see where you concern is, and thanks for clarification thru my snark. I will take your point into consideration as well. So thanks again.

Anonymous
10/03/25(Fri)16:21:16 No.106781835

Anonymous 10/03/25(Fri)16:21:16 No.106781835

>>106781828
96GB is nothing these days though, there is no model worth using cept maybe glm air that would fit

Anonymous
10/03/25(Fri)16:21:38 No.106781841

Anonymous 10/03/25(Fri)16:21:38 No.106781841

>>106781798
>sage 3.1
Sage attention 2 definitely lowered the quality of my gens with flux/chroma, it's not lossless. Sage 3 should be worse since it uses FP4.
>nunchaku
Equivalent or slightly better than a Q4 quant.
Don't know about the others but there's always a trade off, that includes lightning loras too.

Anonymous
10/03/25(Fri)16:22:23 No.106781848

Anonymous 10/03/25(Fri)16:22:23 No.106781848

>>106781841
>Sage attention 2 definitely lowered the quality of my gens with flux/chroma
the difference is negligible and could be fixed with like 1 extra step and be much faster still

Hi all, Drummer here...
10/03/25(Fri)16:22:24 No.106781849

Hi all, Drummer here... 10/03/25(Fri)16:22:24 No.106781849

Hey friends, this Cydonia is lit AF frfr

https://huggingface.co/BeaverAI/Cydonia-24B-v4q-GGUF/tree/main

Try it out! Will release it soon

Anonymous
10/03/25(Fri)16:23:24 No.106781862

Anonymous 10/03/25(Fri)16:23:24 No.106781862

>>106781841
same with lightning loras, you dont % use them for 100% of steps, you establish motion without it, then use it for the steps after which greatly speeds it up

Anonymous
10/03/25(Fri)16:25:46 No.106781890

Anonymous 10/03/25(Fri)16:25:46 No.106781890

>>106781835
Anon what you smoking 96gb is enough to do 99.95% everything you could want to do. Why you fudding?

Anonymous
10/03/25(Fri)16:26:34 No.106781897

Anonymous 10/03/25(Fri)16:26:34 No.106781897

>>106781848
>the difference is negligible
Maybe if you're running flux/chroma quanted but I run it at bf16 and there's also a noticeable difference between q8/fp8 and bf16. This is with complex prompts but the point stands. It's not negligible if the model starts dropping details.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.