[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


🎉 Happy Birthday 4chan! 🎉


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106769660 & >>106762831

â–ºNews
>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c
>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6
>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552
>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview

â–ºNews Archive: https://rentry.org/lmg-news-archive
â–ºGlossary: https://rentry.org/lmg-glossary
â–ºLinks: https://rentry.org/LocalModelsLinks
â–ºOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png

â–ºGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

â–ºFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

â–ºBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

â–ºTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

â–ºText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: no particular reason.jpg (306 KB, 1536x1536)
306 KB
306 KB JPG
â–ºRecent Highlights from the Previous Thread: >>106769660

--Papers:
>106774512 >106774610 >106774669 >106774797
--Frustrations with mod approval and sharing a character customization addon:
>106770110 >106770207 >106770482 >106770215 >106770262 >106770425 >106771674 >106771801 >106771993 >106772136 >106772401 >106772464
--GLM 4.6 struggles with speed and knowledge compared to K2 despite smaller size:
>106771605 >106771704 >106772093 >106772555 >106772685 >106771712 >106771798 >106771958 >106772080 >106772191
--GLM 4.6 model erratic behavior and potential quantization/formatting issues:
>106770753 >106770772 >106770827 >106771431 >106772929 >106772970 >106773019 >106773025 >106771510 >106771662
--Local model performance benchmarks and hardware optimization discussions:
>106773216 >106773254 >106773280 >106773320 >106773366 >106773493 >106776426
--Optimizing chat system formatting for AI interactions:
>106776741 >106776825 >106776959 >106777047 >106777114
--GLM 4.6's high VRAM consumption at large context lengths:
>106773651 >106773712
--VRAM management challenges for large models on 24GB GPUs:
>106774461 >106774484
--GLM-4.6 model quantization performance comparison:
>106770710 >106770745
--2d anime image generation hardware budget and NPU software limitations:
>106769831 >106769845 >106769847 >106769852 >106769866 >106769947 >106770102
--Replacing llama.cpp binaries with CUDA-optimized builds for GLM 4.6 via ooba's UI:
>106773113
--Recommended RAM for local LLMs: 128GB minimum, 192GB dual-channel, >500GB server options:
>106776386 >106776395 >106776400 >106776494 >106777198 >106777256 >106777308 >106777351
--A100 pricing vs consumer GPUs and commercial licensing considerations:
>106776566 >106776597 >106776653 >106776693
--Logs:
>106769725 >106770080
--Miku (free space):
>106769691 >106770398 >106770451 >106770366 >106770215

â–ºRecent Highlight Posts from the Previous Thread: >>106769663

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Not Migu, abandon the thread
>>
Is my DDR4 really crippling my performance on GLM 4.6 that much?
>>
new model when
>>
>>106777694
This. It's been over 24 hours since the last new model drop. Local is dead.
>>
Did anyone manage to get gpt-oss-120b to ERP properly?
>>
>>106777689
https://www.servethehome.com/guide-ddr-ddr2-ddr3-ddr4-and-ddr5-bandwidth-by-generation/
>>
>>106777408
whomst is this purple slut?
>>
>>106777689
The biggest crippling factor is memory channels. A shitty ddr4-2400 epyc with 8 channels is going to run much faster than a ddr5-6000 gayman board with 2 channels.
>>
>>106777726
So then yes. I have octo-channel DDR4 2400MT/s. I need a next gen EPYC now.
>>106777777
Nice digits. What you are describing is exactly what I have.
>>
>>106777777
Checked.
>>
>>106777728
purple teto
>>
Do I need to set the Oobabooga parameters in addition to the Silly Tavern ones?
>>
Hey guys. Is 4.6 really that yappy with its thinking? I've been trying an IQ1 quant and the thinking is like 2 paragraphs most of the time in RP. Did the quanting kill its reasoning capability?
>>
>>106777689
>numbers you getting vs numbers ddr5 people are getting
>you happy with your numbers?
>you got money to go to ddr5?
>>
>>106777808
no the sillytavern ones will take precedence when the user prompt is submitted
>>
>>106777858
Thanks!
>>
>>106777852
7.5t/s vs 15t/s
No
Also no
>>
File: 1749143747579428.png (232 KB, 2016x1374)
232 KB
232 KB PNG
>>106777781
I've made this exact switch a couple of months ago. Here's some rudimentary tests I made at the time about bandwidth. The speed gain isn't that much if you're only keeping the experts on CPU and the rest on GPU but it's still a huge jump.
>>
>>106777725
Define "proper RP"
>>
whats the flavour of the month model for vramlets (16 gbs)
>>
File: glm_miku.png (27 KB, 400x500)
27 KB
27 KB PNG
GLM-chan drew migu.
>>
>>106777728
Utane Uta. Never heard of her.
>>
>>106777850
It's about this yapping: >>106772093
>>
is there anything I can run on 6gb vram 32gb system ram?
>>
>>106778073
4.6? That's pretty fucking good. These things have come a long way in 2 years.
>>
>>106778105
Mistral Nemo Instruct Q4KS pretty slowly, Qwen 3 30B A3B not that slowly.
>>
>>106777996
Shit. That is the exact data I was looking for. How much did you spend on the upgrade?
>>
>>106778105
Anything up to 30B, really.
>>
>>106777996
>ddr5-6400 x12
can you even run them with expo/xmp bro? I guess theyre running at the standard jdec no? is it ecc?
>>
>>106777996
Glad I didn't fall for the cpumaxxing meme.
>>
>>106778214
yeah bro let's just buy a stack of h100s, it's way better
>>
>>106778214
cope
>>
>>106778156
MB: 1300€ (a single socket mb would've like 500 bucks cheaper)
CPU: 2600€
RAM: 3800€ (12x64GB Samsung M321R8GA0EB2-CCP)
It's quite a bit of money, not even considering what I had already lying around. It's probably not worth it if you're only looking for ERP at better speeds.
>>106778198
ECC RAM is a basic requirement for Epyc processor so it won't run with anything else. There's also no EXPO with these processors so all those cheaper Threadripper ECC kits that run at 4800 natively with a potential EXPO boost to 6000 will only run at their native speed. You need DIMMs that do this speed natively which adds to the price.
>>
>>106778404
My projections put it at about $12K for a worthwhile upgrade. You can get an EPYC 9124 for $900, but then it would be very slow. A good 8x96gb kit is around $4500.
>>
>>106778453
for 9005 series, what's minimum required CCDs to utilize all 12 channels again?
>>
File: 1740319565517440.png (87 KB, 858x530)
87 KB
87 KB PNG
>>106778453
>EPYC 9124
You have to be careful here. The cheaper EPYC processors often can't make use of all their memory channels due to technical constraints. This means their bandwidth is going to be less than advertised.
For Epyc 9004 the cutoff is the 9334 which has those weird dual memory links while the 9005s have the 9135 and 9175F which are close to saturating its channels.
https://jp.fujitsu.com/platform/server/primergy/performance/pdf/wp-performance-report-primergy-rx2450-m2-ww-ja.pdf
Here's some data on dual-socket builds Fujitsu has gathered in benchmarks. Check page 14.
>>
>>106778112
Yeah.
>>
>>106778486
No idea.
>>106778493
I was thinking of going with a threadripper pro anyway. I was just using the 9124 as an example.
>>
>>106778506
Don't quote me on it but I'm pretty sure I came across something saying that Threadripper has the same issue on the cheaper models while I was doing research on this retarded CCD bottleneck issue for my build. So be careful.
>>
Hope I'm in the right place. I've never used ai before It's all I ever hear about online I assume I'm very late to this stuff. Can my gaming PC run AI stuff? It has a 5090 with 128gb of ram.
>>
>>106778554
That's not bad at all.
Download koboldcpp, go to huggingface, search for bartowski glm air gguf, download the Q6 or Q8 version.
>https://github.com/LostRuins/koboldcpp/wiki#quick-start
Also, look for Silly Tavern to use as a frontend.
There's some information in the OP that you can use, even if a little outdated.
>>
>>106778554
yes. you can run shit on that definitely. I think glm might be doable, read this and the last thread for details
>>
>>106778537
what CPU did you end up going with?
>>
>>106778576
wow fast response thanks for the info! will check those out.
>>
>>106778594
Epyc 9355. I was considering the 9135 but nobody could properly explain why exactly it's showing those speeds in the fujitsu benchmarks despite being a 2 CCD model going by its tiny L3 cache, which is why I didn't trust it. The 9175F was only like 200 bucks cheaper than the 9355 while only having 16 cores instead of 32 so I went for the latter.
If you're fine with 4800mhz RAM there's always those 600 euro chinese 9334 QS on ebay that other anons have used for their CPUMAXX builds.
>>
>>106778624
Those will get you on the right track, but you'll have to fiddle with stuff and learn as you go.
For example, you want to put all the layers of the model on the gpu but offload most/all expert tensors to the CPU/RAM. You'll figure out what that means by fucking around with the koboldcpp UI and reading their wiki.
>>
>>106778632
that sounds like a good choice. I saw some redditor somewhere reporting suspiciously low t/s numbers on the 9175F for deepseek at q4. could be wrong but personally I thought that CPU manages to enter compute-bound territory somehow
>>
>>106778537
I was thinking of at least a 48 core threadripper pro, that should be fine right?
>>
>>106778782
what is "fine" really? WRX90 supports 8 memory channels. that's fine compared to Ryzen (2), but EPYC is a little more fine (12)
>>
>>106778813
and btw it costs less, no reason to go MEMERIPPER when ayypic is there (unless youre a smelly gamer and care about high niggahertz)
>>
>>106778073
what drawing language was the output in? SVG?
>>
File: file.png (190 KB, 782x723)
190 KB
190 KB PNG
Will I ever have a local LLM that doesn't the left mental illness?
>>
>>106778554
you can run a lower quant of glm 4.6
>>
>>106778701
Yeah, I also came across somebody saying that too few cores might fail to saturate the memory channels or some shit in actual use. No clue if that's bullshit or not but it didn't help that the only hard testing I found for the 9175F was on those pointless asrock ddr5 mainboards that only have 8 ddr5 DIMM slots.
>>106778782
Sounds like it but I'm really not into the subject matter with Threadripper models. At least with Epyc, the biggest 4 CCD model should be the 9334/9335 ones which have 32 cores but with dual memory links to compensate, so their speeds are okay. Meanwhile everything else with 32+ cores has 8 or more CCDs. This means that with Epyc, you should be fine with any CPU that's 32 cores or above.
But I have no clue if this also applies to Threadripper or if there's any exotic shit going on here.
>>
>>106778840
SVG, yes.
>>
>>106778813
>>106778819
I did not really see much of a reason to go for EPYC because I would not be able to use 12 channel memory while also using 4 GPUs unless I use risers and a non standard case, which is what I currently have, and don't really want to do anymore. I have scoured the Internet for every single motherboard and none of them have everything that I need.
>>106778915
I see. Maybe I should take a closer look at the CCDs.
>>
File: 2028-7-09.png (473 KB, 745x437)
473 KB
473 KB PNG
Current consumer grade hardware technology is already outdated.. I demand we leap frog in time NOW!
>>
>>106778979
chinese inference-focused machine that runs 800b/40a moe models at 50t/s for $3000 any day now for sure
>>
>>106778915
yeah. and btw, even with 8 channels, the guy was at like 9 t/s, and that's for q4 remember. sounds low to me
>>
>>106778855
What model is that?
>>
File: G2Pk9qxaYAAIFnx.jpg (22 KB, 540x354)
22 KB
22 KB JPG
>>106777728
>>
File: file.jpg (120 KB, 1954x409)
120 KB
120 KB JPG
>>106777578
>>106778073
>>
Bilibili now supports CN->EN video translation with voice cloning. Any guess what the model might be?
>>
people are waking up to benchmarks
https://www.reddit.com/r/LocalLLaMA/comments/1nx18ax/glm_46_is_a_fuking_amazing_model_and_nobody_can/
>>
>>106779140
RP isn't a real world use case for productive people.
>>
>>106779095
Yes, and?
>>
>>106779153
its talking about coding
>>
CoomBench (Vanilla/Extended)
>>
>>106779104
IndexTTS2
>>
Would I be able to fit a Blackwell Pro 6000 in this motherboard while in a case?
>>
>>106779140
Artificial Analysis has so much fucking wrong with it it's hilarious
>>
File: saaaaar.png (40 KB, 1290x222)
40 KB
40 KB PNG
>>106779140
>>
File: 13-145-568-01.png.png (1.19 MB, 1280x1715)
1.19 MB
1.19 MB PNG
>>106779220
forgot image
>>
>>106779230
The CPU cooler and RAM is definitely going to block you on this stupid mainboard layout
>>
File: iu[2].jpg (36 KB, 474x266)
36 KB
36 KB JPG
>>106779230
That's a rack server motherboard right?
>>
>>106778923
Do you have the prompt? I don't think there's a standardized LMG SVG mikugen test prompt
>>
>>106779246
Thought so. So in other words, quad GPUs with 12 channel memory in a normal case is not possible.
>>106779256
Yes. I just hate my current server rack. I would prefer a workstation configuration.
>>
>>106779230
no, but you can use risers
>>
>>106779282
this
>>
>30b worse than 8b
moesisters our response?
>>
>>106779282
I could use risers in a normal case, would the RAM still interfere if I were to mount my GPUs horizontally?
>>
>>106779230
just use pcie risers
>>
>>106779306
>3b worse than 8b
dense sissies??
>>
File: migu2.png (33 KB, 400x400)
33 KB
33 KB PNG
>>106779270
I don't for that one, but for this it was "Draw me a Miku as SVG." The other one was similar.
>>
>>106779323
if the active parameters is the only thing that matters, MoE would be useless
>>
>>106779317
In a rack?
>>
>>106778840
I look like this
>>
>>106779276
Quad GPUs without risers would be impossible on that mainboard anyway. The first and third from the right are immediately next to the next slot so any 2 slot card would block the one to the left of them.
>>
>>106779349
It is.
>>
File: 1728257099098402.png (729 KB, 1000x1000)
729 KB
729 KB PNG
>>106779351
This is also called a rack
>>
>>106779306
- I lke that it's the same org that produced both.
- Are the numbers considered close enough for their accuracy to be considered pretty much the same?
- In which case it, at the inference end, it boils down to trading memory for tok/s.
- At the training end, maybe there is a difference in cost?
>>
>>106779349
Computation cost growth is quadratic wrt param size
>>
>>106779401
>>106779414
then what's the point of MoE?
>>
>>106779428
ignore the poor fag, he is trying to cope with his 70B
>>
/lmg/ Sirs — is it wise to install Linux? I'm afraid my performance will tank.
>>
>>106779025
Mistral AI + Nvidia
>>
>>106779095
Miku.sh was the ultimate mistake. it single-handedly instigated the genesis of the thinkslop we suffer today
>>
>>106779433
Anyone calling other people poorfags should be required to post their H100s with a timestamp, otherwise shut the fuck up
>>
>>106779512
you should commit sudoku for not already being on linux
>>
>>106779535
But I have other software what I need to use... I'm already 75% committed to install Linux. Just need to wrap up some backups. Dual-booting is stupid, it's all or nothing btw.
Undervolting my gpu is the biggest issue but apparently that's "fine" too in Linux.
>>
>>106779530
>their h100s
thx for proving your a retarded vramlet
>>
>>106779512
>is it wise to install Linux?
what benefit are you looking to get out of using linux? I wouldn't switch OS'es for no reason at all.
>I'm afraid my performance will tank.
given you have things set up correctly on windows/linux, the performance difference is negligible. in my experience getting llama.cpp to run at full-speed is easier on linux than it is on windows, but I am biased.
>>
>>106779559
I don't think there's anything inherently stupid about dual booting btw. do whatever makes you happy
>>
File: 23l73n8v4amf1.mp4 (1.43 MB, 960x540)
1.43 MB
1.43 MB MP4
Enough is enough! I've had it with corpo scum Nvidia stalling progress via delivering bottom barrel RND scrap! If we don't have synth wifes performing backflips onto our cocks within the next 5 years it will be because of Nvidia! We need next generation hardware coming out every 6 months this is the change required if we are to pass the great filter we must hurry the fuck up! BEFORE THE NEXT CARRINGTON EVENT WIPES OUT OUR FUTURE!
>>
>>106779571
>benefit
Uhh, I love unix-like system but haven't used anything like that at home since I had some SGI machines (irix) ages ago. At work, yes but that's completely different as it's just about using certain software..
I can easily transfer my stuff to linux as most of my personal llama-server stuff is python based anyway.
>>106779591
Not per se but I mean that eventually you'll spend more time on the other system and therefore dual-booting is sort of fallacy and waste of time.
>>
messing around with VibeVoice 7B on my rtx 5090. Input audio is cleaned up with Resemble Enhance and acon digital deverberate 3.
input audio
https://www.youtube.com/watch?v=1Jp4Ce8yStA
output file
https://vocaroo.com/1iNMH2wAVkPH
>>
>>106779619
full-send switch to linux, I wouldn't worry about performance issues. just make sure you choose a distro that plays nice with cuda drivers. anything with debian lineage will be easy to set up.
>>
>>106779632
It only has mid/high frequencies left.
>>
>>106779632
looks like he's talking on a phone, I like that effect but c'mon
>>
File: copium.png (178 KB, 400x388)
178 KB
178 KB PNG
>>106779612
That won't happen again it was a fluke. We are safe.
>>
>>106779512
>Linux
What's your alternative? Microsoft only makes an ad delivery and behavioural analytics system now. That it can also run programs is incidental.
>>
>>106779512
Using linux can give you more headroom to fit models and run things faster. It's honestly ideal for local stuff and that's why I switched.
>>
>>106779560
Oh you don't have any? Or B100s? Or B200s? Or how about GB100s (you definitely don't have those)? Or even A100s? Then shut the fuck up and never call anyone else poor ever again
>>
>>106779843
I have a 1.5TB ddr5 setup, who the fuck is running shitty tiny models anymore
>>
>>106779852
p-poor fag
>>
Still no new model drops today? Fucking aye man. I need more models!
>>
ring-1t ggufs fucking when
>>
>>106779871
>spend $10K for 96GB to run shitty tiny model at 100 tks vs spending $20k to run the best cloud level models at 30tks+
hmmm...
>>
>>106779852
Literally anyone who doesn't want to drop several grand on advanced shivers down their spine, because it's a fucking HOBBY and there isn't a single company that's actually trying to make reasonably sized models for the average consumer and everyone who enables this dumbass "just make the models bigger! (so they can get marginally better benchmarks without any innovation or effort, so they can squeeze out more investor money)" idea is part of the problem and you need to stop sucking off corporate models that are not made for you and will soon be beyond your reach completely because they will just keep making them bigger until you can't even make a ram-based build that fits Q1-XXXS
>>
>>106779914
that is some crazy cope there. There is no free lunch, bigger = better.
>>
>>106779919
Imagine for a second, that they stopped innovating on computers in the 20th century, and just kept making them bigger and adding more shit on instead of making the parts smaller until they couldn't fit computers into buildings anymore. Do you realize how fucking stupid that is? Do you realize how fucking stupid you sound? Do you get it? Stop repeating buzzword phrases like a mindless drone and think for a second, dipshit
>>
>>106779914
The entire point of open source is to btfo openai and jewgle. Zucc gang is obviously too retarded to do it, so we need chinks. Thus chinese have high priority to make smollm work, because it's the ultimate blow to the west, which is fully dependent on winning the AI race. Our economy would instantly implode if that ever happens.
>>
>>106779939
imagine just adding more transistors, that would be crazy. /s
>>
>>106779883
ddr5 ain't getting 30tks even on empty context, you faggot
>>
>>106779939
If we had good alternative prospects then I expect we'd pursue then.
>>
>>106779961
>/s
They have a site for you over here, check it out >>>/reddit/
>>
>>106779973
12 channels faggot, this aint your desktop
>>
>>106779914
words words words, but every day that goes by is another day I'm getting good use out of my setup and enjoying life
I'm glad there are giant models, because it enables performance that appears otherwise impossible
>>
>>106779973
nta, but I think you could hit 30tk/s on sota with a well-spent $20k
>>
>>106779989
still not happening outside of your dream, maybe half that on empty context and a tenth of that with a character card
>>
>>106779718
>younger generations dont even use desktop computers at all, just their phones
>desktop market share rapidly shrinking as old users die off
>microsofts brilliant plan to fix it is to further drive all their old users away by turning windows into an AD and spy platform
oh pajeetsoft, nobody will miss you
>>
>>106779998
oh you "think", well i'm convinced
>>
>>106780006
you are just plain wrong
>>
>>106780006
>not having a threadripper
look at this coping poorfag
>>
>>106780020
>>106780028
post benchmarks, never seen more than 15tks on empty context but go ahead and prove me wrong with your richfag rig
>>
I just looked at threadripper CPU prices and had a shock...
WHAT THE FUCK ARE THOSE PRICES?
13K USD FOR A CPU? WTF?
>>
>>106780066
first time seeing workstation prices?
>>
File: Untitled.jpg (173 KB, 1409x635)
173 KB
173 KB JPG
>>106780066
>96 cores
But that's a supercomputer...
>>
>>106780066
threadripper is overinflated vs epyc for the kind of performance you're looking to achieve. Check for chink QS/ES versions on eBay if you feel like rolling the dice
>>
>>106779914
It's just a hobby for early adopters but I don't see why it couldn't grow to be a media giant like film/vidya
>>
>>106780066
Could try looking for QS and ES chips on ebay?
Though 9__5 QS/ES looked gimped harded compared to final than 9__4 QS/ES compared to final.
>>
>>106780155
https://www.reddit.com/r/nvidia/comments/1mf0yal/
we can't let reddit win bros, wheres our richanons?
>>
>>106780155
>incredible
>colossal
>cooler no included
>>
What happened to "safety" at OpenAI?
https://files.catbox.moe/as7xpq.mp4
>>
>>106780173
>2xL40S, 2x6000 ADA
That's a poorfag build.
>>
>>106780194
>safety
You're after safety... for machines?
>>
>>106780204
haha yeah even my rig is better, i sure hope anons in here don't have less than that
>>
>>106780225
>for machines?
what about humans?
https://files.catbox.moe/nr3fk0.mp4
>>
>>106780188
It needs a water bucket.
>>
>>106780264
Try other figures from history.
>>
>>106780173
>2xL40S, 2x6000 ADA, 4xRTX 6000 PRO
how much money is that for the GPUs alone?
>>
What would be the best local model for ERP (and other task, I don't want a horny encyclopedia or writing assistant, or maybe I do now that I think about it) if I have 12GB of VRAM (RTX 3060)?
>>
>>106780393
As a friend of mine would say as he ran into people who had no idea what they were getting into: "you're fucked"
>>
>>106780393
Nemo
>>
I NEED
NEW
MODELS!!!!
>>
>>106780504
train your own
>>
>>106780511
That's the best way to lose interest in using the models, actually.
>>
>>106780466
Oh I've already been there. I simply took a break and now that I'm back I prefer to ask rather than trying every new model one by one.

>>106780469
I can't find one specific model named "Nemo". Is it on Huggingface?
>>
>>106780573
>Is it on Huggingface?
Is it in the guide in the OP?
>>
>>106780504
You need to spend time engaged in a hobby that demands work be put in to obtain a reward.
>>
>>106780504
https://huggingface.co/DavidAU
>>
>t-there's no way they're running those models locally, noooooooooooooooooooooo... how will I cope?
>>
>>106780504
you don't. change your system prompt instead
>>
>>106780618
holy slopping hell of 8B 4B unproductivity
>>
https://files.catbox.moe/diri53.mp4
bruh...
>>
>>106780720
>8B 4B
You are small time, check this out.
>https://huggingface.co/DavidAU/Qwen2.5-Godzilla-Coder-51B
>>
https://www.reddit.com/r/SillyTavernAI/comments/1nuhidb/your_opinions_on_glm46/
>>
>>106780768
Mhm, interdasting...but for coding I really only consider the top5 api options and dont fuck with local. unfortunately that's required, unless you want to spend more time tard wrangling than vibing.
>>
File: file.png (44 KB, 657x727)
44 KB
44 KB PNG
>>106778073
Huge improvement compared to past models. Must have put some data in there. How does it do when asked to draw using PIL or matplotlib?
Olds for reference:
>>102080804
>>102079522
>>102080359
>>102082930
>>
>>106780602
Well, the guide references "nemo 12b instruct gguf Q4", which the first result on HF is https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct but it's uploaded by Nvidia so I doubt it's gonna comply with NSFW requests :/
>>
>>106780887
>https://rentry.org/recommended-models
>https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main
>>
>>106780887
>but it's uploaded by Nvidia so I doubt it's gonna comply with NSFW requests :/
That's what we in the biz call a "happy fluke".
>>
File: qwen uwu miku.png (54 KB, 919x1183)
54 KB
54 KB PNG
>>106780879
This was Qwen qwq, SVG
>>
>switch to wsl ubuntu
>constantly OOMs with ooba on the same settings as before
yeah linux is useless
>>
File: qwencodermikusvg.png (12 KB, 380x578)
12 KB
12 KB PNG
qwen coder's naive attempt
>>
https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025/
>The CUDA backend for llama.cpp can now run on ZLUDA. We've done some preliminary measurements and found the performance to be within range of the results measured by Phoronix on ROCm (Latest Open-Source AMD Improvements Allowing For Better Llama.cpp AI Performance Against Windows 11 - Phoronix). We're interested in your feedback, if it doesn't work or you are getting worse performance than with ROCm, please share in the issues.
>>106781053
What does wsl stand for again?
>>
Maybe its ooba?
>>
why is stuff like flash attention and triton so slow to be added to windows, there is a trillion dollars in ai atm
>>
>>106781061
>What does wsl stand for again?
windows subsystem for linux

>ZLUDA
Huh, I thought that project was dead.

>>106781103
>there is a trillion dollars in ai atm
And almost nothing of it being invested in these software projects.
>>
>>106780194
>>106780264
You think automated systems are smart enough to tell apart some indirect political point from a movie?
>>
>>106781103
Those are corpo projects and corpos by and large only care about datacenter use.
>>
>>106781060
I don't like this Miku
>>
>>106781053
ooba is an antiquated piece of shit
just use lm studio
>>
File: glm-miku-horrors.svg.png (68 KB, 1697x1127)
68 KB
68 KB PNG
>>106778073
Cute!
picrel Q3_K_M hmm
>>
I like feet
>>
>>106778073
>>106781197
What's the exact prompt?
>>
>>106781217
What kind of feet? Remove your socks, look down, and coom. That is if you have feet.
>Can a person without legs wash their feet?
>>
File: Azula-Test.png (2.76 MB, 1644x812)
2.76 MB
2.76 MB PNG
>>106777408
Good evening /lmg/. Made yet another slop tune. This time trained on an entire 4chan board :)

https://huggingface.co/AiAF/bf16_Merged-11268_gemma-2-2b-it-co-sft-qlora

Dataset used: https://huggingface.co/datasets/AiAF/co-sft-dataset
>>
>>106781317
cool, what made you pick /co/?
>>
>>106781053
>switch to wsl ubuntu
Why would you do this? Linux is faster than Windows because it doesn't have massive amounts of bloatware running in the background so you're not going to get any extra performance doing that.
>>
>>106781408
On a whim. I almost did /r9k/ at first but I felt like training it on a blue board's posts instead. Surprisingly even at over 11,000 steps the training loss hasn't even plateaued yet in the evil loss still continues to drop. Maybe after the 10th epoch I'll call it quits, merch that one, and then pick another board. Got any recommendations? By the way the original source data set was ripped from this repo if anyone's interested:

https://huggingface.co/datasets/lesserfield/4chan-datasets
>>
>>106781452
>Got any recommendations?
i'd just pick the most schizo board desu though not sure which that'd be
>>
So how much vram do I need to future proof my ai generation? For the next five years? I really don't want to spend 10k on a Ada 6000 pro just to be outclassed next year. So I had a 3060 for 5 years a 4070 for 2 years, and I am thinking I just go with the 5090 desktop since the others were laptops. I want to be able to generate video and train ckpts and Lora and maybe even train video ckpts. Would it make sense to get a desktop that can hold several cards and just upgrade by buying the latest again in couple years? Or just go full retard and get a system with 98gb vram?
>>
>>106781500
/vg/ at your service
>>
>>106781502
>next five years
lol we can't even predict next year
>>
>>106781514
Fair point.
>>
File: mikusvgprobs.png (176 KB, 1455x1487)
176 KB
176 KB PNG
>>106781257
Make something up? interesting to observe the sampling
>>
>>106781502
The best thing you can do to "future proof" is get as much vram as you can on a single relatively modern card and stack those as time goes on.
>>
>>106781592
there is this thing called electricity, good luck running enough cards on residential power, if you really want to run these local your best bet is a server or a mac
>>
>>106781502
qrd on the image/video-gen space?
more frames + greater res = need more vram for reasonable perf ?
or does it top out at 8gb or something regardless of what you're doing?
>>
>>106781185
>ooba is open sourced
>lm studio is closed source
Easy choice. Now go buy an ad.
>>
>>106781502
Get a chink 4090D or an rtx pro 6000. I've seen some anons on /ldg/ complain that the 32gb on a 5090 isn't enough for good video gen.
>Would it make sense to get a desktop that can hold several cards
Maybe for LLMs but if you don't care then one card is enough.
>>
>>106781715
do NOT get a 4000 series card, 5000+ has too many speed ups these days to not have.
>>
>>106781592
So basically stack 3090s since its DDR6. And 4 of them is 96gb. I guess my next question is does the type of DDR matter for generating and training?
>>
>>106781715
isn't good enough or they're just pathetic brain fried zoomers who can't wait a few more seconds for a gen? which one is it
>>
>>106781726
3090s lock you to slow text gen, for video gen a 5090 is like 8x faster, 4x faster than a 4090
>>
File: glm-miku6.svg.png (67 KB, 908x917)
67 KB
67 KB PNG
>>106781197
a little more >prompt engineering and top_p 0.98
>>
>>106781611
I don't live in a third world cunt
>>
>>106781755
>let me just run 12 400W cards off a single circuit
I sure hope you are not retarded enough to run a single system off of multiple circuits anon... or you are going to eventually find out why that is a bad idea and lose all your gpus
>>
>>106781617
Asking the wrong dude. I'm the one with the questions, I just want to train and make HQ visuals
>>
>>106781725
Like what? Sage attention in general trades quality for speed btw.
>>106781739
Not enough vram to make long videos or ones at a high resolution and you can't run wan without quanting it at 32gb. I don't know the specifics about training, but I imagine you need more vram and that anon wants to future proof. Models aren't going to get smaller.
>>
File: hotelmining.jpg (3.02 MB, 1560x9600)
3.02 MB
3.02 MB JPG
>>106781611
>>106781767
>not splicing someone else's feed to run more GPUs
>>
>>106781452
>Got any recommendations?
/v/ or /vg/
>>
>>106781776
sage 3.1 +nunchaku, plus soon dc gen, there are other ones as well, and no the difference in quality is / will be almost nothing
>>
>>106781780
the point is that he would have to have a commercial electric panel put in / have his house rewired or at least a server room + you will need cooling
>>
>>106781452
Cool
>>
>>106781767
12 cards? At max my idea was 4 3090s. I can see where you concern is, and thanks for clarification thru my snark. I will take your point into consideration as well. So thanks again.
>>
>>106781828
96GB is nothing these days though, there is no model worth using cept maybe glm air that would fit
>>
>>106781798
>sage 3.1
Sage attention 2 definitely lowered the quality of my gens with flux/chroma, it's not lossless. Sage 3 should be worse since it uses FP4.
>nunchaku
Equivalent or slightly better than a Q4 quant.
Don't know about the others but there's always a trade off, that includes lightning loras too.
>>
>>106781841
>Sage attention 2 definitely lowered the quality of my gens with flux/chroma
the difference is negligible and could be fixed with like 1 extra step and be much faster still
>>
Hey friends, this Cydonia is lit AF frfr

https://huggingface.co/BeaverAI/Cydonia-24B-v4q-GGUF/tree/main

Try it out! Will release it soon
>>
>>106781841
same with lightning loras, you dont % use them for 100% of steps, you establish motion without it, then use it for the steps after which greatly speeds it up
>>
>>106781835
Anon what you smoking 96gb is enough to do 99.95% everything you could want to do. Why you fudding?
>>
>>106781848
>the difference is negligible
Maybe if you're running flux/chroma quanted but I run it at bf16 and there's also a noticeable difference between q8/fp8 and bf16. This is with complex prompts but the point stands. It's not negligible if the model starts dropping details.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.