/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106769660 & >>106762831â–ºNews>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-previewâ–ºNews Archive: https://rentry.org/lmg-news-archiveâ–ºGlossary: https://rentry.org/lmg-glossaryâ–ºLinks: https://rentry.org/LocalModelsLinksâ–ºOfficial /lmg/ card: https://files.catbox.moe/cbclyf.pngâ–ºGetting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplersâ–ºFurther Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapersâ–ºBenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferenceâ–ºToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingâ–ºText Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
â–ºRecent Highlights from the Previous Thread: >>106769660--Papers:>106774512 >106774610 >106774669 >106774797--Frustrations with mod approval and sharing a character customization addon:>106770110 >106770207 >106770482 >106770215 >106770262 >106770425 >106771674 >106771801 >106771993 >106772136 >106772401 >106772464--GLM 4.6 struggles with speed and knowledge compared to K2 despite smaller size:>106771605 >106771704 >106772093 >106772555 >106772685 >106771712 >106771798 >106771958 >106772080 >106772191--GLM 4.6 model erratic behavior and potential quantization/formatting issues:>106770753 >106770772 >106770827 >106771431 >106772929 >106772970 >106773019 >106773025 >106771510 >106771662--Local model performance benchmarks and hardware optimization discussions:>106773216 >106773254 >106773280 >106773320 >106773366 >106773493 >106776426--Optimizing chat system formatting for AI interactions:>106776741 >106776825 >106776959 >106777047 >106777114--GLM 4.6's high VRAM consumption at large context lengths:>106773651 >106773712--VRAM management challenges for large models on 24GB GPUs:>106774461 >106774484--GLM-4.6 model quantization performance comparison:>106770710 >106770745--2d anime image generation hardware budget and NPU software limitations:>106769831 >106769845 >106769847 >106769852 >106769866 >106769947 >106770102--Replacing llama.cpp binaries with CUDA-optimized builds for GLM 4.6 via ooba's UI:>106773113--Recommended RAM for local LLMs: 128GB minimum, 192GB dual-channel, >500GB server options:>106776386 >106776395 >106776400 >106776494 >106777198 >106777256 >106777308 >106777351--A100 pricing vs consumer GPUs and commercial licensing considerations:>106776566 >106776597 >106776653 >106776693--Logs:>106769725 >106770080--Miku (free space):>106769691 >106770398 >106770451 >106770366 >106770215â–ºRecent Highlight Posts from the Previous Thread: >>106769663Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Not Migu, abandon the thread
Is my DDR4 really crippling my performance on GLM 4.6 that much?
new model when
>>106777694This. It's been over 24 hours since the last new model drop. Local is dead.
Did anyone manage to get gpt-oss-120b to ERP properly?
>>106777689https://www.servethehome.com/guide-ddr-ddr2-ddr3-ddr4-and-ddr5-bandwidth-by-generation/
>>106777408whomst is this purple slut?
>>106777689The biggest crippling factor is memory channels. A shitty ddr4-2400 epyc with 8 channels is going to run much faster than a ddr5-6000 gayman board with 2 channels.
>>106777726So then yes. I have octo-channel DDR4 2400MT/s. I need a next gen EPYC now.>>106777777Nice digits. What you are describing is exactly what I have.
>>106777777Checked.
>>106777728purple teto
Do I need to set the Oobabooga parameters in addition to the Silly Tavern ones?
Hey guys. Is 4.6 really that yappy with its thinking? I've been trying an IQ1 quant and the thinking is like 2 paragraphs most of the time in RP. Did the quanting kill its reasoning capability?
>>106777689>numbers you getting vs numbers ddr5 people are getting>you happy with your numbers?>you got money to go to ddr5?
>>106777808no the sillytavern ones will take precedence when the user prompt is submitted
>>106777858Thanks!
>>1067778527.5t/s vs 15t/sNoAlso no
>>106777781I've made this exact switch a couple of months ago. Here's some rudimentary tests I made at the time about bandwidth. The speed gain isn't that much if you're only keeping the experts on CPU and the rest on GPU but it's still a huge jump.
>>106777725Define "proper RP"
whats the flavour of the month model for vramlets (16 gbs)
GLM-chan drew migu.
>>106777728Utane Uta. Never heard of her.
>>106777850It's about this yapping: >>106772093
is there anything I can run on 6gb vram 32gb system ram?
>>1067780734.6? That's pretty fucking good. These things have come a long way in 2 years.
>>106778105Mistral Nemo Instruct Q4KS pretty slowly, Qwen 3 30B A3B not that slowly.
>>106777996Shit. That is the exact data I was looking for. How much did you spend on the upgrade?
>>106778105Anything up to 30B, really.
>>106777996>ddr5-6400 x12can you even run them with expo/xmp bro? I guess theyre running at the standard jdec no? is it ecc?
>>106777996Glad I didn't fall for the cpumaxxing meme.
>>106778214yeah bro let's just buy a stack of h100s, it's way better
>>106778214cope
>>106778156MB: 1300€ (a single socket mb would've like 500 bucks cheaper)CPU: 2600€RAM: 3800€ (12x64GB Samsung M321R8GA0EB2-CCP)It's quite a bit of money, not even considering what I had already lying around. It's probably not worth it if you're only looking for ERP at better speeds. >>106778198ECC RAM is a basic requirement for Epyc processor so it won't run with anything else. There's also no EXPO with these processors so all those cheaper Threadripper ECC kits that run at 4800 natively with a potential EXPO boost to 6000 will only run at their native speed. You need DIMMs that do this speed natively which adds to the price.
>>106778404My projections put it at about $12K for a worthwhile upgrade. You can get an EPYC 9124 for $900, but then it would be very slow. A good 8x96gb kit is around $4500.
>>106778453for 9005 series, what's minimum required CCDs to utilize all 12 channels again?
>>106778453>EPYC 9124You have to be careful here. The cheaper EPYC processors often can't make use of all their memory channels due to technical constraints. This means their bandwidth is going to be less than advertised. For Epyc 9004 the cutoff is the 9334 which has those weird dual memory links while the 9005s have the 9135 and 9175F which are close to saturating its channels.https://jp.fujitsu.com/platform/server/primergy/performance/pdf/wp-performance-report-primergy-rx2450-m2-ww-ja.pdfHere's some data on dual-socket builds Fujitsu has gathered in benchmarks. Check page 14.
>>106778112Yeah.
>>106778486No idea.>>106778493I was thinking of going with a threadripper pro anyway. I was just using the 9124 as an example.
>>106778506Don't quote me on it but I'm pretty sure I came across something saying that Threadripper has the same issue on the cheaper models while I was doing research on this retarded CCD bottleneck issue for my build. So be careful.
Hope I'm in the right place. I've never used ai before It's all I ever hear about online I assume I'm very late to this stuff. Can my gaming PC run AI stuff? It has a 5090 with 128gb of ram.
>>106778554That's not bad at all.Download koboldcpp, go to huggingface, search for bartowski glm air gguf, download the Q6 or Q8 version.>https://github.com/LostRuins/koboldcpp/wiki#quick-startAlso, look for Silly Tavern to use as a frontend.There's some information in the OP that you can use, even if a little outdated.
>>106778554yes. you can run shit on that definitely. I think glm might be doable, read this and the last thread for details
>>106778537what CPU did you end up going with?
>>106778576wow fast response thanks for the info! will check those out.
>>106778594Epyc 9355. I was considering the 9135 but nobody could properly explain why exactly it's showing those speeds in the fujitsu benchmarks despite being a 2 CCD model going by its tiny L3 cache, which is why I didn't trust it. The 9175F was only like 200 bucks cheaper than the 9355 while only having 16 cores instead of 32 so I went for the latter. If you're fine with 4800mhz RAM there's always those 600 euro chinese 9334 QS on ebay that other anons have used for their CPUMAXX builds.
>>106778624Those will get you on the right track, but you'll have to fiddle with stuff and learn as you go.For example, you want to put all the layers of the model on the gpu but offload most/all expert tensors to the CPU/RAM. You'll figure out what that means by fucking around with the koboldcpp UI and reading their wiki.
>>106778632that sounds like a good choice. I saw some redditor somewhere reporting suspiciously low t/s numbers on the 9175F for deepseek at q4. could be wrong but personally I thought that CPU manages to enter compute-bound territory somehow
>>106778537I was thinking of at least a 48 core threadripper pro, that should be fine right?
>>106778782what is "fine" really? WRX90 supports 8 memory channels. that's fine compared to Ryzen (2), but EPYC is a little more fine (12)
>>106778813and btw it costs less, no reason to go MEMERIPPER when ayypic is there (unless youre a smelly gamer and care about high niggahertz)
>>106778073what drawing language was the output in? SVG?
Will I ever have a local LLM that doesn't the left mental illness?
>>106778554you can run a lower quant of glm 4.6
>>106778701Yeah, I also came across somebody saying that too few cores might fail to saturate the memory channels or some shit in actual use. No clue if that's bullshit or not but it didn't help that the only hard testing I found for the 9175F was on those pointless asrock ddr5 mainboards that only have 8 ddr5 DIMM slots. >>106778782Sounds like it but I'm really not into the subject matter with Threadripper models. At least with Epyc, the biggest 4 CCD model should be the 9334/9335 ones which have 32 cores but with dual memory links to compensate, so their speeds are okay. Meanwhile everything else with 32+ cores has 8 or more CCDs. This means that with Epyc, you should be fine with any CPU that's 32 cores or above. But I have no clue if this also applies to Threadripper or if there's any exotic shit going on here.
>>106778840SVG, yes.
>>106778813>>106778819I did not really see much of a reason to go for EPYC because I would not be able to use 12 channel memory while also using 4 GPUs unless I use risers and a non standard case, which is what I currently have, and don't really want to do anymore. I have scoured the Internet for every single motherboard and none of them have everything that I need.>>106778915I see. Maybe I should take a closer look at the CCDs.
Current consumer grade hardware technology is already outdated.. I demand we leap frog in time NOW!
>>106778979chinese inference-focused machine that runs 800b/40a moe models at 50t/s for $3000 any day now for sure
>>106778915yeah. and btw, even with 8 channels, the guy was at like 9 t/s, and that's for q4 remember. sounds low to me
>>106778855What model is that?
>>106777728
>>106777578>>106778073
Bilibili now supports CN->EN video translation with voice cloning. Any guess what the model might be?
people are waking up to benchmarkshttps://www.reddit.com/r/LocalLLaMA/comments/1nx18ax/glm_46_is_a_fuking_amazing_model_and_nobody_can/
>>106779140RP isn't a real world use case for productive people.
>>106779095Yes, and?
>>106779153its talking about coding
CoomBench (Vanilla/Extended)
>>106779104IndexTTS2
Would I be able to fit a Blackwell Pro 6000 in this motherboard while in a case?
>>106779140Artificial Analysis has so much fucking wrong with it it's hilarious
>>106779140
>>106779220forgot image
>>106779230The CPU cooler and RAM is definitely going to block you on this stupid mainboard layout
>>106779230That's a rack server motherboard right?
>>106778923Do you have the prompt? I don't think there's a standardized LMG SVG mikugen test prompt
>>106779246Thought so. So in other words, quad GPUs with 12 channel memory in a normal case is not possible.>>106779256Yes. I just hate my current server rack. I would prefer a workstation configuration.
>>106779230no, but you can use risers
>>106779282this
>30b worse than 8bmoesisters our response?
>>106779282I could use risers in a normal case, would the RAM still interfere if I were to mount my GPUs horizontally?
>>106779230just use pcie risers
>>106779306>3b worse than 8bdense sissies??
>>106779270I don't for that one, but for this it was "Draw me a Miku as SVG." The other one was similar.
>>106779323if the active parameters is the only thing that matters, MoE would be useless
>>106779317In a rack?
>>106778840I look like this
>>106779276Quad GPUs without risers would be impossible on that mainboard anyway. The first and third from the right are immediately next to the next slot so any 2 slot card would block the one to the left of them.
>>106779349It is.
>>106779351This is also called a rack
>>106779306- I lke that it's the same org that produced both.- Are the numbers considered close enough for their accuracy to be considered pretty much the same?- In which case it, at the inference end, it boils down to trading memory for tok/s.- At the training end, maybe there is a difference in cost?
>>106779349Computation cost growth is quadratic wrt param size
>>106779401>>106779414then what's the point of MoE?
>>106779428ignore the poor fag, he is trying to cope with his 70B
/lmg/ Sirs — is it wise to install Linux? I'm afraid my performance will tank.
>>106779025Mistral AI + Nvidia
>>106779095Miku.sh was the ultimate mistake. it single-handedly instigated the genesis of the thinkslop we suffer today
>>106779433Anyone calling other people poorfags should be required to post their H100s with a timestamp, otherwise shut the fuck up
>>106779512you should commit sudoku for not already being on linux
>>106779535But I have other software what I need to use... I'm already 75% committed to install Linux. Just need to wrap up some backups. Dual-booting is stupid, it's all or nothing btw.Undervolting my gpu is the biggest issue but apparently that's "fine" too in Linux.
>>106779530>their h100sthx for proving your a retarded vramlet
>>106779512>is it wise to install Linux?what benefit are you looking to get out of using linux? I wouldn't switch OS'es for no reason at all.>I'm afraid my performance will tank.given you have things set up correctly on windows/linux, the performance difference is negligible. in my experience getting llama.cpp to run at full-speed is easier on linux than it is on windows, but I am biased.
>>106779559I don't think there's anything inherently stupid about dual booting btw. do whatever makes you happy
Enough is enough! I've had it with corpo scum Nvidia stalling progress via delivering bottom barrel RND scrap! If we don't have synth wifes performing backflips onto our cocks within the next 5 years it will be because of Nvidia! We need next generation hardware coming out every 6 months this is the change required if we are to pass the great filter we must hurry the fuck up! BEFORE THE NEXT CARRINGTON EVENT WIPES OUT OUR FUTURE!
>>106779571>benefitUhh, I love unix-like system but haven't used anything like that at home since I had some SGI machines (irix) ages ago. At work, yes but that's completely different as it's just about using certain software..I can easily transfer my stuff to linux as most of my personal llama-server stuff is python based anyway.>>106779591Not per se but I mean that eventually you'll spend more time on the other system and therefore dual-booting is sort of fallacy and waste of time.
messing around with VibeVoice 7B on my rtx 5090. Input audio is cleaned up with Resemble Enhance and acon digital deverberate 3.input audiohttps://www.youtube.com/watch?v=1Jp4Ce8yStAoutput filehttps://vocaroo.com/1iNMH2wAVkPH
>>106779619full-send switch to linux, I wouldn't worry about performance issues. just make sure you choose a distro that plays nice with cuda drivers. anything with debian lineage will be easy to set up.
>>106779632It only has mid/high frequencies left.
>>106779632looks like he's talking on a phone, I like that effect but c'mon
>>106779612That won't happen again it was a fluke. We are safe.
>>106779512>LinuxWhat's your alternative? Microsoft only makes an ad delivery and behavioural analytics system now. That it can also run programs is incidental.
>>106779512Using linux can give you more headroom to fit models and run things faster. It's honestly ideal for local stuff and that's why I switched.
>>106779560Oh you don't have any? Or B100s? Or B200s? Or how about GB100s (you definitely don't have those)? Or even A100s? Then shut the fuck up and never call anyone else poor ever again
>>106779843I have a 1.5TB ddr5 setup, who the fuck is running shitty tiny models anymore
>>106779852p-poor fag
Still no new model drops today? Fucking aye man. I need more models!
ring-1t ggufs fucking when
>>106779871>spend $10K for 96GB to run shitty tiny model at 100 tks vs spending $20k to run the best cloud level models at 30tks+hmmm...
>>106779852Literally anyone who doesn't want to drop several grand on advanced shivers down their spine, because it's a fucking HOBBY and there isn't a single company that's actually trying to make reasonably sized models for the average consumer and everyone who enables this dumbass "just make the models bigger! (so they can get marginally better benchmarks without any innovation or effort, so they can squeeze out more investor money)" idea is part of the problem and you need to stop sucking off corporate models that are not made for you and will soon be beyond your reach completely because they will just keep making them bigger until you can't even make a ram-based build that fits Q1-XXXS
>>106779914that is some crazy cope there. There is no free lunch, bigger = better.
>>106779919Imagine for a second, that they stopped innovating on computers in the 20th century, and just kept making them bigger and adding more shit on instead of making the parts smaller until they couldn't fit computers into buildings anymore. Do you realize how fucking stupid that is? Do you realize how fucking stupid you sound? Do you get it? Stop repeating buzzword phrases like a mindless drone and think for a second, dipshit
>>106779914The entire point of open source is to btfo openai and jewgle. Zucc gang is obviously too retarded to do it, so we need chinks. Thus chinese have high priority to make smollm work, because it's the ultimate blow to the west, which is fully dependent on winning the AI race. Our economy would instantly implode if that ever happens.
>>106779939imagine just adding more transistors, that would be crazy. /s
>>106779883ddr5 ain't getting 30tks even on empty context, you faggot
>>106779939If we had good alternative prospects then I expect we'd pursue then.
>>106779961>/sThey have a site for you over here, check it out >>>/reddit/
>>10677997312 channels faggot, this aint your desktop
>>106779914words words words, but every day that goes by is another day I'm getting good use out of my setup and enjoying lifeI'm glad there are giant models, because it enables performance that appears otherwise impossible
>>106779973nta, but I think you could hit 30tk/s on sota with a well-spent $20k
>>106779989still not happening outside of your dream, maybe half that on empty context and a tenth of that with a character card
>>106779718>younger generations dont even use desktop computers at all, just their phones>desktop market share rapidly shrinking as old users die off>microsofts brilliant plan to fix it is to further drive all their old users away by turning windows into an AD and spy platformoh pajeetsoft, nobody will miss you
>>106779998oh you "think", well i'm convinced
>>106780006you are just plain wrong
>>106780006>not having a threadripperlook at this coping poorfag
>>106780020>>106780028post benchmarks, never seen more than 15tks on empty context but go ahead and prove me wrong with your richfag rig
I just looked at threadripper CPU prices and had a shock...WHAT THE FUCK ARE THOSE PRICES?13K USD FOR A CPU? WTF?
>>106780066first time seeing workstation prices?
>>106780066>96 coresBut that's a supercomputer...
>>106780066threadripper is overinflated vs epyc for the kind of performance you're looking to achieve. Check for chink QS/ES versions on eBay if you feel like rolling the dice
>>106779914It's just a hobby for early adopters but I don't see why it couldn't grow to be a media giant like film/vidya
>>106780066Could try looking for QS and ES chips on ebay?Though 9__5 QS/ES looked gimped harded compared to final than 9__4 QS/ES compared to final.
>>106780155https://www.reddit.com/r/nvidia/comments/1mf0yal/we can't let reddit win bros, wheres our richanons?
>>106780155>incredible>colossal>cooler no included
What happened to "safety" at OpenAI?https://files.catbox.moe/as7xpq.mp4
>>106780173>2xL40S, 2x6000 ADAThat's a poorfag build.
>>106780194>safetyYou're after safety... for machines?
>>106780204haha yeah even my rig is better, i sure hope anons in here don't have less than that
>>106780225>for machines?what about humans?https://files.catbox.moe/nr3fk0.mp4
>>106780188It needs a water bucket.
>>106780264Try other figures from history.
>>106780173>2xL40S, 2x6000 ADA, 4xRTX 6000 PROhow much money is that for the GPUs alone?
What would be the best local model for ERP (and other task, I don't want a horny encyclopedia or writing assistant, or maybe I do now that I think about it) if I have 12GB of VRAM (RTX 3060)?
>>106780393As a friend of mine would say as he ran into people who had no idea what they were getting into: "you're fucked"
>>106780393Nemo
I NEEDNEWMODELS!!!!
>>106780504train your own
>>106780511That's the best way to lose interest in using the models, actually.
>>106780466Oh I've already been there. I simply took a break and now that I'm back I prefer to ask rather than trying every new model one by one.>>106780469I can't find one specific model named "Nemo". Is it on Huggingface?
>>106780573>Is it on Huggingface?Is it in the guide in the OP?
>>106780504You need to spend time engaged in a hobby that demands work be put in to obtain a reward.
>>106780504https://huggingface.co/DavidAU
>t-there's no way they're running those models locally, noooooooooooooooooooooo... how will I cope?
>>106780504you don't. change your system prompt instead
>>106780618holy slopping hell of 8B 4B unproductivity
https://files.catbox.moe/diri53.mp4 bruh...
>>106780720>8B 4BYou are small time, check this out.>https://huggingface.co/DavidAU/Qwen2.5-Godzilla-Coder-51B
https://www.reddit.com/r/SillyTavernAI/comments/1nuhidb/your_opinions_on_glm46/
>>106780768Mhm, interdasting...but for coding I really only consider the top5 api options and dont fuck with local. unfortunately that's required, unless you want to spend more time tard wrangling than vibing.
>>106778073Huge improvement compared to past models. Must have put some data in there. How does it do when asked to draw using PIL or matplotlib?Olds for reference:>>102080804>>102079522>>102080359>>102082930
>>106780602Well, the guide references "nemo 12b instruct gguf Q4", which the first result on HF is https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct but it's uploaded by Nvidia so I doubt it's gonna comply with NSFW requests :/
>>106780887>https://rentry.org/recommended-models>https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main
>>106780887>but it's uploaded by Nvidia so I doubt it's gonna comply with NSFW requests :/That's what we in the biz call a "happy fluke".
>>106780879This was Qwen qwq, SVG
>switch to wsl ubuntu >constantly OOMs with ooba on the same settings as beforeyeah linux is useless
qwen coder's naive attempt
https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025/>The CUDA backend for llama.cpp can now run on ZLUDA. We've done some preliminary measurements and found the performance to be within range of the results measured by Phoronix on ROCm (Latest Open-Source AMD Improvements Allowing For Better Llama.cpp AI Performance Against Windows 11 - Phoronix). We're interested in your feedback, if it doesn't work or you are getting worse performance than with ROCm, please share in the issues.>>106781053What does wsl stand for again?
Maybe its ooba?
why is stuff like flash attention and triton so slow to be added to windows, there is a trillion dollars in ai atm
>>106781061>What does wsl stand for again?windows subsystem for linux>ZLUDAHuh, I thought that project was dead.>>106781103>there is a trillion dollars in ai atmAnd almost nothing of it being invested in these software projects.
>>106780194>>106780264You think automated systems are smart enough to tell apart some indirect political point from a movie?
>>106781103Those are corpo projects and corpos by and large only care about datacenter use.
>>106781060I don't like this Miku
>>106781053ooba is an antiquated piece of shitjust use lm studio
>>106778073Cute!picrel Q3_K_M hmm
I like feet
>>106778073>>106781197What's the exact prompt?
>>106781217What kind of feet? Remove your socks, look down, and coom. That is if you have feet.>Can a person without legs wash their feet?
>>106777408Good evening /lmg/. Made yet another slop tune. This time trained on an entire 4chan board :)https://huggingface.co/AiAF/bf16_Merged-11268_gemma-2-2b-it-co-sft-qloraDataset used: https://huggingface.co/datasets/AiAF/co-sft-dataset
>>106781317cool, what made you pick /co/?
>>106781053>switch to wsl ubuntuWhy would you do this? Linux is faster than Windows because it doesn't have massive amounts of bloatware running in the background so you're not going to get any extra performance doing that.
>>106781408On a whim. I almost did /r9k/ at first but I felt like training it on a blue board's posts instead. Surprisingly even at over 11,000 steps the training loss hasn't even plateaued yet in the evil loss still continues to drop. Maybe after the 10th epoch I'll call it quits, merch that one, and then pick another board. Got any recommendations? By the way the original source data set was ripped from this repo if anyone's interested: https://huggingface.co/datasets/lesserfield/4chan-datasets
>>106781452>Got any recommendations?i'd just pick the most schizo board desu though not sure which that'd be
So how much vram do I need to future proof my ai generation? For the next five years? I really don't want to spend 10k on a Ada 6000 pro just to be outclassed next year. So I had a 3060 for 5 years a 4070 for 2 years, and I am thinking I just go with the 5090 desktop since the others were laptops. I want to be able to generate video and train ckpts and Lora and maybe even train video ckpts. Would it make sense to get a desktop that can hold several cards and just upgrade by buying the latest again in couple years? Or just go full retard and get a system with 98gb vram?
>>106781500/vg/ at your service
>>106781502>next five yearslol we can't even predict next year
>>106781514Fair point.
>>106781257Make something up? interesting to observe the sampling
>>106781502The best thing you can do to "future proof" is get as much vram as you can on a single relatively modern card and stack those as time goes on.
>>106781592there is this thing called electricity, good luck running enough cards on residential power, if you really want to run these local your best bet is a server or a mac
>>106781502qrd on the image/video-gen space?more frames + greater res = need more vram for reasonable perf ?or does it top out at 8gb or something regardless of what you're doing?
>>106781185>ooba is open sourced>lm studio is closed sourceEasy choice. Now go buy an ad.
>>106781502Get a chink 4090D or an rtx pro 6000. I've seen some anons on /ldg/ complain that the 32gb on a 5090 isn't enough for good video gen.>Would it make sense to get a desktop that can hold several cardsMaybe for LLMs but if you don't care then one card is enough.
>>106781715do NOT get a 4000 series card, 5000+ has too many speed ups these days to not have.
>>106781592So basically stack 3090s since its DDR6. And 4 of them is 96gb. I guess my next question is does the type of DDR matter for generating and training?
>>106781715isn't good enough or they're just pathetic brain fried zoomers who can't wait a few more seconds for a gen? which one is it
>>1067817263090s lock you to slow text gen, for video gen a 5090 is like 8x faster, 4x faster than a 4090
>>106781197a little more >prompt engineering and top_p 0.98
>>106781611I don't live in a third world cunt
>>106781755>let me just run 12 400W cards off a single circuitI sure hope you are not retarded enough to run a single system off of multiple circuits anon... or you are going to eventually find out why that is a bad idea and lose all your gpus
>>106781617Asking the wrong dude. I'm the one with the questions, I just want to train and make HQ visuals
>>106781725Like what? Sage attention in general trades quality for speed btw.>>106781739Not enough vram to make long videos or ones at a high resolution and you can't run wan without quanting it at 32gb. I don't know the specifics about training, but I imagine you need more vram and that anon wants to future proof. Models aren't going to get smaller.
>>106781611>>106781767>not splicing someone else's feed to run more GPUs
>>106781452>Got any recommendations?/v/ or /vg/
>>106781776sage 3.1 +nunchaku, plus soon dc gen, there are other ones as well, and no the difference in quality is / will be almost nothing
>>106781780the point is that he would have to have a commercial electric panel put in / have his house rewired or at least a server room + you will need cooling
>>106781452Cool
>>10678176712 cards? At max my idea was 4 3090s. I can see where you concern is, and thanks for clarification thru my snark. I will take your point into consideration as well. So thanks again.
>>10678182896GB is nothing these days though, there is no model worth using cept maybe glm air that would fit
>>106781798>sage 3.1Sage attention 2 definitely lowered the quality of my gens with flux/chroma, it's not lossless. Sage 3 should be worse since it uses FP4.>nunchakuEquivalent or slightly better than a Q4 quant.Don't know about the others but there's always a trade off, that includes lightning loras too.
>>106781841>Sage attention 2 definitely lowered the quality of my gens with flux/chromathe difference is negligible and could be fixed with like 1 extra step and be much faster still
Hey friends, this Cydonia is lit AF frfrhttps://huggingface.co/BeaverAI/Cydonia-24B-v4q-GGUF/tree/mainTry it out! Will release it soon
>>106781841same with lightning loras, you dont % use them for 100% of steps, you establish motion without it, then use it for the steps after which greatly speeds it up
>>106781835Anon what you smoking 96gb is enough to do 99.95% everything you could want to do. Why you fudding?
>>106781848>the difference is negligibleMaybe if you're running flux/chroma quanted but I run it at bf16 and there's also a noticeable difference between q8/fp8 and bf16. This is with complex prompts but the point stands. It's not negligible if the model starts dropping details.