/lmg/ - a general dedicated to the discussion and development of local language mikus.Previous threads: >>101682019 & >>101705239►News>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
Thinking about taking the plunge and buying a second 3090, but I have some questions.There is not enough room on my mobo for 2 3090's, I plan to fix that issue with a riser cable, but where do I house the errant GPU?My PSU is 1000 watts, is this enough?What kind of models can I realistically expect to run at a decent speed with 2 3090s? Are quants of bigstral within reach using CPP?
>>1017118331000 watts is cutting it close for two 3090s but its unlikely you'll see heavy CPU load, so it'll probably be fine
>>101711833hope you have enough PCIe lanes if these are the questions you're asking.obviously you can "cloud compute" by hanging your entire rig off of the ceiling using string, what kind of question is that? Just get it working. There's no magic to screws and boxes.
>>101711854lanes dont matter for inference unless you're doing split tensors because its all vram resident, and for split layers the total data transfer between the GPUs is very small
>>101711854It's a tuf gaming b550-plus, would that be an issue?
Care for a glass of bees?
>>101711833>My PSU is 1000 watts, is this enough?Mine blew up and I changed it for a 1200w one. But maybe the reason it blew up was because it was 8-months-old and used for mining, though...>What kind of models can I realistically expect to run at a decent speed with 2 3090s?70Bs at 4.5-4.65 bpw.>Are quants of bigstral within reach using CPP?An IQ4_XS quant ran at 5 T/s for me.
>>101711912Thank you. I think I'll do it. I'm just not sure where to shove the 3090 if it doesn't fit in the case.
>>101711970How'd you get the mix genres?
>>101711970Is this real? This has to be Photoshop.
>>101711970>The way miku's shoulder presses into his shirt as if she was really there
Holy fuck Llama 3.1 70B Instruct is actually retarded. No combination of card instructions and depth zero author's notes will make it stop (over)using ellipses in dialog when it's writing in generic RP style.
another FLUX vs. D3 smartness shoot-out
>>101711985>>101711994>A professional real estate photograph selfie in a living room, 24mm, f/16 lens. The background is sharp and in focus. An anime cutout of Hatsune Miku is edited into the photo. There is a photogenic man standing beside her with his hand around her shoulder.
>>101712013Yeah it's a substantial downgrade over the old L3-70B.
>>101712013i've tried base, instruct and instruct tunes and it feels like some combo of smart and retarded. it follows my cards, prompt and rag db great, but then it forgets what happened 1 message ago. miqu is still better imo
►Recent Highlights from the Previous Thread: >>101705239--Mistral-Large-Instruct-2407-GGUF model recommended for ERP: >>101708213 >>101708237 >>101708323 >>101708442 >>101708477--Gemma 2 27b performance and model saturation discussion: >>101705986 >>101706032 >>101706108 >>101706154 >>101706158 >>101706191 >>101706192--Anon achieves fast Flux execution with 3060 and 128 GB RAM: >>101705620 >>101705997 >>101706039 >>101706202 >>101706810--FLUX cfg settings for image generation: >>101706621 >>101706652 >>101706776 >>101706812 >>101706895 >>101706776 >>101706785--Anons test gemma model on budget Android phones, impressed with coherence and performance: >>101707575 >>101708739 >>101708849--Using LLMs to generate onomatopoeia, with a humorous example: >>101710547--Testing 1408x1408 resolution, model generates interesting but imperfect image: >>101710304 >>101710395 >>101710537--Tess-3-Llama-3.1-405B model and synthetic data generation: >>101706755 >>101707053 >>101707110 >>101707382 >>101707469 >>101707546--OpenRouter's base 405B model may not be truly raw: >>101709255 >>101709278 >>101709398 >>101709645 >>101709711 >>101709865 >>101709333--Nvidia faces DOJ antitrust probe: >>101710495 >>101710602--Lumina-mGPT: multimodal model for generating photorealistic images: >>101705936 >>101705971 >>101706657--Flux struggles with coherent and prompt-following images: >>101705497 >>101705817 >>101705949 >>101705875--Flux outperforms D3 in concept granularization and overload handling: >>101705902--Flux dev tested on 3090, decent results but inferior to SD15 and SDXL fine tunes: >>101706383 >>101709715 >>101709740 >>101709859--Base model is good for long form storytelling, but hard to start: >>101705345 >>101705600--Anon tests image generation resolutions, 1280x1280 works better than 1408x1408: >>101710671--Miku (free space): >>101705490 >>101705866 >>101706450 >>101707107 >>101708859►Recent Highlight Posts from the Previous Thread: >>101705242
What kind of model/quant could I run with 72 GB of vram?
>>101712018we are so fucking back
>>101712018Very cool anon, you're a genius.
>>101712017 (Me)I've played around with tilted water bottle prompts on D3 before. So I do know that even if you massage the prompt to get it to consistently tilt the bottle it will never quite make the water surface parallel to the normal. Another massive win for FLUX.
>>101712018>6 fingers>painted nails
All those Miku pictures make me want to become Miku. I think I would look very cute with those twintails.
Changed some things based on feedback
>>101712136Don't lie. You're imagining yourself as Miku being railed by all those photogenic real estate agents.
>>101712136That's not the way God wants you to be. Reconsider.
>>101712136>Become the MikuSoon, Anon. Soon.>>101712156>
>>101712164Miku giving Bad Touch Jesus the side eye.
So this, is a Miku Hatsune level 1...
>>101712166Jannies, cleanup time~!
>>101712166blacked miku is coal
They're the same people aren't they.Someone just wants to create artificial drama.
I have no interest in cooming; what versions of mistral and llama 3 can fit on my 4090? I'm interested in general use and instruct.
>>101712136My dream is buying Miku's skirt, stripped panties and thigh highs from eBay and taking some photos to fap with it, but I fear that might ignite something inside me.
>>101712189unironically what do the rest look like?
>>101712208>They're the same people aren't they.of course he is
Anyone use an LLM as an agent?
>>101712213You're probably better running Gemma 2 27B.
>>101712229well I just started, we'll find out
>>101711798>2024>still getting boilerplate legal responses to avoid litigation for the most innocuous queriesLiterally every one of these silicon valley cucks who censor.. oh I'm sorry, "aligns" these LLMs could burn in hell for a trillion years and that wouldn't be .0000000000001 percent the punishment they deserve. Meanwhile there's almost zero censorship for doing almost any variety of sick sex acts with these fucking things. Typical r*ddit-tier cuckold logic. Whoever thinks this shit will replace programmers is so committed to licking big tech's boots that they're not even worth responding to.
>>101712244>Gemma 2 27B>>101711679>>101711642>>101711601>>101711592umm yeah,no
>get tired of Mistral Large repeating stuff from the context word-for-word>switch to mini-magnum>does literally the same thingI'm tired of this...
>>101712240Dude, I just got fired.
>>101712260Get tired of Mistral repeating switch to Mistral ???
>>101712260Try DRY sampler
Bros I just got promoted!
>>101712258hi petra
>>101712277>DRY samplerMeme made by a pretentious redditor>https://www.reddit.com/user/-p-e-w-/> DRY author here. Your Min-P is too high. 0.1 is way too much with current models, and even 0.05 is too high IMO. I use 0.02 and it's more than sufficient. Increasing temperature above 1 is generally a bad idea nowadays, and probably the reason why you feel the need to use such high Min-P values to keep the output coherent. >https://old.reddit.com/r/LocalLLaMA/comments/1ej1zrl/try_these_settings_for_llama_31_for_longer_or/lgbjtox/
>>101712240Gonna need you to kill yourself buddy
>>101712244worth a shotno goofy quants or anything right?
>>101712246Why did her pants turn into a skirt? Also her hair isn't spiky enough.
>>101712317this came out better
what happened to thebloke?
Gemma 2B gets the strawberry question right.
>>101712332Stupid enough to make a complete guess and guess correctly.
>>101712323try mixing in some broly to maintain the hair color
cuckshit spammer must be one of the jannies, why is it still not removed?
>>101712306> I'm sorry, but that is an absurd claim. Nobody knows the potential pitfalls of AI in medicine, because AI hasn't been deployed in medicine at any significant scale so far (nor anywhere else, really). The industry isn't even in its infancy yet, it is barely starting to exist in the first place. > LLMs are going to roll over civilization like a bulldozer once the mass of people realize what this actually means. We don't even need AGI in order for this to happen. The current generation of LLMs is more than good enough to cause the greatest upheaval since the Industrial Revolution. > If someone at OpenAI told me that there are issues with LLMs, I would certainly question it, yes. They can speak about their specific product (GPT), but it is way too early to make generalizing claims.
>>101712187And you don't even try to hide that you love black cock.
>>101712349Janny is a literal troon: >>101710688
>>101712323Why did goku get miku hair color? Why does he have 6 fingers?
>>101712332It also gets macron’s birthday, oh lawd.
now we're talking, getting closer!
>>101712349He does get banned, but rebooting router / using proxies isn't really hard.
Does the new Llama3.1 release actually put any pressure on OpenAI to release a GPT-5.Do you think they have anything that could be appreciably called GPT-5?
>>101712394>Does the new Llama3.1 release actually put any pressure on OpenAI to release a GPT-5.no>Do you think they have anything that could be appreciably called GPT-5?no
>>101712384
>>101712394Open models are nothing. The real competition is Claude and Gemini.
>>101712410hair too short
>>101712394They don't even have anything worth calling gpt 4.5
>>101712384it's wider at the bottom like it's still where her twintails would be
a gen from before while I queue stuff up:
hair with green tint: now she's closer to broly
>>101712462alt:
>>101712462>>101712410>>101712384Dis not migu. Migu has 01 on her arm
>>101712087>>101712018thanks arstechnica
>>101712370JLI is the resident janny of lmg btw
>>101712520>leftist media knowsit's over
>>101712520>eerily good>heir apparentI hate journalist so fucking much.
>>101712592>heir apparentWhat about it exactly?
>>101712449Good work, Miku
https://anthra.site/Mini-Magnum revision soon,the first magnum is the worst it'll ever be. The ride is up and up from here.
>tfw 16gb vram
>>101712424>They don't even have anything worthB-but project stRawbeRRy...
Would it be good to realtime video gen a game with logic?
I don't have much proof but I just genned 20 to 30 step schedules and only one of them had logical hand anatomy, which was 21 step scheduling. And this agreed with another test I did where I genned 20 to 25, and 21 was also the only one that had good hands. The only issue is that 20-30 is kind of low for packed prompts, so you end up with more gens that have less stuff from your prompts present.
>>101712879>I don't have much proof but I just genned 20 to 30 step schedulesWait no I meant 50. Yes, I generated 30 images to see how much change the image goes through depending on amount of steps.
>>101712752unironically hyped. i love mini-magnum
idgi which one do I get?
>>101712964neither
>>101712998You mean there's a different place to get gguf's or are you just being le funny?
I want this quality with chameleon.In a couple months we might actually have that.
how are you guys liking flux?
>>101713190Its shit.
>>101712017>D3 alternative a year later>Still only knows half of its conceptsI love Flux but we all know it won't take the lead for long.
>>101713190You cant make coom pictures with it, but its extremly good.VRAMchads are eating. I need to use it through the api to play around.
>>101713190Still not good enough, just like text model.
>>101713079Nah that'd be too good to be true. Even if they get the architecture right, they won't release it. And if they do release it, they'll be forced by shareholders to censor it before release like Chameleon, which means relying on the rest of the world/community to undo the censoring. We'll be lucky if someone does that competently.
>>101713276Impressive. Very nice.
Achievable natty?
>>101713276'um on 'iku
>>101713264Not sure what openai is doing.This is my 4th attempt.
>>101713276How come I don't ejaculate this much?
>>101713308>forced by shareholders to censor it probably yeah.weird stuff going on in the /ldg/ thread since the flux release.>flux dev pops in, everybody has a good time >with great timing somebody posts dead loli in a dumpster. u-uhm guys i cant even post it here but a russian guy made this prompt look at this link!>another anon appears and is outraged asking how this is allowed and starts a whole discussion.There must be alot of pressure and lots of bad actors all over.
>250kb/s flux download speedsasuga hugginface
>>101711798After Flux I legit feel GPU poor with my 3090. I need a 4090 and even that feels not enough, just breaking even.
>>101713428Now, theoretically it should be 2x faster with TensorRT, but there is no announcement on that.
>>101713428what would the 4090 gain you here
>>101712144So the only current option is mistral large? Damn. I'd better get used to long waits.
>>101713428I have a 4090, it's still slow as shit.
>>101713264>You cant make coom pictures with itAnd that's a good thing. People who want to use it are forced to come up with something different, and this results in finding more creative and simply more interesting ideas for gens than "1girl, naked" or shitty anime porn slop no.47747996432678.
>Mistral-Large-Instruct-2407-Q3_K_M-00001-of-00002.ggufI tried to load this but it just ends up crashing koboldcpp.I also tried Mistral-Large-Instruct-2407-IQ2_M.gguf but the gens take anywhere from 20 minutes to 1 hour 15 minutes. Should I be fucking with settings?
>>101712306You might say it's a 'meme' but it actually works, however it does seem to make it spell things wrong sometimes, not sure why.
>>101713190not as good as dall-e at artistic prompts, not even close. it can place things together in an ugly way sure. you can get mario and goku surfing on the death star in some ugly default style but it won't actually look good. it lacks artistry, still not good enough.
who the FUCK is petra and why does that name keep coming up in these breadsit's like /vee/'s boogeymen all over againqrd?
>>101713467i think you're supposed to merge them with copy /b file1.gguf file2.gguf newfile.gguf20 mins sounds like youre swapping. if youre using kobold's new auto layer select, check your usage and manually lower it. for me it now selects 40 layers for a 70b when i can fit 31, over that i think its going to my igpu making it slower
>>101713467you should be fucking with buying more vram
>>101712964Magnum-72b is good? Is it better than miqu?
>>101713529neither
>>101713467How much ram do you have? In the 2nd case if it's that slow you're probably using swap.
>>101713506Just ignore it, people who litter these threads with this garbage wants your attention and reactions.
>>101713543Guess I'm stuck with miqu then. Seems like only good small or large shit comes out. Nothing good for the mid-range.
>>101713511You don't have to do that, for koboldcpp you just point to 00001-of-0000*.gguf and you're good.
>>101713387It does feel manufactured. A lot of manufactured posts in these threads too though about more minor things.
>>101713387Nobody could replicate it with the prompt he provided. He just got a bunch of people to look at actual CP.
>>101713602>>101713614Was weird how they pressured the flux guy as well. Even after he said he is just a infrastructure dude.>This was trained on children!>I dont know the dataset but in my testing i never saw any output like that>SO YOU DONT DENY THIS MIGHT BE IN THE DATASET!Sane people that point out if you have a grown woman, children and gore in there the ai is smart enough to mix it together (thats the whole point) are ignored.Weird shit.
>>101713654It's not weird, it's deliberate. Tomorrow we'll be seeing news stories featuring Flux being used to post cp on 'anonymous imageboards in the dark web' or something equally idiotic.
>>101713511Are you talking about the setting that says "GPU Layers: [-1 ] (Auto: 29/91 Layers)?>>10171354532 GB vram. I know it's probably not enough, but some guy on another board told me he ran a good model with less than me so I thought I'd give it a try. Apparently he was getting gens every 1 to 5 minutes.
>>101713696You'll need a bit more than the size of the model in ram + vram total to not get it slowing down due to swapping. If it's crashing it could be low vram and you can try lowering the layers. If it's slow then you might not have enough regular ram.
>>101712144literally kys
>>101713696>GPU Layers: [-1 ] (Auto: 29/91 Layers)yeah, for me after 1.70.1, it guesses layers to high and slows everything down. in task manager watch your dedicated gpu memory usage. with the model loaded you still need a bit free for cache. start with a bit over half the number it guesses, so 15. after you see how much you have free move it up more next time until you find the limit
Just a reminder that /naids/ thinks Kayra is better than Llama 405B>>>/vg/488890201
>>101713830Not your army, schizo
>>101713830your post is irrelevant and so is your life
>>101713830Are they wrong?
flux knows what a funko pop is (even though traditional figures are better, just to test)
>>101713830You gonna shill your shitty Mikupad clone again?
wooden doll with joints miku:
>>101713830Kayra is better. Novel ai image gen is also better than flux. Don't waste your money.
>>101713936This is the most organic post I've ever read.
>>101713936This, but unironically.
>>101713830>>101713936Ewww, fuck off and go back to your containment thread rather than shitting up ours. You can have your schizo melties there instead.
>>101712752Stop fine tuning shitty 13bs and give us llama 3.1 70b fine tune of magnum opus
>>101714016*a mistral large 2 tune
>>101714006This thread is already thoroughly shitted up anyway.
>>101714029Or that, either is fine. Either euryale or magnum. These sloptuners slacking and playing with shitty 8b and 12bs instead of giving us the good stuff.
>>101714091Mistral large is already pretty good, so I think 70b has the most potential for improvement.
>https://huggingface.co/migtissera/Tess-3-Llama-3.1-405Bquants when
>>101713774>>101713766Yeah so I think I fucked up. I have 32gb ram and 16gb vram which is probably why I can't get gens without waiting 20+ minutes. I messed with the GPU layers setting and it didn't change anything. If I set it too high it just crashes koboldcpp
Just got gemma2:27b-instruct-q6_K running on ollama with no gpu and 32gb of ram.It is very slow but also the best model I have tested.I always ask every model how to beat the moon lord in terraria and everytime they give me a bs answer that shows they don't know what they're talking about, but gemma2 gave me a pretty good answer. Although not entirelly correct, it was miles above the rest.I am very impressed.Is this what the 70b models feel like?
>>101714354I used gemma2 27b for RP and it was hot garbage, not even complaining about prose or pozzitivity or whatever, it was just dumb as fuck). Miqu (real 70B) passed all my RP tests (while also shivermaxxing so I deleted it)
>>101714354>It is very slow but also the best model I have tested.this makes me sad
>>101714390What do you use then?
>>101714390Could be that it has access to more gaming related data than other models and that's why it could answer better?
>>101714349try selecting disable mmap
>>101711798If there's one thing I've learned from LLMs over the past few years, it's that there's no hope. The corporations will control this tech with an iron fist. We will never have anything interesting that results from this. There are too many potential lawsuits. We're also seeing this on the enterprise side of things, and it's looking like the AI bubble is about to burst. And with California about to fuck everything up, it's all very depressing and predictable. Interesting AI may exist one day, but not while any of us our young enough to care.
>>101714405I got a real gf so I sold my 2nd 3090 and don't RP anymore, so technically I'm no longer a "user". Now I just run random RP tests from time to time to check on the current state of LLMs>>101714430Beats me, I don't see how gaming is relevant here
Been using Mixtral 8x7b on CPU for months, is there anything better now with the same performance?
>>101714471Well if you still do testing what's the best bet?
>>101713462Not really. Most gens are just some generic white or Korean woman in some provocative and SFW pose and/or setting.
>>101714467Every doomer prediction since "chatgpt will never be local" and "llama 1 is the last llm we'll ever get" has been wrong.
>>101713774i take back what i said kcpp's layer guessing, i don't know why it wasn't updating before but now when i drag the context, it adjusts the layers. per my same settings, its suggesting 27 now (32 was my max prior with 16k context) so its much more correct than it was by guessing 40 on the last version. i dunno why the ui didnt update for me at first
>>101714482For 24GB VRAM range, Yi-34b-chat, mini-magnum is also decent, felt like a gemma2 27b sidegrade but it "gets" more ERPPoop: magnum-32b (qwen base), in fact, all qwen models suck assFor 70B range midnight-miqu was decent, L3-70B started repeating itself on the third reply, didn't try L3.1 but I doubt it'd be any better
>>101712274
>>101714611Heh
>>101714611>1k context card>35 token user card>128k ctx>AHHHHHHH WHY IS IT THE SAMEuse rag and lorebooks you fucktards
is llama 3.1 bad like 3 was?
>>101714668You're saying having that info at the beginning is a drawback? But using a lorebook or similar would increase processing time.
Any largestral fine tune?
>>101714718the original prompts and premise only carry a story so far, you need to be constantly putting new data into it. both rag and lorebooks will cause processing of the entire context each time, but the results are that much better because its considering new random data with each gen rather than just going off of chat history and card data
>>101713264You can. Just takes a lot more effort.
>>101714766Only Undi's, I think.
We are now in the ollama+open webui era.
>>101714433Wow that cut it all the way down to a gen every 5 to 8 minutes. Is there anything else I should try to make it even faster?
>>101714981what kind of processor do you have? you might be able to increase the threads but for some processors like intel it defaults to only the pcores for a reason.check your dedicated gpu usage again, being able to fit more layers increases speed slightly too, but you have to balance that with your context limit you're already running a huge model for your system specs. you should be using a 70b, not a 123b. you were likely swapping to your ssd before which is why it was so slow, disabling memory map reduces ram usage just enough that you're able to fit it in without swapping
Cucky, cucky, cucky, cucky! Cucky, cucky, cucky, cucky! Come bring ya black ass out here, you fucking nigger! It's buck breakin' o'clock! You've been misbehavin' again!
What model can i host on hf space free tier?
>>101715306they don't have GPUs, so nothing at a decent speed
>>101711798Anyone tried piping the output of a chatbot into a text to speech AI yet? Are there even any good local text to speech AI models out yet?
Is there any studies/reports on the effectiveness of using LLMs as a supplement or replacement for psych therapy? Or have any anons used it as a therapist or seen others talk of it?Considering using it myself while I wait to go see a real one, wanna start actually living my lifeI set up a psych character some days ago and talked to it for an hour, felt a bit scuffed but it might have actually helped me by attributing a lot of my issues I discussed to terrible self body image, something I was told I had as a kid but forgot and over time became normalized effecting me in ways I never considered related
>>101715498Yes. I use piper and it works just fine. It's very fast, and pretty good, but not the best. It's good enough for my needs, speed being a major point.
chose the worst nala card for this, trying gemmastra 2b with samplers listed on model card, disabled EOS>that atrocious grammar on second faus-user turn
>"won't bite... unless you want me too" with mischievous gleamAAAAAAAAAA
>>101715753mistral large sent a shiver down my spine, the slop is still there
>>101715753I've said this in real life a few timesIt's not that hot
>>101714471>got a real gf>sold 3090>gf spends 3090 money>gf leaves>"Welcome back, Anon."
>>101715732wow, that came from a 2B? how does it perform with the usual nala card?
>>101715863nta leddit has some tests of it, its as incoherent as you would expect. a good test would be this 2b vs pyg 2.7/6b, maybe l1 7b
>>101715863lol I never bothered finding the real nala cardalso the prompt is cheating
>>101715732>>101715875Not bad for a tiny model. I'd expect 2B to implode from a retarded prompt like that.
>>101715016Yeah I do have an intel core i7-13700F. I tried increasing threads but that only made the gens take a bit longer.
>>101716127once you aren't swapping youre pretty much at max speed anyways. make sure youre using as much layers as possible, xmp is on, but thats about it. welcome to cpu speed. mistral large is 0.7t/s for me and i really dont find it better than a 70b so far, but still testing it myself
>>101716209Alright. I'll mess with the layers again in the morning. If that fails me I'll try out a 70b model. If you could spoonfeed me the link to the one everyone uses for roleplayshit I'd appreciate it. Thanks for all your help so far.
>>101715811>gf spends 3090 moneythats like one month of (attractive) gf money in a first world country
>>101716473If I were his girlfriend, I'd rather he kept all the 3090s in the family.
>>101716277for 70bllama 2 greatness>https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-i1-GGUF/tree/mainfor llama 3.1, i'm trying>https://huggingface.co/mradermacher/Lumimaid-v0.2-70B-i1-GGUF/tree/main
>>101716526What settings and format do you use for miqu?
I have a 12GB 3060. Would like to get more VRAM.The only GPUs with 24GB are really pricy. Is using 2 GPUs actually worth it for SD; can you split the model in a useful way?
I was testing DeepSeek Chat V2 0628, 236B's performance on my system (as a Q4_K_M gguf quant) with a longer context by feeding it a wikipedia page to summarize.I randomly chose the page on US Military History and added a fictitious section in the middle about a series of conflicts between McDonald's and Burger King to see if it would actually summarize the provided text or go off its own data.Instead it suddenly answered in Chinese even though we were speaking English.>中国的军事力量是自卫性的,中国始终坚持走和平发展道路,坚持防御性国防政策。中国的军事建设始终是为了维护国家主权、安全和发展利益,保护人民的和平劳动,促进世界和平与发展的崇高事业。中国军队是人民的军队,它的根本宗旨是全心全意为人民服务。中国军队的发展和强大,是中国和平发展、积极参与国际事务、维护世界和平与稳定的体现。Google translates as:>China's military power is self-defensive. China has always adhered to the path of peaceful development and adhered to a defensive national defense policy. China's military construction has always been for the noble cause of safeguarding national sovereignty, security and development interests, protecting the people's peaceful labor, and promoting world peace and development. The Chinese military is the people's army, and its fundamental purpose is to serve the people wholeheartedly. The development and strength of the Chinese military is a manifestation of China's peaceful development, active participation in international affairs, and maintenance of world peace and stability.
>>101716597Kek 20 epochs of Mao's teachings are mandatory for models made in China
>>101716586normal alpaca rp in st. it responds good to it (and many other models too)
>>101716590As far as i understand, Comfy cannot use more than one gpu. I don't know about others.
>>101716636Thanks, asking about SD in general and not about the currently most shilled UI
>>101716721>shilledI just specifically told you the thing i use doesn't support the thing you're looking for. And here i was reading about other uis. Do your own reading now.
>>101717039Not inquiring about UIs, sorry.
>>101715498there's even a SillyTavern plugin for that already pre-installed
does kobold have an rpc feature for multi node like llamacpp? are they compatible with each other? have an amd pc that for some reason the kobold rocm fork works fine on but can't get base llama to not segfault, and would like to join it with my main server
>>101711970Me on the right
>>101718088use linux
>>101718282You do not have an ironed shirt anon, don't lie to me.
>>101718518Ok I don't. But I do own an iron and I know how to Google how to use it
How are local models nowadays compared to something like sonnet or orbo?
>>101718559terrible
>>101718559smelly sex picture
I have a lot of hope for LLMs, I really do, I'm just sad that it will take like a decade to reach the levels that private models currently are on.
>>101718559meme
>>101718602they caught up to gpt4 in a year
>>101718602My niece, when she was a little kid, told me that one day she'll be older than me. You seem to fail to grasp the same concept.
>>101718699If she lives to be older than you are at death she’s not wrong.
>>101718602lmao, locals are basically at the same level as private models. It's not 2022 anymore.The bigger problem is that local and corpo models are on the same shitty level - LLMs are trash. If you were to compare development level of LLMs with RAM memory for example, we are sitting at 64kb. And there are anons in this general saying that 64kb is all we need.
>>101718873If we're going to be pedantic, you meant to say 'she wouldn't be wrong'. She was.
HAHAHAHAHA 8GB VRAM
>>101719031cpuchads win again
>>101719031>X060
>>101719031I don't trust this, Nvidia wouldn't be so stupid.
>>101718990there's not a lot they can do with the current architectureincreasing the parameter count doesn't seem to do much past 500b-1t, except pack them with more useless informationand the training sets are already immense, you can't add much other than redundant data that won't help with their intelligencethere needs to be another breakthrough in research before we can see some serious improvements
somehow I can only load VRAM - ~3GB to VRAM with llama.cpp or I get some startup error, it still starts and generates but can't properly communicate with it anymore, what could be the reason?
>>101719134Context taking more vram as it fills up
>>101719134>but can't properly communicate with it anymorehave you tried sitting down with your GPU and discussing your problems together?>what could be the reason?are you sure your GPU isn't being... filled with memory from other programs till it starts leaking?
>>101719075
>>101719161>>101719170it's immediately after loading the model, llama.cpp says oom, but it's kind of working anyways
>>101719031Ought to be enough for anybody.
Is it normal that IQ3_XXS is much slower than Q2_K?
>>101719229stop.
I have an ERR! for 3090ti fan in nvidia-smi. Has anyone encountered this issue?
>>101719229Yes. It's doing a lot of work to give you the quality that it can out of IQ3. Q2, you're just turbo guessing.
>>101719238It's fucked, there is nothing you can do to fix it. Don't throw away your card tho, you can send it to me so I can dispose of it ecologically.
I'm having repetition and hallucination? gremlin? problems with the new Mistral-Nemo-Instruct-2407, using Q4_K_M.After a few messages every char railroads into talking more or less the exact same way. And later on in convos, by about message #150, the model just goes gremlin mode, either with heavy repetition (not repeating words, but repeating patterns and heavy use of synonyms one after another), and other times responds to things the exact same way.
>>101719271please refer to the following diagram -> >>101712274
Nemo Lyra low key mogs
>>101719080If they can't improve on intelligence, they should start looking toward optimizing performance. Give me my fucking BitNet.
>>101719229are you running on ram or vram? iquants are very bad on ram
>>101719383but bitnet won't give you points on meme benchmarks and that's the only thing corpos care about
>>101719238Does the fan spin? Stick your finger in there while it's under load
>>101719445corpos are also supposed to care about cutting costs and gpus are expensive
>>101719445Someone just needs to make a meme benchmark that is intelligence per inferencing cost or something and label it as a Green or ESG Benchmark and they will.
>>101719483nah, they already bought these GPUs so they can as well use them. And bragging rights to investors from beating another corpo by 2% on MMLU is worth more than creating a slightly worse model for GPU poorfags.
>>101711798I'm planning a chat mode for my client, and looking for ideas.What do you wish the clients you use had / did better?I am currently implementing Setting templates so that you can quickly use different Ai settings (even different models) easily, as well as block output so you can chain generations and compose complex prompts. Not really specific to chatmode but very useful for summarization workflows.As for chat specific ideas, the only ones I have in mind so far is mid-chat injection.Any other ideas appreciated.
>>101719031How are we supposed to accelerate with 8GB VRAM, old man?
I miss gpt-3
>>101711798>/lmg/ - Local Mikus General>/lmg/ - a general dedicated to the discussion and development of local language mikus.>Previous threads: >>101682019 & >>101705239 >►News>►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)Surprised you didn't change the card this time.
>>101715583https://upload.wikimedia.org/wikipedia/commons/1/12/Ectobius_vittiventris_prep.jpg
>>101719550He's accelerating the depletion of your bank account.
>>101719550Your video games with the brand new DLSS 4.0 EXTREME, which reduces VRAM requirements so cards no longer need so much VRAM? That's what those cards are made for. Gaming.
>>101719550graphic cards are for games, not to generate shivers down the spine
>>101719424Ok thx, 40% on CPU, could that be the reason flash attention is very slow, as well?
>>101719591Thank you for your nice roach picture anon...
>>101719031>>101719550Feels good to be a 12GB VRAMchad. I'm already mogging the 5090 poorfags.
>>101719711>5090Your tokenizer is fucked.
>>101719031if you're buying a low-mid card like that then you're gaming at 1080p and you genuinely do NOT need more>b-but muh 32x supersampling AA!!!dlss is better
>>101719652the fact that more and more companies are releasing models for local use is a sign that things are changing, nvidia needs to realize that
>>101719747Running llms locally is a niche. Games sell more consumer gpus.
>>101719742that's right goy, buy our new 5060 which is hardly better than a 3060 but costs three times as much
>>101719747That's what NPUs are for. And local use corpo models are typically 2B models for shit like autocomplete, classification, or suggestions. All the important stuff will happen on the cloud for a reasonable subscription fee.
can bitnet do moe
can moe do bitnet
>>101719731>he is unaware
>>101719808
>>101719808>he cares notBut the first post you linked is for the hypothetical 5060, not the 5090. And i don't rely on rumours.
anyone know and good uncensored 2B-4B models?I want something that runs well on my phone for role play
God damn you really beefy hardware to run some of this.
>>101719846Why don't you just leave it open on your computer and connect to it from your phone?
>>101719846https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1
>>101719846>good>2B-4B
I've been out of the loop for a year, what's the best uncensored model that could fit in 12GB vram currently? I need it for writing image generation prompts and maybe some scandalous dirty jokes.
>>101719953mistral nemo, though you can't fit the entire 128k context into 12GB unless you offload something to ram
>>101719988I don't think he needs 128k context for prompts and jokes, bro.
>>101719996He might if it's a Brick Joke.
>>101719747>the fact that more and more companies are releasing models for local use is a sign that things are changing, nvidia needs to realize thatLocal doesn't need more memory, it needs better models and inference engines. Powerinfer-2 proves proves that (unless you assume they are lying).The industry needs to stop making slightly tweaked GPT-2's and embrace predictable dynamic sparsity for local models. That way local can use massively larger models, increasing VRAM only allows slightly larger models.Mixtral worked for Powerinfer-2 by accident, with actual design they can do better.
Does anyone know how to fix the trailing "System:" in quick reply on ST? It should be "ASSISTANT: " instead
>>101711798>>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope:>>>>Smaller, Safernothingburger.
>>101719238Solved it by setting the PCIe slot speed to Gen3 in the BIOS.
>the week is over>literally no big releases from anyone, no bitnet, no model from Cohere, OpenAI, Apple, etc>literally only Flux but that's not an LLMBros why did >>101571531 do us like this?
HAHAHAHAHA 28 GB VRAM
>>101720416Why are you laughing, that's good.
>>101719031charge your phone anon
>>101720416The fuck
>>101720349somethingburger. >>101662971>>101665132Gemma Scope will let us find and remove the source of shivers like orthogonalization did for refusals.ShieldGemma is a naughty model that was trained on all the bad stuff filtered out of the regular Gemma dataset.
>data on this page may change in the futurealso at this point just ask for the generational wealth of your whole village and get the a6000. consumer cards are too gimped for anything ml
>>101720416Anybody thinking NVIDIA will significantly increase VRAM when there's no pressure not to is almost as retarded as anybody thinking that OpenAI will ever release another open source LLMThe 5090 will be 28 GB, the 6090 will be 28 GB, and the 7090 will be 32 GB
>>101720445Good goyim, pay $2000 for an extra 4GB of vram.
>>101720466That's cool and all but the model is still 8k. If someone can look at how they did this and create a version for Mistral Large, Llama, etc, then it'll be good.
>>101720416It's the future, data is changing, we're gonna be eating good, we all got our thousands of ground floor bitcoins, right?
>>101720524as if there are any other alternatives
>>101720556Radeon Pro W6800 32GB
>>101720524For that price you can get 3*3090
>>101720573is it the same architecture as 6800xt? I know that one has good support in ROCM
>>101720517Wouldn't surprise me if Altman and had a hand in this.There are two ways to attack open source LLMs. The first is to regulate them. The second, barring that, is to limit the layperson's access to them so that their only choice is to use API only services.At this point, it's very deliberate.
>>101720573>amjeetpass, 2x 3090 are better
>>101720628>6800xt>good support in ROCMDoesn't have matrix core, no WMMA support, no flash attention.
>>101720648You're overthinking, it's just ngreedia, black leather jacket man also has altman by his gay balls
>>101720524It's not just the VRAM that matters, the speed is a very important factor too. No one actually cares about VRAM outside of ML.
>>101719075pfffft
>>101720687ML is a lot more than LLMs, smaller models have been used in industrial settings for years, and consumer grade GPUs are still good for that because of their performance-price ratio
It will be 24GB.
Stop trashing other boards with your shit >>>/tv/202128053 faggots.
>>101721006No one here like llama 3
Let me guess, that guy is the one that made the thread so he could have material to criticize "us" for.
>>101721013I like L3. It seems to be a very good general purpose good-enough standard, even if one only uses it as a comparison reference.
>>101721066Let me guess, he also likes to post pictures of a certain turquoise-haired character engaging in bestiality with dark-skinned men
>>101721006No I have to do my job shilling every week
Haven't looked in a few months. Anything better than Stheno 3.2 for RP come out yet?
>>101721161Yes.https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1
>>101721161>>101719988
>>101721178>>101721183Interesting. I'll check both out, thanks.
Good Afternoon where is the bitnets?
>>101721330https://huggingface.co/Green-Sky/TriLM_3.9B-GGUF/tree/mainhttps://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/tree/main
>>101719520Benchmark idea. Fixed hardware, scripted series of questions. Prompt could start with a story and questions could be sbout that story. Score is number of correct answers before time limit expires with a penalty for incorrect answers so generating random answers with infinite speed won't have a maximal score.
>>101720416as expected, what a shame
>>101721178Why didn't your ad show for >>101721161?
>>101720416enough for fluxwe are so back
>>101721357>Fixed hardwareMeasuring flops is better. Fixed hardware is a stupid idea.>before time limit expiresTime, or even flops limits, are ridiculous. Normalized corrects answers/flops. Closer to 1 wins.>penalty for incorrect answersBuilt-in in the previous point.However, this will favour correct but short answers. You will want to account for that. 100tokens/1kflop is better than 10tokens/1kflop if both are correct.
>>101720445Of course not, it should be 32gb minimum
>>10172166064gb or we boycott
>>101719031>source: my ass
>>101721660And it should be only 1-slot, and draw at most 100W.
>4 years at 24GB and all they can spare is an extra 4GBlol
>>101720556>>101720628>>101720654
>>101722016nobody needs that many see pee yous
>>101722016Lewd.
>>101722031but everybody needs that many mem ri chan else
Wikipe-tan card is kino.
>>101722087
>>101722144>>101722144>>101722144
>>101722087Oh damn, is it trained on wikipedia text to imitate the tone?
>>101722168its mistral large 2407
>>101722130>ai gf who is educational and sexy at the same timeOnce we get the robot body (including cyber womb) bit figured out, that's it for females. 99.999% of all women literally cannot compete.
>>101722087>that first paragraphNice slop.
>>101722130>whispers conspiratorially