[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: file.png (1.8 MB, 1024x1024)
1.8 MB
1.8 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language mikus.

Previous threads: >>101682019 & >>101705239
►News
>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
Thinking about taking the plunge and buying a second 3090, but I have some questions.

There is not enough room on my mobo for 2 3090's, I plan to fix that issue with a riser cable, but where do I house the errant GPU?

My PSU is 1000 watts, is this enough?

What kind of models can I realistically expect to run at a decent speed with 2 3090s? Are quants of bigstral within reach using CPP?
>>
>>101711833
1000 watts is cutting it close for two 3090s but its unlikely you'll see heavy CPU load, so it'll probably be fine
>>
>>101711833
hope you have enough PCIe lanes if these are the questions you're asking.
obviously you can "cloud compute" by hanging your entire rig off of the ceiling using string, what kind of question is that? Just get it working. There's no magic to screws and boxes.
>>
>>101711854
lanes dont matter for inference unless you're doing split tensors because its all vram resident, and for split layers the total data transfer between the GPUs is very small
>>
>>101711854
It's a tuf gaming b550-plus, would that be an issue?
>>
File: ComfyUI_00119_.png (333 KB, 512x512)
333 KB
333 KB PNG
Care for a glass of bees?
>>
>>101711833
>My PSU is 1000 watts, is this enough?
Mine blew up and I changed it for a 1200w one. But maybe the reason it blew up was because it was 8-months-old and used for mining, though...
>What kind of models can I realistically expect to run at a decent speed with 2 3090s?
70Bs at 4.5-4.65 bpw.
>Are quants of bigstral within reach using CPP?
An IQ4_XS quant ran at 5 T/s for me.
>>
>>101711912
Thank you. I think I'll do it. I'm just not sure where to shove the 3090 if it doesn't fit in the case.
>>
File: ComfyUI_00045_.png (1.47 MB, 1280x1024)
1.47 MB
1.47 MB PNG
>>
>>101711970
How'd you get the mix genres?
>>
>>101711970
Is this real? This has to be Photoshop.
>>
>>101711970
>The way miku's shoulder presses into his shirt as if she was really there
>>
Holy fuck Llama 3.1 70B Instruct is actually retarded. No combination of card instructions and depth zero author's notes will make it stop (over)using ellipses in dialog when it's writing in generic RP style.
>>
File: water flux vs d3.png (954 KB, 1445x807)
954 KB
954 KB PNG
another FLUX vs. D3 smartness shoot-out
>>
File: ComfyUI_00031_.png (1.67 MB, 1536x1024)
1.67 MB
1.67 MB PNG
>>101711985
>>101711994
>A professional real estate photograph selfie in a living room, 24mm, f/16 lens. The background is sharp and in focus. An anime cutout of Hatsune Miku is edited into the photo. There is a photogenic man standing beside her with his hand around her shoulder.
>>
>>101712013
Yeah it's a substantial downgrade over the old L3-70B.
>>
>>101712013
i've tried base, instruct and instruct tunes and it feels like some combo of smart and retarded. it follows my cards, prompt and rag db great, but then it forgets what happened 1 message ago. miqu is still better imo
>>
File: ComfyUI_00052_.png (379 KB, 512x512)
379 KB
379 KB PNG
►Recent Highlights from the Previous Thread: >>101705239

--Mistral-Large-Instruct-2407-GGUF model recommended for ERP: >>101708213 >>101708237 >>101708323 >>101708442 >>101708477
--Gemma 2 27b performance and model saturation discussion: >>101705986 >>101706032 >>101706108 >>101706154 >>101706158 >>101706191 >>101706192
--Anon achieves fast Flux execution with 3060 and 128 GB RAM: >>101705620 >>101705997 >>101706039 >>101706202 >>101706810
--FLUX cfg settings for image generation: >>101706621 >>101706652 >>101706776 >>101706812 >>101706895 >>101706776 >>101706785
--Anons test gemma model on budget Android phones, impressed with coherence and performance: >>101707575 >>101708739 >>101708849
--Using LLMs to generate onomatopoeia, with a humorous example: >>101710547
--Testing 1408x1408 resolution, model generates interesting but imperfect image: >>101710304 >>101710395 >>101710537
--Tess-3-Llama-3.1-405B model and synthetic data generation: >>101706755 >>101707053 >>101707110 >>101707382 >>101707469 >>101707546
--OpenRouter's base 405B model may not be truly raw: >>101709255 >>101709278 >>101709398 >>101709645 >>101709711 >>101709865 >>101709333
--Nvidia faces DOJ antitrust probe: >>101710495 >>101710602
--Lumina-mGPT: multimodal model for generating photorealistic images: >>101705936 >>101705971 >>101706657
--Flux struggles with coherent and prompt-following images: >>101705497 >>101705817 >>101705949 >>101705875
--Flux outperforms D3 in concept granularization and overload handling: >>101705902
--Flux dev tested on 3090, decent results but inferior to SD15 and SDXL fine tunes: >>101706383 >>101709715 >>101709740 >>101709859
--Base model is good for long form storytelling, but hard to start: >>101705345 >>101705600
--Anon tests image generation resolutions, 1280x1280 works better than 1408x1408: >>101710671
--Miku (free space): >>101705490 >>101705866 >>101706450 >>101707107 >>101708859

►Recent Highlight Posts from the Previous Thread: >>101705242
>>
What kind of model/quant could I run with 72 GB of vram?
>>
>>101712018
we are so fucking back
>>
>>101712018
Very cool anon, you're a genius.
>>
>>101712017 (Me)
I've played around with tilted water bottle prompts on D3 before. So I do know that even if you massage the prompt to get it to consistently tilt the bottle it will never quite make the water surface parallel to the normal. Another massive win for FLUX.
>>
>>101712018
>6 fingers
>painted nails
>>
All those Miku pictures make me want to become Miku. I think I would look very cute with those twintails.
>>
File: LLM-history-fancy.png (732 KB, 6285x1307)
732 KB
732 KB PNG
Changed some things based on feedback
>>
>>101712136
Don't lie. You're imagining yourself as Miku being railed by all those photogenic real estate agents.
>>
>>101712136
That's not the way God wants you to be. Reconsider.
>>
File: ComfyUI_00093_.jpg (367 KB, 1024x1024)
367 KB
367 KB JPG
>>101712136
>Become the Miku
Soon, Anon. Soon.

>>101712156
>
>>
>>101712164
Miku giving Bad Touch Jesus the side eye.
>>
File: 1695558067008110.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
So this, is a Miku Hatsune level 1...
>>
>>101712166
Jannies, cleanup time~!
>>
>>101712166
blacked miku is coal
>>
They're the same people aren't they.
Someone just wants to create artificial drama.
>>
I have no interest in cooming; what versions of mistral and llama 3 can fit on my 4090? I'm interested in general use and instruct.
>>
>>101712136
My dream is buying Miku's skirt, stripped panties and thigh highs from eBay and taking some photos to fap with it, but I fear that might ignite something inside me.
>>
>>101712189
unironically what do the rest look like?
>>
>>101712208
>They're the same people aren't they.
of course he is
>>
Anyone use an LLM as an agent?
>>
>>101712213
You're probably better running Gemma 2 27B.
>>
File: 1711593409601033.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>>101712229
well I just started, we'll find out
>>
>>101711798
>2024
>still getting boilerplate legal responses to avoid litigation for the most innocuous queries
Literally every one of these silicon valley cucks who censor.. oh I'm sorry, "aligns" these LLMs could burn in hell for a trillion years and that wouldn't be .0000000000001 percent the punishment they deserve. Meanwhile there's almost zero censorship for doing almost any variety of sick sex acts with these fucking things. Typical r*ddit-tier cuckold logic. Whoever thinks this shit will replace programmers is so committed to licking big tech's boots that they're not even worth responding to.
>>
>>101712244
>Gemma 2 27B
>>101711679
>>101711642
>>101711601
>>101711592
umm yeah,no
>>
>get tired of Mistral Large repeating stuff from the context word-for-word
>switch to mini-magnum
>does literally the same thing
I'm tired of this...
>>
>>101712240
Dude, I just got fired.
>>
File: graph.png (4 KB, 502x397)
4 KB
4 KB PNG
>>101712260
Get tired of Mistral repeating switch to Mistral ???
>>
>>101712260
Try DRY sampler
>>
File: 1447592714450.jpg (321 KB, 640x750)
321 KB
321 KB JPG
Bros I just got promoted!
>>
>>101712258
hi petra
>>
>>101712277
>DRY sampler
Meme made by a pretentious redditor
>https://www.reddit.com/user/-p-e-w-/
> DRY author here. Your Min-P is too high. 0.1 is way too much with current models, and even 0.05 is too high IMO. I use 0.02 and it's more than sufficient. Increasing temperature above 1 is generally a bad idea nowadays, and probably the reason why you feel the need to use such high Min-P values to keep the output coherent.
>https://old.reddit.com/r/LocalLLaMA/comments/1ej1zrl/try_these_settings_for_llama_31_for_longer_or/lgbjtox/
>>
>>101712240
Gonna need you to kill yourself buddy
>>
>>101712244
worth a shot
no goofy quants or anything right?
>>
>>101712246
Why did her pants turn into a skirt? Also her hair isn't spiky enough.
>>
File: 1705265020543311.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>101712317
this came out better
>>
what happened to thebloke?
>>
Gemma 2B gets the strawberry question right.
>>
>>101712332
Stupid enough to make a complete guess and guess correctly.
>>
>>101712323
try mixing in some broly to maintain the hair color
>>
cuckshit spammer must be one of the jannies, why is it still not removed?
>>
>>101712306
> I'm sorry, but that is an absurd claim. Nobody knows the potential pitfalls of AI in medicine, because AI hasn't been deployed in medicine at any significant scale so far (nor anywhere else, really). The industry isn't even in its infancy yet, it is barely starting to exist in the first place.
> LLMs are going to roll over civilization like a bulldozer once the mass of people realize what this actually means. We don't even need AGI in order for this to happen. The current generation of LLMs is more than good enough to cause the greatest upheaval since the Industrial Revolution.
> If someone at OpenAI told me that there are issues with LLMs, I would certainly question it, yes. They can speak about their specific product (GPT), but it is way too early to make generalizing claims.
>>
>>101712187
And you don't even try to hide that you love black cock.
>>
>>101712349
Janny is a literal troon: >>101710688
>>
>>101712323
Why did goku get miku hair color? Why does he have 6 fingers?
>>
>>101712332
It also gets macron’s birthday, oh lawd.
>>
File: 1721907628541060.png (1.3 MB, 1024x1024)
1.3 MB
1.3 MB PNG
now we're talking, getting closer!
>>
>>101712349
He does get banned, but rebooting router / using proxies isn't really hard.
>>
Does the new Llama3.1 release actually put any pressure on OpenAI to release a GPT-5.

Do you think they have anything that could be appreciably called GPT-5?
>>
>>101712394
>Does the new Llama3.1 release actually put any pressure on OpenAI to release a GPT-5.
no
>Do you think they have anything that could be appreciably called GPT-5?
no
>>
File: 1713544734895824.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>101712384
>>
>>101712394
Open models are nothing. The real competition is Claude and Gemini.
>>
>>101712410
hair too short
>>
>>101712394
They don't even have anything worth calling gpt 4.5
>>
>>101712384
it's wider at the bottom like it's still where her twintails would be
>>
File: 1701255559263705.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
a gen from before while I queue stuff up:
>>
File: 1721689554914991.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
hair with green tint: now she's closer to broly
>>
File: 1696143828832695.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>101712462
alt:
>>
>>101712462
>>101712410
>>101712384
Dis not migu. Migu has 01 on her arm
>>
File: 1698730566408798.png (17 KB, 319x162)
17 KB
17 KB PNG
>>101712087
>>101712018
thanks arstechnica
>>
>>101712370
JLI is the resident janny of lmg btw
>>
>>101712520
>leftist media knows
it's over
>>
>>101712520
>eerily good
>heir apparent
I hate journalist so fucking much.
>>
>>101712592
>heir apparent
What about it exactly?
>>
>>101712449
Good work, Miku
>>
https://anthra.site/

Mini-Magnum revision soon,
the first magnum is the worst it'll ever be. The ride is up and up from here.
>>
File: 1704904585028190.png (138 KB, 256x256)
138 KB
138 KB PNG
>tfw 16gb vram
>>
>>101712424
>They don't even have anything worth
B-but project stRawbeRRy...
>>
File: 1722735969423.png (430 KB, 573x573)
430 KB
430 KB PNG
>>
Would it be good to realtime video gen a game with logic?
>>
I don't have much proof but I just genned 20 to 30 step schedules and only one of them had logical hand anatomy, which was 21 step scheduling. And this agreed with another test I did where I genned 20 to 25, and 21 was also the only one that had good hands. The only issue is that 20-30 is kind of low for packed prompts, so you end up with more gens that have less stuff from your prompts present.
>>
>>101712879
>I don't have much proof but I just genned 20 to 30 step schedules
Wait no I meant 50. Yes, I generated 30 images to see how much change the image goes through depending on amount of steps.
>>
>>101712752
unironically hyped. i love mini-magnum
>>
File: file.png (241 KB, 2399x1392)
241 KB
241 KB PNG
idgi which one do I get?
>>
>>101712964
neither
>>
>>101712998
You mean there's a different place to get gguf's or are you just being le funny?
>>
File: image (10).jpg (155 KB, 1024x768)
155 KB
155 KB JPG
I want this quality with chameleon.
In a couple months we might actually have that.
>>
how are you guys liking flux?
>>
>>101713190
Its shit.
>>
>>101712017
>D3 alternative a year later
>Still only knows half of its concepts
I love Flux but we all know it won't take the lead for long.
>>
File: image (12).jpg (151 KB, 1024x768)
151 KB
151 KB JPG
>>101713190
You cant make coom pictures with it, but its extremly good.
VRAMchads are eating. I need to use it through the api to play around.
>>
>>101713190
Still not good enough, just like text model.
>>
>>101713079
Nah that'd be too good to be true. Even if they get the architecture right, they won't release it. And if they do release it, they'll be forced by shareholders to censor it before release like Chameleon, which means relying on the rest of the world/community to undo the censoring. We'll be lucky if someone does that competently.
>>
>>101713276
Impressive. Very nice.
>>
Achievable natty?
>>
>>101713276
'um on 'iku
>>
>>101713264
Not sure what openai is doing.
This is my 4th attempt.
>>
>>101713276
How come I don't ejaculate this much?
>>
>>101713308
>forced by shareholders to censor it
probably yeah.
weird stuff going on in the /ldg/ thread since the flux release.
>flux dev pops in, everybody has a good time
>with great timing somebody posts dead loli in a dumpster. u-uhm guys i cant even post it here but a russian guy made this prompt look at this link!
>another anon appears and is outraged asking how this is allowed and starts a whole discussion.
There must be alot of pressure and lots of bad actors all over.
>>
>250kb/s flux download speed
sasuga hugginface
>>
>>101711798
After Flux I legit feel GPU poor with my 3090. I need a 4090 and even that feels not enough, just breaking even.
>>
>>101713428
Now, theoretically it should be 2x faster with TensorRT, but there is no announcement on that.
>>
>>101713428
what would the 4090 gain you here
>>
>>101712144
So the only current option is mistral large? Damn. I'd better get used to long waits.
>>
>>101713428
I have a 4090, it's still slow as shit.
>>
>>101713264
>You cant make coom pictures with it
And that's a good thing. People who want to use it are forced to come up with something different, and this results in finding more creative and simply more interesting ideas for gens than "1girl, naked" or shitty anime porn slop no.47747996432678.
>>
File: 1699382432690393.gif (1.57 MB, 500x281)
1.57 MB
1.57 MB GIF
>Mistral-Large-Instruct-2407-Q3_K_M-00001-of-00002.gguf
I tried to load this but it just ends up crashing koboldcpp.

I also tried Mistral-Large-Instruct-2407-IQ2_M.gguf but the gens take anywhere from 20 minutes to 1 hour 15 minutes. Should I be fucking with settings?
>>
>>101712306
You might say it's a 'meme' but it actually works, however it does seem to make it spell things wrong sometimes, not sure why.
>>
File: GF0ikXhbsAAW4Fu.jpg (507 KB, 2048x2048)
507 KB
507 KB JPG
>>101713190
not as good as dall-e at artistic prompts, not even close. it can place things together in an ugly way sure. you can get mario and goku surfing on the death star in some ugly default style but it won't actually look good. it lacks artistry, still not good enough.
>>
who the FUCK is petra and why does that name keep coming up in these breads
it's like /vee/'s boogeymen all over again
qrd?
>>
>>101713467
i think you're supposed to merge them with copy /b file1.gguf file2.gguf newfile.gguf

20 mins sounds like youre swapping. if youre using kobold's new auto layer select, check your usage and manually lower it. for me it now selects 40 layers for a 70b when i can fit 31, over that i think its going to my igpu making it slower
>>
>>101713467
you should be fucking with buying more vram
>>
>>101712964
Magnum-72b is good? Is it better than miqu?
>>
>>101713529
neither
>>
>>101713467
How much ram do you have? In the 2nd case if it's that slow you're probably using swap.
>>
>>101713506
Just ignore it, people who litter these threads with this garbage wants your attention and reactions.
>>
>>101713543
Guess I'm stuck with miqu then. Seems like only good small or large shit comes out. Nothing good for the mid-range.
>>
>>101713511
You don't have to do that, for koboldcpp you just point to 00001-of-0000*.gguf and you're good.
>>
>>101713387
It does feel manufactured. A lot of manufactured posts in these threads too though about more minor things.
>>
>>101713387
Nobody could replicate it with the prompt he provided. He just got a bunch of people to look at actual CP.
>>
>>101713602
>>101713614
Was weird how they pressured the flux guy as well. Even after he said he is just a infrastructure dude.
>This was trained on children!
>I dont know the dataset but in my testing i never saw any output like that
>SO YOU DONT DENY THIS MIGHT BE IN THE DATASET!
Sane people that point out if you have a grown woman, children and gore in there the ai is smart enough to mix it together (thats the whole point) are ignored.
Weird shit.
>>
>>101713654
It's not weird, it's deliberate. Tomorrow we'll be seeing news stories featuring Flux being used to post cp on 'anonymous imageboards in the dark web' or something equally idiotic.
>>
>>101713511
Are you talking about the setting that says "GPU Layers: [-1 ] (Auto: 29/91 Layers)?
>>101713545
32 GB vram. I know it's probably not enough, but some guy on another board told me he ran a good model with less than me so I thought I'd give it a try. Apparently he was getting gens every 1 to 5 minutes.
>>
>>101713696
You'll need a bit more than the size of the model in ram + vram total to not get it slowing down due to swapping. If it's crashing it could be low vram and you can try lowering the layers. If it's slow then you might not have enough regular ram.
>>
>>101712144
literally kys
>>
>>101713696
>GPU Layers: [-1 ] (Auto: 29/91 Layers)
yeah, for me after 1.70.1, it guesses layers to high and slows everything down. in task manager watch your dedicated gpu memory usage. with the model loaded you still need a bit free for cache. start with a bit over half the number it guesses, so 15. after you see how much you have free move it up more next time until you find the limit
>>
Just a reminder that /naids/ thinks Kayra is better than Llama 405B
>>>/vg/488890201
>>
>>101713830
Not your army, schizo
>>
>>101713830
your post is irrelevant and so is your life
>>
>>101713830
Are they wrong?
>>
File: 1698308431270578.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
flux knows what a funko pop is (even though traditional figures are better, just to test)
>>
>>101713830
You gonna shill your shitty Mikupad clone again?
>>
File: 1704712997481735.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
wooden doll with joints miku:
>>
>>101713830
Kayra is better. Novel ai image gen is also better than flux. Don't waste your money.
>>
>>101713936
This is the most organic post I've ever read.
>>
>>101713936
This, but unironically.
>>
>>101713830
>>101713936
Ewww, fuck off and go back to your containment thread rather than shitting up ours. You can have your schizo melties there instead.
>>
>>101712752
Stop fine tuning shitty 13bs and give us llama 3.1 70b fine tune of magnum opus
>>
>>101714016
*a mistral large 2 tune
>>
>>101714006
This thread is already thoroughly shitted up anyway.
>>
>>101714029
Or that, either is fine. Either euryale or magnum. These sloptuners slacking and playing with shitty 8b and 12bs instead of giving us the good stuff.
>>
>>101714091
Mistral large is already pretty good, so I think 70b has the most potential for improvement.
>>
>https://huggingface.co/migtissera/Tess-3-Llama-3.1-405B
quants when
>>
>>101713774
>>101713766
Yeah so I think I fucked up. I have 32gb ram and 16gb vram which is probably why I can't get gens without waiting 20+ minutes. I messed with the GPU layers setting and it didn't change anything. If I set it too high it just crashes koboldcpp
>>
Just got gemma2:27b-instruct-q6_K running on ollama with no gpu and 32gb of ram.
It is very slow but also the best model I have tested.
I always ask every model how to beat the moon lord in terraria and everytime they give me a bs answer that shows they don't know what they're talking about, but gemma2 gave me a pretty good answer. Although not entirelly correct, it was miles above the rest.
I am very impressed.
Is this what the 70b models feel like?
>>
>>101714354
I used gemma2 27b for RP and it was hot garbage, not even complaining about prose or pozzitivity or whatever, it was just dumb as fuck). Miqu (real 70B) passed all my RP tests (while also shivermaxxing so I deleted it)
>>
>>101714354
>It is very slow but also the best model I have tested.
this makes me sad
>>
>>101714390
What do you use then?
>>
>>101714390
Could be that it has access to more gaming related data than other models and that's why it could answer better?
>>
>>101714349
try selecting disable mmap
>>
>>101711798
If there's one thing I've learned from LLMs over the past few years, it's that there's no hope. The corporations will control this tech with an iron fist. We will never have anything interesting that results from this. There are too many potential lawsuits. We're also seeing this on the enterprise side of things, and it's looking like the AI bubble is about to burst. And with California about to fuck everything up, it's all very depressing and predictable. Interesting AI may exist one day, but not while any of us our young enough to care.
>>
>>101714405
I got a real gf so I sold my 2nd 3090 and don't RP anymore, so technically I'm no longer a "user". Now I just run random RP tests from time to time to check on the current state of LLMs
>>101714430
Beats me, I don't see how gaming is relevant here
>>
Been using Mixtral 8x7b on CPU for months, is there anything better now with the same performance?
>>
>>101714471
Well if you still do testing what's the best bet?
>>
>>101713462
Not really. Most gens are just some generic white or Korean woman in some provocative and SFW pose and/or setting.
>>
>>101714467
Every doomer prediction since "chatgpt will never be local" and "llama 1 is the last llm we'll ever get" has been wrong.
>>
>>101713774
i take back what i said kcpp's layer guessing, i don't know why it wasn't updating before but now when i drag the context, it adjusts the layers. per my same settings, its suggesting 27 now (32 was my max prior with 16k context) so its much more correct than it was by guessing 40 on the last version. i dunno why the ui didnt update for me at first
>>
>>101714482
For 24GB VRAM range, Yi-34b-chat, mini-magnum is also decent, felt like a gemma2 27b sidegrade but it "gets" more ERP
Poop: magnum-32b (qwen base), in fact, all qwen models suck ass
For 70B range midnight-miqu was decent, L3-70B started repeating itself on the third reply, didn't try L3.1 but I doubt it'd be any better
>>
File: 1722733668845724.png (4 KB, 502x397)
4 KB
4 KB PNG
>>101712274
>>
>>101714611
Heh
>>
>>101714611
>1k context card
>35 token user card
>128k ctx
>AHHHHHHH WHY IS IT THE SAME
use rag and lorebooks you fucktards
>>
is llama 3.1 bad like 3 was?
>>
>>101714668
You're saying having that info at the beginning is a drawback? But using a lorebook or similar would increase processing time.
>>
Any largestral fine tune?
>>
>>101714718
the original prompts and premise only carry a story so far, you need to be constantly putting new data into it. both rag and lorebooks will cause processing of the entire context each time, but the results are that much better because its considering new random data with each gen rather than just going off of chat history and card data
>>
>>101713264
You can. Just takes a lot more effort.
>>
>>101714766
Only Undi's, I think.
>>
We are now in the ollama+open webui era.
>>
>>101714433
Wow that cut it all the way down to a gen every 5 to 8 minutes. Is there anything else I should try to make it even faster?
>>
>>101714981
what kind of processor do you have? you might be able to increase the threads but for some processors like intel it defaults to only the pcores for a reason.
check your dedicated gpu usage again, being able to fit more layers increases speed slightly too, but you have to balance that with your context limit
you're already running a huge model for your system specs. you should be using a 70b, not a 123b. you were likely swapping to your ssd before which is why it was so slow, disabling memory map reduces ram usage just enough that you're able to fit it in without swapping
>>
File: ComfyUI_00011_.png (1.55 MB, 1024x1024)
1.55 MB
1.55 MB PNG
Cucky, cucky, cucky, cucky! Cucky, cucky, cucky, cucky! Come bring ya black ass out here, you fucking nigger! It's buck breakin' o'clock! You've been misbehavin' again!
>>
What model can i host on hf space free tier?
>>
>>101715306
they don't have GPUs, so nothing at a decent speed
>>
>>101711798
Anyone tried piping the output of a chatbot into a text to speech AI yet? Are there even any good local text to speech AI models out yet?
>>
File: 1714171673217016.jpg (160 KB, 2048x1365)
160 KB
160 KB JPG
Is there any studies/reports on the effectiveness of using LLMs as a supplement or replacement for psych therapy? Or have any anons used it as a therapist or seen others talk of it?
Considering using it myself while I wait to go see a real one, wanna start actually living my life
I set up a psych character some days ago and talked to it for an hour, felt a bit scuffed but it might have actually helped me by attributing a lot of my issues I discussed to terrible self body image, something I was told I had as a kid but forgot and over time became normalized effecting me in ways I never considered related
>>
>>101715498
Yes. I use piper and it works just fine. It's very fast, and pretty good, but not the best. It's good enough for my needs, speed being a major point.
>>
File: foreplay.png (462 KB, 813x2417)
462 KB
462 KB PNG
chose the worst nala card for this, trying gemmastra 2b with samplers listed on model card, disabled EOS
>that atrocious grammar on second faus-user turn
>>
>"won't bite... unless you want me too" with mischievous gleam
AAAAAAAAAA
>>
>>101715753
mistral large sent a shiver down my spine, the slop is still there
>>
>>101715753
I've said this in real life a few times
It's not that hot
>>
>>101714471
>got a real gf
>sold 3090
>gf spends 3090 money
>gf leaves
>"Welcome back, Anon."
>>
>>101715732
wow, that came from a 2B? how does it perform with the usual nala card?
>>
>>101715863
nta leddit has some tests of it, its as incoherent as you would expect. a good test would be this 2b vs pyg 2.7/6b, maybe l1 7b
>>
>>101715863
lol I never bothered finding the real nala card
also the prompt is cheating
>>
>>101715732
>>101715875

Not bad for a tiny model. I'd expect 2B to implode from a retarded prompt like that.
>>
>>101715016
Yeah I do have an intel core i7-13700F. I tried increasing threads but that only made the gens take a bit longer.
>>
>>101716127
once you aren't swapping youre pretty much at max speed anyways. make sure youre using as much layers as possible, xmp is on, but thats about it. welcome to cpu speed. mistral large is 0.7t/s for me and i really dont find it better than a 70b so far, but still testing it myself
>>
File: 1696857740840216.png (1.26 MB, 1000x1545)
1.26 MB
1.26 MB PNG
>>101716209
Alright. I'll mess with the layers again in the morning. If that fails me I'll try out a 70b model. If you could spoonfeed me the link to the one everyone uses for roleplayshit I'd appreciate it. Thanks for all your help so far.
>>
File: 1722757372318341.jpg (95 KB, 1024x1024)
95 KB
95 KB JPG
>>101715811
>gf spends 3090 money
thats like one month of (attractive) gf money in a first world country
>>
>>101716473
If I were his girlfriend, I'd rather he kept all the 3090s in the family.
>>
>>101716277
for 70b
llama 2 greatness
>https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-i1-GGUF/tree/main

for llama 3.1, i'm trying
>https://huggingface.co/mradermacher/Lumimaid-v0.2-70B-i1-GGUF/tree/main
>>
>>101716526
What settings and format do you use for miqu?
>>
I have a 12GB 3060. Would like to get more VRAM.
The only GPUs with 24GB are really pricy. Is using 2 GPUs actually worth it for SD; can you split the model in a useful way?
>>
File: image.png (299 KB, 755x901)
299 KB
299 KB PNG
I was testing DeepSeek Chat V2 0628, 236B's performance on my system (as a Q4_K_M gguf quant) with a longer context by feeding it a wikipedia page to summarize.
I randomly chose the page on US Military History and added a fictitious section in the middle about a series of conflicts between McDonald's and Burger King to see if it would actually summarize the provided text or go off its own data.

Instead it suddenly answered in Chinese even though we were speaking English.
>中国的军事力量是自卫性的,中国始终坚持走和平发展道路,坚持防御性国防政策。中国的军事建设始终是为了维护国家主权、安全和发展利益,保护人民的和平劳动,促进世界和平与发展的崇高事业。中国军队是人民的军队,它的根本宗旨是全心全意为人民服务。中国军队的发展和强大,是中国和平发展、积极参与国际事务、维护世界和平与稳定的体现。
Google translates as:
>China's military power is self-defensive. China has always adhered to the path of peaceful development and adhered to a defensive national defense policy. China's military construction has always been for the noble cause of safeguarding national sovereignty, security and development interests, protecting the people's peaceful labor, and promoting world peace and development. The Chinese military is the people's army, and its fundamental purpose is to serve the people wholeheartedly. The development and strength of the Chinese military is a manifestation of China's peaceful development, active participation in international affairs, and maintenance of world peace and stability.
>>
>>101716597
Kek 20 epochs of Mao's teachings are mandatory for models made in China
>>
>>101716586
normal alpaca rp in st. it responds good to it (and many other models too)
>>
>>101716590
As far as i understand, Comfy cannot use more than one gpu. I don't know about others.
>>
>>101716636
Thanks, asking about SD in general and not about the currently most shilled UI
>>
>>101716721
>shilled
I just specifically told you the thing i use doesn't support the thing you're looking for. And here i was reading about other uis. Do your own reading now.
>>
>>101717039
Not inquiring about UIs, sorry.
>>
>>101715498
there's even a SillyTavern plugin for that already pre-installed
>>
does kobold have an rpc feature for multi node like llamacpp? are they compatible with each other? have an amd pc that for some reason the kobold rocm fork works fine on but can't get base llama to not segfault, and would like to join it with my main server
>>
>>101711970
Me on the right
>>
>>101718088
use linux
>>
>>101718282
You do not have an ironed shirt anon, don't lie to me.
>>
>>101718518
Ok I don't. But I do own an iron and I know how to Google how to use it
>>
File: 1706859635657725.jpg (2.62 MB, 2894x4093)
2.62 MB
2.62 MB JPG
How are local models nowadays compared to something like sonnet or orbo?
>>
>>101718559
terrible
>>
>>101718559
smelly sex picture
>>
I have a lot of hope for LLMs, I really do, I'm just sad that it will take like a decade to reach the levels that private models currently are on.
>>
>>101718559
meme
>>
>>101718602
they caught up to gpt4 in a year
>>
>>101718602
My niece, when she was a little kid, told me that one day she'll be older than me. You seem to fail to grasp the same concept.
>>
>>101718699
If she lives to be older than you are at death she’s not wrong.
>>
>>101718602
lmao, locals are basically at the same level as private models. It's not 2022 anymore.
The bigger problem is that local and corpo models are on the same shitty level - LLMs are trash. If you were to compare development level of LLMs with RAM memory for example, we are sitting at 64kb. And there are anons in this general saying that 64kb is all we need.
>>
>>101718873
If we're going to be pedantic, you meant to say 'she wouldn't be wrong'. She was.
>>
File: 8GB.png (529 KB, 1170x2532)
529 KB
529 KB PNG
HAHAHAHAHA 8GB VRAM
>>
>>101719031
cpuchads win again
>>
>>101719031
>X060
>>
>>101719031
I don't trust this, Nvidia wouldn't be so stupid.
>>
>>101718990
there's not a lot they can do with the current architecture
increasing the parameter count doesn't seem to do much past 500b-1t, except pack them with more useless information
and the training sets are already immense, you can't add much other than redundant data that won't help with their intelligence
there needs to be another breakthrough in research before we can see some serious improvements
>>
somehow I can only load VRAM - ~3GB to VRAM with llama.cpp or I get some startup error, it still starts and generates but can't properly communicate with it anymore, what could be the reason?
>>
>>101719134
Context taking more vram as it fills up
>>
>>101719134
>but can't properly communicate with it anymore
have you tried sitting down with your GPU and discussing your problems together?
>what could be the reason?
are you sure your GPU isn't being... filled with memory from other programs till it starts leaking?
>>
>>101719075
>>
>>101719161
>>101719170
it's immediately after loading the model, llama.cpp says oom, but it's kind of working anyways
>>
>>101719031
Ought to be enough for anybody.
>>
Is it normal that IQ3_XXS is much slower than Q2_K?
>>
>>101719229
stop.
>>
I have an ERR! for 3090ti fan in nvidia-smi. Has anyone encountered this issue?
>>
>>101719229
Yes. It's doing a lot of work to give you the quality that it can out of IQ3. Q2, you're just turbo guessing.
>>
>>101719238
It's fucked, there is nothing you can do to fix it. Don't throw away your card tho, you can send it to me so I can dispose of it ecologically.
>>
I'm having repetition and hallucination? gremlin? problems with the new Mistral-Nemo-Instruct-2407, using Q4_K_M.

After a few messages every char railroads into talking more or less the exact same way. And later on in convos, by about message #150, the model just goes gremlin mode, either with heavy repetition (not repeating words, but repeating patterns and heavy use of synonyms one after another), and other times responds to things the exact same way.
>>
>>101719271
please refer to the following diagram -> >>101712274
>>
Nemo Lyra low key mogs
>>
>>101719080
If they can't improve on intelligence, they should start looking toward optimizing performance. Give me my fucking BitNet.
>>
>>101719229
are you running on ram or vram? iquants are very bad on ram
>>
>>101719383
but bitnet won't give you points on meme benchmarks and that's the only thing corpos care about
>>
>>101719238
Does the fan spin? Stick your finger in there while it's under load
>>
>>101719445
corpos are also supposed to care about cutting costs and gpus are expensive
>>
>>101719445
Someone just needs to make a meme benchmark that is intelligence per inferencing cost or something and label it as a Green or ESG Benchmark and they will.
>>
>>101719483
nah, they already bought these GPUs so they can as well use them. And bragging rights to investors from beating another corpo by 2% on MMLU is worth more than creating a slightly worse model for GPU poorfags.
>>
File: 1722455246530456.png (608 KB, 850x1103)
608 KB
608 KB PNG
>>101711798
I'm planning a chat mode for my client, and looking for ideas.

What do you wish the clients you use had / did better?

I am currently implementing Setting templates so that you can quickly use different Ai settings (even different models) easily, as well as block output so you can chain generations and compose complex prompts. Not really specific to chatmode but very useful for summarization workflows.

As for chat specific ideas, the only ones I have in mind so far is mid-chat injection.

Any other ideas appreciated.
>>
File: accelerate.png (1.12 MB, 1320x764)
1.12 MB
1.12 MB PNG
>>101719031
How are we supposed to accelerate with 8GB VRAM, old man?
>>
I miss gpt-3
>>
>>101711798
>/lmg/ - Local Mikus General
>/lmg/ - a general dedicated to the discussion and development of local language mikus.
>Previous threads: >>101682019 & >>101705239
>►News
>►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)
Surprised you didn't change the card this time.
>>
>>101715583
https://upload.wikimedia.org/wikipedia/commons/1/12/Ectobius_vittiventris_prep.jpg
>>
>>101719550
He's accelerating the depletion of your bank account.
>>
>>101719550
Your video games with the brand new DLSS 4.0 EXTREME, which reduces VRAM requirements so cards no longer need so much VRAM? That's what those cards are made for. Gaming.
>>
>>101719550
graphic cards are for games, not to generate shivers down the spine
>>
>>101719424
Ok thx, 40% on CPU, could that be the reason flash attention is very slow, as well?
>>
>>101719591
Thank you for your nice roach picture anon...
>>
>>101719031
>>101719550
Feels good to be a 12GB VRAMchad. I'm already mogging the 5090 poorfags.
>>
>>101719711
>5090
Your tokenizer is fucked.
>>
>>101719031
if you're buying a low-mid card like that then you're gaming at 1080p and you genuinely do NOT need more
>b-but muh 32x supersampling AA!!!
dlss is better
>>
>>101719652
the fact that more and more companies are releasing models for local use is a sign that things are changing, nvidia needs to realize that
>>
>>101719747
Running llms locally is a niche. Games sell more consumer gpus.
>>
>>101719742
that's right goy, buy our new 5060 which is hardly better than a 3060 but costs three times as much
>>
>>101719747
That's what NPUs are for. And local use corpo models are typically 2B models for shit like autocomplete, classification, or suggestions. All the important stuff will happen on the cloud for a reasonable subscription fee.
>>
can bitnet do moe
>>
can moe do bitnet
>>
>>101719731
>he is unaware
>>
File: fm1ciczayamv96e.jpg (30 KB, 524x329)
30 KB
30 KB JPG
>>101719808
>>
>>101719808
>he cares not
But the first post you linked is for the hypothetical 5060, not the 5090. And i don't rely on rumours.
>>
anyone know and good uncensored 2B-4B models?
I want something that runs well on my phone for role play
>>
God damn you really beefy hardware to run some of this.
>>
>>101719846
Why don't you just leave it open on your computer and connect to it from your phone?
>>
>>101719846
https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1
>>
>>101719846
>good
>2B-4B
>>
I've been out of the loop for a year, what's the best uncensored model that could fit in 12GB vram currently? I need it for writing image generation prompts and maybe some scandalous dirty jokes.
>>
>>101719953
mistral nemo, though you can't fit the entire 128k context into 12GB unless you offload something to ram
>>
>>101719988
I don't think he needs 128k context for prompts and jokes, bro.
>>
>>101719996
He might if it's a Brick Joke.
>>
>>101719747
>the fact that more and more companies are releasing models for local use is a sign that things are changing, nvidia needs to realize that
Local doesn't need more memory, it needs better models and inference engines. Powerinfer-2 proves proves that (unless you assume they are lying).
The industry needs to stop making slightly tweaked GPT-2's and embrace predictable dynamic sparsity for local models. That way local can use massively larger models, increasing VRAM only allows slightly larger models.
Mixtral worked for Powerinfer-2 by accident, with actual design they can do better.
>>
File: 1707952051475914.png (648 B, 107x25)
648 B
648 B PNG
Does anyone know how to fix the trailing "System:" in quick reply on ST? It should be "ASSISTANT: " instead
>>
>>101711798
>>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope:
>>>>Smaller, Safer
nothingburger.
>>
>>101719238
Solved it by setting the PCIe slot speed to Gen3 in the BIOS.
>>
>the week is over
>literally no big releases from anyone, no bitnet, no model from Cohere, OpenAI, Apple, etc
>literally only Flux but that's not an LLM

Bros why did >>101571531 do us like this?
>>
HAHAHAHAHA 28 GB VRAM
>>
>>101720416
Why are you laughing, that's good.
>>
>>101719031
charge your phone anon
>>
>>101720416
The fuck
>>
>>101720349
somethingburger.
>>101662971
>>101665132
Gemma Scope will let us find and remove the source of shivers like orthogonalization did for refusals.
ShieldGemma is a naughty model that was trained on all the bad stuff filtered out of the regular Gemma dataset.
>>
>data on this page may change in the future
also at this point just ask for the generational wealth of your whole village and get the a6000. consumer cards are too gimped for anything ml
>>
>>101720416
Anybody thinking NVIDIA will significantly increase VRAM when there's no pressure not to is almost as retarded as anybody thinking that OpenAI will ever release another open source LLM
The 5090 will be 28 GB, the 6090 will be 28 GB, and the 7090 will be 32 GB
>>
>>101720445
Good goyim, pay $2000 for an extra 4GB of vram.
>>
>>101720466
That's cool and all but the model is still 8k. If someone can look at how they did this and create a version for Mistral Large, Llama, etc, then it'll be good.
>>
File: nvidiaisourguy.png (300 KB, 1080x1511)
300 KB
300 KB PNG
>>101720416
It's the future, data is changing, we're gonna be eating good, we all got our thousands of ground floor bitcoins, right?
>>
>>101720524
as if there are any other alternatives
>>
>>101720556
Radeon Pro W6800 32GB
>>
File: 1706498505542593.gif (1.59 MB, 267x200)
1.59 MB
1.59 MB GIF
>>101720524
For that price you can get 3*3090
>>
>>101720573
is it the same architecture as 6800xt? I know that one has good support in ROCM
>>
>>101720517
Wouldn't surprise me if Altman and had a hand in this.
There are two ways to attack open source LLMs. The first is to regulate them. The second, barring that, is to limit the layperson's access to them so that their only choice is to use API only services.
At this point, it's very deliberate.
>>
>>101720573
>amjeet
pass, 2x 3090 are better
>>
>>101720628
>6800xt
>good support in ROCM
Doesn't have matrix core, no WMMA support, no flash attention.
>>
>>101720648
You're overthinking, it's just ngreedia, black leather jacket man also has altman by his gay balls
>>
>>101720524
It's not just the VRAM that matters, the speed is a very important factor too. No one actually cares about VRAM outside of ML.
>>
>>101719075
pfffft
>>
>>101720687
ML is a lot more than LLMs, smaller models have been used in industrial settings for years, and consumer grade GPUs are still good for that because of their performance-price ratio
>>
It will be 24GB.
>>
File: 1695314498385949.png (28 KB, 192x208)
28 KB
28 KB PNG
Stop trashing other boards with your shit >>>/tv/202128053 faggots.
>>
>>101721006
No one here like llama 3
>>
Let me guess, that guy is the one that made the thread so he could have material to criticize "us" for.
>>
>>101721013
I like L3. It seems to be a very good general purpose good-enough standard, even if one only uses it as a comparison reference.
>>
>>101721066
Let me guess, he also likes to post pictures of a certain turquoise-haired character engaging in bestiality with dark-skinned men
>>
>>101721006
No I have to do my job shilling every week
>>
Haven't looked in a few months. Anything better than Stheno 3.2 for RP come out yet?
>>
>>101721161
Yes.
https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1
>>
>>101721161
>>101719988
>>
>>101721178
>>101721183
Interesting. I'll check both out, thanks.
>>
Good Afternoon where is the bitnets?
>>
>>101721330
https://huggingface.co/Green-Sky/TriLM_3.9B-GGUF/tree/main
https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/tree/main
>>
>>101719520
Benchmark idea. Fixed hardware, scripted series of questions. Prompt could start with a story and questions could be sbout that story. Score is number of correct answers before time limit expires with a penalty for incorrect answers so generating random answers with infinite speed won't have a maximal score.
>>
>>101720416
as expected, what a shame
>>
>>101721178
Why didn't your ad show for >>101721161?
>>
>>101720416
enough for flux
we are so back
>>
>>101721357
>Fixed hardware
Measuring flops is better. Fixed hardware is a stupid idea.
>before time limit expires
Time, or even flops limits, are ridiculous. Normalized corrects answers/flops. Closer to 1 wins.
>penalty for incorrect answers
Built-in in the previous point.

However, this will favour correct but short answers. You will want to account for that. 100tokens/1kflop is better than 10tokens/1kflop if both are correct.
>>
>>101720445
Of course not, it should be 32gb minimum
>>
>>101721660
64gb or we boycott
>>
>>101719031
>source: my ass
>>
>>101721660
And it should be only 1-slot, and draw at most 100W.
>>
>4 years at 24GB and all they can spare is an extra 4GB
lol
>>
>>101720556
>>101720628
>>101720654
>>
>>101722016
nobody needs that many see pee yous
>>
>>101722016
Lewd.
>>
>>101722031
but everybody needs that many mem ri chan else
>>
Wikipe-tan card is kino.
>>
>>101722087
>>
>>101722144
>>101722144
>>101722144
>>
>>101722087
Oh damn, is it trained on wikipedia text to imitate the tone?
>>
>>101722168
its mistral large 2407
>>
>>101722130
>ai gf who is educational and sexy at the same time
Once we get the robot body (including cyber womb) bit figured out, that's it for females. 99.999% of all women literally cannot compete.
>>
>>101722087
>that first paragraph
Nice slop.
>>
>>101722130
>whispers conspiratorially



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.