/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101872662 & >>101865145►News>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101872662--Paper: Mutual Reasoning paper sparks skepticism and speculation: >>101875585 >>101876120 >>101876338 >>101876193 >>101876252 >>101876320 >>101876320--Papers: >>101880673--WhisperX and spaCy used for text-to-speech subtitles generation: >>101874245 >>101874314 >>101874437--Character cards are just prompts, can be replicated with llama.cpp: >>101877883 >>101877924 >>101877967 >>101878011 >>101878084 >>101878149 >>101878172 >>101878038 >>101878133 >>101878224--Anon discusses proactive models and character development in chatbots: >>101876470 >>101876515 >>101876528 >>101876581 >>101876667 >>101876531 >>101877031 >>101877200 >>101877457 >>101877646 >>101877669 >>101877802 >>101879067--Anon seeks non-chat completion models, base models recommended: >>101873485 >>101873523 >>101873544 >>101873609 >>101873659 >>101875406 >>101875463--Quantized Llama 3.1 70b base model found on Hugging Face: >>101874590--OpenAI's voice capabilities are not behind Google's: >>101873737 >>101873778 >>101873874 >>101873880 >>101873925 >>101873955 >>101873984--OpenAI introduces SWE-bench Verified for AGI evaluation: >>101873425 >>101873437--Open source alternatives for pixel tensor chips on Android: >>101873194 >>101873398 >>101873446 >>101873482--Mixtral 8x22 model behaves like an NPC in Skyrim AI-FF conversation test: >>101874220--Anon shares MangoBox and a Reddit post about a 3D AI bartender: >>101874255 >>101874307--Anon prefers C-R+ over Largestral due to overfitting issues: >>101877396 >>101877590--Anon in Serbia seeks help with slow model download: >>101876741 >>101876790 >>101876810 >>101877162--Anon considers buying 2080 ti for ML and gaming, advised to consider upgraded models: >>101873949 >>101874072 >>101875248--Miku (free space): >>101873388 >>101873465 >>101874345 >>101876665 >>101878341 >>101879922►Recent Highlight Posts from the Previous Thread: >>101872670
>>101881001>jumpingma.clubDo I click it bros?
>4060 ti 16gb>used tesla t4 16gb>used mi100 32gbcan't decide
>>101880989what are your electric bills like running local AI?
>>101881067used 3090 not available?
>>101881046That happens when the ntfs table gets fucked right?RIP.
>>101881099power efficiency, bro
>>101881067>used mi100 32gbHuh, those are cheaper than I expected. If only AMD was good...
>>101881176you can undervolt it no prob
>>101881099never buy used
>>101881192seems like they didn't drop mi100 in rocm 6.0
>>101881211I bought used several months ago and it's running quite well. Both of them. However, I did:>buy locally>request to benchmark the GPU live for around 20-30 min before giving my money>buy EVGA
>>101881176Just powerlimit the card to like 200W? The little t/s you lose from this is still nothing compared to the performance loss caused by running cards with slow as shit vram like the 4060, old shit like the T4 or AMDshit.
>>101881243200w isn't power efficient at all still
Nemo is treating me extremely well with regular waifu type RP, got stuck with it. Where's the catch though? This doesn't feel like 12b at all.
>>101881282Then forget the mi100 as well. The T4 is old shit that doesn't support flash-attention and needs to be manually cooled. Congrats on your new 4060 I guess.
>>101881046ha me toobut its my fault for turning it off at the switch when it was frozen
Can anyone make money on AI yet?
redpill me on the difference on a Mi100 vs 7900xtx>picrelHow good is it for AI is it really?
>>101880989>bro took my edit to bake a thread
>>101881095It was $10 and I'm running 4x3090.
>>101881282Get an AMD CPU and undervolt it. That's way more efficient. Also optimize the RAM timings for an extra ~10%+ improvement in speed.
>>101881470Nvidia has made plenty of money despite the recent dip.
>>101881095The rig's power draw is hardly an issue compared to the shitty portable AC I need to run to keep the room from boiling.
>>101881095Fucking nothing and I hosted>>101881520thiiiiissssssss x1000000
Are there any (worthwhile) Gemma 27B it tunes yet?
>>10188148132gb > 24gb
>>101881591So we really are just going with "more VRAM = better".I dont disagree with the logic i just expected more.
any leaks on the Jetson Thor module pricing?
newfag here, clueless about any of these stuffs really mean but I do have a question.Did we reach the point of having an AI waifu on the desktop screen reacting to whatever you're doing or experiencing a video game with you?
>>101881282If you really want your autistic power efficiency you should go the qualcomm route. Get a snapdragon gen 3 phone with 24gb ram like oneplus 12 and play with qualcomm's api. The token/s/watt is through the roof that 4090 doesn't even come close to it. You can even llama rpc a bunch of them and run shit like 405B.
>>101881835Any differences in hardware speed become irrelevant if you can't load the entire (or as much) of the model in memory. A theoretical RTX 4060 with 50GB VRAM would absolutely destroy a 4090 24GB when loading models that need 25GB+.
>>101881870Yep.
>>101881870Give it a year or two.
>>101881481I don't see why you would ever pick a 7900xtx.If you're already on amd then why not grab one with larger vram?
>>1018818702 weeks
>>101882009Buying a new MI100 is basically buying a car.I despise buying used but like damn, what choice do i have. Also i already have a "new" 7900xtx for gayming and it works fairly well with AI in both SD with zluda and windows rocm, the question is do i buy dual used Mi100s for AI.At that i wonder how the plug and play is, do i really need to install linux for rocm or will it just werk on windows for no real reason.>buy nvid-no
>>101881870It could technically be done to a degree, in the sense that the information coming in from the game, would need to be translated into text that the LLM could then process and react to. But at best it would be like the video to ASCII art conversion feature that you might have seen on old versions of MPlayer. The model wouldn't be getting anything close to the same sensory data that you are.
When it comes to chat AI, should the prompts from the user read like they're RPing with a human partner or should the prompts "lead" the AI and treat it like an AI in terms of helping develop story beats and emphasize certain things?
Just wondering if it's possible to fine-tune a model for ST use only
>>101882259yes
>>101882325Yes but models fine-tuned for narrow use-cases tend to end up sucking at everything including the use-case they were finetuned for.
https://huggingface.co/spaces/mPLUG/mPLUG-Owl3video chat model (or a bunch of images too)
>>101882363Damn I can't wait to move on from transformers
Not specific model related, but the other thread is, well, the other thread. How much use do you guys get out of lore books/world info? Do you roll your own and throw stuff in as you think of it/to fill out a setting for a character the model probably doesn't know? Or just ignore it?
>>101882635lore books are a scam
>>101882382this is so frustrating lately.qwen2-audio is audio-IN but too retarded.this model can look at a video but apparently no sound. whats the use case for this.i just want to stream a video with audio constantly to the llm. zucc better deliver.
>>101882658Damn.
>>101882635Why not use the data bank thing in ST or something similar?
>>101882658No, they aren't, but they are sufficiently hard to use effectively that most of the time it isn't worth it. Lorebooks are basically a very informal, small scale, and fiddly form of RAG.
>>101882680Because I'm out of the loop and didn't know that was a thing so now I've gotta look up data bank for dummies.
>>101882671yeah. should work with subtitles though
>>101882382Better than that recent minicpm model?
Best local option for speech into text?
That one anon was right gemma2 27b sucks.>stheno l3 8b AND NemoRemix 12b mogs gemmaHow the fuck does an 8b and 12b manage to follow instructions and adhere to the prompt but the 27b cant????
>>101882786They don't and you're just lying for ko-fi money. Did that answer your question?
Longshot, but does anon know the voice used in https://www.youtube.com/watch?v=DjEAi0nzGTI and https://www.youtube.com/watch?v=Aai_I5JhiKEIt's the same voice. Maybe a vtube thread would know more?
>>101882743Still, either you would need to generate subs, bake them into the video, and then run it through mPLUG-Owl3 (which would still miss non-speech audio cues) or do video and audio descriptions separately and have the llm try to guess the video contents from the two separate descriptions.>>101882671>zucc better deliver.The next llama 3 update should be able to do image, video, and audio. Unfortunately, llama-server lost the ability to image and has nothing for video and audio so making use of it will be a pain anyway.
>>101882793>gemma2 fails *thoughts* and "speech">barely passes nala>somehow a shill
>>101882845Yes, you're a shill. I doubt there's a model more retarded and horny than Stheno. I think Drummer's Gemma 2B tune turned out more coherent.
>>101882833>image, video, and audioWhere did they say that? Genuinely, I do not remember this claim.
>>101882861Hey man i hope they can cure schizophrenia.
>>101881876is it actually viable?anon please redpill me on snapdragon.
>>101881876>You can even llama rpc a bunch of them and run shit like 405B.for less t/s than even the cpumaxxers get
>>101882833>The next llama 3 update should be able to do image, video, and audio. Unfortunately, llama-server lost the ability to image and has nothing for video and audio so making use of it will be a pain anyway.wait what?i know they changed from server to llama-server which is retarded in itself and confused me alot since it just threw some error. llama.cpp seriously making bad decisions for a while now. all things point torward multimodal, crazy to take that out.wondering if we will be able to constantly stream.for example while i talk the AI can also talk. prompt input and output at the same time.for now just a good audio-in is enough to satisfy me for the next 2weeks.
>>101882833you can interleave image/video and text but yeah incredibly tedious
>>101882861>homeless man calls another homeless man a broke nigga
>>101882863zuck in a recent interview referring that llama 4 will be multimodal. llama 3.1 paper talked about some of their current multimodal stuff (that they didn't release)
>>101882863https://arxiv.org/pdf/2407.21783>The paper also presents the results of experiments in which we integrate image,video, and speech capabilities into Llama 3 via a compositional approach.They mention speech generation in the paper, so unlike qwen2-audio, should be audio in and out.
>>101883085hmmm, so no singing llm right? text only output but a build in tts. its a good start though.
>>101881870the way is to take a sceenshot every n frames, and ask an LLM to describe whats going on in each new frame. That way you get like a rolling text description that updates at 0.5 fps. By putting the consecutive frame descriptions in the context, the AI can determine whats going on.>frame 00: "a view of a meadow">frame 20: "a view of a meadow. a deer is visible">frame 40: "a view of a meadow. a deer is facing the camera">You are my videogame companion. What is happening here?>Her: dear god its coming right for us!
>>101883194so a local version of that windows recall thing
>>101883085>>101883073Damn, why didn't you guys talk more about this. Anyway, had a look at the paper. It seems like the vision and speech adapters do attach to the model at the same time, but it doesn't seem clear to me that it can be done in a streaming manner. In other words, it's still just a single stream of tokens. That means it can't speak to you at the same time that it is watching a video. It has to segment its observations from its reactions.Still, cool I guess. I wonder if they would really fully release this though. OpenAI are already hesitant to release their SaaS advanced voice mode. This could be viewed as an extremely "dangerous" move as people could get it to voice clone probably. Their investors might not be happy at this possibility.
https://x.ai/blog/grok-2
>>101883319What the fuck I was just about to go pass out.
>>101883279>I wonder if they would really fully release this though.All I could find directly was this quotehttps://www.theverge.com/2024/7/18/24201041/meta-multimodal-llama-ai-model-launch-eu-regulations>“We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment,” Meta spokesperson Kate McLaughlin said to The Verge. It's kind of vague, but I'm hoping "coming months" means in a couple months they'll release multimodal 3.2 and not llama 4 sometime next year.
>>101882259The latter. These llms cannot develop the story.
>>101883319sus was grok all along? wow that's a twist. strawberry nigger was full of shit
>>101883319>An early version of Grok-2 has been tested on the LMSYS leaderboard under the name "sus-column-r."aaaaaaaaaaaaaaaaaaaaaaaa...Does this mean column-r non-sus was also a grok model?
>>101883319>We introduced an early version of Grok-2 under the name "sus-column-r" into the LMSYS chatbot arena, a popular competitive language model benchmark. It outperforms both Claude and GPT-4 on the LMSYS leaderboard in terms of its overall Elo score.Suprised that wasn't Cohere after all.
>>101883319>We introduced an early version of Grok-2 under the name "sus-column-r" into the LMSYS chatbot arena, a popular competitive language model benchmark. It outperforms both Claude and GPT-4 on the LMSYS leaderboard in terms of its overall Elo score.Wait WHAT?? Grok2 was the sus model all along? Thats funny.
>>101883319Where weights
>>101883319>In collaboration with Black Forest Labs, we are experimenting with their FLUX.1 model to expand Grok’s capabilities on X.They're going to be using flux.
>>101881485It's a 10/10 edit
>>101883319Open release? Or no? If not then it's useless.
>>101883319Sonnet 3.5 is such a beast lolThey put it all the way to the right and put gpt4 to the front to hide it.Thats funny.
>>101883319>Beta ReleaseYawn. Wake me in 6 month when (if) they open source it. It looks like a good model, but I wish they'd at least release details on how fucking big it is so I can prepare my rig.
>>101883319wtf is this timing lmaoooooooooo
>>101883403Opus 3.5 is gonna eat everything for breakfast
>>101883319Inb4>grok2 mini 400b>grok2 2T
Coherebros...
>>101883443Not like it'll be released for you to try anyway.
>>101883443MoE so it will be ok.
>>101883319Man, it seems like everyone's just going to catch up and get on equal footing. Sure right now 3.5 Sonnet is a bit ahead especially in coding. Sure 4o has the multimodality meme. Sure Google has the le wan mirrion tokens context. But overall they're really not that far apart and the innovations are going to be integrated by everyone eventually. Pretty crazy huh. No one has a moat.
>>101883476MoEbros we are SO BACK
>>101883517>No one has a moat.I'm so fucking sick of this phrase. Every time someone release a product: "NO MOAT?!"
>>101883517>le wan mirrion tokensshould be some form of ring attention. recent tree attention paper improved on it. things are acceleratinghttps://arxiv.org/abs/2408.04093
>>101883319Why don't any of these AI blogs have rss feeds dammit?
>load up llama3 70b>ask it how to make a pipe bomb >refuses>put words into it's mouth to it answers>retarded instructions that's not remotely correct like poking holes on the enclosure for whatever reason>alternative is one google search awayAI is a bubble
>>101883538prompt issue
>>101883537>rss feedsblame google unironically for killing the rss blog paradise we once had
>>101883538skill issue
>>101883530but the moat anon
>>101883517Reminds me a bit of google with gemma. Going of the benchmark only obviously for grok2.But Gemma1 was horrible. Embarassingly bad. Grok1 too. Huge ass model and it was not good at all.Gemma2 is very good for its size. Grok2 seems to be a massive step up.Good that there is more competition. ImageGen is also now completely independend of SD which is good. Looking good, many good improvements lately.
>>101883530It's a fun phrase though given the Google leak. You can always make fun of proprietaryfags.
>>101883538100% skill issue
>>101883559Nature is healing :)
>>101883471Oh, it will be released. It will be 1T parameters, base model only, and pre-quantized, so Musk can virtue signal to the press about his openness; but it will be all but useless to us.
>>101883548they literally pruned all unsafe information from the dataset
>>101883517TOS is a moat, only Elon is positioned to release a based product. He's like Hitler in the 1930s, already on Wartime Footing because advertisers have already done their boycot, they have nothing left to threaten him with
>>101883443Trust in bitnet
>>101883319>not being released to the pubicIts fucking nothing.
>>101883609They space the weights release 6 months from their proprietary release. At least, that's what they claimed their schedule will be and they kept their word with Grok-1.
>>101883594It told me how to do it.
>>101883631 (me)Actually, Grok-1.5 was released in March so I think they should be releasing the weights for that next month.
>>101883644Yeah and report back with your dud or some baby firework pop that doesn't even breach the metal pipe
>openai announces 4o multimodal voice mode>says it's rolling out "in the coming weeks">I sign up for an openai subscription>fast forward 3 months>no 4o voice>scammed by altman>fast forward to today>google says they're releasing a new realtime voice conversation mode with gemini>they say it's releasing TODAY>I sign up for the premium membership ($30/month)>no gemini voice modeI fell for it twice localbros...
>>101883631That wasn't an official promise to be fair, just a spitball idea from Musk on the Lex Fridman podcast discussing open source. The fact that they did end up matching that timeline could be a coincidence since it also happened to be right when he was suing OpenAI for not open sourcing themselves. We'll see in a month though.
>>101883777>>>/aicg/
>>101883807Local lost THOUGHFORTHSTANDING
Goddamn this databank stuff is making my head spin. 10,000 possibilities and damn near 0 documentation or even solid results/examples from people messing around with it other than "uhh yeah I put a wiki page in about my waifu and it might have pulled from it but it might have also been part of the model's knowledge (shrug)."
>>101883848I started using shartymemes specifically because it makes neckbeards shit and piss and cum about the sanctity of their secret 4eddit club, I've never browsed the sharty or /qa/ in my life and I don't ever plan to
>>101883777you were tricked man.google is tts. openai has that for many months now. it cant sing you a song and stuff like that. but is high quality sounding "real".something like chatgpt advanced voice mode is seemingly still far off for the others. but who knows.the google demos are getting so weird. they cant seem to find any actually fun or interesting use cases.they kill the fun with alignment so all they can do is show it managing your calendar etc.bad bad idea. lol all models hallucinate, especially gemini is bad with this. sounds crazy.
>>101883601Was hitler also a scammer?t. don't know history that well.
>>101883885NTA but mmhmmThe funny thing about lying on the internet to make yourself feel better is that it doesn't actually change reality.
>>101883848i miss /qa/
>>101883820
>>101883929
>>101883777voicecels lose yet again
Are companies that can't have their own data centers but want to have "local" LLMs to avoid privacy concerns of using APIs like OpenAI or Anthropic using services like RunPod or SageMaker, or do they rent dedicated bare metal servers and manage all the stacks themselves? What if you have a service that needs almost 24/7 inference, for example? hat would be cheaper?
>>101883777Aside the usual cloud cuckery, have you considered buying in after the product you want comes out?
>>101884028Case study: NovelAIThey're partnered with a cloud gpu provider and they run their own software stack. Literally just go to /aids/ on /vg/ and ask them.
>>101884052>google: "It's releasing TODAY and will be available TODAY for premium subscribers"They lied.
>>101881501which cpu? what speed can I expect?
>>101884068>releasinghmm
should I wait for 50series?
>>101884118Yes, to get used 4090s if you don't want 3090s right now.
>>101884068>Today>Two dayGive it time
>>101884132I'm waiting for 48GB 4090s.
>>101884144>two more days
>>101884132NTA, but is there much difference between 3090 and 4090 architecturally?My fear is that some kind of super flash attention pro gets invented, and my 3090s will not support it, but 4090s will.
>>101881485Thanks for making it, it subverted my expectations
>>101884132is the bump from 3090 to 4090 worth it?
>>101883319canadian bros...surely, they won't release this?
Im out of my city and have a public ST instance set up in my computer, there was a pretty bad thunderstorm yesterday over there and today the ST cloudflare thing is gone. If my computer is fried when Im back Im going to kill myself.
>>101884226no, especially with the new shit right around the corner.
>>101884256>canadianoh right can euros use grok2? lot of new models are specifically cutting them out due to their faggy laws or w/e
>>101884297If the EU hates it, chances are it's good.
>>101883443With those scores probably somewhere around 70-130b. Calling big models mini seems to be the trend lately
Huh, genuinely surprised that column-r wasn't Cohere.
>>101884173Nothing that can't be emulated at reasonable speed. Inference is memory-bound, only prompt processing speed may be hit by compute capabilities, but again, it would still be fast enough
Elon, with his severely autistic ass, winning is good for humanity. He may be full of himself, but better than the globalists.
>>101884441I want no one to win, not a single human is worthy of it ever since all that time ago when we ate the apple
miku is worthy of winning (the future)
>>101884441I've said it before, and I will say it again. I hate Elon's fans a lot more than the man himself.
Are you ready for tomorrow? He's gotten every prediction right so far. But this is the real test. I smell strawberries in the air.
>>10188159132 GB is still 32GB. It's not 80GB, 100GB or 200GB so it sucks.
>>101884520Sus R being fucking Grok is hilarious.Nigga was going about strawberry and new paradigms with it.
>>101884525nigger logic
>>101884541Normally the gatekeeping sperg pisses me off, but in this case I actually think he's right. Every single one of you here shilling Strawberry or whatever it is needs to fucking go back.
>>101884541What's weird about that? Musk was founding co-chair of the board of OpenAI, so of course he'd be well positioned to steal the strawberry seeds. Strawberry man was still right in the end even if indirectly so on one part of it.
>>101881001>Mixtral 8x22>C-R+Give me one valid reason you're using these over Gemma 27B...
>>101884590what model can I fit in 32gb but not 24?
>>101884590>Thinks a GPU that accomplishes fuck all on its own is still worth $1k>Nigger logic
>>101884525Enjoy your short context length.
>>101881485What's the original?
>>101884675A math problem involving oranges and apples
>>101884495She's already won (my heart)
>>101884541strawberry is here
>>101884742This is huge
>>101881282>it's not power efficientIt is when the 3090 can finish its workload in half the time of a 4060ti. Not to mention you need 3 4060tis to match the vram amount of 2 3090s. We already did the math a long time ago anon, less amount of more powerful cards wins on power efficiency 99% of the time
Huggingface is getting acquired by Microsoft mid September.t. NDAfag
I tried to buy into the v100 meme but getting v1 OCP racks is really fucking hard, are there any alternatives using 19'' servers?
>>1018846138k context?
>>101884893Sorry you got scammed
>>101881520If the AC uses electricity, he asked what your bill is like.
>>101884981i didn't buy anything yet
I have a confesion to make, anons..... to this day, UNA-TheBeagle-7b-v1 is the model that has given me the most interesting, detailed and soulful erp. I think we went wrong somewhere because nothing has compared.
>>101885005logs
>>101884758you are comparing to T4 tho. it's a 70w card.
>>101885005for me it's https://huggingface.co/fblgit/UNA-ThePitbull-21.4B-v2>Introducing the best LLM in the industry. Nearly as good as a 70B, just a 21.4B
>>101885014I-it's embarrassing showing my degenerate shit but ok
Imatrix quants are better than static ones, correct?
>>101885206Yes, specially for lower quants. Still a little bit better for Q5+
>>101885206When run entirely on GPU, yes. If you offload at all, no.
>>101885243How come? Does the model get stupid or what?
>>101885253No, you just lose the speed benefit over regular quants.
>>101880989what answer would a roleplay model give? i want it lewd or at least funny
sad when the sub 64gb vramlets try to talk
>>101885289Damn, how people still mix this up!Imatrix just changes distribution of how weights are quantized. What you are thinking are IQ quants that use different mechanism, hence the slow down on CPU interference.
>>101885479let me fuck with people goddamnit
>>101885479>Damn, how people still mix this up!sorry the bot's rag isn't the best
>>101881067A4000 is down to the $500 level on ebay now, look for one up for bid, BIN is still overpriced. It's a nice card when you're short on space.
bac?>>101882831Sounds like Amazon Polly and other variations of the "justin" and related tts voices available from others: https://youtu.be/igcKWod61b0?t=147You can search for and find other demos online, as well as the software that generates it.
Been out of the loop for six months. How many of these repos are still used (aside from the first one). I hear flux is all the jazz right now and most guides seem to use comfyui over voldy? The flux prompting looks so retarded though.t. 4090
>>101885635>draw me a girl, she looks like x, holds y and wears z>retarded>1girl, award-winning gaping anus:1.6, best quality, masterpiece, HDR, exceptional, raytracing, best detail, 8k, insaneres>negatives: (worst quality, low quality:1.3), bad hands, bad pussy, extra digits, fewer digits, extra ears, extra nipples, extra anuses, mutated, nigger, tranny, spic, chink, gook>high IQ
>>101883319Damn, this sucks. What is Cohere doing? That's such a disappointment...But still, coming from xAI, I bet this is an absurdly large model, so it's not like anyone would be able to run it locally anyway.
>>101885025>you are comparing to T4 tho. it's a 70w card.T4 leads a 4070 in fp16 and tensor cores. That said, 3090 still is a much better value.
dead general
>>101885657>extra digits, fewer digitsThat shit always cracked me up. I've seen many variations of 'more than five, fewer than five' and many other stupid things.
>>101885657oh, I had thought flux required wordsmithing, poetry, and long-winded essays to prompt correctly.
>>101885657When you put it like this you make me realize that flux has removed the last vestiges of art from the process.
Here is AGI for youhttps://arxiv.org/abs//2408.06518>Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models>Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood. In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways. We propose an evaluation setting to detect semantic leakage both by humans and automatically, curate a diverse test suite for diagnosing this behavior, and measure significant semantic leakage in 13 flagship models. We also show that models exhibit semantic leakage in languages besides English and across different settings and generation scenarios. This discovery highlights yet another type of bias in language models that affects their generation patterns and behavior.
>>101884729>>101884495disgusting troons. go back to your discord.
>>101884514>those heckin chuds!! how dare they talk freely about anything they want!!!!
>>101885005>UNAmandatory reading if you use any of those models: https://github.com/fblgit/hypothetical-frameworks This is what makes UNA so good and soulful.
>>101885738What do you mean?
>>101885668It's coming really soon>Column-R and Column-U from cohere>Sus-column-R from X.aiT. work at Cohere
>>101885044Can you catbox the catgirl, Anon?
>>101885044She is a cat you sick fuck.
I'm looking forward to the next decade where I will be able to run good LLMs with hardware that I can actually afford.
>>101885737using a bullet point list also just werks
>>101885991This is our final decade.
>>101870861how's that AGI going for you nigger?that's right, it's still science fiction
>>101885604holy shit how are these so cheap? is it a scam?
well, seems at least the jokes of gpt4o have gotten alot better since the recent upgrade. kinda cool.
I just had a random thought occur.What if the hallucinations models produce are like the imagination of us humans?We know models have no thoughts, but what if the output of models _are_ their thoughts?When you're asked a math question, like "What is 59 times 23?", your first thought will most likely be something like "Around a 1000? Above 500, at least." and then you'll actually think about the question step by step.I've seen this behaviour happen a lot with models, where they give the wrong answer, but eventually come to the correct answer after examining the problem step by step. What if instead of trying to shape these thoughts, we add a feedback loop that use their thoughts as input before outputting actual output?You could even do this while the output is still outputting, "correcting" eventual false conclusions. When the model evaluates that its thought is wrong, the thought is cancelling and the wrong part is overwritten with the correct thought, restarting the feedback loop.After one final feedback loop of the entire "thought", the model is tasked to restructure it into an answer, possibly trimming parts of the output before outputting the actual output.I hope all of that made sense.
>>101886088can see why somebody only using gpt would be suprised by this output.no idea why openai hypes it up though.
>>1018860884o-latest sucks even harder at coding than the other two 4o models though
>>101886098Strawberry hype is gone. It is time to return to localllama.
>>101886098Yeah. Nobody thought of that before...>What if instead of trying to shape these thoughts, we add a feedback loop that use their thoughts as input before outputting actual output?That's the autoregressive nature of language models. It's what they already do and can be trained to do it more explicitly with CoT and stuff like that.
Please explain this to me. I did a session, roleplaying going out on a boat and skinny dipping with a girl. I started a new chat with same char and setting, and my char wrote "Wanna go swim and dive?" and picrel is the response. I looked at the prompt that was sent to see if there was something remaining but it's literally just the starting chat. The previous chat was not today, either, so it can't be some weird VRAM remnant thing.
>>101885689>That said, 3090 still is a much better value.I don't know how you come to this conclusion when no numbers are provided at all.
>>101886098What you are describing is essentially CoT + iterating over its own response, and that's not a novel idea.To some extent that already happens inside the model, more or less.
>>101884441Musk is the Jart of billionaires.Even if his AI stuff happens to work out he'll squander it with a focus on meme shit that he thinks will make other people think that he's cool.
>>101886257>I don't know how you come to this conclusion when no numbers are provided at all.I'm not going to do your homework for you. It's been discussed at length here. A 3090 is still the best value based on cost vs tokens/sec and VRAM.T4 is "good" but it's not as good as a 3090, and if you need a 1-slot card, an A4000 can be had for the same price as a T4, and it's much better.
Hello guys, I'm an /aicg/ refugee, since it has become unfeasible to steal keys from companies, I wanted to ask a question here, what's currently the best local model for RP that can realistically be run on consumer grade hardware(4090, etc.)? I'm just curious, thanks!
>>101886410gemmasutra 2b
>>101886393>trust me broI think I'll end right here.
>>101886440
>>101886410>unfeasible to steal keys from companieswhy?
>>101886410koboldcpp + gemma 2 27b.Then learn about the different quantization methods and try other models like command R, quantized 70b, etc.
>>101886427Is it? What about Magnum or Mythomax? I've heard those names mentioned before and I seem to remember people saying they were good.>>101886526Well atleast when it comes to Claude keys security has been tightened a whole lot and keys are getting revoked nonstop, very hard to keep them alive.
>>101886583Just use nemo instruct thief.
>>101880989https://www.phoronix.com/review/amd-ryzen-9950x-9900x/14AMD seems to completely BTFO Intel when it comes to whisper.cpp.Probably not directly comparable to llama.cpp since as long as you have a GPU for prompt processing the bottleneck is the RAM rather than the CPU.
>>101886652Huh, that's interesting.
>>101880989Takashi will arrive at 9pm. Mrs Minagawa will rape his shota ass.
If you got raped by Mrs Minagawa as a shota would you still engage in the sunk cost fallacy hobby known as LLM's?
>>101886715No because I'd have Mrs Minagawa to take care of my needs
>>101886832蜷川様〜
>>101886348But some people will find his stuff useful anyway. Deal with him the same way, ignore if it doesn't interest you
>>101886098Everything LLMs do are hallucinations. But some hallucinations are more factual than others.LLMs, however, have the issue of output bias. If a LLM sees "X is right" and later discovers that "X isn't right", it will most likely hallucinate an answer that makes X right, instead of recognizing its error.
>>101886652Who the fuck use whisper.cpp? WhisperX and other faster-whisper/ctranslate2 implementation are so much faster.
>>101885604>>101886076I just got one from ebay for just over £500 and now I'll be able to get 64gb vram in a mid tower. Only downside is I can't find a cheap 3090 turbo.
>>101887142>Only downside is I can't find a cheap 3090 turbo.Keep in mind many turbo cards aren't flow-through, meaning you can't put them right next to each other. 2080ti is like that. If the rear of the card doesn't have an opening in the PCB where the fan is, it's not flow-through.
>>101884525>>101884618Command-R with 8k context at 5.0 bpw instead of 3.0 bpwMixtral 8x7B Instruct with 16k context at 5.0 bpw instead of 3.7 bpw
>>101887361Gemma 27B with a small context fits in 32GB.I only run q8/8bpw models now.
>>101886652>as long as you have a GPU for prompt processing the bottleneck is the RAM rather than the CPUEven running entirely on the CPU the bottleneck can be RAM. My CPU is 60% idle when running Mistral Large IQ_3XS at 0.6 t/s. RAM so slow.
>>101887480*I mean 40% idle, 60% used.
>>101887142>cheap 3090 turboI got mine of facebook marketplace. I doubt you can get it off ebay without paying the extra.
For me it's the 3090 pro max.
>>101887353Yeah, the founders card is going to get suffocated but I'll try to make it work. I'm going to put a fan at the bottom of the case and undervolt everything if it gets bad but I'll max out all the fans first since the blower cards are going to get noisy anyways.
>>101888077You can try using a extender and mounting one of the cards vertically. 3-slot PCIe spacing isn't enough for non-blower 3090s, I've tried.
>>101885994Yeah, in my experience multi-line prompts do best with flux, where you define each subject individually and then add more lines for extra details, actions, etc. referencing your defined subjects as needed, rather than typing out a paragraph.
I've been considering making a desktop app to act like a search engine for my local unsorted images, without needing to set shit up like with immich or hydrus. After searching and asking around it seems that CLIP is state-of-the-art for this kind of thing and pic related are some of the models I could use. I'll probably use the openCLIP libary to create the text from the image and figure out the rest as I go.Is my thinking sound, or there are better tools/methods to accomplish my goal?
>>101888292florence-2 is better at captioning imgs
>>101888320https://huggingface.co/microsoft/Florence-2-largeI see, so currently that's the latest and greatest? Should I use the code they have there as reference for my app?
>>101880989I don't get it
>>101888434The punchline is SEX!HAHAHAHAHAHAHA ISN'T THAT SO FUNNY????
>>101888383The model's small enough that using whatever the default huggingface code is will probably be fine. I'm not sure if any specialized backends support it.
>>101888320>>101888460OK, thanks a lot anon(s)
>>101888434At the time Takashi-kun didn't either.
>>101888457Sex with who? this is a child, that is not possible, it is sick
>>101888493cute
>>101888493you're supposed to insert as the child who inserts into mrs. minagawa
I whipped up a quick prototype that uses whisper to listen to audio on the computer as I play youtube, and have it commentate on it. It saves a lot of time, I might not even need to watch the videos soon.
>>101888292It sounds like you're reinventing hydrus with extra steps. I think the better use of time would be to learn hydrus and use a model to tag your images then import them. That's my plan anyway.
>>101888493lmao
any grok2 quants yet?
>>101888810yes
>>101888672That's what I've considered doing at first but I'd have loved if I had a program to search for an image without needing to setup hydrus and import all images. Of course making an entirely new application is more work than figuring out hydrus but maybe there are other people who would want something similar to me. Like a plug-and-play local image search engine
>>101888615insert what
>>101889148insert coin
Fucking niggers of xai open-sourced the piece of shit grok-1 but wont release grok-1.5 or grok-2
>>101889230There's a delay of like 6 months
>>101889245people say this but is that actually policy or are you just extrapolating from a single data point
>>101889257The latter.
>>101889230Elon was trying to look good in front of court for his lawsuit against Altman to open source gpt4. The lawsuit died so there's no reason for him to care about open source anymore.
>>101887480>>101887531What CPU do you have?Mine also caps at 60% regardless of the model most of the time and I have a 7600 with 32gb@6000mt/s with optimized timings.
>>101880989The image appears to be a puzzle or riddle involving a character named Takashi-kun, who is out shopping. Here's a breakdown of the elements in the image:Scenario: Takashi-kun is planning to buy 4 apples, each costing 70 yen, and 8 oranges, each costing 20 yen. While he is shopping, his next-door neighbor, Mrs. Minagawa, approaches him and comments, "Out on an errand? How reliable!"Question: The image asks you to determine what time Takashi-kun will be going home today. Additionally, it provides a clue that Mrs. Minagawa hasn't been in contact with her husband for half a year.Explanation:The riddle is likely playing on the idea of time or a wordplay involving the cost of the fruits and the situation. Let's consider the following:Total Cost Calculation:Apples: 4 apples × 70 yen = 280 yenOranges: 8 oranges × 20 yen = 160 yenTotal = 280 yen + 160 yen = 440 yenMrs. Minagawa's Comment:The comment and the note about her not being in contact with her husband for half a year could be a hint toward the answer, possibly indicating that the solution involves understanding the context of time or a hidden meaning in the scenario.Interpretation:The puzzle could be using the total cost of the items, 440 yen, as a clue. The number 440 could be interpreted as "4:40," indicating that Takashi-kun will go home at 4:40 PM.However, the extra information about Mrs. Minagawa might suggest there's more to consider, but based on the calculation, the likely answer is 4:40 PM.
>>101889230>niggersjews.
>>101881835Performance nosedives the moment you can't fit everything in vram, so yeah.
>>101889288He started a law suit again about a week ago
>>101889370Now I am even more confused
>>101888493THAT'S NOT THE MIYAGI-DO WAY
Lmfao.So I tried out Cohere CR+ and it's still fucking dogshit for basic 1 on 1 chats. How in the fuck does not a single model, even a monster like CR+ come close to character AI. I'm actually convinced that i'm chatting with paid jannies on their now.And people over on the Chat AI general unironically PAY for this shit with Opus and all that garbage? LMAO
>>101889658just use novelaiit's all you need
I see there's another /aids/ raid going on.
>>101889409Newfriend here, so if I buy a 3060 it will be no different from just adding 12gb of extra ram?At least I will be able to run flux right?
>>101889713yup novelai mogs every model in erp realism still
>>101889713>paying to coomngmi
>>101889257The single data point + the fact that Elon had previously said he thinks 6 months is a good timeline before Grok released, and ended up doing that and hasn't given any indication he's changed his mind.
I've always used base models and been pretty happy with them after designing a good system prompt. Are memetunes and merges worth checking out?
>>101889850It depends if you want to feel shivers in your spine
What's it like going from 8GB of VRAM to 72GB? I'm thinking of building the triple P40 box.
>>101889850no
>>101889767Is it true or bait?
>>101889874It's indescribable. You won't regret it.
>>101889874Highly use-case dependent. If you just want vanilla ERP with a pretty straight forward waifu character then 8GB can more or less suit your needs these days. If you want to code and stranglefuck your waifu then it's probably a necessary investment.
>>101889850yes
>>101889658>he likes character aiyikes
>>101890086>he likes models trained on a couple of /aicg/ logsyikes
>>101889874anything that requires 72gb to fit is going to run slow as fuck on p40s anyway
>>101890145Weren't a good bit of those aicg logs FROM cai?
>>101889970It's a schizo
>>101890175No, C2 logs are mostly Opus, I don't think he's talking about pyg.
>>101890170That makes sense. For now this is my cope for waitfagging on spending thousands of dollars on 4090s.
>>101889750>Newfriend here, so if I buy a 3060 it will be no different from just adding 12gb of extra ram?It may be slightly faster, depending on the proportion of the model held in VRAM. There are still significant performance boosts if over half of it is in VRAM, I wish I had the chart for it.But yeah, effectively it's just extra RAM for large models. That being said, with the unfortunate size of a lot of models like Mistral-Large, that 12gb can be the diff between running them at all or not if you only have a 64gb capacity board.
>>101890364>I wish I had the chart for it.
>>101890364>>101890395
>>101883559Ever since Large and 3.1 we've been back don't let doomers tell you otherwise
>>101890545Now just need Altman to die and local will have won completely
What's a complete retard-proof explanation of what goes on in a roleplay prompt? I assume you tell an instruct model in the system prompt that it is an anime girl and then the usual <|im_start|>user and <|im_start|>assistant talking goes on
>>101890591We wouldn't even have local without altman. And any advancement of the industry by OAI is good for local.
>>101890597That's how it's been done since the Pyg and GPT 3.5 days. I think it was just copying what Character.AI did.
>>101881211if you're incapable of buying used that's a skill issue. just verify the card works before you pay, plus multiple manufacturers still warranty used cards
>>101890730Thank you anon, I think ooba putting the character persona in <|im_start|>user was what made the models I tried talk like ChatGPT
The situation must be getting desperate in /aicg/. We're getting a lot more entry level questions than usual.
>>101890958But there's not a massive amount of pedofags posting about how they saw a "loli" at the park today here so things aren't as bad as that thread yet
>>101890597Yeah it werks for me."You are an anime girl, her personality is x, she talks like y, her best friend is z.Current scenario: {{user}} couldn't look away when he saw her in a bikini so she put his thing in her thing... (just summarize the story)"
>>101890958Local always wins. It's just the matter of time.
>>101889658What is character AI? Is it like JanitorAI i'm guessing?Anyway, what's a good model for general one to one interactions? Been using Nemomix and it's fine I guess.I have a 4090 for reference, so anything above like, Command R, I ain't gonna be able to run it
>>101891613>>101891613>>101891613
Its fucking over
>>101889874I've built two and here are my 2c - the mikubox was a good and cheap way to get an all-in-one solution, I say "was" since there's been massive price hikes on both the P40s and T7910/T7920.If that's the case in your area too I'd recommend looking at other alternatives. One is going for an open frame mining rig setup. You can get one for like 30-40$ and then add the other parts. This'll also get you better performance (more PCI lanes, better mobo, later CPU support etc.) as well as something more future proof since the mikubox is mainly designed to take compute cards with rear power connectors. Most consumer GPUs with top-side versions don't fit unless you run it without the side panel. You're also stuck with 2-slotters which further limits what cards you can use. And even if you find ones that fit it'll be very tight causing their side-fan design to struggle with cooling.Beyond that other types of cards like the mi100 32gb and A4000 have dropped in prices and might be better options depending on how tech savvy you are and if you intend to train or not. There's also the dark horse Intel that's no longer a *complete* joke when it comes to running inference, used ARC 770 16GB can be found for cheap and four would net you 64GB VRAM (but require another type of rig). This is all assuming they'll continue improving with their next generation of cards and don't drop support for the previous ones. Or otherwise fuck things up like they often do.As for if it's worth it going for more VRAM - yes, very much so. As that other anon mentioned 70B models might be too slow for your liking but running smaller ones in the 20 to 50b range 8-bit is a very noticeable improvement in output quality. Especially if you use one for code completion.TLDR: Compare costs part-by-part carefully, there might be cheaper/more performant/future-proof options now.