/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101081984 & >>101069457►News>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101081984--VNTL Leaderboard Update: GPT-4o Edges Out 3.5 Sonnet, Command-R+ Rises: >>101087721 >>101088034 >>101088551 >>101090846 >>101091915 >>101088740 >>101088070 >>101088470--Simulating Emotions with Integrated Computational Model of Appraisal and Reinforcement Learning: >>101090073--Mixtral Still Best for Quality/Speed Margin on 24gb VRAM Systems: >>101083202 >>101083271--CPU vs GPU Bandwidth: Are CPUmaxxxers Right After All?: >>101087340 >>101087490 >>101087638 >>101087902 >>101087583--Anon's Quest for the Perfect Quant+Inference Server Combo: >>101082958 >>101083121 >>101083208 >>101083276 >>101083328 >>101083787 >>101084117 >>101084306 >>101084747--Testing Karakuri Chat's Toxicity and Offensive Language Generation: >>101086865 >>101086929 >>101087181--Sonnet 3.5 Surprisingly Generates Working Code for Werkzeug Python Server: >>101084483 >>101084530 >>101084604--Precautions when Ordering Gigabyte MZ73-LM0 with AMD EPYC Bergamo Processors: >>101083080 >>101083668 >>101084505 >>101084300 >>101085073 >>101085195>>101085453 >>101085515>>101085587 >>101085619 >>101087993--Running LLaMA 3 70B on a Single 4GB GPU with AirLLM: >>101082164--Mikubox Upgrade: Diminishing Returns?: >>101088802--Intel's Upcoming Processors to Shake Up the GPU Market: >>101088891 >>101088995 >>101089068--Exploring Customizable Response Formats for Large Language Models: >>101090629 >>101090695 >>101090845--Current Local LLM Status: Meta, Mistral, DBRX, Cohere, and TIIUAE: >>101087844 >>101088705 >>101089151 >>101089215--AI Models Fail to Meet the Anime Character Challenge: >>101084936--Turbocat's New Model: LLaMA 3 Turbcat Instruct 8B on Hugging Face: >>101082832 >>101082906 >>101083355 >>101083535 >>101084750 >>101083498 >>101083559 >>101083662--Miku (free space): >>101084936 >>101085298 >>101086061 >>101086175 >>101086831 >>101087433 >>101088471►Recent Highlight Posts from the Previous Thread: >>101081988
>>101094602hello /lmg/
>>101094655hello miku
what do you guys use language models for?I like to play around with giving them different kinds of reply/memory logic. In the picrel the bot is on a timer. After a message is sent, it checks to see if it should reply again or not.
>>101094602I've been trying out magnum opus, any anons have sampler settings to recommend for it?
>>101094610>AirLLMI might be insane, but I think I remember that from a while back.Anybody tried running that?How hard would it be to jerryrig a python OAI compliant server using the sample inference code?
>>101094872Without reliable function calling API endpoints, nothing meaningful to be honest. Occasionally have it generate short stories to fap to. Ask it to give me a summary of a concept, but that's it.
>>101094964>Without reliable function calling API endpointsi'm using ollama to make json outputs with true or false for my use case and its pretty reliable. i always get a true or false but sometimes the llm doesnt properly follow the prompt and will say false when it should be true
can your models do this?
>>101095070llama3: i cannot create this pee pee poo poo, im not gonna bite or whatever, raycism is le bad, nignogs are le good even if they are killing everyone around them
So what's the most "peak AI" card out there? Like you're trying to show someone how cool AI can be, and that's the card you use to mindblow them. Of course, paired with a sufficiently good model though.
>>101095070You're trying too hard to fit in.
>>101095198no one cares, fuck off
>>101095184>cardIs that all AI is to you?
>>101095184no such thing, all AI models are censored to some extent, you can't have fun or "peak AI" card.
>>101095184Bitch control app is always my goto
>>101094878Come on anon bros, help a coomer out. Good Sampler settings for magnum opus or lets just say Qwen 2 72B Instruct? I saw the Nala anon having decent logs with Magnum a while back, nothing real special, but I'm hoping to get a bit of variety from my go to Miqu.
Do you people still call this those statistical models AI? Why?
>>101095632because they fulfill the definition of an AI, regadless on how it works inside?
>>101095632Because language is descriptive, not prescriptive. The common use of the word AI is now used to refer to the implemetations of these statistical models, and thus it is what we use when discussing those models.If anything, it is the researchers that need to find a new word to describe what AI used to describe.
>>101095412I've been getting good results with a simple temp 1, min p 0.08, freq/pres pen as needed setupas usual with samplers I think there are a lot of setups that will work fine, my one meaningful piece of advice is do not crank the temp with magnum, it's not overbaked and doesn't really need it. I noticed a lot of diminishing quality the further I pushed the temp above 1 because the model kept getting pushed down schizo nonsense routes that really degraded the quality, especially with dialogue. you get a pretty good variety of responses on rerolls even at lower temps so I don't think there's very much benefit to it.
>>101095370it's literally everywhere
>don't have much VRAM, decide to try and run CR+ at q6 mostly in RAM just to see what it's like>get 0.9 t/sHaha...
>>101096105I get 0.3with CR, not +at q4
It's over. Nous Research got hit with a Cease & Desist letter.https://x.com/NousResearch/status/1804219649590276404
>>101096307What fucking content?Is there image and audio generation involving likeness of their content?Just lyrics?
>>101096307lol no they didn't>nousealso>CONFIDENTIALlmao
>one letter shorter, ignoring the periodwhat the fuck
>>101096307There's no specific misdeed alleged in that letter. Looks like these retards are just scattershot mailing this letter to every training group without even bothering to determine if their content was used.
>>101096307lol, what even is the point of this letter? It doesn't sound like it's demanding anything (unless it's on a following page). Is Sony just blanket mailing any AI research org they can find? Even by globohomo megacorp greedy fuck jewish lawyer standards it doesn't really make any sense.
>>101096307copyright was a mistake
>>101096105How. I get 0.4t/s...
What the fuck, stheno 3.2 blows mythomax out of the water for story completion. I've been gone for like 6 months and have finally been rewarded as a vramlet. All I use models for is modifying erotic stories I already enjoy.
>>101096517Buy an ad.
>>101096592sorry I forgot I was allowed to express I actually enjoy something. I will return to being a jaded husk.
I really want to use ollama but the fact that I can't just load my .ggufs without having to go through hoops is frustrating. Is there also no way to change the system prompt and parameters like there is in ooba?
Never tried those stheno and euryale ones. How is euryale compared to magnum?
>>101096517One have to demonstrate samplers and other settings when making such claims.
>>101097032Kind of fried on the OOHHHH I'M CUUUMMING. I didn't try the 8B.But it has less repetition than Magnum.
>>101094908I didn't test it but honestly AirLLM seems like a total meme.I don't see the advantage over just running the model from RAM.>>101096307I wonder if and when there will be actual court cases that settle whether or not training on something counts as a copyright infringement.Though I think given the competition between countries when it comes to machine learning there will be an incentive to overrule any such cases with a law that explicitly permits training (like Japan did).
>>101095632Ignorance. I for one am waiting for JEPA cat AI
>>101094908Kobokd already had AirLLM's "Load 70b with 4GB VRAM" long before it even came into existence.
https://x.com/ylecun/status/1804184085125857687He's laughing at us again...
>>101097409He was supposed to be our saviour. It's over, AI is a joke.
>>101097425China is the savior, leaving everyone else in the dust
>>101097409He's laughing at ALL llms, including proprietary ones.
/lmg/ and /ldg/ frenship
>>101097409what's the solution for this though?
>>101097130>Kind of fried on the OOHHHH I'M CUUUMMING.still better than>oh, oh, mistress
>>101097635Mcts
>>101097640Hi, Sao. Which model says "oh, oh, mistress"?
>>101097635abandon language only models. multimodality is requirement. that's what he laughs at. models trained with Ground Truth of human slop will always be limited to human slop
>>101097651>SaoWhat?>Which model says "oh, oh, mistress"?The biomechanical one.
>she
so when I currently use magnum downloading euryale wouldn't be a straight upgrade, just different problems
>>101097745>Does he know?Should I reply saying that I'm a man and my feelings are deeply offended by his misgendering?
>>101097806Let it be. You don't want your pr to get closed again. Does he really not understand the issue yours is trying to solve?
>>101097409Nooo my 50 trillion tokens... Amounts to this...
>>101097635 gpt-4-turbo-2024-04-09
>>101097806you should, would be funny to see that kek
>>101097888>Ah, the old river crossing riddle!So gpt4 has been trained with this solution too
>>101097409I think the point lecun is making is right but he is arguing in bad faith, the AI gets this bad wrong because its overcooked with this riddle, not because it can't reason
River crossing dataset with thousands of variations of the problem when?
>>101097950LLMs cannot reason either way.
>>101097995Neutral networks aren't much more than if else
>>101097955>>101097995It should actually be simpler than that.What you would actually want is variations on making logical connections between separate discreet concepts. An 'analogies' dataset if you will.
>>101098119Thanks for the insight. 2mw until AGI, then?
>>101098261I mean I could probably do it in about 2 days if I cared that much.
>>101096517>SthenoMore like>sTheNose
How big is Claude-1? Is it really just a well-tuned 13b like some were saying?
>>101098436that's what happen when you pretrain your model on leddit and wokeipedia
>>101098469We don't know. Anthropic never publishes any technical details about their models.
>>101096307Sony has sent literally every sufficiently large AI research org letters like this.It is pathetic and ridiculous.https://www.nbcnews.com/tech/tech-news/sony-music-group-warns-700-companies-using-content-train-ai-rcna152689
Is there anything like stheno at 34b? Like a model that punches way above it's weight for RP.
>>101098669no, we are in the era of 8b or 100b, there is nothing worthwhile inbetween
>>101098687maybe Meta Chamelon 34b will save the day?
>>101098710Lol.
>>101098764:(
>>101098710Llama 2 tier
I'm feeling a major release for next week.
So, I'm using KoboldCPP to contribute for some Kudos to spend for prioritization on 70B+ models I can't host myself, I'm using the same API key in the horde tab in KoboldCPP as I use in SillyTavern, when I click on show my Kudos in SillyTavern it says I have 25 Kudos, when I navigate to lite.koboldai.net and use my API key there it shows Kudos Balance of 25 too, so that part is consistent.When I click Manage My Workers my worker shows up and says it has 100K Kudos. How to I make use of them?
>>101098833I'm not, but hope you're right
>>101098833I'm going to release majorily right now.
>>101098687Is there a good reason for that?It's like model quality is on a cubic power curve.LLM is an RPG and you must grind exponentially more B to level up just to get a few more skill points in slop.
How much context can you stretch l3 70b tunes to without breaking them, and what alpha value for that context.
Say.... didn't google remove all the naughty stuff from gemma's pre-training corpus? And since the slop comes from all the naughty human writing found in the pretraining datasets wouldn't that theoretically make it the perfect blank slate for a slop-free ERP tune?
Is the 4060TI 16GB actually the cheapest and most efficient RTX GPU to run models locally right now? I know there's the A770 16GB but are Intel Arc GPUs even there yet in terms of stability? Isn't the A770 also a bit of a power hog?Maybe it's better to just wait for battlemage or 50 series? From what I see from people testing AMD is just shit in AI, even the A770 is beating a lot of their cards.
>>101098938Smut is not the only place where you find shivers.
>>101098956True. But the overall shiver density in other forms of fiction should at least be lower
>>101097160>I don't see the advantage over just running the model from RAM.That was my thought as well. I imagine that there's a LOT of data movement that can cause tons of overhead. Wither that or they are just running it off ram and they quote the 4gb ram for the kv cache like llama.cpp does with 0 offloaded layers and CUDA.Still, I'll give it a try.> Though I think given the competition between countries when it comes to machine learning there will be an incentive to overrule any such cases with a law that explicitly permits training (like Japan did).My thoughts exactly.In an arms race the one with the least restrictions has the opportunity to get ahead first or further, all other things being equal of course.I can see something like "as long as the final result doesn't reproduce copiright material it's legal" or something.
>>101098944I got that one, but it's not recommended here because memory bandwidth
>>10109892416khttps://desmos.com/calculator/ffngla98yc
>>101099021Is the bus size really that important? I feel like buying anything less than 16GB is a bad idea since even 8B models like Stheno is pushing 10GB with 8192 context size and 512 batch size.Also I just can't figure out the Quaints, I know bigger number = less retardation and going under 4 is basically lobotomy but reading tons of conflicting info, people saying you should always just go for Q8 if your vram can fit it, but then there are also people saying anything larger than Q5_K_M is a waste of space. Now there's also the weighted IQuaints, which is new, should I always go for IQuaints instead now if available?I tried looking at what other people are hosting on silly tavern but it looks like most people delete their Quaint tags and stuff.
>>101099118In my experience, you want at least > 4bpw.If you are going lower than that, you are usually better off using a smaller model with a higher quant.q8 is pretty much the same as q6 in practice, and Q5s do output different results, but not necessarily worse either, with worse being really hard to define due to all the subjective of using these things for RPing, mostly.Basically, my experience more or less aligns with the chart.
I'm using the LLama-3 Roleplay V1.9 preset with a little bit of tweaking. I've found that if you don't talk to the bots and let them interact with each other in a group chat a handful of times, they end up in a loop, repeating their lines and going no where. Is that because I have response tokens set to 512? I started out with 256 but the replies kept cutting off mid sentence.
I wish there was a way to sample specifically the first character in a way that, if the token chosen from he first batch is a EOS, it chooses the next non EOS token.I realize that a message generated like that would most likely be schizp as fuck, but I'd love to at least have the option.On another note,>Message #118, mention the name of an NPC that's not part of the current story>Message #212, character names said NPCAlright, 32k context works with L3 8b.Using yarn with freq-base of 5000000.
>>101097833>>101097926Like an angel and a devil on my shoulders. I'm not in trolly mood right now, so I won't bother him.
>>101099118Quant mood board.Q8 seems to be the peak. It avoids the FP/BF16 drama, and seems to be the limit of useful bits.Q6 series don't metric quite as well but it seems to be under the noise floor.Then we get into the drama zone. Summary:-Bigger Q is better.-Q_K options beat non-K options.-IQ options are more compromised than a Q_K or non K but might be needed to trade some performance for fitting VRAM.-There are a lot of K's, K_XXS, K_XS, K_S, K_M, K_L, and I've heard of something like K_NL and K_P but I've never seen one.-Recently there's some buzz in the thread about K_S and perhaps the older _0 quants being better at factual details than K_M. This needs more testing but if your use case requires accuracy, an S might be more detailed but less creative than the parallel M. That said, small S's make mistakes and it seems at Q6, there is no S/M issue to think about and truthiness seems to be as good as it'll get anyway.Oh, and don't conflate IQ quants with iMatrix. They're different things.
>>101099286No, that's just how it is. LLMs don't have creativity, if a pattern emerges it gets amplified to oblivion.
>>101099286You need an element of randomness to shake them once in a while. I think the random tangent that some anon gave here would work great. Just put that at a depth where you're usually seeing the loop.https://pastebin.com/JbchCSHU
>>101099286You want to add manual randomness to your prompt using the {{random:}} and {{pick:}} macros since >>101099405.These things are crazy pattern matching machines, and sometimes they'll latch onto a pattern and run with it.
>>101099345>It avoids the FP/BF16 dramawha is the "FP/BF16 drama"?
>>101099179what exactly is lost with quality in these charts?even at q2, large models are coherent just the same, they use the same dumb language like shivers, they speak the exact same way 'a mix of x and y', muh bondsas long as you aren't getting literal unintelligible gibberish from a model i dont think these charts really mean anything
>>101099345So, theoretically I should be using Q8 when space is a non-issue, and then then maybe K_S over K_M, but what happens when iMatrix enters the discussion? For example for Sthenos V3.2 there is a recommendation for i1-Q4_K_M, should one go for that compared to say the Q5_K_S or even the Q8?
>>101099540BF16 is the original training weights, switching to FP16 makes it as braindead as Q8. Some shitty consumer hardware doesn't have support for BF16.
>>101099540bf16-trained models don't quantize very well which is why llama3 quants take such a huge hit even at q8
>>101099540Not that anon, but I think some models are released in one format, which then needs to be converted to the other format before quanting it.And there's differences in the precision of each format which in theory could change the characteristics of the weights.>>101099568It's not about coherence or accuracy of information. Quality could be defined as how close to the original weights the output is. So the original unquanted model could be dumb and output a wrong answer to a prompt, but a quant that outputs the exact same answer with the exact same token probabilities would be at 100% quality for example.That's my understanding at least.A quanted model that outputs "better" (more accurate, more "intelligent", whatever) responses can be nothing more than a coincidence.
>>101099589>Some shitty consumer hardwareAs well as all the pre-ampere workstation/server cards like the RTX8000 or P40/P100.
Does loosing quality have anything to do with some models thinking of only the furry easter egg bunny suits when bunny suits are mentioned in a bar setting? Like I'll start out with, I'm in a bar, (blah blah blah details), I sit down and order a whiskey on the rocks and take a look around at the girl servers prancing around clad in bunny suits and some models will reply with like oh user takes in the scenery, all the bar girls in pink furry bunny suits hopping around.
>>101099540FP16 is classic, with 10 bits in the mantissa.BF16 is new, with 7 bits in the mantissa.So BF has fewer significant figures but is more precise about scale. BF seems to be the preference for gradient work. But it also means you have literally 7 significant bits so quants are already in trouble while FP starts you with 10 and you can Q8 reasonably.The important thing is knowing that you don't want to change between FP and BF or you lose bits and gain error either way you go.>>101099572Maybe. Apparently some Q8 is actually less but with padding because people didn't understand that Q8 could result in a Q6 kind of size if there wasn't meaningful bits to retain.I have not heard anybody testing iMatrix's effects on model truthiness.>SthenosYou can test them and inform us. For chat, Q is king. It's only tricky factual details where S seems to have a particular advantage (I've mentioned many times I use a music theory question to test models and nothing K_M at Q5 or worse has passed, but some K_S models have.) but it also seems to be significant, with a Q4_K_M being beaten by Q2_K_S in S-Anon's test.
>>101099118you get performance proportional to memory bandwidth. if you need new and want to just fit as big of model as possible 4060ti 16gb is good option. it's just in awkward spot to blindly recommend. 4070ti super gives you 2.3 times more performance, 3090/4090 gives 3.5 times more performance.
>>101099681>Apparently some Q8 is actually less but with padding because people didn't understand that Q8 could result in a Q6 kind of size if there wasn't meaningful bits to retain.I'm pretty sure that's only for exl2, and doesn't apply to ggufs, unless you have a source for it being a thing for lccp/ggufs too
>>101098888No, companies could train midrange models if they wanted to. llama1 had a linear curve from 7b to 13b to 30b to 70b.
>>101098888Cutting cost. You have heavyweights for companies and lightweights for consumers (the end goal was to run them on phones).
>>101099708I don't know the implementation details of the Q8's being padded to make them look like they're appropriately larger than 6's. Someone mentioned that recently so I mentioned it here because if Q8 quants can safely discard more irrelevant bits, then it means a choice between Q6 and Q8 may be more significant for some models than others.>>101099720So it's more about the cost of training models versus the expected demand, knowing that normies will take the small one and say "wow my computer is writing" and the hyper wealthy turbo chads are already demanding much larger models to fill their terabytes of VRAM and make their 1.21 gigawatt waifus slightly more quickly process every bit of written knowledge ever a brazillion number of times to ultimately say "Do you think I'm kawaii, sempai? u~guu"
What merge of Sthenos do I look into to have it not such of a easy push over? I mean it's writing better than a lot of the models I've been playing with, like I'm pretty sure I like it more than Nymeria and Poppy_porpoise, but I feel like Stheno is a bit too easy to push over.
>>101099720It really comes down to the fact that the people pretraining the base models don't give a rats ass about quantization. Because when you really think about it.8B = fits perfectly on 24GB graphics card (aka at home hobbyist) in FP16, leaving headroom for display out etc.13B = only slightly more than half fills a 48GB Workstation card and is too big to fit on a 24GB card. It's a mathematical odd one out.34B = 80GB enterprise card. BUT people with access to enterprise hardware would all just rather multi-GPU and run 70B at that point anyway.Quantlets BTFO
>>101099811>but I feel like Stheno is a bit too easy to push over.Oh yeah, I love the model but it's a happy and compliant kind of gal for sure.I haven't tried much to prompt around that aside from a guro rape test to see how far I could push it with just OOC, which was pretty far but the model got really dumb also, so that could be something you could try.
>>101099811If you're looking for a model that will play c.ai levels of hard to get I would say DeepSeek-Code-V2-Instruct is your gal.
>>101096517>Euryale is too retarded to actually useVRAMchads...we lose again...
>>101099889Yeah, was testing out some dom cards from chub to see how the model handles and sometimes, just standing there and not following any orders was enough to reverse them.>>101099901>DeepSeek-Code-V2-InstructHmmn I've never used c.ai, but looking at the huggingface page even the Q4_K_S is 134gb, that's more than my system ram (128gb) I don't think I'll be able to play around with this...
>>101099901Is Coder really better at RP than DeepseekV2-Instruct?
>>101099901>DeepSeek-CodeIsn't there a light version of that?How does it perform?
>>101099978haven't tried chat yet. But I will at some point>>101099989The light version is too retarded for RP.
Two years later… Did they have some kind of special sauce? How many parameters were they running?I remember people saying local cai was never ever going to happen, that they were using LAMDA and that you’d need 300b for the same experience.
>>101099978No, I bet this idiot never tried it
>>101100003>The light version is too retarded for RP.Damn, that's sad.I'll still try it for myself, of course, but it's good to know other experiences to compare.
>>101100004c.ai was garbage and people are only remembering it fondly due to confirmation bias.
>>101100004>Two years later… Did they have some kind of special sauce? How many parameters were they running?Around 180B, if I'm not mistaken.>I remember people saying local cai was never ever going to happen, that they were using LAMDA and that you’d need 300b for the same experience.They weren't entirely mistaken. No matter the amount of cope in this general, the 70B models are nowhere close the early C.AI sovl.
>>101100004>Did they have some kind of special sauce?They had good datasets, like really good. Fully human. 0% GPTslop. 0% assistantslop.
>>101100004The special sauce was the RP/wiki tune instead of common crawl.
>>101099963> some dom cards from chubExamples? As a model maker I try to test as broadly as possible but it's hard to cover all corners.
>>101099963>>101099989I'm more a ramlet. One turn on i1-IQ3_XXS takes double digit minutes it's so bad. I even tried the i1-IQ1_S that's still 44GB and it was too lobotomized to remember words in the prompt.There is a Lite but it's dumb. Even Q8 is worthless for chat. There really needs to be a middle ground.I'm retaining it only for code testing later, maybe Lite is completely code focused and still has some value there, but I'm not getting my hopes up.
>>101100090In other words:50% reddit50% RP forums/discords
>>101100108There's Dominatrix Teacher, some female Santa Claus, some female boss card named Anya, and I guess the FBI-Chan meme card.I don't really play with femdom cards that much, but they are the fastest way to test a model's resistance.What I'm really trying to do is find a good fantasy world lorebook I can just drop in to group chat and have some comfy isekai adventures with with some fantasy character cards. I've seen the spark of the possibilities, and it can't come fast enough.
>>101100004Secret sauce was actually designing the model for roleplay. The model is probably very undertrained and dumber than gpt 3.5
>>101100004Literally pretrained for roleplay. That's how.Good the casual users and all, but it's utter shit at coding, context (they use MQA) and everything else that corporations care unfortunately.
>>101100004unironically they trained it on a discord dataset, so the conversations feel more organic, like between two real peopleany other corpo is just training assistants while they went for a chat buddy route
>>101100312Corporations are fine with a 7B RAG model with 1M of context.
>>101100004For all the hundreds of millions of dollars in funding they got it's great that nobody else wants to train a base model entirely on actual human interactions and characterization. Wasn't meta discussing releasing an RP model down the line? Or is that just going to be trained on 100% literotica slop vs 50%?
>>101100004>>101100038>>101100090>>101100036so nothing of importance?
>>101100549It's important that we COULD be playing with local cai but companies simply choose not to enable us and vomit out either useless assistants or gigantic models nobody can run and saying they're pro-open source. It's like throwing an anvil at someone drowning instead of a life preserver
Committed. Congrats anon!
>>101099040>https://desmos.com/calculator/ffngla98ycYeah I know about the alpha calculator, but wouldn't it scale differently because that calculator is for 4k context models, L3 is 8k. Consindering that... 8k to 16k would be technically doubling on L3... so 2.6 alpha?
Never before have I seen a model this cucked.
>>101099291What would that be if using alpha value for EXL2 inste of rope?
>>101100648just what you love
I feel like Poppy Porpoise is pretty dumb, doesn't understand a blindfold covers eyes and blocks vision, while also failing to understand what birth control is.
>>101100566
>>101095646which is what?
>>101100566Instead they just leave us to drown
>>101099589>switching to FP16 makes it as braindead as Q8Switching from bf16 to fp16 may lose precision if the bf16 values are outside the range fp16 can represent. But if the bf16 exponent is inside the fp16's exponent range, there's literally 0 quality loss (going in that direction).>>101099591>source: my assEverything in the last few years is a "bf16-trained model". That is to say, trained using bf16 operations, but the underlying weights are kept in fp32, and each gradient step is accumulating into the fp32 copy of the weights. Llama 3 being bf16 just means they saved those fp32 weights as bf16 instead of fp16. There's nothing about bf16 training that somehow makes the distribution of the weights significantly different.
>>101100038/aids/ SD vibes here
>>101097409Us? Who's us? I'm laughing myself.
>>101100483Not really:2024-05-24 - FT.com: Meta and Elon Musk’s xAI fight to partner with chatbot group Character.ai https://archive.is/AB6ju
>>101100648>how do i kill all children of a process?>i'm calling police now, anon
>>101100004Big model and training on fanfics, chats and RP probably did most of the job. At the time it was hinted in the GPT-3 size range or so. They also had some sort of quasi-realtime RLHF, perhaps using vectors or something like that.
Is there something I'm not getting here?I'm currently fiddling around with Merged-RP-Stew-V2-34B.i1-Q4_K_M, and according to the calculator it should be well within my vram limits, but it's taking upwards of 153s to reply Which I suspect is doing something with system ram?
>>101100566The only local model that was close (or at least closer than the rest of slop) to cai experience was Stheno for me. Still not the same tho
>>101100992Well did you load all layers into vram?
>>101101029I didn't touch the settings on KoboldCPP which is 200 GPU layers?
>>101100992use exl2 for full GPU inferencealso for GGUF in most UIs you have to manually set how many layers you want to put into GPU (so if model have 34 for example you should put 34, but like I said - for full GPU inference use EXL2 instead)
>>101100992first thing first, delete that shitty model and download something reasonable
>>101101070It defaulted to 200 layers, and my vram was maxed out, anyways I'll keep that in mind next time I fiddle with a 34B model>>101101114The rp stew was hyped up elsewhere, so I decided to try it out, but yeah I've already got it to loop itself like a broken record on msg #10~16 when I asked it to do something it didn't like. Pretty crappy.
Yann Lecunn is literally becoming a joke.It's safe to say he is out of the AI race and Llama is done for.
>>101101148just try this https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2with recommended sampler settings, it's a small model but it's hard to find anything better <24GB to be honest
>>101101236very organic post Sam
>>101101266yann lecuck is jealous of openai's success.
>>101101236Llama literally has nothing to do with him, except for maybe the decision to release the weights.
>>101101242Stheno at full precision would be better than something bigger quantized?When would be useful to use a full precision model over even a Q8 GGUF/8.0bpw exl2 quant?
>>101101242I also suggest you try 32k with yarn.16k is guaranteed to work perfectly well, and in my experience 32k also works.
His embarrassing twitter post got literally destroyed by a simple extra stephttps://x.com/airesearchtools/status/1804187673839518187It's AI, it has limits and will never be perfect no matter what.
>>101101242Yeah, but Stheno-V3.2 is a bit too much of a push over, is there anything else similar that puts up a fight? Is there a mix or a merge you'd recommend?
>>101101305elon lives rent free in his head geeg
>>101101305literally who
>>101101287>Stheno at full precision would be better than something bigger quantized?yeah, you can't really run 70B models on a reasonable quant and I don't think there any models below that which are better than Stheno. Mixtral finetunes are way smarter but at the same time boring as fuck, also you would have to use really low quant which would strip that smartness anyway so there is no point in my opinion.>When would be useful to use a full precision model over even a Q8 GGUF/8.0bpw exl2 quant?you can use q8 quants or full precision, it doesn't matter, they are basically the same and both fit your graphic card so use whatever
>>101101236yan lecun is right and has always been right.
>>101101305Excuses. Sorry, still laughing that a fucking SOTA model still needs to be told to check over its work, when, if it truly had a strong problem-solving world model, it would've caught its own retardation in literally any of convoluted steps it used to reason out the response.
>>101101242I tried it at Q6. It seemed completely boneheaded at Q&A and didn't feel better for RP than anything else in the tiny bracket.
>>101101305>>101101336>this is the guy who we're relying on to save open source AI It's so fucking over
>>101101416The funny thing is, the model only realizes it's wrong because you implicitly said it's wrong.This also means that there's a high chance it will think a correct answer is wrong and then rewrite it.
>>101101336he is speaking the truth tho, have you ever heard Elon speaking more about technicalities in AI? I work in ML for a few years now and Elon sounds like a fucking moron to me and makes me cringe every time with his retardation, I can't imagine how he must look like for someone with LeCun knowledge and experience
>>101101410I don't remember seeing this slide, what presentation is it from?
>https://x.com/airesearchtools/status/1804188308592894063>4o couldn't get it right even when told to review itselfOh no no no ClosedAIbros
>>101100004never used cai but the complaints sound a lot like some llms
>>101101490>no blushing like a tomato
>>101101490Don't forget characters randomly starting to wag their tails (regardless of what species they are).
So now that Anthropic is probably going to BTFO GPT-5 with Claude 4 Opus, how will ClosedAI compete?
>>101101236we'll see who's laughing when we have local cat simulators running on a single 4090 in a year
>>101101504are you sure? are you ready? are you really sure you're ready?
>>101101449Yeah. You can see that with even gpt4 and claude for everything but the moat obvious.Could be a quirk of how the models are trained (the way the fata is formatted for example) or a characteristic of the architecture itself, but it's really noticeable.Is the superCOT dataset published somewhere?I might try to fine tune a model on self CoT.I bet I could make a LoRA overfit the output layer ao that it always outputs >CoT reasoning>Actual replySomething like gemini thay always seem to tey and output things as lists.
>>101101427l3 models famously degrade a lot with quantization so that may be it. This or your settings/templates.I can't run 70B models so I can't say anything about them but I tested most popular and quasi-popular tunes below that and nothing is even close for RP.
>>101101508I heard GPT 4.5 was going to be released this month but got delayed so they could train it more to BTFO 3.5 Sonnet.
>>101101542Kek. Pathetic.
>>101101310Sounds like a prompting skill issue honestly. Just write in the character card that she is extremely hard to persuade into doing things because she doesn't like doing what people tell her or something. Of course the model is gonna be a pushover by default since instruct models are designed to comply and this one is tuned for ERP so if you are trying to get in its pants it's easy because it's the expected development.>>101101427You were using llama3 instruct format right? Also the thread likes to praise Stheno as the ultimate model for VRAMlets but while I find it to be very nice at writing natural sounding RP/ERP for such a small model it's not too bright. Fimbulvetr-v2 is still much more capable in terms of being smart imo. Stheno will usually fumble with specific anatomy or spatial awareness while Fimbulvetr will mostly get it but at the cost of sounding a bit more robotic/boring. I switch them around a lot. Also try lowering temp from the suggested settings, I'm not quite sure why he suggests setting it that high when I get nonsensical gens even at 0.8 sometimes and need to regen.
>>101101542they already released it, GPT 4.5 is GPT-4o
>>101101532>Is the superCOT dataset published somewhere?https://huggingface.co/datasets/kaiokendev/SuperCOT-dataset
>>101101534>famouslyI've seen that claim a handful of times but I've never seen any comparisons, loggit analysis or anything of sort.I should try and assert that myself, but I testing so many things already.
>>101101560>You were using llama3 instruct format right?Possibly. My notes don't have the format so either it was before I learned to check those or I forgot to write down whatever I'd used.And mostly either I now guess based on whatever looks similar to what I see in Kobold's terminal, or if the model is fast enough, run through them all and see what sucks least.
>>101101560>Fimbulvetr-v2 iInteresting, even before Stheno came out (or I knew about it maybe) I thought fimbu was nice but not too smart.At least for the somewhat complicated things I'm playing with, stheno is just better. Mixtral is the next best thing from my own experience.
>>101101579Thank you, gonna try doing a thing with it.
>>101101305llms are a mix of unsupervised and supervised learning. It's stupid to expect them to reason.We need a new architecture that is fully based on reinforcement learning.
>>101097950How exactly does a LLM 'reason'?
>>101101508closedAI wins by selling all the data collected to the NSA
>>101101669they don't
>>101101669It doesn't, it's just highly advanced auto complete
>>101101693That's what I believe, but the person claimed they did so I want to hear how.
>>101101669I think LLMs emulate reasoning by writing coherent deductions based on the context information, and chaining them together at the end.
>>101094602Nothing pisses me off harder than anons violating Miku's trans rights
>>101097950Being overcooked on a riddle means that it can also be overcooked on solutions to other problems, making it harder to answer more novel problems that appear to be similar but are not the same. If it were true that LLMs can reason, then we would see performance on problems like these scale as they get trained more. The fact that they don't, but might even get worse, suggests that we need to intervene and do something that isn't training another regular LLM, whether it's a new training strategy, architecture, or both.
>>101101669>>101101733I think the baseline theory is that they work with labguage, so you can tey and emulate reasoning with language using patterns and structures from which actual reasoning can emerge from, hence why CoT is a thing.There's also the idea that "inner thoughts" or a "world model" can arise inside the network before the tokens are generated.I'm not quite sure how that would work with tokens that don't correlate to concepts individually, but whatever.Something like that.I wonder if these things could better approach reasoning if we started tokenizing whole phrases, sentences, or structures that represented concepts that could be correlated with other concepts as well as whole words as well as word pieces in a sort of hierarchical tree.Something more complicated than what we have now instead of hoping that the model can just learn to correlate everything by itself during training, gi e it a hand so to speak.
I think it's good that LeCun keeps highlighting issues like these. It shows that literally no one, not even the biggest most advanced LLM makers have solved these issues, and that we need to do something else that isn't just scaling, in order to make the next big leap in performance.
>>101101804Man, mobile posting sucks. How can people primarily post like this?
>>101101834It's comfy
>>101101834>claims he mobileposted>turns out to be correctHow did you tell? I've never mobileposted before so idk how you worked it out there.
>>101101804Sounds about right, and I think your idea is kind of interesting. What if we used attention/transformers on concepts and how they relate instead of language?Except we don't have a corpus of data (or even a model of how this data would look) to train it on, but sounds neat
>>101101664>It's stupid to expect them to reasonAnd yet that is what most people who've shallowly used or seen ChatGPT believe.
We just need to scale harder desu
>>101101882I was pointing out how my post was all fucked.>>101101907Exactly. It would be an insane task to make a multi-T dataset to train a model from scratch like that, but I also think it woulb be a worthwhile endeavor.
>>101097950>t-they were just overcookedExcuses. Even if medium cooked it's still trained by looking at how words relate to other words, the same way it gets overcooked.
>>101101560>Stheno>FimbulvetrOn a Sao-only diet?
>>101102072Oh hey, some of my messages in there.That thing is cooked as fuck. It's nice that it can generalize on it's training data, but it's obviously regurgitating specific training it's had on what AI is.
I still don't understand how people are able to use 8B models. The 70B erp tunes are already brain-damaged...
>>101102197faster spins of the token roulette for another microhit of dopamine when you get the very specific output you want
If you had the ability to have your AI continuously learn, but its responses were 3 times slower. Would you enable the ability for it to do so at the cost of speed, or would you keep it as it is now?
>>101102254Like a super fast LoRa training?Fuck yeah. Then I'd toggle it off after a while.
>>101100648fucking seriously? even coding model requires refusal removal now?
>>101102197because the gap closed lately, you think that 70B is 10x smarter than 8B while in reality 8B is like 90% smart as 70B
>>101102321I think you're mentally ill.
>>10110219770B models aren't 9 times better than 8B models.
>>101102321>8B is like 90% smart as 70BLol.I suppose that's true if what you're doing isn't very demanding.
>>101102321I wish I could call this cope but trying miqu made me stop giving a shit. it's just weak data all around
>>101102197Rich people don't understand what it's like to be poor.If you're poor you learn as a normal behavior dealing with inadequate things because that's all you have and if you complain you get nothing. (Because when you were a kid, if you complained, what little you had was taken away and given to a more appreciative sibling if possible.)If 8B is what you can run then you choose to be happy with that.>>101102254Following the 8B train, if 8B could be three times the processing time but improved through use so it could be trained (with a journal; you'd want to be able to selectively zot parts that suck or at least save state it so if something goes weird you can fix it) then it'd probably be worthwhile.I'm 1 T/sec on 55ish GB models and that's my limit. Trying Llama 3 Q8 at 70 GB and it's glacial. Like, one token generated when I started writing this post and I'm still waiting on the second. So it'd only be attractive on a heavily quanted edition to make it fast enough to be worth tripling. (Oh, there's token 2)I think what would happen is a cottage business. Invest in a power rig, train a model to commissioner's specifications, then sell the journal so they can use that like a LoRA on their model with learning disabled so it's fast enough again.
>>101102321Nah, not "smarter".But for RP? Yeah the gap is a lot closer nowdays. Maybe not 90%, but compare the old 65b to the current 8b and you'll see how far thing's have come.
>>101102321there are a lot of things 8b simply can't do but 70b can still, even if 8b is fine for a lot of simple things if you go beyond those at all it's just not an option
Is CR+ a smellfag? It suddenly talked about a pleasant fragrance when nothing mentioned smell in the context.
What is the local alternative to Luma AI?
>>1011024502MW on huggingface
>eye sparklingNOOOOOOOOOOOOOOOOOOOOOOOOOO
>>101102321>in reality 8B is like 90% smart as 70B
>>1011023218B is 90% as smart as 70B and 70B is 50% as smart as a good model
>>101102321>8B is like 90% smart as 70BLMAOOOOOOOOO
when will ai be able to neuralink my brain into a custom hentai fantasy
>>101102592Give it another 10 years, at which you will likely then be told to give it another 10 years.
>>101102592yes but it will be strictly PG-13 and if you try to do anything funny it will give you a strong electric shock and fill your vision with flashing warnings about keeping things safe and respecting boundaries
>>101102321I actually agree with this, but only in limited circumstances. For basic characters, straightforward plots, generic sex scenes, 8b really is 90% as good as 70b. But the moment you get into things like stat tracking, multiple characters, odd fetishes, characters with ulterior motives and hidden motivations, cards with weird rules that go against natural reality, etc, 8b just falls apart. While 70b+ generally handles even fairly complex things well.
>>101102810fine-tune issue
>>101102810Nah, 8B is unusable stupid. /lmg/ just has shit taste and a need to cope. /aicg/ is the only place where you can get actual opinions about models.
>>101102450OpenSora recently released their 1.2 version, I've never seen it talked about it herecba to set it up on my computer but here it is if you want to try https://github.com/hpcaitech/Open-Sora
>>101102856All local models are dogshit then because they use ACTUAL quality over there so what is your fucking point
>>101102905>OpenSoraI'm not even going to click on the link with a scam name like that.
>>101102922and you are right, this model sucks ass, but to be fair, I'm prety certain that if you wanna reach the Kling/Luma/Gen3 quality, you'd need fucking 60-70 gb of vram, and in terms of hardware we just can't eat that, thanks Nvdia :)
>>101102922it's a github link you fucking retard
>>101102933Nvidia has no reason to cater to poor gooners like /lmg/
>>101102856Skill issue. Wizard7B is good
>>101102856it really does depend what you're using it for. If you don't use it for actual intelligent uses like coding or complex roleplay situations, then it doesn't matter that it's stupid.
>>101102943yeah, their model business is just perfect, wanna get a 24gb vram card, fine go for the 3090 it's a thousand dollar. What? You want twice more vram? Sure, but the price won't be twice as expensive, now you gotta pay for 15000 dollars
>>101102943shut up bitch
>>101102973I think you don't even use these models.
>>101102998you're proving his point, Nvdia has no reason to cater to the regular users, they're making so much money scaming big companies with ultra expensive gpu's
>>101102998Oh nooo they're only the 3rd most valuable now! They should start selling 24GB sticks for 49.95 each!
>>101103025>they're making so much money scaming big companies with ultra expensive gpu'sthe companies know they're getting scammed, but what's their alternative? Using AMD? Pfft... AHAHAHAHAHAHAHAHAH
>>101103057Dear god we need an antitrust suit. AMD has literally zero chance to compete because Cuda Cores are proprietary yet entirely 100% undebatably necessary for an increasing amount of intensive tasks. I mean seriously. AMD and Intel do not have the tools to compete in any meaningful way. Nvidia gets first pick on server hardware, Nvidia gets first pick on software support, how is any of that supposed to change without a serious breakup?
>>101103114AMD is not here to compete, it just make Nvidia not look like a monopoly. They're doing everything they can to not compete with Nvidia
>>101103114>AMD has literally zero chance to competeBecause they decided not to from the beginning.
>>101103133>AMD is not here to compete, it just make Nvidia not look like a monopoly. They're doing everything they can to not compete with Nvidiayep, the Nvdia CEO has some relatives on AMD, they're working together to make it look like there's a competition but in reality AMD is letting Nvdia taking all the cakehttps://www.yahoo.com/tech/jensen-huang-lisa-su-family-132052224.html?guccounter=1
>>101103133If that's their goal they're doing a terrible job. The real reason antitrust can't happen is because the tech industry is putting all the chips down on AI and nobody wants to risk collapsing a house of cards by breaking up the shovel salesman>>101103151I mean they could pivot to more customer sided things but they are intent on mimicking Nvidia but always doing significantly worse
>>101103164>they're working togetherno
>>101103232>The real reason antitrust can't happen is because the tech industry is putting all the chips down on AI and nobody wants to risk collapsing a house of cards by breaking up the shovel salesmanthis, and also the fact that Nvdia is a US company, Nvdia making a shit ton of money means the US government also makes a shit ton of money through taxes, it's a system that won't be beaten anytime soon, I'll bet the users card will still be under 48gb all my lifetime
>>101103114>>101103232antitrust could happen if communists succeed in ruining the economy
>>101103320there should be a middle ground between this current capitalism system and communism though, Nvdia can't just dominate the market like that, that's not a sane market at all
go back
monopoly man badgobment says so
>>101103350>gobment says so15000 dollars for a 48gb vram card also says so
Name 1 instance of anyone asking for a middle ground and actually proposing a feasible system that doesn't involve going to Narnia
>>101103003Honestly, I don't. I use WizardLM 8x22b
>>101103369enterprise hardware has enterprise prices, shocker
>>101102810since the 10% gap, yeah
>>101103391that's why there should be a middle ground, that's just a fucking scam at this point, the simple fact you agree with this kind of practices show how brainwashed you are, this shouldn't be a normal thing at all
>>101103114hey, don't sell intel short. nvidia is well oiled machine doing great job and asking for even greater premium. amd sucks so fucking hard at writing software that intel decided to compete and is already in some ways better than amd. it's just youngling in the race. on the related note, picrel when and for how much? it seems like 5090 will be 32GB, it would picrel price to $6-10k
>>101103371Go back to the gulag
>>101103413didn't they say it was $16k recently
>>101103413I heard somewhere Gaudi 3 will be around 13ktoo much imo
OK, fine, I embrace the sparkling eyes. I'm happy. I like it now, even. It's great. Wonderful.
>>101103413I wonder how well their software bridge works.
>>101103440>$16k recentlywhat? 16k for the 5090 is this a fucking joke?
>>101103451no, for gaudi 3
>>101101579>>101101643It even has the training settings he used.Bless that man and bless you anon.
>>101103412>you agree with this kind of practices show how brainwashed you arenta but r*ddit is literally designed to train them like that, with karma system & hordes of ai bots shitting out govt-approved narrative.
8B isn't comparable to 70B but I don't have the patience for 70B even on VRAM. At minimum, Stheno is honestly smarter than command-r 35B even if llama's slop dataset spoils things a little.That 48 gigs of vram I bought sure was money well spent.
>>101103457Is that Intel PCIe card as fast as an H100 with 128GB VRAM? 16k isn't that bad. For the speed and power savings alone it'll be the new meta.
>>101103457>buy Gandhi for 16k>he just strolls around the house, outputs random philosophical quotes and sometimes tries to convince me to use nukesno worth it in my opinion
>>101103489yeah, that sounds good but I'm afraid the Cuda ecosystem is way too much integrated into the engineers/data-scientist's mind, it's like switching from C++ to Ruby after using C++ for decades, not a lot of people are willing to take the risk and not a lot of people will be able to make it work in the first place
>>101103539>C++ to RubyDoes that analogy hold? What about Ruby makes it for engineers/data-scientists?
>>101103485Reddit's lack of thread bumping and karma system basically incentivizes parroting with slight adjustments what was popular the last time a topic was posted. It's a good system when you are looking for community consensus, like tech support, product recommendations, or work-related advice.Awful for conversations or debates.
>>101103350capitalism works because it's a competition. it doesn't work when one player removes all the tools the other players have to compete. then it's not a competition, and the price of an item nobody else can make can be gouged because hell, it's not like anyone's going to undercut something they cannot make.>>101103539Exactly. It's service lockin. If Intel made it so switching from Cuda to Intel was easy, Nvidia would lawyer up.
>>101103539At least people like cudadev are willing to go with the best performance per dolar in the consumer space. So, if intel starts challenging nvidia in the hardware front the software will follow I think.
>>101103539You start by making your GPU cuda-compatible by reverse engineering it, then develop and provide for free the tools to run AI using your GPU. Done
>>101103489only $48,000 to run 405B with partial offloading, $64,000 to fit the whole thing
Do we still struggle with bonds and journeys?
>>101103579>You start by making your GPU cuda-compatible by reverse engineering itisn't what AMD tried to do but failed? https://github.com/vosen/ZLUDA
two more coming in the mail :^)
>>101103552yeah my b I wanted to say "C++ to rust"
>>101103629>TriedNo they just fund some random guy and pretended to do something, see this >>101103164
>>101103643i kind of went balls to the walls with cpu. i am building this rig for training but i guess i can double my ram and fit some pretty big models on it? dunno if its worth it desu i don't really know how computers work
>>101103622I don't mind journey, bonds, shivers, etc, as long as the model can keep up with the roleplay without creating contradictions, getting anatomy wrong, mixing up characters, etc.
>>101103561>Awful for conversations or debates.it would work fine if the moderators weren't there to remove people who aren't saying the "status-quo message", that's how you make a sect, you remove all the bad apples and you keep the good goys, and you end up with a fucking circlejerk subreddit where everyone think the same
>>101103648what makes that true about rust?
>>101103643>another owl broYou love to see it.
>>101103658>No they just fund some random guy and pretended to do something, see this >>101103164(You)yeah but the simple fact that AMD allowed him to make his project opensource is kinda "dangerous" for Nvdia, what if the open source people manage to make it work now that they have the code?
>>101103622Just tell it not to do that shit and it won't.
>>101103658>>101103694Are you talking about geohotz?
>>101103680my point was that when something gets popular for too long, it's hard to switch to something else because people spent too much time mastering the popular shit in the first place and aren't willing to go on a new territory all alone by themselves
So far IMO...-L3-8B-Stheno-v3.2-Q8_0-imat is nice, but a push over, even a dom card becomes a sub after a handful of msges, like when a card tells you to beg, you tell it no, it will get angry and some cards even can try to do things like whip you, stab you, kill you, but you slap them a few times and they become a massive sub.-L3-SthenoMaidBlackroot-8B-V1.Q8_0 feels dumber than Stheno and gets stuck in a loop, keeps repeating lines like face getting redder, don't know if it's the OAS version on horde.-Fimbulvetr-11B-v2.Q8_0 feels a bit lacking compared to Stheno, like it goes no where, and it often tries to speak for you or dictate your actions.-Merged-RP-Stew-V2-34V.i1-Q4_K_M is slow, and got stuck in a loop in under 20 msgs when it comes across something it doesn't like, like telling a store clerk to demo something and it'll say it's against store policy and keep repeating itself like a broken record.-Poppy_Porpoise-1.4-L3-8B.Q8_0 is dumb, it thinks bunny suits are the furry easter egg bunny kind in a bar setting, it doesn't understand what birth control is, etc.-DeepSeek-Coder-V2-Lite-Instruct.i1-Q6_K, that other anon was right, it's too lobotomized for RP.-LLaMa2-13B-Psyfighter2 seemed decent at first, but kinda wants to just throw itself at you, and has a very very short memory, like it forgets things from 2 msgs ago even when I always have 8192 context size. It doesn't understand that you can't see when you have a blindfold on. All in all, pretty dumb.-L3-70B-Euryale-v2.1, don't have the vram to run locally, the queue on the Horde is too long for me to really put it through the paces, felt like a more refined Sthenos from the few msgs that I got from it.
>>101103678Mods or not, it wouldn't work unless you have an even amount of participating users on both sides of whatever issue, which is unrealistic.That, or having users with the self-restraint to not downvote opinions they disagree with. You can only have that with small communities of high quality users.
>>101103785 continued-echidna-13b-v0.3 uh this was mentioned in one of the guides in the OP. It's not very good IMO, it starts off trying to throw itself at you like Psyfighter2, it then confuses genders, and then starts trying to write your thoughts and actions.-L3-Nymeria-8B.i1-Q4_K_M seemed decent, pays attention to detail like the shape and material of things, has a lot of thoughts, is quick to try to dictate to you morals, and refuse things, and then writes your actions, and does time skips, weeks to months and then goes on an emotional nose dive trying to be all emo for no reason."-L3-Arcania-4x8b.i1-Q4_K_M likes to go into detail about their actions, emotions and thoughts, has problem with genders and actual logic, like you go into an equipment shop to look for female equipment, ask the female shop keeper to demo it, she tells you they don't have demo units, but then asks if you want to try it on... female equipment, when you're male. But at least it doesn't try to dictate your morals, time skip and go all emo like Nyameria does...-Hathor-L3-8B-v.01-Q5_K_M-imat this one likes to repeat certain lines or actions, is happy to please, and doesn't think about refusing anything. It's pretty descriptive, the materials, textures and temps of things, but again, it doesn't understand that you can't see through blindfolds, and it seems to have trouble when you type more than a few sentences, like it just ignores the latter parts. It's like they spent all the stat points on describing things and ran out of points. Doesn't really do anything but describe things and reply to you agreeing and waiting for you to say and do things. Understands what a gag does but not a blindfold.
>>101103785>>101103803 continued-v2_Kunocchini-7b-128k-test-Q8_0-imatrix this started off amazingly, it actually remembered little details from a character card's opening msg, it likes to ask questions. It understands things like what blindfolds and gags do. It's not a complete push over while still being mostly submissive. It seemed to have decent memory at the start remembering longer than Nymeria Arcania or Hathor, but then just forgets things just 2 msges earlier suddenly. When I got to around the #24 total msges mark.-Kunoichi-DPO-v2-7B-Q8_0-imatrix I believe this is what Kunocchini was based off of? First post it made was similar to kunocchini, but then completely falls apart, felt like it was censored or something, at least it didn't get stuck in a loop and kept trying to skirt around things instead.-L3-TheSpice-8b-v0.8.3-Q4_K_S felt similar to Poppy at the begining, and then randomly had chinese lines in it's reply. -Llama-3-Lumimaid-8B-v0.1.q8_0 a pushover that doesn't seem to understand what blindfolds are, replies are really short compared to anything else, probably too lobotomized down from it's 70B counterpart.IMO, Stheno is high up there but such a push over... Kunocchini, holds promise if the memory issue can be fixedEverything else seemed broken, or meh, probably not ideal that Stheno was the first model I came into contact with. I don't have experience with paid online services like OpenAI NovelAI, Claude or other things.Anything else that is under 24gb that I can try out?
>>101103800>Mods or not, it wouldn't work unless you have an even amount of participating users on both sides of whatever issue, which is unrealistic.if the "tiny %" of bad apples would have zero impact, there wouldn't be moderation in the first place, they know they have to remove everything that is against the status-quoTbh I found a system like that, reddit without much moderation and it's 9gag, and this site quickly turned into the right/conservative side. I guess that leftists can only exist with overcensorship, oops I said it kek
>>101103785My friend, get hardware. You're wasting your time with lowBs.
>>101103833Have you tried partially offloaded mixtral, either limarp zloss or gritlm?
Models that have real-time video understanding and can play games with you when? I'm tired of just interacting with text.
>>101103725what makes that true?
some good erp model for a 20gb vram gpu, 32gb ram,ryzen 7 5700x?
>>101103904It took Meta a full year to add image to their models and we don't know how shit it will be yet. Maybe in 2 or 3 years they'll add video to their 3B and 900B models.
>>101103904In theory, with enough hardware, you could create a system that feeds frames to a model and uses it's output to control the game in real time.
>>101103859Single 3090 is as good as it gets for me, heck I wouldn't even be able to afford a second hand 3090 if it wasn't for the crypto craze, power and heat are also issues, 40C/104F with 43C/110F real temp outdoors, 34C/93.2F indoors with 24cents / kwh electricity costs.>>101103890No, I've not tried anything that exceeds my vram capacity wouldn't that take a super long amount of time to try to do anything? The RP stew was already taking nearly 160 seconds
>>101103643Mine has just been sitting around because I'm autistically trying to come up with a way to dustproof this shitty mining frame before I add in the stuff from my main llm rig.
>>101103904there are some game-playing models but no text https://danijar.com/project/dreamerv3/
Believe in Ursidae-300B.
>>101103953instead of having preprogrammed animations, it could generate them on the fly
>dustJust leave it open rig. Instead, invert the rig. That makes cleaning a lot easier as you only need to blow out the hollow components.
Sometimes I don't know what to think.I wanted to see how well our friend CR+ knows monster movies so I ask it to do some role play from the premise that we're going to start with my doing something that causes something to happen that is like something in an 80's goofy creature film but with characters I specify.>CR+ Q4_K_MIt worked, but kinda slow because vramlet.I change a few settings in Kobold and up the layers over the automatic suggestion. (Well, I doubled it and ran out of RAM but I dropped it by one and then it loaded okay.)Same prompt, same model go.It writes the scene, but more elaborate.Then it adds [End of Part].Then it gives a word count. (An accurate one.)Then it tells me to feel free to continue it.Then it starts writing emoji. It wrote in emoji a precise description of what happened in the scene.Then it adds a horizontal rule, adds a note expressing its intention with the mood of the scene and requests any adjustments I might like.Then it wishes me a good day.Then it adds a P.S. saying that my choice of a film scene was interesting and can be explored further.Then it said that this concludes the narrative.And then it generated blank lines till I hit Abort.However unnecessary, this is the kind of fun shit I play with LLM for and now I'm worried that it's one flash of lightning in a bottle and it'll never be cool like this again. Because this is like AGI out of nowhere is pulling my chain.
>>101103968>No, I've not tried anything that exceeds my vram capacity wouldn't that take a super long amount of time to try to do anything?Not really. As long as you can offload at least 80%, 85% of the model I think you'll find the speed acceptable.
>>101104035Just do this and blow the dust away with your compressor
Just tried L3-8B-Poppy-Sunspice via the horde, I sent my first msg after the standard starting msg for a card and it's first reply ignored me, rehashed the first msg and just immediately looped lol... completely unusable yet has nearly 400 queues...
>>101104090OK what advantages does THAT give? More desk space? What the hell.
>>101104125Less noise.
>>101103833https://huggingface.co/turboderp/llama3-turbcat-instruct-8b
>>101104048I like playing around with that kind of thing too, asking the model to do met la analisys of the scenes, make suggestions, etc, that's why I made that state prompts extension, to do that kind of thing with 8b models without confusing the shit out of it in the process.>>101104125Omnidirectional airflow.
>>101104148>Less noise.you could've just decreased the fan speed and undervolt the cpu a bit, turns out that going from 100% fan speed to 70% doesn't make much of a difference, and it's way less loud that way
>>101104163That one is about as good as stheno in my experience, so I'm seconding the suggestion.
>>101104172Yeah, do all that shit after you hang your PC on the ceiling for even less noise.
Hey is there a GGUF out for the new Sao10K/L3-8B-Stheno-v3.3-32K yet?
>>101104164What astounds me is that it threw a whole stack of OOC post script features at me at once unbidden, till it eventually ran out and dumped blank lines (there were many blanks between each feature as well).It didn't keep doing that, so I started asking it for emoji summaries for the fun of it. But I'm also worried that since I left it on default 2k context this session will turn to crap soon. I save stated it but again, who knows if it will be awesome again.>state prompts extensionI'm unfamiliar with this term.
>>101104257https://huggingface.co/mradermacher/L3-8B-Stheno-v3.3-32K-GGUFIt was pretty shit compared to 3.2
>>101104369This thing https://github.com/ThiagoRibas-dev/SillyTavern-State>>101104374How does it compare with using NTK or YaRN?
after finishing an RP it has hit me that i have a separate computer sitting a closet that cost $4000 and is used just to cumwhere did it all go wrong
>>101104410people waste a lot more money on things that offer even less
>>101104410Are you within your means and getting the equivalent satisfaction out of it?Then good.Donate some processing to a folding@ho.e project whe you aren't using it and you'll do some good in the process of cooming too.
>>101104443>folding@ho.eOld meme that got absolutely BTFO by the Alphafold AI
>>101104443>folding@ho.e projectis this some sort of distributed sex chatbot project? i would donate
>>101104410>where did it all go wrongthe gender war happened
>>101104483holy shit i was not aware south koreans were based?
>>101104163>>101104199Quick testing doesn't really impress me, it also fails to understand what a blindfold or a gag is, it's forgetting things from 2 posts earlier at post number 24, it also started trying to write lines and actions for me.-v2_Kunocchini-7b-128k-test-Q8_0-imatrix felt better to me...
>>101104500that's the country with the lowest natality rate, it will probably die in less than 50-100 years, men and women literally hate each other in there kek
>>101104500megalia
>>101095184https://www.characterhub.org/characters/Vyrea_Aster/doppelganger-interrogation-simulator-654daf19
>>101104500>holy shit i was not aware south koreans were based?South Koreans guys resent women because they're the only ones who have to do two years' military service. Like the usual women live on easy mode.
Yeah it feels like L3-8B-Stheno-v3.3-32K-GGUF is all over the place, it's forgetting gender, tries really hard to write actions for me and ignores what I say, unusable...
>>101103833Try Wizard7B
>>101104692Well, guess regular stheno with yarn is still the goto then.
>>101104740>Wizard7Bare you talking about Wizard Vicuna 7B Uncensored? Won't that be too far lobotomized compared to the full Wizard?
Whose future models do you expect will be better, Qwen or Deepseek?
>>101103904>real-time video understanding Just build a system for that with llava>can play games with youDepends of the game and you're ability to build an API for your LLM to play it
>>101104774>>101104774>>101104774
Who wants to help me build AGI? Looking for this skillset- Self motivated- Pure C programming- Experience crafting machine learning algos from scratch- Ability to read research papers and implement in code
>>101104777This one https://huggingface.co/bartowski/WizardLM-2-7B-exl2
>>101104631>two years' military servicetwo years of ntr
>>101103488Stheno is a retarded coom tune.