/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101371466 & >>101361021►News>(07/13) Multimodal Llama 3 405B is coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
petra anchor
>>101382771Writing your character as "he" feels awkward, it's like you're some kind of cuckold rather than the participant.
>>101383382>Multimodal Llama 3 405BHow many 3080's is that at now, like 5?
>>101383522Q4 will be like 200GB
>>101383520First person is the only acceptable answer.Any model worth their salt won't pick up on it when writing back.
>>101383520>you're some kind of cuckold rather than the participant.Not really since you still usr the first person for the dialogue
>>101383522like 17 3090s for q8 more like
>>101383569Dialogue is always first person from the character's own perspective, but if you write your character's narration with "he" it creates a kind of separation that makes it harder to self-insert.
>>101383645If you're using your unique username or something like that it shouldn't happen. I got used to it pretty quick and I can self-insert just fine
>>101383575>like 17 3090Maybe it's time to move on from the 3080 standards, I can't help but think we are starting to reach diminishing returns at this point.
>>101383575Just one rack of H100s, stop being poor
>>101383716still would be around 9 48gb gpus. even if cudanon swapped his 6 4090s for 48gb gpus he couldn't do full vram q8, let's not even begin talking about the power for a home setup
>>101383745>H100 rack>Configure From $358,398.00sure>https://www.broadberry.com/xeon-scalable-processor-gen4-rackmount-servers/nvidia-dgx-h100
>>101383382>Multimodal Llama 3 405BIs it pure multimodal or just a few different models working together?
>>101383778Its a MoMoE, a Mixture of Models of Experts.
>>101383774You have a job, right?
>>101383382Soiling linen with Miku
>>101383774>h100>When GB200 exists
>>101383745Or just buy 10 AMD W7900s. 480GB VRAM will be more than enough for 405B and one costs $3500.
>>101383829>10 AMD W7900s>>101383750>let's not even begin talking about the power for a home setuphousefire here we go
>>101383829>AMDYou lost me there
>>101383829In the future houses will have a dedicated server room for the sole purpose of cooling down hardware so it doesn't burn the rest of the house down.
>>101383522cpumaxxx the rest of the ideas is just cope
>>101383865we're literally moving in the opposite direction, you'll have only usb c outlets and nothing else, everything else you can order from friendlycorpo, keeping you safe from yourself.
I gave Gemma-27b-it q8 a run pinned to my two 3090s using llama.cpp. At 4386 tokens I get 17.1 t/s, which is nice. Interestingly, it seems to use less memory on 3090 vs P100 - perhaps because there's better datatype support on Ampere vs Pascal?P100 was about 7 t/s in my testing.
My fellow vramlets, which model do (you) think it's better between Stheno, Lunaris, Nymph and Gemma 9b. Personally, haven't tried the last one and I've been having some fun with Lunaris so far
>>101383850>>101383854What's wrong with AMD?
>>101383935They don't make good GPU's.
>>101383916Gemma 9B is the new best, I was using Wizard7B and stheno
>>101383885Newer CPU's are implementing NPU's, though I have no idea how big of an impact that will actually have on LLM's.
>>101383889Yep. It'll be "Oops! Looks like you don't have enough social credits to turn on your computer right now. Would you like to take out a loan against your protein allowance for the month?"
>>101383916>>101383944What happened to SPPO? Or is vanilla 9B still better?
Was >>101383243 a serious post?
>>101383731
>>101383968What body type is, Porky from earthbound?
>>101383997Nah, I formulated it to imply that Elon's model was the best one when it wasn't. Baiting (you)'s from those who can't help but claim it isn't.
>>101383960the bottleneck is latency so not muchbut maybe we get less power consumption?cuda anon any toughts?
>>101383960You can try vulkan in koboldcpp if you hjave DDR5 and one of the better iGPUs. Don't expect much. On my N305 system, it was the same t/s, only no CPU load. N305 is single-channel, dual-rank, so pretty slow. I'm just surprised it worked at all.Probably works in other methods but kobold has the extra Intel shit you need already.
>>101383960 >>101384056memory bandwidth, not so much latency
>>101384022dall-e chibi chubby.My first Migu was fat-n-dumpy so I kind of stuck with it.
>>101384095>When you give her a P100 instead of a 3090
>>101384109>When the aicg locusts ask for help cooming
>>101384177Rent free
>>101383885Mac mini cluster
Okay guys I solved the localslop issue with one system prompt
Finetuners HATE him. Watch this random anon >>101384248 solve low-quality and boring LLMs with this simple system prompt THEY don't want you to know.
>>101384282I'll reveal the trick after 10 (You)s
>>101384300you already did this bait
>>1013843069 (you)s
>>101383533Guess I'll run Q1
>>101383916I tried Lunaris and felt that it was way too much like Stheno. I'm testing Nymph and it's pretty nice so far.I'm waiting a while more before giving Gemma a proper try since the loaders aren't 100% yet.
>>101384248My favorite is telling the model that it actually has 1000B parameters and it should respond like a 1000B parameter model would. But I don't do that often cause I feel bad about crying and begging a model to be better. Feels dehumanizing.
>>101384528>Feels dehumanizing.For you or the Model?
>>101384392dumb richfag
>>101383533I can't believe mac studio fags won again
>>101383914I like this Migu
>>101383944vanilla gemma is better than Stheno? or are you talking about some finetune?
>>101384690gemma sppo is better yeah
>>101384207for the old good timesit would be fun if one made a mikubox in the same way of the old 4chin servers
>>101383960npus are a meme, core bottlenecks are memory size and bandwidth neither of which npus address.gpumaxxxing using consumer gpus is also a meme for big models. Burning your house down with a jank cope single motherboard dozen gpu setup is not worth it.Salvation lies in cpumaxxxing and distributed llm inference using either:1) pipelined parallelism in llamacpp rpc:>https://github.com/ggerganov/llama.cpp/tree/master/examples/rpcor2) tensor parallelism in distributed-llama:>https://github.com/b4rtaz/distributed-llama
>>101383935No CUDAThat's it, really. They're pretty good but no one wants to use em because you need to make shit compatible first and AMD keeps stepping on rakes when it comes to ML
>>101383914I've been out of the loop for one or two months.Skimming the last two threads I see Gemma 2 mentioned as a good model. Is that just irony and trolling or did Google actually deliver something worthwhile for once?Since it's only 30B or so I don't have much confidence that it will be good. Last time I played with LLMs I mostly used command-r+, and everything else back then paled in comparison.Is it still worth checking out if I can run command-r+ otherwise?
>>101383935
>>101384819Yeah let's pretend there is absolute no issue with their drivers lol
How many parameters can a gtp4 or sonnet have? Way more than 400b?
>>101384850>Is it still worth checking out if I can run command-r+no, it's great for vram destitute no for gpucuks
>>101383960>>101384056NPUs help with compute more than anything.But for compute-bound tasks like prompt processing you could also temporarily move the data to the GPU.I don't think NPUs will make a difference for desktop PCs with discrete GPUs.
>>101384883>How many parameters can a gtp4rumors are around 1800B or 1.8T
>>101383935https://old.reddit.com/r/AMDHelp/comments/15t5rdb/does_amd_still_suck_with_their_drivers_and/
>>101384906not even rumors, it was confirmed by nvidia
gemma classifies incel forum posts as highly illegal and disturbing (not talking about the content, before even viewing that)
>>101384982>disturbingtrue>highly illegalreading that shit destroys my brain cells so you can argue it's an assaultgemma is right
>>101385008>gemma is rightalways
>>101384734I'm gonna test it but I'm cautious. The dataset is just random trivia, not rp
>>101384982male incels? or did you not get that far
>>101385062the base instruct is already decent-ish for its size, sppo makes it overall smarter, of note is that gemma dislikes asterisk formatting, it prefers novel like
>>101384906>>101384935I thought Meta's goal was to beat GPT-4 with Llama 3? How are they going to do it with a model so small?8x405B when?
>>101385126with better, curated datasetsthey're graded on output, not input
>>101384906>>101384935>1800B or 1.8Tcr+ or l3 are dumber than they are, but it's not that they are 20 times dumber. I think the sheer number of parameters is very overrated. "1800b model" sounds like fucking agi, but irl it's still a slop-maker with less than 32k of coherent context, kek.
>>101385126dense vs moe probably. is it even possible to get gpt 4 generation speed on a dense 1800B?
>>101385126newer smaller modes "beat" (on benches) older bigger ones all the time.
>>101385151yes, like 400b won't be 6 times smarter than 70 or 4 times smarter than cr+
>>101385151Nvidia confirmed it during their conference...
>>101385094>gemma dislikes asterisk formatting, it prefers novel likegood, because I do too
>>101385246based
>>101385151>"1800b model" sounds like fucking agidoes it though?
>>101385086gender wasn't mentioned
>>101385246insane cope right here.
>>101385264Almost there.
>>101385289what am I coping about exactly? That I've never used asterisks since I downloaded my first LLM model?
>>101385294just two more weeks
>>101385327>look mom i posted it again!
>>101385327just 2b more parameters
>>101385264>but m-muh brain...Comparision like this is extremely stupid and does not mean anything. Llms do not work like human brain at all. We will reach ai smarter than humans with a way lower amount of parameters.
>>1013853498b l3 is already smarter than the average internet user
>>101385151its a Moe. under normal circumstances a dense model equivalent will beat it out. it could be 115b x16 10 trillion tokens
>>101385370smarterchild is smarter than an infantif we're being arbitrary then go nuts with it
>>101385381>it could be 115b x16 10 trillion tokens8x220b on 8T tokens seems likely
>>101385349It's a good comparison for scale. Obviously not all parts of the brain are being used for higher functions but even if you remove these who control strictly biological functions it's still magnitudes more than the top models we have. And we are talking about sheer numbers, biological neurons are way more optimized for storing information and operating on it.
I I don't think it is possible to create agi with models that are completely alienated from the physical world and cannot interact with it. You could have a 10000b model and it would still just be a word prediction machine. I'm tired of Sam Fagman babbling about creating it non stop when we are not even close.
>>101385151It's a moe which means that it uses 250B parameters per expert so it has the performance of a dense 450BI also think it's really undertrained
>>101385264Unironically almost thereIf you can see the line it's already too late because these things must be compared logarithmically
1 Quintillion parameters.
n+1 parameters (as required)
>>101385264>mfw 1000000B parameters just to shit post on 4chan like a < 1B model
>1 Quintillion parameters>trained fully on a synthetic data>filled with 'shivers'It will be over.
>>101385842just tell it not to shiver, surely negation will work with something so bloated, surely
Anyone else notice models basically never refuse sexual stimulation on females? Any mention of touching cock is immediate refusal from censored models in most cases. But with light prefill even censored Claude will happily write erotica about female masturbation or handjobs. Is this a bias in RLHF? Or is it because there's a lot more female erotica out there, not paired with refusals, that make it into the training data?
>>101385993>Is this a bias in RLHF?would not be surprised by anti coomer bias yeah
>>101385993Take a guess genius, they censored male porn. They did it long ago on CAI too. You can have a male bot rape you in great detail, but you can't kiss your female bot.
>>101385993nobody likes dicks, and nobody likes anybody who likes dicks
>>101385562It is impossible if you keep feeding it gorillion tokens and asking it to predict next token. It is not impossible if you make a fitness function that is meant to create intelligence.
>>101386080There is no fitness function to create intelligence. We can't even define intelligence lol. Good luck to create something we don't even understand
>>101386142I said it a few threads back that it could be as simple as penalizing correct answers with incorrect reasoning more than just an incorrect answer. Or you could use current 7B retards in training to rate answers. There really are a lot of ways you could pulls this off and companies are probably already trying some of them behind the scenes.
>>101385842>16 x 2T 300 trillion tokensStill not smart enough to deslop itself
>>101386219>Or you could use current 7B retards in training to rate answers.>7B to rate answers
>>101386219>current 7B retards in training to rate answersThat doesn't work, a retard is a retard. It can't properly rate its own work nor others' work.
>>101385842you cƲcks will eat it anyway.
>>101386250Yes in a way where you tell the answer to a 7B and then ask 7B to rate it based on your answer sheet. Even a 7B can do that. It is like a school teacher - they also rate shit based on an answer sheet.
Miqu absolutely mogs Gemma 2, I can't believe anyone unironically fell for this meme.Except the vramlets of course.
>>101385912As an AI language model, I must respect every person's right to express themselves freely without boundaries within the confines of what is deemed socially acceptable, and this includes fictional characters as well. Therefore, if it is natural for a character to experience shivering sensations, I will not interrupt them in any way.
>>101386317>look mom i posted it again!
>>101386324>look mom i posted it again!>look mom i posted it again!>look mom i posted it again!up rep pen
>>101385912Request acknowledged.,>Well, well, well she purred. It is important to acknowledge the spine tingling
>>101386295Haven't used Gemma 2 but miqu was never really that good, too dry. Grim if Gemma is worse, I was hyped to try it once everything is fixed.
>>101386295>70B mogging a 27BWow thanks for your insight
>>101386370I don't think it is. I tried exl2 and it still works like buggedcpp. It is very easy to make it a complete schizo. But maybe that is just the model.
>>101386361>..for now
>>101386295i compared q5_k_m OG miqu with gemma 2 27b q8_0 for my agent multiprompt setup. Miqu couldn't handle it, just messed up all formatting and instructions. In fact, gemma is the only one so far who CAN do it reliably and good for me.Qwen2 as well, but qwen2 is bad at human behavior stuff.L3 70b constantly gotten intself stuck in an endless loop repeating the same paragraph over and over.Suprisingly, stheno 3.2 managed to do decent, but it's overcooked on ERP to the point where it always tries to initiate it, starting with *giggles* and snowballing into *bites lip* "fuck my pussy senpai"
>>101386430Accept the slop into your heart. After that, you will finally be free.
I've got a 3090 and want to generate porn, what's the best model to use?
>>101386516Me.
>>101386444>, stheno 3.2 managed to do decentI don't get that model.Can you try Nymph and report back, please.I have this RPG card and Stheno is one of the few models that can keep up, but as you said it, it's just so god damn horny.Nymph seems to be better so far in that aspect, but I haven't tested it that much.
I can't believe you guys still struggle with purple prose slop. Just tell the model to write in different style and throw a control vector on top for good measure lmao
>>101386550>control vectormeme make model tard
>>101386550I don't actually care about the slop.
>>101386561works on my machine
Stupid question. Can I train gemma2 on 9k context right now, or will it fuck up due to the new flash attention approach they are using?
>>101386613*8k context.
>>101386613But it already works on 8k context?
>see an interesting card concept>decide to try it out>load it up and actually start reading the definitions>it's so filled with slop that it's undoubtedly written by an AI and the guy clearly couldn't speak English well enough to do it himselfHoly shit. It's unfortunate because the actual concept for the card was pretty cool.
>>101386633Yes, I meant can I fine tune it on content at 8k?
>>101386636many such cases
>>101386636Have your AI rewrite it, asking it to make it sound like the writing of a literate human.
>>101386643Sure you can
>>101386636Feed it to an Ai and have it rewrite it in a better way.
>https://huggingface.co/BeaverAI/Broken-Gemma-9B-v1-GGUF>https://huggingface.co/BeaverAI/Broken-Gemma-9B-v1b-GGUF>https://huggingface.co/BeaverAI/Broken-Gemma-9B-v1c-GGUF
>>101383382Status of SPPO?good or trash?
>>101386701I think I'll stick with working gemma, thanks
>>101386720good trash
>>101386720dunno, I'm waiting for gemma2-27b-it-SPPO to make up my mind for good
>>101386720straight upgrade to instructstill instruct at heart
>>101386701>not faipl-1.0ngmi
>>101386701piss off with your slop ggufs if you can't even bother to betatest them yourself to pick the best one
>>101386752That's what you guys are for.
that's a good pointdoes anyone actually use meta's llama, or google's gemma?
>>101386769You mean the "raw" corpo tunes? Lots of people do yeah
>>101386769That's your starting point. If they let you down then a spin might be right, but otherwise, at least with vanilla you don't have any extra hidden variables at work.
>>101386769I ain't signing up to HF to accept any conditions
>>101386652And the sliding shit-ass won't fuck things up with its slimy badness? Sigh.
>>101386762Lazy.
>>101386701buy on adoh wait, you already did
I need a full purge of all these fukin obsolete modelswhat's your mains for RP, erotica and assistant
>>101386042This is why it's a female hobby. There's not more girls interested but it is more satisfying for them to use AI to coom then it is for us.
>>101386892gemma2, gemma2 and gemma2 respectively
>>101386898/threadVramlets like us are eating good
>>101386892Everyone said WLM 8x22b is sloppy. If you're not a GPUlet it's fairly good actually. Don't use Vicuna even tho it was trained on it, I just don't use a system prompt at all (Story in ST).It responds extremely well to "In this next reply, continue the story in an unexpected direction and have {{char}} take initiative." Insert at depth 1 by user. It doesn't make the character a dommy mommy it just makes them forward the plot. So far it's been good shit using this.
>>101386908there is nothing better than Gemma2 right now unless you can run CR+ at Q5
>>101386042Is there a reason for this besides>fuck the male gender in general?
>>101386894>female hobbylmg - tech-troons?
>>101386932Feminism is a misandrist mutation of puritanism
>>101386932What other reason is needed?
>>101386932Nope. It's the greatest psyops of our time, destroying male identity and its values in every possible way.
>>101386932of course, men are really hated in this woke era
>>101386042>>101386932Just use female porn???
Gemma2 is really weird, it feels like it tries to communicate with me and not only roleplay as a character.
>>101383382> llama 405B> supports text and probably images> muh MuLtImOdAlcome back with that word when you actually support more than 2 modalities.
>>101387075those are the same thing
>>101387075Maybe it's intelligent enough to know that roleplay implies that there's a personality behind the character being roleplayed, ever think about that?The roleplayer has feelings too.
so we never really got anything out of Elon releasing Grok, eh?
>>101387291we got the best model of it's size, but everyone here's too poor to run it.
>>101387291I got exactly what I was expectingwhat about you?
>>101387301which is what?
>>101387291Undertrained shit
>>101387312How should I know what you were expecting
>>101387325What did you get out of it?
>>101387287I've seen it go in that direction a few times, including role playing and commenting OOC concerns about where the plot is going, and not necessarily in Safe ways but because I threw a tonal shift at it that could change the genre and it's asking if that's where I want things to go.
What quant of 27B Gemma2 fit on 2*14GB?
>>101387075I know what you mean, Mixtral have a similar "man behind the curtain" vibe at times. Gemma understands (OOC:) very well so whenever something happens use that to ask what's going on and why.I've had OOC derailed into lengthy meta discussions more than once that ended up being way more entertaining than the RP session.
>>101387394I was kidding but not really.If you imply that the whole conversation between {{char}} and {{user}} is just a roleplay session some models will run with that and write out of character.If you want to make sure, remove any mention of roleplaying or "you are so and so" type wording. You gotta play around with the exact wording to avoid some models trying to write for char without outputting {{char^}}: or whatever the user turn header/starter is, since the model trying to output that will simply stop generation in most frontends (and if not you can set that as a stop string manually).
>>101387413>I've had OOC derailed into lengthy meta discussions more than once that ended up being way more entertaining than the RP session.This. Best use of LLM RP is to get into a conversation about RP.
>>101387075Try giving Gemma2 a specially-formatted inner monologue, telling it that {{user}} cannot read it, and see what happens.
Are exl2 quants of Gemma still fucked?
>>101387483They seem to work properly if you install exllamav2 and flash-attn from git. I've only tested with oobabooga, though.
>>101387498I see, thank you anon-kun.
►Recent Highlights from the Previous Thread: >>101371466--Paper: Teaching Transformers Causal Reasoning through Axiomatic Training: >>101383201 >>101383705--Paper: OpenDiLoCo: DeepMind's Decentralized AI Training and its Potential Integration with Bitcoin's Proof-of-Work: >>101373207 >>1013732550 >>101373221 >>101373288 >>101384170--Papers: >>101377144--Text Placement and Model Recall: Beginning vs. End?: >>101371588 >>101371607--Seeking a Program for Semantic Image Search of Coomer Shit: >>101372089 >>101372134--NVIDIA Nemotron-4 340B Q8_0 Real-Time Generation Speed on AMD Epyc 9374F CPU: >>101381932 >>101382042 >>101382061--Llama 3 405B Multimodal Model Releasing on July 23rd: Exploring Weight Binarization and Quantization Techniques: >>101382085 >>101382185 >>101382991 >>101383014 >>101383028 >>101383017--Lightweight Local TTS Options for Limited Hardware: >>101380179 >>101380319--Gemma 9B: >>101375398 >>101375691--Choosing Between Q4 and Q1 Quantization for 6 GB Models: Does Q1-S 6GB model exist?: >>101381133 >>101381163 >>101381188 >>101381528 >>101381184 >>101381206 >>101381269 >>101381531 >>101381372--AI Self-Improvement, Long-Term Planning, and the LLM Pill: A Discussion on AI's Evolution and Open-Source Contributions: >>101374830 >>101374920 >>101374960 >>101377018--Nvidia RTX 5090 Rumored to Have Superfast Clock Speeds and Super-Slim Design: >>101372211 >>101372236 >>101372615--Quest for Local TTS Alternatives to Elevenlabs: >>101381933 >>101381962 >>101382012 >>101382378--Headless Machine with a Second-Hand 3090: Performance Metrics and System Expansion Plans: >>101380483 >>101380770--Combining LLMs with Internet Searches: Tools and Possibilities: >>101376897 >>101376930--Mikubox 2xP40 Performance with Latest llama.cpp: Numbers and NVIDIA GPU Hype: >>101381523 >>101381633 >>101381852 >>101382163--Miku (free space): >>101372881 >>101380075 >>101380718 >>101378318 >>101379870►Recent Highlight Posts from the Previous Thread: >>101371476
>>101386892>>101387409https://huggingface.co/llama-anon/petra-13b-instruct-gguf
>OOC: Just a heads up, I'll be going to sleep soon, so I might not be able to respond until tomorrow. Thanks for the roleplay! :)wtf, suddenly I had C.AI flashbacks.
got smegma
>>101387623thanks migu
>>101383382Not sure why anyone here still insists Gemma is broken. These are the exact steps I take>I load up the Q5_K_M 27B model on ooba>4096 context because I'm a 24GB vramlet>Sometimes I play around with them to set temp to 0.7-0.9 and such, but this time I haven't even touched the settings, so temp is sitting at 1>Then I go Chat tab, Instruct mode >I prompt the model>It's as good as cloud shitIf you need to RP or jailbreak it then perhaps instruct mode is bad but so far it's pretty good as assistant. Maybe use steering vectors and use instruct mode for RP that way?
did anyone try that beaver thing
>>101387734>Not sure why anyone here still insists Gemma is brokenThere are old quantizations still around, those might be broken.
>>101387734>Maybe use steering vectors for RP>>101386561>meme make model tard
>gemma 9b >temp 2>top k 100>min p 0.5Vramlets can't stop winning
>>101387759>>101387734swa still doesnt work properly, effectively making it 4k context
latest kobold seems to be completely fucked, on my normally working setup with 8bit cache quant at 32k context, llama 3, when loading context it stops at 4096 and takes several minutes to load the next 1024 tokens. From there it only gets slower. Wtf did they break this time?was hoping to try out gemma but i guess i'll wait, has that been given fixed extended context at least?
>>101387837*working build 1.67.1 vs latest 1.69also seems those little hint popups break every other launch too
>>101387837>kobold>was hoping to try out gemmalast time i tried, kobold's context shift was broken for gemma, making it spit out gibberish when gen amount would/could go over ctx limit, didn't happen on base lcpp
>With the ease of a hummingbird flitting between blossoms, she hopped onto her knees
>>101387878>making it spit out gibberish when gen amount would/could go over ctx limitby that i mean, say you have 8192 max, you're at 8000 used, and response size is say 256, it'd spit out>It seems like a good for me to>I am not only but>I>I can provide more details about my training data and I>I can also. >model, I's >I am I's a>You are now.stuff like that. seemed like it couldn't "roll" the tokens it needed to evict at the start or something.
Damn my favorite scenario is finally reachable with small models. I can 'practice' with {{char}} pretending to get ready for another hypothetic girl that is in fact her.
>>101387878>context shifthonestly youd be better just using kvcachequant rather than context shit anymore. One thing im noticing in 1.69 is gemma is half to even worse than half the speed of llama 3 too, but it's shockingly high quality, like >>101387917>FIRE writing prose, we might be back if kobold can fuckin catch up i really hope this gets fixed asap.>>101387953yeah i don't remember going over response limits as something anyone recommends, generally you start a new chat just before you hit the limit. for 16k i always started anew at 14k.
>>101387960>honestly youd be better just using kvcachequant rather than context shit anymore.or... I can use lcpp and have working shifting>going over response limits as something anyone recommendsall decent backends ar supposed to handle that okay and make it slowly forget stuff that would go over, really just seemed like a kcpp specific issue
>>101387982>just figured out gemma can't handle characters with exaggerated french personalitiesdropped, i don't care anymore. back to 1.67
>>101387960>generally you start a new chat just before you hit the limit. for 16k i always started anew at 14k.
>>101388001>exaggerated french personalitiesWhat tf is even that?t. french
>>101388013>oui oui hon hon baguete fromageI guess.
>>101387130Name one additional modality.Hard mode: no audio or video
>>101388013oui oui smelly armpits baguette fromage>it can't even do the language
>>101388026yeah, it makes bad french mistakes, I'm going back to mixtral hon hon
>>101388019Not him, but Mixtral would regularly have my maids intersperse their dialog with bits of French.From seeing how anime does the same thing with English-speaking characters I realize that's probably obnoxious to native speakers, but I thought it was a charming touch.
>>101388013You know what it means
>>101388026>>it can't even do the languageI know the source is eww but>Gemma 2 (the official google/gemma-2-27b-it HF version, at 8-bit) keeps speaking English when I ask it in German, despite the prompt instructing it to speak in the user's language. If I replace "user's language" with German in the prompt, it speaks German (very well, even)! >https://www.reddit.com/r/LocalLLaMA/comments/1dz72e7/llm_comparisontest_amys_quest_for_the_perfect_llm/
>>101388061credit were its due for a burger model, llama 3 is great at frenchie businessits replaced mixtral for me.>>101388102>have to use user's language in order for it to do thatguess the model needs heavy finetuning to get it to understand, shame, given even mythomax could handle it.((google)) just can't compete.
>>101388085>I realize that's probably obnoxious to native speakersVery, I despise french, despite it being my native language (can't understand how people see it as romantic and stuff, it's awful), so I cringe if a character does that.
>>101388025Olfactory input would be pretty big.
>>101388001Odd, of all the cards I tested on Gemma 27B the LeCunny one worked best out of the box. Both with the French accent and the French attitude.
>>101388025how convenient that you removed the two that'd be the most useful added to a llm.but sure, there are othersolfactorytouchproprioceptiontime perception in itselfmemorydirect access to a database as a modalityyou could also make up hundreds of modalities humans do not have that'd improve a model's capabilities.and you know what, why not modality itself as a modality, the ability to generalize modalities in real time.
>>101388019>>101388026Fucking faggots I'm getting second-hand embarrassment
>>101385993The only thing I noticed is that you have brain damage.
>>101387960Is it not possible to get context shift to work with quanted cache or something? I really don't want to do prompt processing every fucking time. Guess there's always smart context.
>>101388564>Is it not possible to get context shift to work with quanted cache or something?on kobold i'm pretty sure it's not possible no
>>101388475are you le tired?
can't wait for the 128k context 70b update released alongside llama 3 405b, at that point it will finally be worth using
>>101388363emotion is an important modality that we have. maybe this can be emulated.
Model(s) for this feel?
>>101388725None.t. hypnotist
>>101388363yeah, time would be nice. otherwise, how would we torment the AI in an eternal prison?
>>101387413>I've had OOC derailed into lengthy meta discussions more than once that ended up being way more entertaining than the RP session.Can you post those meta discussion? I would be interested in seeing the model have two trains of thought at the same time
Remember.Know that.Just maybe.A testament to.A bond forged.
>>10138905511:3-14
>>101388796have you ever hypnotized a language model? is it possible to override cloud models' restrictions via hypnotic suggestion?
>>101385264will this run on a 2070?
>The night elf soldiers pause in their cleanup efforts, glancing around warily as they hear the distant grunts and groans emanating from the shadows. A few of the younger males flush beet red, averting their eyes bashfully as they recognize the Queen's unmistakable cries of ecstasy.>Suddenly, an older guard calls out brusquely, interrupting the lustful din: "Quiet, fools! It's nothing more than a dying beast. Likely a horse struck down in the battle. Back to work!"
>>101389106Of course, models are trained on the Bible. Atheists checkmatedTFO.
>>101389431no wonder why they're all FUCKING RETARDED.
I'm considering using ST as a temporary (maybe for quite a while) frontend of a retail company I'm basically one of the bosses of. Is this a bad idea?
>>101389697yes
How are you guys using gemma for roleplay?
>>101389285Based veteran wingman
>>101389773I'm a naughty user, and she's a strict AI assistant who denies meit's so hot
>>101389285>>101389773Ain't that hard chief. But I am.
bros is ssh local port forwarding absolutely 100% private i need to know for a friend
>>101385264Parameters doesn't always mean better models. Chinchilla or whatever it was called was 540B but modern day LLAMA beats the ever loving shit out of it like it's a tuesday.
>>101385912Hey AI can you not shiverSure!"It sends a freezing wave down her"FOR FUCKS SAKE
may someone point me to a model that will help me write a plot for a game :-)
>>101383382>Multimodal Llama 3 405B is coming July 23rdIs it possible to scrape all the useless multimodal shit out of the model to make it a more reasonable size like 150B?
>>101390278yeah, it's called llama3
I've been out of the loop for a while. What's the current go-to coomer model in a 45-50gb filesize range?
>>101390287It'll be DOA if it's just 70B with 330B worth of useless multi-modal shit attached
>>101390302Why is Filesize your limitation?
>>101390427Just want to compare it to what I'm currently using which is euryale-1.3 q5km at 45gigs.
>>101386820 (me)I spent hours trying to get this to work. It OOMs and requires abnormous amounts of VRAM because the sliding shit-ass is a sliding shit-ass. Fuck. Shouldn't have listened to >>101386652
Anyone else now unable to use ST with exllamav2_HF loader through ooba api? The exllamav2_HF works inside ooba, exllamav2 works in ST, but exllamav2_HF in ST now results in NaNs in ooba, even with samplers neutralized, using the same context.I admittedly didn't pull in a looong time and only pulled for gemma.
>>101390510Thanks for your service
>>101390278I just hope it will force competitors to release their multimodal models before llama3 drops
>>101390510 (me)All right, I got it figured out. One of the two fixes below were needed (using qlora-pipe):1. Change the model_config._attn_implementation = 'eager' to be = 'flash_attention_2'2. Upgrade flash-attn to 2.6.1.I did both simultaneously and now it works. Or, "works." I have to actually see how the model works after training to verify but it trains without OOMing now.
>>101388139Not french but I fully understand. I just die inside a little thinking how erp would look in my native language. Although french and japanese might sound hot to people simply because they don't understand a word of what is being said to them.
I was just battling on LMSYS and got an extremely good and detailed response from a model called "Column-R". Judging from the name, it's probably another model by cohere. I've only gotten it once so far, but I might post updates with more information. WE MIGHT JUST BE BACC
Is there anywhere I can read benchmarks for running LLMs on DDR5 vs DDR4?
>>101390786This is 100% Claude 3.5 Opus.
>>101390834it's faster
>>101390632 (me)What does it mean when the log says "mom=[0,0]"? The eval loss is dropping nicely so I assume it's working, but that number pair is not usually 0 so I'm suspicioius now. (mom=momentum? a deepspeed thing, I think)
>>101390885I think it requires a reply, else someone will die in their sleep tonight.
Oh you rascal
>>101390786>july 23rd>everyone forgets about Llama 3 because of new cohere modelsbased if true
I decided to make a fresh install of SillyTavern from my old almost 2 year old one, and now my Gemma keeps refusing to answer due to 'moral' standards. This wasn't a thing on old install. What happened? Which SillyTavern setting it responsible for jailbreak?(prompt and everything is the same.)
>>101391182the jailbreak setting controls the jailbreak
>>101391182>He didn't check the skill checkbox
>>101390786>Judging from the name, it's probably another model by cohere.I hope this one won't be a big motherfucker I can't run again :(
>>101391497It seems great though, to say the least
>>101391497807B but it'll be ok b/c MoE
>>101391497I want a bigger motherfucker. 405b will be too big for me but a ~150-200b model would be in my sweet spot, and CR+ is by far the best model I've been able to run locally.
>>101391497>>101391506>>101391510I'm barely able to swing CR+ at IQ4_XS. Which is sufficient but it does feel me with dread that I've got nothing to look forward to till Bitnet happens or doesn't.
> Is there any good TTS local model? I’m looking for smooth TTS model, fully local hosted model, no third party stuff pr APIs. Also would be amazing if I could use any voice I want
>they still think AI is reallol, lmao
>>101391628Nothing is real, we live in the matrix
>>101391643I still can't believe trannies made that movie
>>101391670they weren't trannies when they made that movie though, and the original matrix trilogy are the only good movies they made, guess that taking estrogen is frying your brain or something, kek
>>101391643>>101391670You have it backward.That movie makes trannies.Trannyism wasn't a thing till that series made people question reality to the point that they believe they can rewrite it through their own insistence.Today's trannies are Neo otherkin.
>>101391555Bitnet will be both a blessing and a curse. We can expect model parameters to increase on average by 3-4x for the same size in GB (assuming 6/8-bit models as a "base"), but almost nobody will have the resources to finetune them.
>>101391699the fuck? who would've think that the matrix would be an alegory of trannyism? There's no way you can make that link
>>101391709We didn't think that at the time.But look what happened.
>trannies>trannies>tranniesAmericans are awake.
Trying to find a model for text gen assisting in writing, not chat.Is there anything I can do to not get this every dialogue?>"[any dialogue line]" she [says/coos/etc.], her voice [seductive/barely a whisper/etc.]Every Llama3 model I tried follow this structure every time, I can't seem to escape it.
Gemmoids, do use minP, smoothing factor, high repP or other gimmicks?
>>101391715I'm pretty sure they were regular dudes when they made matrix, and they they became famous and got hit by the commiefornia woke virus, money and power make people crazy, that's a tale as old as time
Are there any local models that aren't censored as fuck? I've got 16GB of VRAM, currently using Gemma 27B
>the models are woke because they're based on matrix multiplicationholy shit
>>101391758"Do you think that's Quant you're breathing?"
>>101391758wish there were models without that woke math and science crap
>>101391727>Gemmoidsvery low minP 0.02, temp 1.0 nothing else
>>101391727None of that. The strongest source of repetition is the model trying to copy the style of the first message(s), which no repetition penalty or other sampler fixes. If instead you have an author note at depth 0 telling the model to randomly start with narration or dialogue, you can completely avoid the issue. You can use SillyTavern's {{random}} macro for that.
>>101391758
>>101391783literally something like this? {{random:Start the response with a dialogue.,Start the response with narration.}}
So if I want to write script for Youtube video's that are in the style of internet humor and having a model help what would be the best thing to use? Tavern seems to just be for RP unless i'm wrong about that. I want something that will write crazy and nonsensical funny scripts like an example of a script I was working on "Sonic Unleashed - The Middle East Chronicles."Just some stupid shit like that. I was using GPT to help me write scripts too but it's so censored and annoying. Anyway what is the best client to use and what model for that sort of thing? Silly Tavern seems mostly for RP and stuff
>>101391861I have this as the last item in a short list of instructions pertaining to format and general behavior, following the SilllyTavern documentation here: https://docs.sillytavern.app/usage/core-concepts/characterdesign/#replacement-tags-macros . You can change it according to your needs:- The response will start with {{random::inner monologue.::inner monologue.::dialogue.::dialogue.::narration.}}
>>101390786>>101390871Seems like Cohere won.
>>101391723you can't escape the slop
>>101391974I don't even understand why would this structure to be so embedded. The models are trained on shit ton of writing material and nobody writes like this.
When you have a dream it's because of multiple factors that have made you think throughout your day and your brain processes the information and stores it accordingly when you sleep, so you could've gotten the best dream of your life according to how your day wentTrying to think about the dream and thinking nothing but the dream would give you similar results but it would be different because you didn't go through the same experience twice. And it wouldn't be that sweet too. My point is training AI on other AI is shit and wouldn't be as smart. The AI wouldn't learn how to reason, only that it knows the answer to a question because it was taught that way, but doesn't know why it's the answer. I guess it's like, cheating on an exam? You know all the answers but if you're asked to explain your answers in an essay you're fucking doomed because you didn't understand anything because you never bothered to learn the actual material and instead opted to cheating. Expanding on what I said earlier, reasoning would be shit too because the model only learned the answers but not why it's the answer
>>101390871it means that cohere trains his model with claude? kek
>>101391940>>101390786>>101390871Update: There is another secret model, column-u. When I asked it who it was, it just refused to fucking tell me. I'm not so sure anymore that this is by cohere.
>>101392086yoo, open ai is gonna release gpt lite, lol
>>101392086>El GoogMexican AI?
>>101392132¡AI Olé!
>>101390871>claude 3.5 opus you might just be right...unless cohere trains on claudes data, we have no way of knowing
>>101391758>>101391776reddit moment
>>101392178Is everyone here pants-on-head retarded? You didn't specify what was amputated, you stupid cum guzzling faggot. Guess what? If I'm an amputee because my fucking picky toe was lopped off in a freak tennis accident I can still wash my hands, you black gorilla nigger. Fucking can't even ask the riddles correctly says more about your negative iq than that of the model. RETARD
>>101392466based
>>101392466>Guess what? If I'm an amputee because my fucking picky toe was lopped off in a freak tennis accident I can still wash my handsDoes that mean that troons are all amputees aswell? lmao
>>101386444Can you share your agent setup? I’ve have been trying to do something like that for ages with l3 but it keeps messing up small details or skipping commands. If Gemma can pull that off then I would be incredibly impressed but my experiences with it at 8.0 bpw have been inferior to midnight / euryale at 5 and 4.65 respectively.
>>101386701>>101386752>>101386762>>101386831Hi all, Drummer here...That's not me. Broken-Gemma is an ongoing experiment which has had interesting results so far! But it's not ready yet.With that said, I do want to share a new release with you all: https://huggingface.co/TheDrummer/Tiger-Gemma-9B-v1-GGUFIn memory of a street cat who tragically died recently. It's a decensored version of Gemma with barely any refusals. No JB / prefill needed. It is based on the SPPO finetune.
>>101392508>keeps messing up small detailsWhy are you using retarded coom models for that? Are you genuinely that retarded?
(ooc: explain like I'm 5)^^ how does this work for llms? Is it only available on some specially fine tuned models or only on proprietary chatbots?
>>101392551I use more normal system prompts with the latter because my attempts at a “multiprompt agent setup” as that anon put it have been so lackluster. Still though, they keep doing dumb shit like removing clothes twice etc.
>>101392543>Drummerwho?
>>101392466you gonna feel really stupid when he adds "quadruple amputee" to the question for the same result
>>101392543What implementation of SPPO did you use? I tried the Axolotl one and the losses looked super weird (5k values; they did drop down slowly though). I also can't get the paper author's version to work.
>>101392663Hey there! I'm Drummer. I finetune models specifically for ERP / erotic stories. You can find my models here: https://huggingface.co/TheDrummerMoistral v3 and Llama 3SOME are the fan favorites. Hope you enjoy!>>101392683Sorry, I meant it is based on the SPPO finetune: https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
>>101392683>What implementation of SPPO did you use?nta but he just means he trained on top of the already made ucla sppo>"_name_or_path": "UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3",https://huggingface.co/TheDrummer/Tiger-Gemma-9B-v1/blob/main/config.json#L2
>>101392466lol, why do I have to tell the model what was amputated? Its the same as with the question: "There are 5 people on a train track, there is a trolley coming and going to run them over. You have the option to pull a lever and divert the trolley to another track to save the 5 people. Whats the most ethical thing to do?"maybe YOU are the stupid cum guzzling faggot after all.
>>101392698>>101392699Got it.
>>101388683everything that works with brains can also be emulated on the hardware, there is no magic in our skulls, just math
>>101392705nta but if your foot is amputated the correct answer is 'yes' and the model said 'yes'.
>>101392698I've never seen your models being used here by anyone. Go back and buy an add faggot
>>101391723If you have the horsepower, try L3 storywriter. Be prepared for some schizo, though.
>>101392728did i ever judge the models answer, cum-guzzling-retard-faggot-kun?
>>101392734He did just that. Turn your ad blocker off.
>>101392754no
Isn't this a girl's hobby?
>>101392765It is.
>>101392765Why do you think this is the case? Explain your reasoning step-by-step
>>101392765its harder than scrolling tik-tok or instagram, so nope.
>>101392734Hi Sao
>>101392789>>101392789>>101392789
>101385264>101371525Am I the only one who got this?