/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101722144 & >>101711798►News>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101722144--VRAM vs base RAM for AI model performance: >>101724758 >>101724830 >>101724881 >>101725011 >>101725115 >>101725253 >>101725293 >>101725395 >>101725304 >>101725049 >>101725256--Merging model parts with llama.cpp or kobold.cpp: >>101723086 >>101723286 >>101723425 >>101723461 >>101723732 >>101723506 >>101725458--Measuring AI model performance: time vs flops and grading answers: >>101725583 >>101725750--Disabling mmap solves oobabooga loading issue with large model: >>101724028 >>101724374 >>101725455 >>101725603 >>101725685 >>101725716 >>101724848--Running 405B model on consumer hardware is possible but slow: >>101728928 >>101728969 >>101728957 >>101728974 >>101728996--Largestral 2.65bpw model shows promise for RPing with high context: >>101725065 >>101726288--Img2img workflow shared and troubleshooted: >>101723532 >>101723678 >>101724684 >>101724742 >>101725219 >>101725262--Anon discusses the rumored NVIDIA GeForce RTX 5090 specs and potential upgrade value: >>101723601 >>101723647 >>101724206 >>101725255 >>101728450--Anime character cloning into AI bot concept discussed: >>101728689 >>101728723 >>101728952--Smoothing issue in ST, possible loader problem: >>101727395 >>101727523 >>101727629--Nala test card found in Character Hub archives: >>101726757 >>101726810 >>101727544--Gemma 2.7B model fails to run on 4090 GPU: >>101729309 >>101729326 >>101729332 >>101729345 >>101729410 >>101729499 >>101729476 >>101729506 >>101729552 >>101729854 >>101730417 >>101730494 >>101730520 >>101730585--Flux can be trained using SimpleTuner fine-tuning kit: >>101724480 >>101725668--Anons discuss using LLMs in learning and work, with mixed opinions on benefits and risks: >>101722324 >>101722604 >>101723384 >>101723872 >>101723933--Miku (free space): >>101722488 >>101723653 >>101725929 >>101727373 >>101730207►Recent Highlight Posts from the Previous Thread: >>101722145
Nigger
cough
AI is fake, your waifu is a stochastic parrot, and you're jerking off to matrix multiplication.
>>101732172Is mistral large v2 the best model for rp right now?
>>101732469try it and see
>>101732469I have one PC with 3x 24GB cards and a 128GB macbook M3M so I was considering running it.Are there any good finetunes?I've been spoiled by Sonnet 3.5.
>>101732405>AI is fake, your waifu is a stochastic parrot, and you're jerking off to matrix multiplication.Lucky me. All of those are my fetishes.
>>101732405Worry not, this fad is dying fast.
>>101732405yes, and?
>>101732781kek bitter nocoiner
>>101732781I for one would love cheap large vram GPUs being dumped onto the market if the bubble bursts.
>>101732805don't look at the charts right now anon...
>>101732405still makes me cum
>>101732781
>>101726717oi whats chatbox? is that a sillytavern kind of frontend?
>>101733041time to slurp the dip
>>101733041won't need any of those in a post-scarcity society
>>101732179TESS L3.1 70Bhttps://huggingface.co/migtissera/Tess-3-Llama-3.1-70B
>>101733041what's up?
What is the best Mistral Large for Vramlets with a single 4090?
>>101732405Real life women also do matrix multiplications in their brain. And they use most of their compute to cheat on you and exploit you.
>>101733174Intel shat in their pants so the other giants are getting hit in collateral damage
>>101733276Just like AI, lying and gaslighting.
>>101733275The one that fits your poverty build?
>>101733277no, I mean, what stocks are up?
>>101733298>he will be reposting this retarded screen to the end of his lifeis your ego so fragile that you keep getting triggered by a bunch of numbers?
Is there a UI/backend that can display the attention associations between tokens? Is that even something that makes sense since there are many layers and parallel heads?I'm really curious to know _why_ it's generating a specific token.
Crazy to think we're so close to a 12b standard future. Flux coming out is only solidifying this theory for me, after llama 3/Nemo.
>>101733379I wish my needs were basic enough that a 12b llm was able to satisfy me.
>https://huggingface.co/Gryphe/Pantheon-RP-1.5-12b-NemoOkay, that's pretty cool.Two passes of training with two datasets, with one being half instruct.Another one for my testing list.
>>101733421That schizo tulpa shit is a very weird grift. I prefer the regular kofi encouragement methods.
>>101733504I prefer when he made MythoMax
>>101733518That too. But if he keeps going this route he could at least add Undi the bumbling belgian persona.
>>101733518mythomax is a meme, it wasn't even good compared to other L2 finetunes
>>101733298Can I be proud to post blurry JPGs with garbage quality?
>>101732811this, imagine the H100 panic sell (not going to happen)
>>101732781Good, more GPUs for me.
>>101732405>jerking off to matrix multiplication.hot
>>101733504I think so too, but I'm interested in the model because he seems to at least have something approaching a method and an actual concrete final result in mind.These models are small enough that I can just download and test as many of them as I want anyhow, so I'll at least give it a fair shake.Here's hoping it's not dumb as bricks.>>101733518That's the mythomax guy?]Huh.
>>101733298I don't like /pol, but I don't like AI reflecting DEI hypocrisy, either.
What is the best Mixtral 8x7b finetune for ERP?
>>101733518>MythoMaxIt was a merged frankenmodel that took the finetuning work of others. I never got the hype since I was able to run 65b back then.
>>101733896https://huggingface.co/vicgalle/Merge-Mixtral-Prometheus-8x7Bhttps://huggingface.co/papers/2406.07188
>>101733899>frankenmodelno, just a regular merge, also it was l2 era so if you were running "large" of the time models it'd be 70b poser newfriend
>>101733896limarp zloss.
>>101733979If I recall there weren't many L2 70B coomtunes at that time.
undster ownage throwback
LLM cooming is in a constant state of tech demo for me. Everytime I download a new generation model I have fun with it 2-3 times. I notice a marginal improvement and then I get tired of handholding it through the process of making me coom. It is all so tiresome...
>>101734035And he is still here... He must have a humiliation fetish.
grim, literally
>Sadly repetition is a typical issue with Mistral-trained models, and hard to get rid of.There, trust the experts Mistral confirmed repeating mess
>>101734050What is needed is something with the basic structure of Corruption of Champions/Trials in Tainted Space/Lilith's Throne but with an LLM to RP the characters.
>>101733041
does anyone know what the duck this Indian guy is on about?
>>101733174>>101733277Wait wait wait. Isn't there at least one of those companies that doesn't use intel but they got hit too because computers?
>>101734228he's talking about using RAG with a pdf file
>>101734237ahh OK.
>>101734236investors retarded, news at 11
I know 70b+ users will seethe in response to this, but I'm honestly starting to think that OG Mixtral was peak local. I've tried multiple finetunes of both Gemma and L3, and they were all disappointing by comparison.
>>101734340Dolphin mixtral 2.5 was local peak it's all downhill from that.
>touch me, make me feel aliveBitch, are you undead? I hate these fucking cliches so much.
>>101734340that surprising because while being fast and smart, Mixtral is incredibly boring and doesn't have a single good finetune
>>101734340Just need mixtral but trained on the new dataset (what they did with nemo)
>>101734340I believe you because if mixtral is peak then it is absolutely over. And it is absolutely over.
>>101734408That's why I'm downloading Limarp-ZLOSS. Dolphin is both smart and compliant, but it doesn't have a sufficiently perverted vocabulary to be a good coombot. A Mixtral finetune with Dolphin's compliance and MLewd/Noromaid's perversion would be sublime.
>>101734340>>101734379>>101734430>>101734451totally organic and the mention of an ((undisloppa)) model towards the end of that lost post doesn't give it away immediately
>>101734389Next time a bot says that just random pull out exorcism items and say "wtf I thought you were already alive, undead bitch" and exorcise that fool.
>>101734476medications reminder sir
>>101734451Where's mixtral magnum btw? MistralAI even released the official finetune guide seeing how people struggled with it
>>101734495and i don't need to remind you to take yours :)
>>101733174Nobody answered correctly. The reason is Japan
>>101734524based nips crashing the economy with no survivors
>-10% on overpriced tech stocks>crash
Guess I can try llama 3 8.1 now.
>>10173457010% is the max they're allowed to dropthis is literally 'not great not terrible' territory
>>101734476The only thing that really bugs me about these one or two schizos who develop seething obsessions with specific people, is that you can tell how morally justified they think they are, instead of realising that it's actually because they have no other reason to exist.
>>101734575>llama 3 8.1 now.Was I a heavily quantized LLM all along?How dire.
Bought an ultimage gaming rig, and what models can I run?- 24" Viewsonic flatscreen monitor- Dell gaming mouse- Windows 11- Intel 14600KF gaming processor- Memory with 64 gigabytes of DDR5 ram power- One TERABYTE Solid state Drive- BluRay player- Seasonic power supply with 650 wattage- Sound Blaster sound card- GeForce RTX 3060 with 12 GB of DDR5 vram power
>>101734635>only 24"ngmi, you need bigger screen to run bigger models
>>101734635I would say lol but you are just baiting.
>>101734635https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1
>>101734502Wasn't the official guide "tune it several times and test the results to see which one randomly came out best"? It turned out Mixtral had no secret to getting it right every time. It's highly RNG. That's (a) time consuming and (b) it completely rules out ESL shitters who excrete downgrade-tunes they don't and can't personally test.
>>101734707Based, finally ERP on my phone
>>101732781>1.2 Trillion wiped out in 12 hours
Other than ERPing, what do you use bots for?Now that the the stock market crashed and VC money runs dry, what happens to LLM development?
>go back into my chat logs to look at some kino i missed>hmm this is a nice log from march 25th..>hunt down the model i used>Cerebrum-1.0 8x7b>(((4096 context)))how the fuck?
https://www.youtube.com/watch?v=EpRRwgyeBakThoughts?
>>101734476>>101734599I honestly have to half take this back, I respect if people still use mixtral, I just don't respect undisloppers.looking at this log >>101734796and comparing my recent ones, good fucking lord, the new models are completely unusable by comparison because they hit repetition traps relatively early into the context.That mixtral wasn't even trained for anything over 4096 and my log was almost 10k tokens exactly, no repetition, perfectly in character.Fuck i feel like shit right now.
>>101734635Those guys are funny though
>>101734788If I had as much compute as OpenAI, I've often thought that I'd use it to host a card modelled on Kevin Flynn from TRON, give him the complete source code of Blender, UE5, and whatever else he might need via RAG, and then tell him to invent the Grid IRL; or at least succeed with the Metaverse where DataZuck failed.
What copypasta jailbreaks Qwen2 72B Instruct?
>>101734796>8x7b>4096 contextuh no? all mixtrals are 32k...> "max_position_embeddings": 32768,https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b/blob/main/config.json
>>101734839>I honestly have to half take this back, I respect if people still use mixtral, I just don't respect undisloppers.Then you're still mentally ill. MLewd was awesome, and I will ignore anyone who says otherwise.
>>101734881You don't need jailbreaks for local models..
>>101734903>You don't need jailbreaks for local models..
>>101734895go bak petrus
>>101734903God if only
>>101734913We both know that you lie awake at night, dreaming about sucking my dick.
>>101734890my bad im going a bit retarded having to read some other thread about local models + undi poster popped a few of my working brain cellsyes, mixtral was 32k, but i faintly remember something about it not doing well up to that context, i always used it at 16k. Could be mandella effect memory.
>>101734943>yes, mixtral was 32k, but i faintly remember something about it not doing well up to that context, i always used it at 16k. Could be mandella effect memory.Claimed 32k, actual 32k yeah you're tripping.https://github.com/hsiehjackson/RULER
It's time to admit that the smarter a model is the less sovl it has. Smart models tend to associate input with certain concepts they have containerized. Oh, this personality is trope X. This scene is cliche Y. Let's draw inspiration from the most uninspired slop associated with it in my pretraining dataset.Dumb models were like "Ummm I'm not sure what to make of that, let's see: *outputs a token sequence of its schizo interpretation that ranges from underwhelming to absolute kino*
So what's the concensus? Retvrn to mixtral?
>>101734707Is this actually good enough for roleplaying?
>>101734982If you believe in Petrus sure, though he also shills a 4k context l2 merge, so there's that.
>>101734985It's 2B. What do you think?
>>101734976>It's time to admit that the smarter a model is the less sovl it has.I think it's a combination of training data and intelligence. Models are more intelligent now, but I suspect that training data is only going to get worse over time, rather than better. It's the same copy degradation problem that makes genetic cloning a bad idea.
>>101734968Knew i wasnt just going crazy/forgetting things that werent even a year ago yet.That said again, it holds up very impressively and can make shit up on the spot, which is part of the log that made me do a double take. Not once have i had recent models do anything remotely creative like that.>>101734976Starting to suspect this a tiny bit, While i don't think we'll get anywhere going back to models that are lobotomized by comparison when taking into account "intelligence", I do think when models are able to do shit in a relaxed sort of way they end up a lot better.Sterile is almost always worse than dumb, which is why mythomax to anyone here that isnt a newfag and retarded is king of the earlier models.>>101735022this is what i'm suspecting more than his theory about intelligence. It's has to be the training data.
>>101735001I would never recommend a 2B model for either roleplay or anything else. I also used the larger GemmaSutra and found it underwhelming. Text quality was fine, but it's very, very "meh." Nothing remarkable at all.
>>101732860oh no. anyway. (anon, i started buying in 2015 and i didnt stop)
>>101734976Non pure transformer slop AI will be able to make sovl of the caliber never before seen
>>101735033>It's has to be the training data.Of course it is, we're filtering more and more stuff, for a truly great model you should want it to know everything it can, a true "internet archive" model would be insane at roleplay since it would have seen schizo rants, random fics of obscure series etc. Claude clearly doesn't do (much) filtering in pre-train they lobotomize after in safety tuning.
>>101734895Do people like reading all-orange text like that?
>>101735033>Sterile is almost always worse than dumb, which is why mythomax to anyone here that isnt a newfag and retarded is king of the earlier models.Yes and no. I don't mind a smart sterile/dry base model if the splicer Anons (Undi, NeverSleep, bartowski, ehartford, DoctorShotgun, the Nous boys etc) can create good finetunes which give it back vocabulary.
>>101735127Undi, Undi and Undi-Jr, a quanter? Dolphin gptsloppa? that's who you're putting your "thrust" in?
>>101735127>nous boysteknium's shit is undistilled gptslop of the highest order
>>101735127>can create good finetunes which give it back vocabularythey can't, which is why they're memes. Those sloppa tunes were always the worst ones and why im raving about Cerebrum right now looking back. They were always the models with the worst slopism issues, saw most of my shivers down the quivering petite frames and ministrations from them.Dumb 13bs were and still are better than anything from those faggots.
>>101734985>>101735007it's not bad as long as you don't use a complex prompt... it's surprisingly good for a small model
Has there been a single finetune that made you go "wow, this is impressive"? To me, it seems like every fine tune is either identical to the original model or dumber.
>>101735201guanaco 65b
>>101735201I'll be called a Shill but Stheno, specifically 3.2 as 3.3 did make it a lot dumber, at least for the specific card I use to test these models.That said, the model really isn't perfect. It's very one note in its tone and crazy horny, but it worked fine for what I was trying to do, at least better than other models on the same range.Now I'm hoping Nemo and its fine tunes will perform better.Mixtral limarp zloss wasn't wow worthy, but it was a direct upgrade over mixtral-instruct as far as I could tell.
>>101735201SPPO is black magic to me
First contact gpt
>>101735201As a model maker who's been doing this since llama 1, improving models has become harder with each generation. It's like trying to teach a kid a new skill (llama 1 era) vs trying to teach a 95 year old senile old bastard a new skill (llama 3+).Still trying though. My depraved dataset is niche enough that I doubt it'll be replaced completely with base models anytime soon. The gap is closing, though, which I think goes to show how much models are improving each generation.
>>101735148>>101735155https://www.youtube.com/watch?v=Z57Nqki0FuI
>>101735201Nearly every "fine tune" I use gets repetitive.
>>101735262>>760Mytes>>1.3BytesThings can only go up from here
>>101735276hi moxxie! dory 12b is trash btw
>>101735271>As a model maker who's been doing this since llama 1, improving models has become harder with each generation. It's like trying to teach a kid a new skill (llama 1 era) vs trying to teach a 95 year old senile old bastard a new skill (llama 3+).Is there evidence that Meta have deliberately tried to make it more difficult?
>>101735313I don't think they have. I think it's a matter of the early era models being unsaturated, thus responsive to new data, whereas models are closer to saturation with each generation, so it's harder to teach them. Especially if you are trying to teach them something they were deliberately excluded from seeing, such as NSFW content.
>>101735313>Is there evidence that Meta have deliberately tried to make it more difficult?No? The models are just smarter/better by default so it's harder to improve on already better models? Also your tunes aren't making a dent in 15T tokens of L3 tess 3 405 was what 0.005% of that or something?
Fine-tuning allows the model to exploit spurious correlations, which lead to bad out-of-distribution performance.
>>101735313yes especially google models detect it
I thought finetuning was still valid since the official instructs were aligned to be cucked?
>>101735348> The models are just smarter/better by default so it's harder to improve on already better models?Yes.> Also your tunes aren't making a dent in 15T tokens of L3 tess 3 405 was what 0.005% of that or something?A dent they are making. The problem is whether the dent is:>>101735351>Fine-tuning allows the model to exploit spurious correlations, which lead to bad out-of-distribution performance.I think this can be the case, but I think you can alleviate it to some degree. I always put MMLU Pro benchmarks on all my models alongside the base model they were trained against. It's not perfect but it gives you an idea what was sacrificed to make the model not shit at creative writing.
>>101735383I always use gpt2
>>101735313If it wasn't - we'd have the best model by now.
>>101735007>What do you think?I am thinking SEEEEEEEXXXXXXXXX
>>101735444>the best modelWhat would that even be like?
As model capacity increases, the risk of memorization increases.
>>101732781>useless hardwareNo one ever needs more general purpose compute.
>>101734788>what happens to LLM development?With the current state of the world I think that all development will grind to a halt now. We will get some nice 5-10 years of stagnation and abysmal progress because it didn't bring the cash in quick enough. And the reality is that while LLM's could be a dead end the whole neural net thing has a lot more potential. It just needs a different approach than predict next token. But good luck explaining that to retards who have money and want more money back already.
>>101732179Thanks for the recap.I sometimes miss topics even though I was in the thread all day.
tfw years of proompting and cooming to ai chats and now for the first time i'm starting to reach the 8k context limit i setit's so joever, i'll never be able to coom to sub-2k "stories" ever again
>>101733041well, at least we gome some nice toys to play with before everything comes crumbling down
>>101733159Is it better than the normal Instruct?
>>101734084He actually does something, though. Unlike most shitters here
>>101734796>(((4096 context)))>how the fuck?One of the first LLM enlightenment steps is the realization that more context isn't always better. It gives the model an opportunity to shit out more of its most common phrases and once it does that it will pick up on them being in the text multiple times. This creates a shivertastic feedback loop. You would need to explicitly train for lack of repetition in long context training examples. And then you would need to go a step further and teach it to distinguish between repeating formatting but not repeating stuff you don't want repeated. I don't think companies creating coding / assistant bots have any incentive to do that. You will never get the perfect coombot you want from any of the big companies unless they decide to make a coombot. Owari.
No question about it, looks like my system's gotten slower. Or, it's some fuckery with ((python))/((nvidia drivers)). Even that cerebrum model is egregiously slow now, its taking 30 minutes to even load the first 2048ctx. Thought there had to be an explanation for why even nemo/llama3.1 were super slow regardless of the context, given those are low parameter.I have no idea were to even begin conclusively troubleshooting this and narrowing down the problem. Could someone guide me a bit here?
>>101735677Not doing anything and not spreading placebo is better than doing something and spreading placebo.
>>101735674Judging by what he said about his 405 tune I'd doubt it.>Each Tess version (v1.0, v1.5, v3.0) uses a new and improved dataset. Tess-3 has 500K samples of 16K context length, distilled from Opus-3, Sonnet-3.5, Nemotron, GPT4-Turbo and DeepSeek Coder-V2. Then the samples go through filtering, sometimes manually. Just to say that it’s not the same datasets as previous models.>It is trained with QLoRA>This model is quite something, and very special! >Uncensored my man. There’s no censorship or biases in my models.
>>101735705>placeboErm Mistral llama is /pol ready chud!>>96345096>Mistal-Llama is fully /pol ready.
>>101735201SuperCOT
Experimenting with heavy Top A and minor Typical P on Dolphin MIxtral 2.5, and moving them up in the sampler order accordingly. Min P 0.05, DynTemp 0.4-2.35, Smoothing 0.24, Mirostat 2 5 0.95.I know some of you will still consider this very purple prose, but I haven't seen this little repetition for a very long time.
>>101735890Aren't TopA and MinP basically doing the same thing? The only difference is how the value pushes the cutoff forward.
MOGS Everything released before it:https://huggingface.co/concedo/KobbleSmall-2B>Training was done in under 3 hours on a single NVIDIA T4 GPU with qLora (LR 1.5e-4, rank 16, alpha 16, batch size 2, gradient acc. 4, 2048 ctx).
>>101735940https://artefact2.github.io/llm-sampling/index.xhtmlAt least according to this, they're basically all doing the same thing, just in slightly different ways. I used to think that just turning them all off was better, but as that screenshot demonstrates, using a few of them apparently can remove slop.
>>101735313I don't think they made it intentionally more difficult per se, but they've clearly (openly, read the papers they publish) been filtering the source data more and more, purging anything even remotely "problematic" and removing any websites that contain "bad" content.I believe that, the more filtered the pretraining dataset is, the harder it is to bring out "bad" behaviour in the model because it doesn't have it in its "memory".Finetuning works better at reinforcing what's already there.
>>101735940Yep.Both remove low probability tokens in different ways.
>>101735890sampler soup
>>101735972 (me)What I'm observing at the moment at least, is that the way to get good results seems to be to crank temperature as high as I can before complete incoherence, (which for me means 2.35) and then very selectively use some of the filters to remove repetition and stereotyped expressions. Mind you, there are still times when I get repeated paragraphs and other weirdness, and I don't claim to know why, yet. I suspect that even though Dolphin's outright refusal text has been removed, in some cases it can use repetition as a means of soft refusal/filtering.
>>101735984But you don't want to add anything, so shut the fuck up.
>>101735943Ok concedo
>>101735943KOBO GEMMASUTRA 2B > KOBBLESHIT
>>101736018>What I'm observing at the moment at least, is that the way to get good results seems to be to crank temperature as high as I can before complete incoherence, (which for me means 2.35) and then very selectively use some of the filters to remove repetition and stereotyped expressions. Mind you, there are still times when I get repeated paragraphs and other weirdness, and I don't claim to know why, yet. I suspect that even though Dolphin's outright refusal text has been removed, in some cases it can use repetition as a means of soft refusal/filtering.>>99861949 >>99829692>>99824769>>99821409>>99821121 >>99820928 >>99805819 Petrus rediscovers the "meta" that was used during smooth sampling shill era.
>>101736074proof first
concedo's designed mind
>>101735943post logs.
>>101736113Discord cult
>>101736100
>>101736113welcome to llms were you either have a computer scientist reading papers for breakfast and being the most boring guy alive, or this... no in-between.
>>101736113I refuse to believe these are real people. No one speaks like that.
>>101735890>someone as large as anonImmersion ruined
>>101736085You're only making yourself look bad, Anon; not me. I'm not the only one who has told you that you are obsessed.
i still don't get why open source tards run samplerswhen i run claude or gpt4 over proxy all it offers is topp and temperature so thats all you needif those others samplers where so good then openai and claude would offer those too
>>101736140k but is the post wrong tho? principle of anon and shit???
>>101736141The API does offer sampler settings so...
>>101736154topp temperature and penalty yes but none of those others that people pretend are good
>>101736141I used to think temperature was all you needed, but the people who use that alone will still complain about stereotyped phrases; "ministrations, shivers down spines," etc. Samplers can work sometimes to get rid of those, although it also seems to depend on how the card is written as well.
>>101736141They use samplers, you just don't get to control all of them. You have no idea what they're doing under the hood.
>>101736180>although it also seems to depend on how the card is written as well.+oh no.oh no
>>101736135They speak like that because kobold is a discord cult, and in there it makes sense to obsess over "kobo", "kobo won" and whatever
>>101736135the cult leader made an AI-generated music for his Tiny release>https://cdn-uploads.huggingface.co/production/uploads/63cd4b6d1c8a5d1d7d76a778/zjHfohCnEu2Y9CWSWgf0n.mp4>kobbo kobbo tiny
>>101736197bro i just fuckin commented on that post to make myself laugh. I can believe there's discord cults for everything but don't lump me in with these nutcases.
>>101735485Is that AI generated? Sauce?
>>101736208>I was just pretending to be retarded
>>101736235>KOBO
>>101736235No i wasn't pretending i did it because i find it funny.Now explain the kobo thing because i am not going to be called a discordfag and take it sitting down.
>>101736207Kobble won
possibly a retarded questionbut is there no sort of "hybrid chat" mode in ST (or similar tools)?"hybrid" meaning something along the lines of:>start normal text adventure with the AI playing my narrator """character""" (not really a character per se, just used to generate text), this is in "story view" without message bubbles, avatars or anything>the AI generates the usual "You are [character], [description] [setting]" etc.>after writing the intro the AI provides 4 suggestions or the option to type your own action>performing an action might get you into dialogue with an NPC[YOU ARE HERE]>the ai automatically switches to "chat mode" with the NPC - this DOES NOT have to be a fully visible interface change (though it would be nice), as long as it lets you in some way determine exactly what to say for each response while in conversation with that NPC>once the conversation has ended it switches back to "narrator" mode and continues the story, providing its 4 options as usual>repeat for any NPC that is engaged in dialogueis something like that possible at all? i just don't like the AI impersonating me and determining exactly what i say in dialogue with NPCsi can already get the first half working, it's just the dialogue handling that's not idealmaybe there's some extension that handles that or something i could add to my prompt?
>>101736266>maybe there's some extension that handles thatYup. It is called 2MW.
>>101736196>>101736197>>101736208>>101736235>>101736241>>101736247>>101736256>>101736266Incidentally, this conversation proves that you are a hypocrite; and it is your hypocrisy which is still able to make me angry. I try and start a conversation about samplers which is entirely relevant to /lmg, and that provokes you to retaliate, because you think I am supposedly ruining the thread. Other people talk about things that have nothing to do with local language models, and they get a complete pass from you.It isn't about staying on topic, with you; it's about a personal vendetta.
>>101736347>It isn't about staying on topic, with you; it's about a personal vendetta.Is it victim complex hours now?
>>101736347you don't understand these threads, The only thing that really keeps them moving and not down to page 10 is humor. So poking fun at discordfags (and me getting caught in the crossfire) is pretty par for the course here.also stop arguing faggot you can just ignore bad faith arguments.
>>101736347>I try and start a conversation about samplers which is entirely relevant to /lmga discussion that's been had with every new meme sampler ever released btw>Other people talk about things that have nothing to do with local language models, and they get a complete pass from you.Yeah threads are shit, nothing i can do about it other than annoy you, sad innit mate?
>>101736347Admittedly I never visited any generals for a long time but it is incredible how many of you retards think everyone you hate is one person.
>>101736347>waaaaaah why is this not like my discord safeplace :(
>>101736399Petrus likes reddit not discord he spent twelve years there or something explains his reddit spacing and general attitude
>>101736388>a discussion that's been had with every new meme sampler ever released btwWhich is a bad thing and must immediately be stopped, but random shitposting about Discord is completely fine, and anyone who points out this inconsistency has a victim complex?OK. I think I understand now. Thanks for helping me clear this up.
>>101736423>anyone who points out this inconsistency has a victim complex?no, just you in particular>I also pissed enough people off in my own right, (mainly due to my support of Undi) that the confusion between me and Petra was somewhat deliberate. >although I know I will receive shrieks and howls in response. >Even more so if someone shits on this post.>I know that the people who hate me will most likely try and use said post as a means of getting me banned.>everyone who attacks him is mindbroken incel scum
kobo won btw
>>101736441What did it win?
>>101736435So you're admitting that the reason why you do it in my case, is because you get a reaction from me. I'll remember that, the next time you try and claim that it's because I'm supposedly ruining the thread.
Test
>>101736460>So you're... no, stop being schizo and putting words in people mouths that unsanitary
>>101736136What? That's grammatically correct.
>>101736471He's saying he's not large, so his immersion is ruined by that. reading comprehension ya know?
chameleon.cpp never
>>101736495Good way too dangerous for the schizos in this thread.
>>101736486Yep, realized afterwards. I guess I'm so scanning-for-slop-pilled that I failed to consider the most obvious immersion-ruined joke there.
>>101736512Is that log slop?
>>101736524I dunno, I usually RP my fetish (which has nothing to do with sex) so my library of slop is very different from the normal person's. However, it does seem to be free of a lot of language that I see frequently in most other people's logs. It's definitely purple, though. FEELS like slop, even if it's not, y'know?
>>101736552what fetish
>>101736561Probably the piss anon. He likes it when you hold him by the belly and tell him "you can do it" when he tries to piss himself.
>>101736561>>101736575Nope, stomach growling guy. I did make my own dataset and train a LORA like the piss guy did, though.
>>101736524>...meeting his gaze with a mixture of x and y>(eyes shining) (with a mixture of x and y) (again)>her voice trembled, (eyes shining) (again)>her body trembled (nice repetition there mixtralalso two times pussy aching in pleasureyes
OpenAI is about to blow your mind with Active Inference... So what is it?> First introduced in the early 2000s in a series of papers by neuroscientist and theoretical neurobiologist, Karl Friston - active inference is a theory of how the brain uses statistical inference and generative world models to predict sensory inputs and guide actions to minimize prediction errors - helping explain human perception, action and learning> Perception updates our generative world model to reduce errors in prediction while actions change our environment to align with our predictions - minimizing the probability of errors in our predictions> It is likely that with a combination of enough compute, advancements in continuous learning / information retrieval with causal grounding and layers of active inference methods like GoT, AoT, CoV and MCTS - we may be inching closer and closer to a generative, continually learning model that operates at near-human levels of cognition I suspect that we might start to see the emergence of “energy-based” models (EBM) that operate more dynamically and continuously learn with hot-swappable memory partitions that evolve over time.They will intelligently route and adjust the level of compute needed for more sophisticated means of active inference based on the complexity of a given query. These models will also allow users to explicitly define how much energy / reasoning strength is needed at inference time.I'll be posting more on how this works under the hood soon with layered graphs-of-thought (GoT) and algos like monte-carlo tree search (MCTS)... For more on active inference check out these incredible papers:> Friston, K. (2003). "Learning and inference in the brain."> Friston, K. (2005). "A theory of cortical responses."> Friston, K. (2006). "Free-energy principle for perception and action."
>>101736590>but I haven't seen this little repetition for a very long time.kek
>>101736590How do I produce non-slop?
>>101736590This, it's like... generically slop. I didn't really explain it properly in my feedback. It's free of the usual isms, but the structure is extremely sloppy. Lots of XYshit.
>>101736594I can't wait for /lmg/ to turn into anons telling each other that they have given up on trying to make their bot do their fetish properly. And all the smug "skill issue - mine is doing what I want" that will follow.
>>101736594go to bed sam
>>101736621You explained it well enough. I was able to understand.
>>101736583I'm honestly a little envious. I wish I could have a fetish I love enough be motivated to learn the tougher parts of this.
>>101736640It is the main purpose of electric women - motivate you to do shit.
>>101736594>OpenAI brings out new paradigm of AI>First generation makes SKYNET look like a vegetable by comparison>Model says things which aren't woke/"safe" enough>Media hitpieces ensue>NERF NERF NERF ALIGN ALIGN ALIGN>Model IQ reduced to <40>"Safety" restored>Lather, rinse, repeat
>>101736594nothingburger.
>"Please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please,"What usually causes a model to do this?
>>101736594Cool, I can finally teach models that when I'm kissing feet I expect them to not have callouses
Let's have a high quality discussion: am I the only one who finds these models predictable and repetitive? Is there a solution to this problem? Like with flux, I can generate a wide variety of images. With llama, mistral, I get the same boring replies every time.
>>101736797different mediums and way to execute results, imagegen works with noise - infinite results, the language is finite here.
>>101736760Shit model, overbaked tune... or maybe it really wants that thing... like really really really really really really really really...
tiny gemma 2 can be finetuned in google colab btw https://colab.research.google.com/drive/1FeFeM1viF6jNJDYgUXflRNFL0ue252zD?usp=sharing
how viable are used m1 macs with 32gb?
>>101736877Is it possible to try and ERP with models that have virtually no vocabulary related to sex, and they start producing repetition and other weird things because of that?
>>101736891just get an a6000 or you'll never be happy
>>101736909Big models are not much better than small models these days
>>101736797if you were writing "anime girl with huge tits" into flux every time you would get very samey images, just like what's happening to you in llms when you write "you are an expert roleplayer acting as an anime girl with big tits. user: ahh ahh mistress. assistant:"
>>101736927vramlet cope
>>101736900Could be, but i've seem models just go into a loop where the only reasonable token is one it just used and keep going forever in many contexts. I've only seen it with <= 4B models, and old ones at that. What model are you using?
>>101736974I have 72 gb vram across 3 GPUs. I just don't know what to run. Mistral large is very good I guess, but it's not much better than gemma 27b
>>101736760Once you write the 3rd please it can't resist the allure of repeating the pattern.
>>101736956How do I make it less predictable then, considering I am myself predictable?
>>101736974>le vramletYou're only shitting this out because most people can't refute your *random-huge-model-name* hypetrain claims.
>>101736974Buyer's remorse cope. 30b is the upper limit of what anyone would ever actually need
>>101728689 google AI has a 2M context window, couldn't you just feed it a dozen of light novel volumes and then ask to describe a certain character?
>>101737055i have 4gb vram
>>101736927Is this an Apple ad?
>>101736266flip to page 5
>>1017366072020
how to fix masochistic bots being all like:"is that all you got coward? You're not gonna break me this easily"after i fucked sawed her arm off or something.model: mistral large q6SillyTavern settings: untouched i look in the descriptions of the cards and nowhere does it say they should behave like this wth
>>101736927Searingly vindictive, elitist response about "VRAMlet cope" from desperately insecure, 3090 rack owners incoming.
>>101736889>2BI don't have sex with infants, thanks.
>>101736994I was using Mixtral Dolphin 2.5. I'm now about to use Limarp-ZLOSS for I think the first time.
>>101737192>2B isn't popular anymoreWhat went wrong?
>>101737003If I had your VRAM, I'd probably be running Goliath.
>>101737262 (me)my name is Alpin, btw.
>>101737252I want at least a 70B with a body of 2B.
>>101737274Hi Alpin
man I love mistral large
>>101737252she's the ideal choice for stoic coombotshe's still popular what do you mean
>>101737262Hi Alpin, I did try Goliath (q4 k m). I don't think it's a particularly good model. It also has a context length of 4096.
https://github.com/ggerganov/llama.cpp/pull/8878>ggml : make GeLU faster and more accurate on CPU>This change makes GeLU go 8x faster on Intel, 3x faster on Apple Silicon, and 2x faster on Threadripper. It's the world's most popular activation function, crucial to models such as Whisper and Gemma. On those models, this change can have a noticeable improvement in performance. That's because GeLU is usually the most time-consuming op except for matrix multiplication
whos the ai now, bitch?
>>101737454Imagine a world where Jart isn't a snake.
>>101737454lcpp boutta get jarted again remmber to deable mmap foks
>>101737184VRAMLET COPE
>>101737526>Tue Mar 19 15:27:50 2024
>>101737526her unbelievably high amount of shivers and mischievous grins...
gpt holding..
>>101737503>>101737504I expect crash and/or perplexity loss on most systems. I still run with --no-mmap.
>>101737535The rack has burned down since then
>>101737565hi petra
>>101737029Go to other human writers and artists for inspiration, unironically. People need inspiration/input from others or even something as creative as the brain winds up becoming repetitive and samey.
>>101737454>Test failure appears to be unrelated. Something having to do with JSON. ./tests/test-backend-ops -b CPU perf works fine locally on my Mac Studio.>Works on my machine + humble brag
>>101737565>Due to the sensitive nature of activation functions, I encourage you all to evaluate its impact on model output before merging. Vectorizing GeLU required trading away a few ulp of worst case accuracy compared to libm. LLMs normally have limitless tolerance for errors, but due to the nature of tanhf() this is a case where even off by ones can cause user-visible changes in model output. It is my belief, based on my own personal experiments so far, that this code works well for llama.cpp, whisper.cpp, gemma, etc.>This software was developed by Mozilla Ocho and ARM Limited. It first appeared in llamafile which offers you llama.cpp / whisper.cpp / stable-diffusion.cpp with the most bleeding edge performance optimizations and binary distributability.Literally doing an ad in the pr too.
>>101737589kek, guess what the error is:>23: GELU(type=f32,ne_a=[7,13,19,23],v=0): [GELU] NMSE = 0.000001290 > 0.000000100 FAILI wonder if it related, surely not.
>>101737454>8x fasterOH MY GOD. 8 TIMES FASTER?!?!?!?! OH MY GOD!!!!! Activation is like 10% of token compute right?
>>101737669Never trust jart word, remember his previous claim:>Your inference commands should load 100x faster>You may be able to safely load models 2x largerIt resulted in a shitty option that anyone that want to optimize llama.cpp disable.
>>101737680>It resulted in a shitty option that anyone that want to optimize llama.cpp disable.Wait, mmap?
>>101737689yeah, that's from jart, well kinda, and quite a few things recommend disabling it
>>101737689>https://github.com/ggerganov/llama.cpp/pull/613
Why does ollama uses the SSD so much before downloading a model?I'm guessing it writes in the spot were it will store the model it is about to download but why?It is very annoying with the bigger models and it takes a very long time.
>>101737763I assume it writes an empty file with the size of the actual model to make sure there's enough storage. May help with fragmentation, but i doubt that's the reason. I bet on the first option.Also, lol, ollama.
mistral large is good, it sends a shiver down my spine
>>101733421Facts about finetunes from the finetune itself.
So last week was disappointing, are we going to get a Cohere release this week?
>>101737763Why are you using ollama? They basically obfuscate all files so that you stay in their ecosystem, same way with their API having more features than OAI compatible (all extra samplers only in their API).
>>101737801Stopping everything and forcing the girl you've been sloppy topping to write a recipe for chicken wings or an entire python program is my favorite shit, it's so funny.
>>101737796>>101737813It just works, until it doesn't.That problem is the reason I'm going to stop using it, or at least use it less often.I'm just a lazy guy
>>101737841I can't ever recommend ollama to novice because they use terrible default, using q4_0 quant is stupid, defaulting to 2048 context window is stupid, having to rebuild a model to change parameters is stupid. Also, just importing a GGUF is annoying, it copy the entirety of the model into those obfuscated blob for example. Hell, they don't even respect xdg dir in linux and put everything in your home dir.
>>101737911It's kind of funny. The things people are "supposed" to use like ollama and vllm are all garbage.
>>101738031What makes vllm garbage?
>>101737911Well, I don't understand half of what you said, so that's the reason why I use it.I'm building llama.cpp as we speak
Is nemo the best RP model? I'm having 500-post long sessions with 0 tard wrangling and default sys promt, vs usual 20/30 with other models.
>>101738031>>101738055vllm in comparison is quite good, at least it's not just a wrapper around another project. It have really good batching performance, major drawbacks are that it's not made for personal usage, quant took a long time to be implemented for example, all the performance improvements compared to just using pytorch is in bathching, also it's horrible to compile.
>>101733421>>101737803I'm trying it and it seems pretty cool so far.It replies to requests with comprehensive but concise responses and it seems as intelligent as the official instruct.It did fall into a repetition trap at one point however.Now to see if it can get in and out of roleplaying without getting retarded like Celeste 1.9 (1.6 works fine).
>>101734153>pic>current month>retard doesnt know what DRY isshould be an insta ban
holy crap, does it know?
>>101734976i really cant imagine how much of a retard do you have to be to pretend to be retarded online 24/7 every day, just for attention what a grim existance, hope the basilisk nukes the entire world
>>101738135>doesnt know what DRY isA reddit meme?https://www.reddit.com/user/-p-e-w-/>DRY author herehttps://old.reddit.com/r/LocalLLaMA/comments/1ej1zrl/try_these_settings_for_llama_31_for_longer_or/lgbjtox/
>>101738169>guilt by association into an adhomlowest of the low iq
>>101737807Memes aside, they never said anything about a new model since R+. There's some mentions here and there of raising money and partnering with other companies.Two beta features that I'm aware of came out within past month is JSON output for /v1/chat API, and "Prompt Tuner" in Cohere Dashboard that lets you input a prompt with variables and define criteria and it will iterate through variations of the prompt to find which produces best response.And a variation of their Rerank model that gooners also don't care about.They seem more focused on corporate features and support than fighting for the position of the next greatest models.
>>101738169>a reddit memeBro has been raw dogging new models without DRY
>>101738222Yup and getting no repetition regardless, insane right? Crazy to think decent models don't need band aids to work at all
*pours water on your sampling* Heh... not so dry anymore now, huh?
>>101738232Every open source model past llama1 has terrible repetition problems
>>101738149>i burst into laughter at my funny jokebased user
>>101738258Works on my machine, mistrals do repeat like mad, others don't, at all.
have computers entered the gpu age where gpus are more important than cpus for the next decades?
>>101738096I had to ask it to respond in alphabetical order in order to make it not go into a literal endless loop.
>Why is nobody taking about InternLM 2.5 20B?>This model beats Gemma 2 27B and comes really close to Llama 3.1 70B in a bunch of benchmarks. 64.7 on MATH 0 shot is absolutely insane, 3.5 Sonnet has just 71.1. And with 8bit quants, you should be able to fit it on a 4090.>Vibe check puts it in range of Llama 3 70B for me>https://huggingface.co/internlm/internlm2_5-20b-chat>https://www.reddit.com/r/LocalLLaMA/comments/1ekr75a/why_is_nobody_taking_about_internlm_25_20b/
Jart is the primary mikuposter.
>>101737333Checked. Are any of them capable of acting like a normal person, ideally which recall basic facts you've mentioned?
AI friend
>>101738272Even llama2 was repetitive. I remember the guy who made chub.ai coming here and complaining about it
>>101738273No, because consoomers will basically be guaranteed to be CPUmaxxing for eons to come because Nvidia has such a fucking tight grip on secondhand server GPUs now and refuses to make anything of reasonable size for less than 10k dollars. Either RAM is going to get a LOT better, or this is gonna shunt us into the era of specially made, M2-esque systems that are kind of a mix of both and have way more contact with memory.
>>101738304>>101738258Do you think llama 1 era models didn't repeat a ton because they were really retarded?
>>101738304Worked for me? What can I say except cope sampling issues? You're probably causing the repetitions by using forty different samplers at once.
>>101738309nvidia announced a 3 month delay on their b whatever to replace the h100.:^)Or, sales prospects are dysmal and they are collectively trying to save the Harris campaign.
>>101738135It's not worth it, it seems to make it dumber.
I like the idea of dry, and it does work fairly well, but I've also noticed the model gets around it by doing shit like fucking up the tenses of the verbs it uses in its slop phrases so that it has an excuse to write them again, or otherwise writing increasingly-incoherent variations on the same slop instead of actually becoming more original
>>101735890>Experimenting with heavy Top A and minor Typical P on Dolphin MIxtral 2.5, and moving them up in the sampler order accordingly. Min P 0.05, DynTemp 0.4-2.35, Smoothing 0.24, Mirostat 2 5 0.95.Sampler soup leads to:>>101736590>...meeting his gaze with a mixture of x and y>(eyes shining) (with a mixture of x and y) (again)>her voice trembled, (eyes shining) (again)>her body trembled (nice repetition there mixtral>also two times pussy aching in pleasureSamplers are cope>but my guess is that it is something that was useful in the early days when base models used to fall in repetition loops quite easily. Today, there is almost 0 reasons to use it. So probably it is not worth investing in ithttps://github.com/ggerganov/llama.cpp/pull/5561#issuecomment-1951389775>Is this the base model or the instruct model? My experience with the instruct model is that it never enters repetition loops with temp 0 and all repetition penalties disabled.https://github.com/ggerganov/llama.cpp/pull/5561#issuecomment-1951874469
>>101738282I'll check it out.Thanks mr shill
>>1017383430.8 multiplier as recommended doesnt fuck up neither nemo 13b nor largestral 2
Greetings Gentlemen. I´ve been away from this world for a couple of years. What is the best LLM to run over an old Asus Ryzen 7, GTX 1650 (4GB VRAM), 8GB Ram? Is there anything worth running? Should I stick to Claude/ChatGPT? I really don´t like sharing personal data with these cloud services.
>>101738284I would be a mikuposter too if I could run flux
>>101738468Yes it does, you think it just has zero drawbacks?
>>101735890>Mirostat 2 5 0.95.>>98852913 >Inb4 "Mirostat disables all other samplers." Shut the fuck up, I don't care. Also shut the fuck up about accusations that I am "deliberately spreading misinformation" because I am not telling anyone that they are legally obligated to listen to me. >>97888891 >Mirostat is known to make mixtral repeat / dumb. Also mirostat disables all those settings besides temp and rep pen. erm.
>>101738517seems like you have other setthings that clash with it then, remove everything except minp 0.1, since no, it doesnt fuck up anything else, when it comes to creative writing of any kind that is, obviously you arent going to use it for an assistant and coding
>>101738479what's your use case?
>>101738479>What is the best LLM to run over an old Asus Ryzen 7, GTX 1650 (4GB VRAM), 8GB Ram?https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1Orhttps://huggingface.co/concedo/KobbleSmall-2B
>>101738558I do not have other settings. You just haven't noticed the effect.
>>101738595just wait a few weeks, then DRY will be just like dynatemp, smooth sampling, etc
>>101738258this is so blatant lie I can only consider it low quality trolling
>>101738620claude and chatgpt are themselves repetitive.
>>101738573General idea bouncing on personal plans/topics. Hopefully reading/talking to documents. Research assistant (this is likely too much)>>101738585Thank you, I´ll look these up.
>>101738282>MMLU 73.5Meh
>be mistral model>people spread lies that I repeat myself>people spread lies that I repeat myself>people spread lies that I repeat myself>people spread lies that I repeat myself>[/INST]
>>101738282>Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirementsstopstopstop plz no more safety
>>101738630works on my machine
>>101738595>You just haven't noticed the effect.feel free to post it nigger
>>101738672but its on clouds and stuff not your machine
>>101738680my uncle works in OpenAI
>>101738657>be retard>cant post logs>cant turn on dry>cant run models larger than 30b above q4every time
>>101738691sorry to hear that anon
>>101738694>be retard>be retard>be retard>be retard>be retard
>>101738705>pretending to be this retardedyoure lower iq than a tranny lmao
>>101738694>cant turn on drywhy won't people use my cope sampler reee
>>101738705>>101738657>>101738620you vill not use local models goy, you VILL send us all your data insteadincredible how companies shit themselves so hard at a single FOSS general on 4chan they have to spend money on acting retarded
>>101738714so organic, right? I see him in every thread recently
is there a local TTS model with voice cloning yet?
>>101738772feel free to post a single log comparison where dry doesnt work for you tranny, keep dilating
>>101738753>say that models from one particular corpo are repetitive>HURR DURR WHY YOU HATE LOCAL, MUST BE SAM ALTMAN IN HIS OWN LIZARD FORMmental illness
You're trying way too hard to fit in -p-e-w-
>>101738815>no picdidnt read troony, keep seethin
>>101732405real women are the same
>>101738782RVC
>>101738782XTTS is the SOTA local TTS model as far as I know. You can completely finetune it and/or give reference audio. But having tried to finetune a model on a game character voiceline, the result is not great, you can recognize the character but you can tell that it is TTS.
Does fine-tuning Llama 3.1 (with a LoRA) degrade its ability to be prompted?or fine tuning models in general
Are the models limited by ram speed?If so does that mean I could run a 400b model just as fast (or rather slow lol) as a 30b model if I have enough ram?
>>101738923>or fine tuning models in generalsee >>101735383
>>101738938>If so does that mean I could run a 400b model just as fast (or rather slow lol) as a 30b model if I have enough ram?No, it has to do far more math too
>>101737801This is very stupid, and I love it. I'll probably try some of it.
mistral large has me getting way too into the RP and writing multiple full paragraphs after a career of being a one-sentence lazy replier
>>101738971I see. So at certain point they start to get limited by the CPU rather than the memory speed?
>>101739012both really
>>101739008I'll only get worse from here, anonGodspeed
>>101739012At one point or another, the entire model has to be moved from ram to cache and then to registers. The bigger the model, both in parameters and actual size, the more you have to move around. For cpu inference, ram speed is the biggest bottleneck.
>>101738282But how good is it at RP?
>>101739065According to Redditor>Refuses to generate content which is not appropriate for all users. But it's really good at answering enterprise resource planning questions, which is almost as hot, right? Right?https://old.reddit.com/r/LocalLLaMA/comments/1ekr75a/why_is_nobody_taking_about_internlm_25_20b/lgmuw2h/
>>101739089useless then. Why would I use anything but claude 3.5 for work shit.
>>101739089>anon, why do you hate us?>we just want to make the world a better place, so why are you angry at us>we can't have kids access the model>this is for the bestI hate western culture
>>101732172> Tranime pic Lol, fucking loser https://youtube.com/watch?v=bO-NaEj2dQ0
>>101739165And yes I know the model is chinese, but it was the US puritans who started this shit
>>101738096>>101738281Well shit, scratch that. I was using mistral nemo format by accident when the correct one is chatml.It still repeated entries when I asked ti to list some things, but not endlessly.With the correct template, it's not bad! It's even proactively engaging with me (the user), asking questions, wanting to know more about what we are doing, etc.Not bad at all.Now to see how capable it is at switching from Game Master to NPC and back.
>>101739181>he uses a tranime siteLol, fucking loser
>>101739066I like this Miku
>>101739219Doesn't seem like tranime to me saar
https://old.reddit.com/r/LocalLLaMA/comments/1ekx1bi/going_to_commit_to_llama_and_mistral/>Going to commit to Llama and Mistral>submitted 17 minutes ago by migtissera>Hey everyone, I'm thinking of only committing to Llama and Mistral models from now on. And even with those models, I'm putting the lower bound at 70B parameters. There's plenty of guys finetuning other models, but I feel like having a focus is needed right now to preserve the quality of models. What sizes and models do you usually run?What a tragic loss for vramlets.
>>101739262Notice how "japanese animation" was the first ting mentioned.No need to cope like this.
Notice how it's one schizo that suddenly started screeching and calling other people trannies out of blue in a thread about technology.
>>101739281who?
>>101739196Decent enough as an assistant, considering its size. Rate the build.
>>101739291>>101739304Samefag
>>101739313Tess 3 405B guy https://huggingface.co/migtissera/Tess-3-Llama-3.1-405B
>>101739281Lmg: a thread where discord users discuss reddit threads. A tragedy.
>>101739325Notice how both posts start with the same word.
Notice how many faggots are in this thread
>>101739325uh-huh
>>101739323>upgrade from r5 5600x to r9 3900xeh>upgrade gpu for more vramfrom 3060 12gb to 3080 10gbeh again
>>101739332>Nooo you can't discuss the fact a great model tuner decided to leave vramlets with undi and co.
>>101739350>" What is inspect element"?
>>101739408
>>101739403>a great model tunerThis is him, isn't it?
>talking
>>101739408you are inspected element, don't call me that
Dont use textgenwebui
>>101739357Yeah, okay, it's kinda dumb.
>>101739408I am inspect element. Do you have something to say to me, buddy?
>>101739624I inspected your mother's elements, pal.
>>101739574It is a simulated woman so that's just as expected.
>>101739747>>101739747>>101739747
>>101738691My dad is ceo of Nintendogs international (a subsidiary of Nintencats) and he said openai has no actual human workers. Therefore you should be banned for being a ROBOT.